Attribute Agreement Analysis

Since running an attribute agreement analysis can be time-consuming, expensive, and usually inconvenient for everyone involved (the analysis is simple compared to running), it`s best to take a moment to really understand what needs to be done and why. This example uses a repeatability score to illustrate the idea, and it also applies to reproducibility. The point here is that many samples are needed to detect differences in an attribute agreement analysis, and if the number of samples is doubled from 50 to 100, the test does not become much more sensitive. Of course, the difference that needs to be recognized depends on the situation and the risk that the analyst is willing to bear in the decision, but the reality is that with 50 scenarios, an analyst can hardly assume that there is a statistical difference in the repeatability of two evaluators with matching rates of 96% and 86%. With 100 scenarios, the analyst will barely be able to tell the difference between 96% and 88%. An attribute agreement analysis is used to simultaneously assess the impact of repeatability and reproducibility on accuracy. It allows the analyst to review the responses of multiple auditors when they review multiple scenarios multiple times. It produces statistics that assess the evaluators` ability to match (repeatability), with each other (reproducibility) and with a known main or correct value (overall accuracy) for each characteristic – again and again. Despite these difficulties, performing an attribute agreement analysis for bug tracking systems is not a waste of time. In fact, it is (or can be) an extremely informative, valuable and necessary exercise. Attribute matching analysis only needs to be applied judiciously and with some concentration.

As with any measurement system, the precision and accuracy of the database must be understood before the information is used (or at least during use) to make decisions. At first glance, it seems that the obvious starting point is an attribute agreement analysis (or R&R attribute gauge). However, it may not be such a good idea. Click the Agreement Evaluation Tables button to create a graph showing the percentage of compliance for each reviewer with the standard and the 95% confidence intervals associated with it. This table shows the extent to which the examiners agreed with each other. As you can see, the reviewers agreed with 40% (6 out of 15) of the time. In addition to the match percentage, Statistica also displays Fleiss` Kappa statistics and Kendall`s concordance coefficient. Fleiss` Kappa statistics show how much the reviewers agreed on each standard answer. A value close to 1 indicates a strong match.

The Kendall concordance coefficient indicates the strength of the relationship between evaluators. This value varies from -1 to 1. A value close to 1 indicates a strong match. Both measures indicate a fairly strong consensus among reviewers. If the audit is planned and designed effectively, it can provide sufficient information on the causes of accuracy issues to justify a decision not to use the analysis of award agreements at all. In cases where the audit does not provide sufficient information, the analysis of attribute agreements allows for a more detailed investigation that provides information on how to use training and fail-safe modifications to the measurement system. Once it is established that the bug tracking system is an attribute measurement system, the next step is to look at the terms precision and accuracy in relation to the situation. First of all, it is useful to understand that precision and accuracy are terms borrowed from the world of continuous (or variable) measuring instruments. For example, it is desirable that the speedometer of a car has just the right speed over a speed range (e.B. 25 mph, 40 mph, 55 mph and 70 mph), no matter who reads it.

The absence of distortion over a range of values over time can usually be called accuracy (distortion can be considered false on average). The ability of different people to interpret and match the same meter value multiple times is called accuracy (and accuracy problems can come from a problem with the meter, not necessarily from the people who use it). However, a bug tracking system is not a continuous counter. The assigned values are correct or not. there is not (or should be none) grey area. If the codes, locations, and severity levels are set correctly, there is only one correct attribute for each of these categories for a specific error. Modern statistical software such as Minitab can be used to collect study data and perform analysis. Kappa graphical output and statistics can be used to examine the efficiency and accuracy of operators in performing their assessments. Duncan only agreed with the standard about 53% of the time. Hayes did much better with about 87% approval. Simpson agreed 93%, and Holmes and Montgomery agreed with the standard in all trials.

This graph shows that Duncan, Hayes and Simpson need additional training. Then click the Each Reviewer button against standard agreement tables to create the following table (partial image below). The audit should help to identify which specific people and codes are the main sources of problems, and the evaluation of the award agreement should help determine the relative contribution of repeatability and reproducibility issues to those specific codes (and to individuals). Also, many bug tracking systems have problems with precision records that indicate where an error was created because the location where the error is found is saved, not where the error was caused. .