- Jun 22
How Effective Is That Test Item? Item Discrimination Made Simple
The credentialing community relies on exams to find out which candidates are sufficiently competent, and which are not. Exam scores help determine whether or not they earn the desired credential.
Well, with a little statistical magic – and we emphasize little – it’s possible to measure how well each test item (aka question and answers) contributed to the purpose of such exams. This is “item discrimination.”
The statistical magic involved can be accomplished with various statistical tools. (Look for our simple example below.) Regardless of the method used, the result is often referred to as the discrimination index (DI). The theoretical range of the result is -1.0 to +1.0.
For example, an item that top performers always get correct and low performing candidates always get wrong would yield a DI of 1.0.
Consider a different item that is correctly answered by the same proportion of high and low performers. Its DI would be 0.0, because it doesn’t discriminate.
What about an item that is always answered correctly by low performers while high performers always get it wrong? Its DI would be -1.0. It discriminates in reverse.
Psychometricians will be quick to point out that negative DIs indicate invalid items. Perhaps the key – the answer that is supposed to be correct – isn’t. Or maybe the item is outdated. In any case, a negative DI tells us that the item is unfair to candidates.
Items almost never correlate perfectly to exam scores in either a positive or negative direction. Table 1 captures the range of DI values that Kryterion’s psychometricians typically observe.
Table 1 Discrimination Index (DI) Efficiency < 0 (i.e., negative value) Invalid measure 0 – 0.09 Poor discrimination 0.10 – 0.14 Low discrimination 0.15 – 0.19 OK discrimination 0.20 – 0.34 Moderate discrimination 0.35 and above High discrimination Source: Psychometric Services, Kryterion, Inc.
Calculating the Discrimination Index (DI)
There are multiple approaches to calculating a DI. However, the following – much simplified – example should clarify the general concept.
A hypothetical exam was given to 10 candidates. Table 2 lists candidates’ total score as a percentage in descending order. It also shows whether they got Question One (Q1) right (R) or wrong (W). The DI for Q1 can be estimated in three steps.
Table 2 Candidate Total Percentage Correct Q1 Responses Candidate 1 100 R Candidate 2 90 R Candidate 3 80 R Candidate 4 80 R Candidate 5 70 R Candidate 6 60 W Candidate 7 60 W Candidate 8 50 R Candidate 9 50 R Candidate 10 40 W
Step one is to find the number of correct Q1 answers for the top and bottom 30% of candidates, though the percentages that define the “top” and “bottom” can range from 25% to 50% in this method.
In this example, the top three performers (30%) responded correctly three (3) times. The three low scoring performers got Q1 right twice (2).
Step two subtracts the low performers’ result from the top performers’ result: 3 – 2 = 1.
In step three, divide that result by the number of performers in either group: 1/3 = 0.33.
The DI of Q1 is 0.33. Meaning, this item – see Table 1, above – is a moderate discriminator between high and low performers.
It’s that straightforward to get a rough estimate of DI. However, Kryterion recommends larger candidate samples and more robust methods to determine item discrimination indices for items being used on credentialing exams.
If you found this moment of psychometric clarity refreshing, we invite you to review our equally accessible discussion of P-value.
Have a question about the psychometric status of your test development project or credentialing initiative? Reach out to Kryterion’s Psychometrics Team here.
Our experts specialize in the delivery of clear and helpful insights. You might enjoy it.