[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Content: Every item should reflect specific content and a single specific mental behavior, as called for in test specifications (two-way grid, test blueprint).

We actually like this rule. This is a good start for the Haladyna list. No, this 2002 rule has no real antecedent in the 1989 list, but that is to the 2002 list’s credit. And this rule is only supported by the sources who mention it, though that is not quite three quarters of them.

The problem we have with this rule is that Haladyna and his colleagues never explain it anywhere. We have found that, in practice, the meaning of and reasoning behind this rule is often unknown. Frankly, we cannot be sure that that Haladyna et al. even mean what we would mean by this rule, and that’s a real problem.

We believe that it is important that each item be aligned to one specific assessment target. That targeted cognition should come from a domain model. Quite often, this is a standard from a set of state learning standards, or it could be some element from a job or role analysis. We believe in domain modelling and domain analysis; we love ECD (i.e., evidence centered design). (We recognize that the good work done by ECD to highlight the importance of domain models came after 2002, so we forgive Haladyna et al. for thinking that assessment targets just come out of test specifications.)

We know that it is important that items each target just one thing because otherwise there would be no way to determine why a test taker responded to the item unsuccessfully. They could just be making the same mistake over and over again, each time one standard is part of an item, even though they have mastery over all the rest. We should not be basing inferences of the successful learning, or teaching or coverage of a standards (i.e., when evaluating students, teachers or curriculum, respectively) with such ambiguous sorts of evidence.

Just as importantly…well, actually more importantly, each item should actually depend appropriately on that targeted cognition. There should not be alternative paths to a successful response available to test takers. They should have to make use of the target cognition to get to a successful response and that targeted cognition should be the key step (i.e., the thing they mess up, if they are going to mess anything up). Otherwise, items can yield false-positive or false negative evidence.

Is all of that clear in how Haladyna et al. phrased their rule? Is it made clear elsewhere in their article? Is it made more clear elsewhere in all their writings? Not really.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman