[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Make sure that only one of these choices is the right answer.

Oh, I want to hate on this rule so much. It is so dumb. Obviously multiple choice items—the items that dominate large scale standardized tests—should have just one correct response. (And we are talking about the simpl multiple choice item type, rather than multiple select (e.g., pick correct two of the five answer options), matching, multiple true-false, etc..)

Unfortunately, I have seen too many items come to from item writers or to external review with multiple correct answers. The technical jargon for this is “double keyed” or “triple keyed.” Occasionally, there isan item in which every answer option is actually a correct response: a quad-keyed item! Not good—though amazing.

Now, multi-keyed items are usually not the fault of the answer options. More often, the problem is in the stem and stimulus, I think. That is, the question can reasonably interpreted in a number of ways, leading to different correct answer options. This sort ambiguity can also be found in answer options, though I suspect that that is less common. I know of no studies of mid-process multi-keyed items that answers that question definitively.

This might be a good spot to get at a deep problem with this list. It is generally written in the second person, giving orders to item developers. My regular co-author and I far prefer a list that describes the traits of high quality items. That is, let’s all be clear about the goal. Let’s focus on what effective items look like.

Then, we can develop processes and procedures for how to achieve those goals. If we are going to address the actions of item developers, let’s try to provide actually helpful advice. In this case, how might item developers make sure that only one of the answer options is correct? As is, this list pretends to offer advice on what to do, but instead it is usually kinda getting at item quality.

With RTD (Rigorous Test Development), we have approaches and techniques to accomplish this. We have a Pillar Practice that we call Radical Empathy. We have a rigorous procedure that we call Item Alignment Examination built on radical empathy. In short, test developers need to work through items through the perspective of a range of test takers, not just as themselves or as one mythologized typical test taker. RTD likely needs to develop more procedures just for catching multi-keyed items. This is hard work. Item development is incredibly challenging work.

These Haladyna lists simply do not recognize that. That is probably the most offensive thing about them. They lay out seemingly simple rules that barely scratch the surface of what it means to develop an high quality valid item (i.e., one that elicits evidence of the targeted cognition for the range of typical test takers), and because of these lists absolute dominance in the literature, they evangelize the idea that item development is fairly simple.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman