[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Phrase choices positively; avoid negatives such as NOT.

This is yet another redundant rule. Rule 17 (Word the stem positively, avoid negatives such as NOT or EXCEPT. If negative words are used, use the word cautiously and always ensure that the word appears capitalized and boldface.) covers the same ground, and though their 2002 article offers some explanation for Rule 17, it offers nothing on this rule.

So, I feel the same way about this rule. That is, if using the word not or some other negating word, be sure to set it in bold and underlined. I do not understand why they offer an out for stems (i.e., bold and all caps), but not for answer options. Their 2004 book also offers that advice, and rewrites Rule 17 to avoid putting that in the rule itself. That seems superior to this 2002 version.

Thus, in their 2004, it is clear that these really are more guidelines or advice than they are rules. Their 1989 articles calls them “rules” dozens of times—including in the title—and says that it is “a complete and authoritative set of guidelines for writing multiple-choice items” (p. 37). The 2002 article does not call them rules, leaning on the word “guidelines.” They end with wisdom from the 1951 edition of Lindquist’s (editor) handbook, Educational Measurement. They quote Ebel, rather than Linquist’s own brilliant chapter.

Each item as it is being written presents new problems and new opportunities. Just as there can be no set formulas for producing a good story or a good painting, so there can be no set of rules that will guarantee the production of good test items. Principles can be established and suggestions offered, but it is the item writer’s judgment in the application (and occasional disregard) of these principles and suggestions that determines whether good items or mediocre ones will be produced. (p. 185)

Yes, this is a great quote. Yes, there is actual wisdom in there. But I do not buy for second that Haladyna et al. believe this. Rather, it feels like too little/too late. They are 20 (of 22) pages in before they use the word principle and this article offers rather (or very) little to help item developers to develop that critical professional judgment. Guidelines without deep and thoughtful explanations have no chance to be understood as true principles. Their approach to presenting these ideas invites them to be understood as rules. Including something like Rule 6 (Avoid opinion-based items) and claiming that it is supported unanimously—though it has just 26% support from their sources— and failing to offer any explanation for it is clearly not an effort to support professional judgement in the application of worthy principles. Offering all those numbers in their Table 2 (p. 314) without offering real explanation is about leaning into the misleading precision of numbers to bolster the seriousness and credibility of everything on their list. Their 2002 article claims 24 of these rules have “Unanimous Author Endorsements” and these 24 rules average mere mention by less than two-third of their sources. Why make such a claim if not to suggest that these are truly rules?

Yeah, I don’t like negatives in answer options, but I don’t think that I can defend a prohibition or even discouragement. Clarity and simplicity of language are good goals, and—as I wrote above—put negative or negating words in bold and underlined type to make sure that test takers don’t miss them. This was all addressed in Rule 17. So, there’s no new principle here in Rule 27, not from me and not from them.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman