[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the stem: Word the stem positively, avoid negatives such as NOT or EXCEPT. If negative words are used, use the word cautiously and always ensure that the word appears capitalized and boldface.

This rule seems pretty good, on the surface. It seems intuitive, but does it actually matter? (And ignore the odd mention of “cautiously” applying after the decision to use such a word has been made, right?)

The 2002 article lays out the evidence, and the evidence does not support their contention. Roughly two-thirds of their sources support the rule, but roughly one-fifth explicitly argue against it. The empirical studies that Haladyna et al. cite do not support this rule. In fact, they cite a study by one of themselves (i.e,. Downing) that found this rule makes no difference to item difficulty or item discrimination.

So, if it does not show up in the item statistics, then why push for this rule? “Under most circumstances, we suggest that a stem should be worded positively.” This lack of reasoning epitomizes the empty center of their whole endeavor. They endorse some received wisdom, but do nothing to explain why. Recall that in 1989, they called their list “a complete and authoritative set of guidelines for writing multiple-choice items”—in the paper’s abstract! While they did not repeat that claim in 2002, nor do they disclaim any of the rules they report from the literature.

So, why avoid negatives? I can think of a reason: stressed and/or hurried test takers might miss that key word (e.g., “not” or “never”) and therefore misunderstand what is being asked of them. This could lead test takers to provide an unsuccessful response, even though they could have provided a successful response if the stem was more clear. (Of course, there is no good reason to include a distractor that is based on test takers missing a negating word in the stem.)

Yes, clarity is essential. Rule 14 (Ensure that the directions in the stem are very clear) is their best rule.

So, if we suppose that that chance of skipping or missing that key negative word is the reason to avoid negative phrasing, is there something that could be done about that? For example, why if such words are bolded and underlined (i.e., something I usually opposed because it can look like garish overkill)? Might that draw sufficient attention to those words to ensure that they are not skipped? And if it would, why avoid negatively worded questions? What reasoning might be left?

It is curious that their 2004 and 2013 books omit mention of the studies cited in the 2002 article that suggest that negative words in stem do not made a difference in how items function. It is almost as though they eventually realized that their argument is so weak that they are better off omitting the whole truth that they know. But that could only be done in bad faith, we know that that cannot be the case. Right?

Last, they acknowledge in their 2002 book that another scholar found that the impact of negative stems varied based upon the type of cognition that was targeted. For the life of me, I cannot figure out why they would mention this without explaining more. It would be useful to know when this advice might actually make a difference. In our own experience (mine and my closest colleagues), we have seen attempts to target cognition that really does call for a negative stem, but the broad acceptance of this rule has made it impossible to get such items through item development processes.

In my view, the stem should be clear and hopefully succinct, but never at the expense of getting at the targeted cognition. If a negative stem does not hurt clarity, succinctness or alignment, I do not see a problem.

So, I would suggest bold and underline those negative words. But don’t use all caps—that’s just too much.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman