Fisking the Haladyna Rules #30: Use common errors of students

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Use typical errors of students to write your distractors.

In 1989, 2002 and 2004, this rule and Rule 29 (Make all distractors plausible) were distinct.  

Their 2013 book finally combines these two rules. It puts them together into one heading, preserving the wording of each and simply separating them with a semi-colon. They finally get it right. Their explanation contains something really good. Something really really good.

The most effective way to develop plausible distractors is to either obtain or know what typical learners will be thinking when the stem of the item is presented to them. We refer to this concept as a common error. Knowing common errors can come from a good understanding of teaching and learning for a specific grade level [or other listed methods].

Finally. A decade after the 2002 article and more than two decades after the 1989 article, near the end of the list, after the multi-part cluing rule, they get to the real meat. Unfortunately, all of that just shows that they have no clue how important this rule is. This is what matters most. Distractors are the defining feature of multiple choice and other selected response item types, and this finally gets near the core of what makes for a high quality distractor.

For multiple choice items to elicit high quality evidence, they must be able to offer credible affirmative evidence (i.e., that the test taker does have proficiency with the targeted cognition) and also be able to offer credible negative evidence (i.e., that the test taker lacks proficiency with the targeted cognition). These two sort of evidence are built in two different ways.

Affirmative evidence comes from items that require a cognitive path that depends on the targeted cognition to produce a successful response. All that cluing stuff Haladyna et al. keep coming back to? That is about alternative paths to a correct response via test taking savvy instead of through use of the targeted cognition.

Negative evidence is harder to collect. Negative evidence, as Rule 30 implies and their 2016 book says more clearly, requires offering potential responses that test takers might actually work to if they misunderstand or misapply the targeted cognition—that is, legitimate results of authentic mistakes. Any other distractor is a waste of everyone’s time. Only a guesser would select it, and that does not tell anyone anything—other than, perhaps, the test taker didn’t even try to actually work through the item. If a mistaken test taker cannot find their own result among the distractors (i.e., because distractors are wasted on some other basis), they are clued to try again. Rather than gathering evidence of their mistake (or mistaken understanding) they are given a second chance. But other sorts of mistakes might not be given such a clue or chance, as when they do have corresponding distractors. That is why it is important that distractors always and only be based on common test takers mistakes.

A substantive and valid meaning for “effective” distractors would be Substantively distractors that actually gather negative evidence of proficiency by giving the most common mistakes with the targeted cognition corresponding answer options. Now, if there is really only one mistake that test takers make with this particular piece of knowledge or skill, then no one should expect more than one effective distractor. But if there is a common mistake that test takers make with a problem but it is not a mistake with the targeted cognition, then it is not a good or effective distractor! Such a distractor would suggest that test takers lack proficiency with the targeted cognition when, in fact what they lack is with other knowledge and/or skills.

Yes, test takers who make other mistakes should be clued to correct those mistakes, because items should not collect information about other cognition. That is, the common mistakes that are relevant are only the ones in understanding or applying the targeted cognition, even if they are not the most common mistakes, overall.

Yeah, this is about item focus. Should an item purported to be aligned with some specific targeted cognition confuse information about other cognition with information about the targeted cognition? Of course not! Other sorts of mistakes should not prevent test takers from being successful, and other skills (e.g., test taking savvy) should not be enough to enable success.

Substantively, effective distractors capture evidence of the lack of targeted proficiency. Anything else is ineffective, regardless of how often test takers select it. And ineffective distractors undermine item quality and every validity claim about a test.

This is the hardest thing about writing high quality items. Developing items that lack alternative paths is hard, and made harder by inappropriate test prep that stresses shortcuts for the particular items on a test over authentic use of the targeted cognition. Developing a full set of distractors is even harder. As Haladyna et al. finally explained in 2013, it really benefits from knowing about teaching and learning of the targeted content. It is made harder because teachers and other educators are always trying to improve teaching and learning, meaning that the most common mistakes or misunderstanding scan shift over time as educators address the most common one they see.

The lure of substantively ineffective distractors that nonetheless masquerade as quantitatively effective distractors (i.e., by popularity) comes in the form of distractors based on other kinds of mistakes, rather than mistakes with the targeted cognition. These can be used to raise or reduce observed empirical item difficulty, and often will not be caught by item discrimination statistics. Haladyna et al. do not understand this, which is why even though Rules 29 and 30 start to get into the meat of what a good item is, even they fall short.

Thus, if they can be taken together, Rule 29/30 are perhaps the most important rule(s) and yet as Haladyna et al present them, they still are not good.

 

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.

  • Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.

  • Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334

  • Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50

  • Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.

  • Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]