Fisking the Haladyna Rules #23: Choices homogeneous

October 23, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Keep choices homogeneous in content and grammatical structure.

Two-thirds of their 2002 sources support this rule, but the only empirical source they mention is a study by one of them that found this makes no difference. Perhaps more importantly, their only logic/or reasoning is that when the items are not all parallel, it can clue one of the items as the key. Their example in their 2002 book makes that obvious.

What reason best explains the phenomenon of levitation?

a. Principles of physics

b. Principles of biology

c. Principles of chemistry

d. Metaphysics

Putting aside magnetism and superconductors (i.e., physics), it’s not hard to see how they answer D would draw disproportionate attention. Depending on the stimulus, D might actually be the correct answer. But the problem is not that lack of homogeneity! The problem is that just one of them sticks out, not that they are not all the same.

So, clearly D should be “Principles of metaphysics,” to match the others. But then there’s a redundancy with physics…but there’s a conventional wisdom among item developers on how to deal with that—one that Haladyna et al. do not ever mention. As I wrote for Rule 22, answer options should all be parallel, all be distinct, or come in pairs (when an even number of answer options).

a. Principles of astronomy

b. Principles of astrology

c. Principles of physics

d. Principles of metaphysics

Do any of those uniquely jump out? They are not homogenous, as two of them are science and two of them are not. The same guidance works for grammar, length, voice, content, etc.. Answer options really do not need to be homogenous.

But here’s the real issue: There is a far far far more important rule for crafting distractors. Rule 29 is the most important rule, make all distractors plausible. If that requires violation of homogeneity, fine. Do it! That second set of answer options above is only good if each answer option is deeply plausible, and a shortcoming of homogeneity (e.g., as in creating pairs) is fine if it does not hurt plausibility. It is plausibility that matters, not homogeneity.

The real issue seems to be that so much of the Haladyna rules is about undermining guessing strategies in a world in which test takers simply can recognize the best answer or not. It does not consider the cognitive paths that test takers might take, and almost never considers that the best distractors are the ones that represent the results of mistakes in understanding and/or application that test taker may make along the way. Perhaps they just assume too simplistic content?

So, no, I don’t buy this rule.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #22: Choices not overlapping

October 22, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Keep choices independent; choices should not be overlapping.

Less than one-third of their 2002 sources mention this rule at all, but they never have cited an empirical basis for this rule. It seems thin.

Their reasoning seems to be based upon cluing and multiple correct answers, but there are already rules on cluing and ensuring that each item has just one correct answer (i.e., is not multi-keyed). So, what does this item add? Moreover, are those really inevitable results of overlapping answer options?

Any item aimed at identifying a set or range (e.g., which characters…, what are the symptoms of…, for what values of x….) would be made far more easy—perhaps too easy—if those sets/ranges could not overlap. I can imagine an argument that these kinda turn into complex multiple choice (type K) items, and that was already addressed in Rule 9. So, that might be a better place to address that concern. But Haladyna et all do not mention that concern in either article or either book. And overlapping ranges are simply not amenable to multiple select or multiple true-false item types. So, this issue doesn’t seem to create a need for this rule.

I simply cannot follow the logic suggesting that overlapping answer options would clue the correct answer option. If the answer options are:

a. Something

b. Some subset of A

c. Something else

d. Something else else

Does that suggest that the answer must be b? Must be a? Cannot be a or b? There is a general idea that answer options should all be the same in some way, all different in that way or come in pairs of two (i.e., when four answer options) in that way. The idea is that no single answer option should just jump out at test takers. But Haladyna et al. do not share this conventional wisdom in their rules. To be fair, I’ve never been quite sure about this wisdom. But would this set of answer options clue anything?

a. Something

b. Subset of A

c. Something else

d. Subset of B

I think not.

Which leaves the question of multi-keyed items. But we already know that multi-keyed items are bad (i.e. Rule 19). Is there something wrong with overlapping answer options if they are not multi-keyed? I keep looking and I cannot find anything other than obscurity. That is, complex multiple choice items (type K) can be needless confusing. So, try to avoid that. But there are also times—particularly with math items—when attention to precision is part of the targeted cognition. Precision in thinking and in communication is valuable in every content area, but math focuses on it more than most others. Should there really be a ban on items that lean into this skill?

I would note that this is not one of those rules that says “avoid.” Now, one might interpret such rules as being less than complete bans, suggesting something less strict. This rule, however, does not even leave that arguable wiggle room.

This seems redundant when it is not actually an obstacle to getting at important content. At best, it is useless.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #21: Logical/numerical order

October 21, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Place choices in logical or numerical order.

I have always wondered what this means. Of course, when the answer options are all numbers, it is clear. But what if the answer options are points on a graph? What if they are names? What if they are phrases or sentences? Should they be ordered by length? Alphabetical? Does it matter? (Yeah, one—but only one—of their books says that answer options “should be presented order of length, short to long,” but…ummm….why!? Because it is prettier? Huh?)

Is there always a “logical” order? What would it even mean for an order to be “logical?” What if two people disagree about which order is more “logical”?

I hate this rule because use of the word “logical” suggests that there is a single right answer. Logic should not yield multiple answers. I mean, imagine that robot putting its hands to its head and repeating “Does not compute. Does not compute,” until its head explodes. There are important issues that are not matters of logic.

Moreover, this rule kinda seems to go against the previous rule about varying the location of correct answer options. If the incorrect answer options are all based on authentic test taker mistakes (i.e., Rule 30), and the correct answer’s location should vary, it does not really leave as much room to put the answers in a “logical or numerical” order? How should an item developer square these differing rules? Are some of them more important than others? For example, are the most important rules earlier on this list? That is, are these rule presented in that sort of logical order?

We do not think that the Haladyna Rules are nearly as useful as they are depicted to be. Over and over again, they beg the actual question, hiding behind simplistic or trite “guidelines” that duck the real issues. They beg the question (in the original meaning of the phrase) by failing to offer useful guidelines or rules for item developers and so very many of them beg the question (in the more recent meaning of the phrase) by not actually addressing the meat of the issue they pretend to address.

And last, why doesn’t this rule include “chronological”? If it says “numerical,” it could easily also say “chronological.” Could it be that Haladyna et al are only thinking of math exams? That would be crazy, right?

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #20: Vary location of right answer

October 20, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Vary the location of the right answer according to the number of choices.

Yes. Totally. I mean, I think this could be written more clearly. I don’t really understand what “according to the number of choices” adds to this rule, but sure. Fine. I think that replacing that phrase with “randomly” might be better.

But “randomly” isn’t actually quite right. In our work, we have found that putting the correct answer option earlier in the list might lower the cognitive complexity of an item. That is, if a test taker finds a truly good candidate early, they might not have to work out all the other answer options all the way through. That is, they might be able to more quickly rule them out as being inferior to that earlier option. The hunt for the right answer might be cognitively more complex if they have to work harder to eliminate more answer options before they find a good one to go with.

Of course, if the correct answer option is always last or always later, that will reward guessing strategies—which is bad. The location of the correct answer option should be distributed equally across an entire form, just to fight that kind of construct-irrelevant strategy. We do not expect careful work to pick just the right items to increase the cognitive complexity of, though we might dream.

You see, even this seemingly simple rule might not be so simple. But Haladyna and colleagues clearly do not sufficiently dive into the contents of items or the cognition that items elicit to recognize that. Instead, they look at this most quantifiable and testable of ideas (i.e., how many distractors?) and revel in the how easily quantified it is.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #19: One right answer

October 19, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Make sure that only one of these choices is the right answer.

Oh, I want to hate on this rule so much. It is so dumb. Obviously multiple choice items—the items that dominate large scale standardized tests—should have just one correct response. (And we are talking about the simpl multiple choice item type, rather than multiple select (e.g., pick correct two of the five answer options), matching, multiple true-false, etc..)

Unfortunately, I have seen too many items come to from item writers or to external review with multiple correct answers. The technical jargon for this is “double keyed” or “triple keyed.” Occasionally, there isan item in which every answer option is actually a correct response: a quad-keyed item! Not good—though amazing.

Now, multi-keyed items are usually not the fault of the answer options. More often, the problem is in the stem and stimulus, I think. That is, the question can reasonably interpreted in a number of ways, leading to different correct answer options. This sort ambiguity can also be found in answer options, though I suspect that that is less common. I know of no studies of mid-process multi-keyed items that answers that question definitively.

This might be a good spot to get at a deep problem with this list. It is generally written in the second person, giving orders to item developers. My regular co-author and I far prefer a list that describes the traits of high quality items. That is, let’s all be clear about the goal. Let’s focus on what effective items look like.

Then, we can develop processes and procedures for how to achieve those goals. If we are going to address the actions of item developers, let’s try to provide actually helpful advice. In this case, how might item developers make sure that only one of the answer options is correct? As is, this list pretends to offer advice on what to do, but instead it is usually kinda getting at item quality.

With RTD (Rigorous Test Development), we have approaches and techniques to accomplish this. We have a Pillar Practice that we call Radical Empathy. We have a rigorous procedure that we call Item Alignment Examination built on radical empathy. In short, test developers need to work through items through the perspective of a range of test takers, not just as themselves or as one mythologized typical test taker. RTD likely needs to develop more procedures just for catching multi-keyed items. This is hard work. Item development is incredibly challenging work.

These Haladyna lists simply do not recognize that. That is probably the most offensive thing about them. They lay out seemingly simple rules that barely scratch the surface of what it means to develop an high quality valid item (i.e., one that elicits evidence of the targeted cognition for the range of typical test takers), and because of these lists absolute dominance in the literature, they evangelize the idea that item development is fairly simple.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #18: Write as many plausible distractors as you can

October 18, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Develop as many effective choices as you can, but research suggests three is adequate.

This rule is ridiculous. This is the rule that shows that these authors have no serious experience as item developers. They do not recognize that item developers simply do not have time to develop more (effective) distractors than they have to, and they appear to have no clue as to how difficult it is to write plausible/effective distractors. In fact, item developers should develop extra ideas for distractors, if they are available, because few will turn out to actually be plausible. (Moreover, the technical and contactual requirements of large scale standardized assessment generally sets how many distractor are required.)

That is really the key point. They have no idea what it takes to develop a good distractor. They think that quantity is really a driving issue here.

And yet, they actually undermine the whole first half of the rule with the second half of the rule. If three is adequate, then why develop more, folks? Do they have that little respect for the time of professional content developers? The 2002 article claims that it is primarily aimed at classroom teachers, though also useful for large scale assessment development. Do they have that little respect for teachers’ time? Why waste time on developing even more distractors, especially considering how difficult it is.

The thing is, they acknowledge in their 2002 article that developing additional distractors can be challenging. “The effort of developing that fourth option (the third plausible distractor) is probably not worth it.” So, why do they suggest it? Why do they say, “as many…as you can?” Why do they say, “We support the current guideline.”

In fact, they mention that this is actually a quite well researched question. There are countless studies on the optimal number of distractors. There are countless studies on how effective distractors are (i.e., how many test takers select them). It is a standard part of test development to review how attractive each distractor was in field testing. And they summarize much of this literature by saying, “Overall, the modal number of effective distractors per item was one.” We have a shortage of effective distractors, even as items usually include three or more distractors. Perhaps the reason why so many studies show that two distractors are sufficient is the low quality of the second or third distractor. That is, it’s not a question of how many there are, but rather of how effective they are. Perhaps quality matters than quantity.

Now, how many of the 14 rules that focus on distractors are about how to write effective distractors? How many really focus on how to gather evidence that test takers lack sufficient proficiency with the targeted cognition? Well, not enough. This one focuses on quantity, while merely waving a hand at effectiveness.

(And we can, for now, ignore issues with the literature’s idea of effectiveness of distractors, which seem to have rather little to do with the quality of the evidence they provide or the validity they contribute to items.)

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #17: Use positive, no negatives

October 17, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the stem: Word the stem positively, avoid negatives such as NOT or EXCEPT. If negative words are used, use the word cautiously and always ensure that the word appears capitalized and boldface.

This rule seems pretty good, on the surface. It seems intuitive, but does it actually matter? (And ignore the odd mention of “cautiously” applying after the decision to use such a word has been made, right?)

The 2002 article lays out the evidence, and the evidence does not support their contention. Roughly two-thirds of their sources support the rule, but roughly one-fifth explicitly argue against it. The empirical studies that Haladyna et al. cite do not support this rule. In fact, they cite a study by one of themselves (i.e,. Downing) that found this rule makes no difference to item difficulty or item discrimination.

So, if it does not show up in the item statistics, then why push for this rule? “Under most circumstances, we suggest that a stem should be worded positively.” This lack of reasoning epitomizes the empty center of their whole endeavor. They endorse some received wisdom, but do nothing to explain why. Recall that in 1989, they called their list “a complete and authoritative set of guidelines for writing multiple-choice items”—in the paper’s abstract! While they did not repeat that claim in 2002, nor do they disclaim any of the rules they report from the literature.

So, why avoid negatives? I can think of a reason: stressed and/or hurried test takers might miss that key word (e.g., “not” or “never”) and therefore misunderstand what is being asked of them. This could lead test takers to provide an unsuccessful response, even though they could have provided a successful response if the stem was more clear. (Of course, there is no good reason to include a distractor that is based on test takers missing a negating word in the stem.)

Yes, clarity is essential. Rule 14 (Ensure that the directions in the stem are very clear) is their best rule.

So, if we suppose that that chance of skipping or missing that key negative word is the reason to avoid negative phrasing, is there something that could be done about that? For example, why if such words are bolded and underlined (i.e., something I usually opposed because it can look like garish overkill)? Might that draw sufficient attention to those words to ensure that they are not skipped? And if it would, why avoid negatively worded questions? What reasoning might be left?

It is curious that their 2004 and 2013 books omit mention of the studies cited in the 2002 article that suggest that negative words in stem do not made a difference in how items function. It is almost as though they eventually realized that their argument is so weak that they are better off omitting the whole truth that they know. But that could only be done in bad faith, we know that that cannot be the case. Right?

Last, they acknowledge in their 2002 book that another scholar found that the impact of negative stems varied based upon the type of cognition that was targeted. For the life of me, I cannot figure out why they would mention this without explaining more. It would be useful to know when this advice might actually make a difference. In our own experience (mine and my closest colleagues), we have seen attempts to target cognition that really does call for a negative stem, but the broad acceptance of this rule has made it impossible to get such items through item development processes.

In my view, the stem should be clear and hopefully succinct, but never at the expense of getting at the targeted cognition. If a negative stem does not hurt clarity, succinctness or alignment, I do not see a problem.

So, I would suggest bold and underline those negative words. But don’t use all caps—that’s just too much.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #16: Avoid window dressing

October 16, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the stem: Avoid window dressing (excessive verbiage).

If you haven’t read my analysis of Rule 13 (Minimize the amount of reading in each item), please go back and read that. It applies here. But there is more, following a fantastic meme.

Yes, excessive verbiage is bad. Afterall, that’s what “excessive” means. So, this rule is somewhat tautological. I think that item developers should not make bad items. But that is not helpful advice.

The question is what counts as excessive. At this point, it is not surprising that the 2002 article makes no effort to explain this. Their 2004 book really offers no meaningful explanation for its version, Make the stem as brief as possible. Their 2013 book combines this rule with Rule 13, and does say quite a bit more. But it is not quite helpful.

Their example (2013, p. 98) is, “Which of the following represents the best position the vocational counselor can take in view of the very definite possibility of his being in error in his interpretations and prognoses?” Yes, that is clearly excessively wordy, but it is practically a straw man argument. Has anyone ever suggested that such a stem might be appropriate, or that such a question would be well writing in any circumstance?

Stems should be clear. They should include all the information needed for the test taker to understand what is being asked of them. Extra adverbs, adjectives and degree modifiers should not be included (e.g., “very” and “definite” in the example above). Filler words and phrases that do not contribute meaning or information should not be included. Phrases and words that can be replaced with simpler, more common and shorter equivalents without a loss of meaning should be so replaced. (e.g., replacing “represents the” with “is,” and replacing “in view of” with “given” in the example above).

My usual co-author offers, “Ensure you use as much verbiage as needed to make the task clear, no more and no less.” This emphasizes that clarity is the guiding principle. Of course, it highlights the reality that once one has Rule 14 (Ensure that the directions in the stem are very clear), the verbiage rule really does not add very much, perhaps not anything at all.

If the explanation of this rule included how to recognize excessive verbiage, the rule would not seem tautological. I understand why a simply stated rule might require further explanation to really be understood, but the articles quite often do not do that, and the quite rarely do it well.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #15: Central idea in stem

October 15, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the stem: Include the central idea in the stem instead of the choices.

100% of their 2002 sources mention this idea. 100%. Haladyna et al. present other rules that a minority of their sources even mention, but this shows that it is possible for a rule to be agreed upon at a very high level. Seven of their rules are supported by more than three-quarters of their sources, but only this one is supported by 100%.

Dare I disagree? They have found no empirical support for this rule, but that is their standard, not mine. I think this is a good rule.

Stems should usually be questions, and these questions should be understandable without having to read the answer options. The central idea should be in the stem.

When items have open stems, the central idea should also be in the stem. That is, when an item offers part of a sentence as the stem and requires the test taker to select the response that correctly completes the stem, the central idea should be in the stem, not in the answer options, or at the very least, it should be clear what the item is getting at from the stem, alone. Test takers should not have to review the answer options to understand what the item is asking of them.

When an item offers a fill-in-the-blank stem, that stem should contain enough information for it to be clear what the item is about. Again, the answer options should not be necessary to understand what the item is getting at.

This is particularly a challenge with a multiple fill-in-the-blank item (i.e., such an item with multiple blanks), even when there are drop-down menus embedded in the presentation of the stem. There should not be so many blanks that are taken out of the sentence that it becomes its own puzzle to even understand what the item is getting at.

This is a form of clarity in the question, not just in the directions (Rule 14). I approve.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #14: Clear directions

October 14, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the stem: Ensure that the directions in the stem are very clear.

Yes. Very good rule. No notes.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #13: Minimize reading

October 13, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Style concerns: Minimize the amount of reading in each item.

This rule is mentioned by one-third of their 2002 sources. But this rule also seems rather redundant with Rule 16, (Avoid window dressing (excessive verbiage)(. Obviously, the irony of repeating a rule amount being mindful of unnecessary verbiage is a joke, right?

So, is reading time a problem? Let’s put aside “excessive verbiage,” because that is Rule 16. This is about reading time.

Yes, reading load should be managed. Reading load matters. Whether or not students face formal time limits, students run out of stamina. Reading load matters. But many standardized test items are based upon reading passages that are included in the test. They have to be included because there is no centralized control over the text that students read in this country and because we often want the texts to be new to the test takers (i.e., so they must lean exclusively on their own reading skills to understand them). But these passages are always far shorter in length than we expect these test takers to be able to understand on their own. Certainly, this is true on ELA (English Language Arts) exams, but it is also true on science exams and social studies exams. When mathematics is actually applied in the real world, it is done in embedded problems and situations, not just as bare arithmetic or algebra. These excerpts and passage are already shorter than we expect students to be able to handle, in authentic circumstances.

Minimizing reading time surely does allow for more items and improve reliability, as their 2013 book says. But minimizing the reading time, as the 2002 article suggests, often simply comes at the expense of the targeted cognition. Sure, if the item is just a basic recall item, keep it short. But if it is a test of reading skills or problem-solving skills (i.e., which often call for recognizing extraneous information), minimizing reading time undermines content- and construct-validity. It looks to a rather quantifiable measure about items (or item sets) and declares that that is more important than the actual purpose or goal of the assessment or the item.

To be fair, their 2004 book does say, “As brief as possible without compromising the content and cognitive demand we require.” But their 2013 book says that minimizing reading time can improve both reliability and validity, which shows a rather poor understanding of validity, in my view. Yes, the 2013 book does acknowledge the occasional importance of extraneous information, but it says nothing about the importance of passage length. And let’s be clear: this rule is not aimed at just the stem, just the answer options or just those two. This is about the whole item—which of course includes the stimulus!

Now, if this rules said, Minimize the amount of reading in each item, without compromising the test content, it would be better. It would almost be good, provided of course that we could all rely on everyone taking that caveat seriously. But the 2002 list—the only version that includes all the rules in a handy one-page format—does not say that. And nowhere in that article does it anywhere offer that caveat. None of the other versions offer that critical caveat as part of the rule, though its inclusions would not make this the longest rule. There are at least half a dozen longer rules, including rules made up of multiple sentences.

So, this could be a good rule, but not as it is presented. As presented, it too often suggests undermining validity.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #12: Correct grammar

October 12, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Style concerns: Use correct grammar, punctuation, capitalization, and spelling.

I will try to put aside the common mistake of referring to word choice, punctuation, spelling and other conventions of language as “grammar.” Grammar is about syntax. The folks who say they care so much about “grammar” are exactly the folks who get offended by such misuse of language when others do it. This rule ought to be labeled “Language conventions,” not “correct grammar.” But I will put that aside, because although I was an English teacher, I am not of that sort. (I am just bothered by the hypocrisy of these people who look down their noses at the language use of others.)

So, what about evaluating this rule as a rule? Yeah. It’s a good rule. It is something that we ought to all be able to agree on. Heck, I would make it the first rule. It’s not challenging or controversial. It does not really need to be explained.

Buried in the middle of the list? Meh. Not great. Things that we can all agree on should go at the beginning of the list, with more challenging ideas coming later.

But, with my own view of language use—grounded as it is in what I have learned from actual scholarly linguists (i.e., those who study actual grammar and syntax) and the beauty of literature—there really isn’t a “correct grammar.” Rather, there is a preferred style, usually one of formal English, though usually one that falls short of the formality of academic writing. This rule would be much improved if it spoke of “formal grammar” rather than “correct grammar.”

But how would I know? I’m just an English teacher who studied linguistics in college.

(And yeah, this rule is mentioned by barely half of their 2002 sources. Not really a consensus to endorse.)

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #11: Edit and proof

October 11, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Style concerns: Edit and proof items.

We’re just gonna ignore that they left it at “proof” instead of “proofread,” right? I shouldn’t be that petty, right?

Because this rule is one of the style concerns, we know that this is not about substantive editing. It is not in the content section, nor in the stem or choices sections. However, the 2002 article does not provide any explanation. Their 2004 book revises this rule to “Proofread each item,” but the 2013 book goes back to “Edit and proof items.”

That 2013 book addresses style guides at length, and this is…somewhat important. Consistency, while sometimes the hobgoblin of little minds, is something that will be looked for by many people. Others within testing organizations and/or sponsors, in addition to teachers and other members of the public, can notice and pick on inconsistency in style. So, yes, items should be compliant with the relevant style guides. Such style guides ought to end discussion and debate on how something should be presented or how a word is spelled (e.g., email, e-mail, Email, E-Mail, emails, etc.).

However, when most people think about proofreading, they are thinking about…spelling, grammar, punctuation and word choice (i.e., often formality of register). Yes, items should be edited for style and these concerns. And yes, proof-reading is important. But here’s the thing: this rule is not a description of anything about the items. This is the only rule that is only about test developers actions. And we know that this rule is not about “grammar, punctuation, capitalization and spelling” because that is the next rule.

Note that this rule does not actually say anything about style guides in any version of the Haladyna rules. One of the explanations (2013) mentions the importance of style guides, but even that version does not mention them in the rule itself. This is entirely about process, and not about product. This is about all published writing and is in no way particular to items.

It is probably dumb to include this in a set of item writing rules or guidelines, and it certainly is particularly dumb to include it in this kind of list of rules or guidelines. In fact, only one-third of their 2002 sources even mention it because it probably should go without saying!

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #10: Format items vertically

October 10, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Formatting concerns: Format the item vertically instead of horizontally.

This one is just dumb. The New York Regents exams violate it all the time. Less than half of Haladyna et al.’s 2002 sources even mention this dumb rule, and nearly one quarter of them explicitly disagree.

In our own research, over half of respondents said that this is irrelevant, though the vast majority of the rest agreed that it is a good thing—though at the lowest level of value (i.e., Useful, as opposed to Important or Very Important).

There certainly is no consensus on this, and Haladyna et al. write, “We have no research evidence to argue that horizontal formatting might affect student performance. Nonetheless, we side with the authors who format their items vertically.” This is not a good basis for including a rule on a list that is supposed to be grounded in the consensus of the literature. It makes clear that this list is little more than a collection of their own opinions masquerading as research findings.

And yet, their 2013 book calls this an “important” (p. 95) item writing guideline. Nowhere do they cite any evidence for this, though they hypothesize that vertical formatting may be less confusing specifically for anxious test takers…without a milligram of support for this contention.

Yeah, “important.” Totally.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Fisking the Haladyna Rules #9: Use best item format

October 9, 2023

[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Formatting concerns: Use the question, completion, and best answer versions of the conventional MC, the alternate choice, true-false (TF), multiple true-false (MTF), matching, and the context-dependent item and item set formats, but AVOID the complex MC (Type K) format.

Though Haladyna et al put this in the short formatting concerns section of their list, this rule is not about formatting. This rule is about the very structure of the item. True-False items are quite different than a classic three- or four-option multiple choice item. A matching item question (i.e., matching each of 3-8 choices in column A with the appropriate option(s) in column B) is entirely different than either of the others. This not merely “formatting;” these are all item types.

Content-dependent items and item sets are not merely a matter of formatting, either. They are items linked to a common stimulus or common topic. But they can each be of any item type.

So, this rule says that it is ok to use different item types? Oh, OK. It is ok to have items sets? Oh, OK.

What is this rule really saying? All that it is really saying is do not use complex MC items. Those are the ones that ask a question, list some possible answers and then give a list of combinations of answers to select from. For example,

Which of these rules are actually decent rules?

I. Keep vocabulary simple for the group of students being tested.

II. Avoid trick items.

III. Minimize the amount of reading in each item.

IV. Place choices in logical or numerical order.

a) I only

b) I and III, only

c) II and IV only

d) II, III and IV only

e) I, II, III and IV

Yes, we grew up with this item type. Yes, this item type is needlessly confusing. But the rule should be something like “Replace complex MC (type K) items with multiple true false or multiple select items.” Unfortunately, 80% of the rule is about other things, and the part of their rule that starts to get at this is buried at the end. Moreover, the rule itself does not say what to do about this problem, whereas our offered replacement is direct and helpful.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.