[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Content: Use novel material to test higher level learning. Paraphrase textbook language or language used during instruction when used in a test item to avoid testing for simply recall.

There is something in this rule that I like, but it really falls off. If there is a need for a second sentence, it certainly should not be that one.

Novelty in items is incredibly important, but it should not be limited to testing “higher level learning.” Honestly, what is higher level learning? What are they talking about? Later in the article they say problem solving and critical thinking, but that leaves us wondering why they didn’t just say that in the rule.

Also in the text, they write of this particular rule as an example of a rule without an empirical basis, but that that they advocate anyway. That might be fine, but check out their reasoning.

We might argue that many of these unresearched guidelines were common sense that did not justify research. For example, consider the guideline: “Use novel material to test higher level learning.” Most educators place a great value on teaching and testing higher level thinking. Thus, this item-writing guideline may be a consensus value among testing specialists without the need for research.

They offer that because educators agree on the goal, this is the right method. They do not even try to explain why or how novelty is so important for “higher level learning.” As an English teacher, I would read that lack of explanation as a lack of investment in thinking, which is so often the case with this list.

In fact, novelty is important for a wide range of cognition, and the issue is not solved simply by paraphrasing. If the example in an item is the same as one used in a teacher’s lecture or a some class activity, the test taker might simply recall the answer given to them by their teacher—rather than generate it themself as the test assumes they would. If it a reading passage is taken from a work read for class, they might recall their teacher’s (or fellow students’) explanation or analysis, rather than generate their own. For example, if you want to test whether they know about the dynamics of the Romeo & Juliet balcony scene, by all means use that excerpt. But if you want to know whether they can read Shakespeare’s language and understand it, you need to present something that test takers have not already had explained to them. Note that this is not simply about critical thinking, as it could simply be about understanding the plain language.

In fact, items need to be sufficiently novel such that test takers cannot simply rely on their memory of how that example was already explained to them, but not so novel that it requires significant new learning in order to make sense of. That can be a careful balance, and it is made all the more difficult because our formal curriculum can vary so from district to district, and even where the same formal curriculum exists, the enacted curriculum and lesson plans can vary enormously. It take real knowledge of how content is generally taught to find the appropriate level of novelty.

Taking derivatives (i.e., differential calculus) likely counts as problem-solving in Haladyna et al’s view. But the simple items should not ask about x-squared (2^2). That simply is not novel enough. Asking about x8 or x197 is no more difficulty for someone who understands, and yet is not going to simply recalled. However, I think that such a task does not rise to the level of critical thinking or problem solving. It is clearly what Webb’s Depth of Knowledge (DOK) and our own revised Depth of Knowledge (rDOK) would classify at the lowest level of cognitive complexity. And yet, that same need for novelty is just as necessary.

Yes, novelty is a very important idea in item development. But no, this rule does not get at it accurately. It is affirmatively damaging because it suggests that novelty is not important outside of “higher level learning” and that paraphrasing is a sufficient mitigation.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman