RTD is inspired by ECD, and we love the idea of thinking about item and test results as evidence—and so much that it implies. And yet, we do not love ECD’s structure of evidence statements (i.e., descriptions of what evidence of the targeted cognition might look like.)

The biggest problem that we see in ECD is that it calls for evidence, but does not offer any theory of evidence. Hence, RTD had to develop its own Small Theory of Evidence, the quality of evidence produced by an item or test is inversely proportional to its ability to produce or support Type I and/or Type II errors. That is, assessments and their items should not support false positive inferences or false negative inferences.

Unfortunately, evidence statements—often inspired by ECD—do not account for the quality of the evidence they describe. Yes, such traits or qualities in test takers’ work products could be evidence of proficiency with the targeted cognition, but is it actually strong evidence in this case? Or, in this case, is it instead evidence of some other cognition. For example, is it instead evidence that the test taker recognized that they could plug the answer options back into the equation to see which one worked (i.e., back solving) instead of evidence that they solved the equation using the targeted cognition?

Evidence is often merely suggestive, instead of being proof in itself. Evidence is often ambiguous, and for that can be useful—to a limited degree. Evidence is rarely proof, instead it really needs corroboration to disentangle the ambiguities it suggests. This is the continuum of evidence quality.

However, evidence statements do not acknowledge this ambiguity and are often confused with descriptions of proof of proficiency with the targeted cognition. Then, they understandably supplant the targeted cognition as assessment targets. Once that happens, Campbell’s Law kicks in. The evidence statement proxy replaces the underlying construct, and item developers target the proxy in whatever most convenient and efficient way they can.

Efficiently targeting a proxy can improve reliability, but it comes at the expense of validity because the most efficient route to a proxy can be one that does not go through the actual construct. That is, the efficiency requirements of larger scale standardized tests hone that efficiency in addressing the wrong target, seriously degrading the validity of the inferences and decisions made based upon such an assessment.

Evidence statements can help to identify potential evidence in a large volume of test taker work product, but that process then requires some other construct or procedure to evaluate that potential evidence for its actual quality. Alas, ECD does not offer that second structure, and test developers’ drive for efficiency can ride evidence statements to rather questionable levels of validity. Retrofitting the evidence statement structure to address this problem (i.e., what we call robust evidence statements) is cumbersome—likely beyond any practical use.

Thus, if evidence statements enable increasing reliability at the expense of validity, test developers need a structure that focuses on validity—on producing evidence of the targeted cognition. This is where RTD item logic comes in.

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman