Unidimensionality and Fairness Dimensions

Unidimensionality is a simplifying assumption, giving non-expert something that they think they can interpret—regardless of the fact that this kind of simplfication likely will baffle real experts as being utterly uninterpretable. Its impact on fairness is quite similar.

If a test is unidimensional, then items that do not measure what are the other items measure are bad items and should be excluded. This is the basis for simply differential item functioning (DIF). If some items are working different than other items for some defined subgroup of test takers. 

But if the construct a test is supposed to be measuring is not truly unidimensional, DIF is not going to work. In that situation, it is resting on false assumptions. The very fact that DIF works at all to flag problematic items is simply a product of the fact that the demands of unidimensional models are put ahead of the demands of content and the construct definition. 

Therefore, one problem with depending on unidimensional psychometric models is how it allows so many people to think that DIF is the most important tool to catch fairness issues in items (and therefore tests). It distorts the construct and thereby alters potential meanings of fairness. Of course, DIF analysis is limited otherwise to examining only dimensions of diversity that are tracked for all test takers. 

In fact, test takers’ success with individual items and tests is the product of many many dimensions, qualities and traits. These interact in a variety of ways. For example, Kristen Huff just told me a story about her own childhood experience to substantiate an untracked dimension that Marjorie and I think about a lot. We think that urbanicity is a big deal, and it is something different than simply geographic region. Kristen said that she had no experience with city blocks, growing up. Something appeared on a test or something, and she could only make sense of it because she watched Sesame Street.



In fact, this authority of unidimensional psychometric models leads to attenuation of any signal that tests could measure, focusing them on some muddled middle of compensatory KSAs—many from outside the domain model—that might not be eventually distributed across all subgroups in a testing a population. Thus, lower scoring members of one subgroup might have some of those compensatory KSAs in larger degrees than others. And frankly, the unexamined assumptions made by content developers about additional KSAs likely are a product of their own background and experiences. The unwittingly give test takers with backgrounds and experiences more similar to their own an advantage. 

While this is not directly a product of the insistence on unidimensionality, it might follow inevitably in a test development workflow that is so dependent upon that assumption. Appropriate examination of the many dimensions within the content and across the test taking population is a sort of habit of mind—a professional habit. But not one encouraged by psychometrics’ appreciation of the robustness and mathematical elegance of item response theory. 

Insisting test developer think more carefully about dimension, putting in the time and effort to recognize the complexity of test takers cognitive paths in response to items, is an important part of Rigorous Test Development Practice. We apply such tools as radical empathy to infuse considerations of fairness concerns through the content development process, because the psychometric desire for simplifying unidimensionality is only going to shift people away from respect for real variety of dimensions of diversity among the test taking population. During content development, we consider so many different dimensions of diversity, as might be germane to the content, the the items and the test population, rather than trying to narrow it down to a generic list of tracked test taker traits.