One of my little pet peeves is when athletes say that they to be more consistent at something that that have long been consistently mediocre or bad at. I agree that they need to be better, but I don’t think that the problem has been consistency. Heck, a basketball player going from a 23% 3-points shooter to a 35% 3-point shooter has not even become more consistent, even though that would constitute a rather large improvement.
Words have meanings, and while I love metaphorical language, when words with rather precise meanings are expanded, our ability to express precise things is diminished. I find that frustrating—perhaps because I was raised by a lawyer and perhaps because I was such a math and science kid growing up.
But the fact is, that words can have different meanings in different contexts. This is certainly true when words have technical meanings in expert fields and also have lay meaning for the general public.
Reliability is one of those words, and it is a very important technical word in the field of educational and psychological testing. And yet, it is also a middle school level word that refers to trustworthiness.
In everyday use, a reliable person is someone you can trust to be there and to do the right thing. It is not just consistency, but also usefulness and worthiness.
But the statistical term, as used in many technical fields, merely means consistency. Something can be consistently off by the same amount, and that would be reliable. Statistical reliability is only about consistency, regardless of appropriateness, precision, or actual accuracy. Under this definition, a car that only—but always—starts up when it is over 90 degrees outside is a reliable car. Moreover, it would be more reliable if it only started up when the temperature was over 100 degrees, and most reliable if it never started up at all. After all, that would be perfectly consistent—consistency useless.
This is a particularly important gap in meaning in my field because when psychometricians insist on maximizing reliability, that sounds really good to a lay audience who does not appreciate the difference between the everyday term and the statistical term. Psychometricians want consistency, even if it means consistently the wrong thing or leaving out the most important stuff because it hurts the consistency of the test. They say they are increasing reliability, and they are not lying. Heck, I am sympathetic to their use of the term as it is also what I tend to think the term means, too. That math kid started taking advanced Probability and Statistics courses nearly 40 years ago. That meaning is what I have in my head when basketball players talk about being a more consistent shooter.
I think a lot about important technical problems with how psychometricians’ focus on that statistical idea of reliability leads to worse tests, but perhaps the bigger problem is how their different meaning of reliability misinforms policy-makers and the broader public about what they are even trying to do. The broader public and policy makers are the real audience for our tests, and we should be mindful of how they hear what we are saying.