What is the Point of Teacher Evaluation? Seriously, What is the Purpose?

March 8, 2016

Recently, Rick Hess has written about the pointlessness of new teacher evaluation systems, claiming – in his headline – that they "Don't Make a Difference." But I think Rick might be missing the point. (He is basing this on the research of Matt Kraft and Allison Gilmour, which I have downloaded but not yet read, yet. They might be missing the point, too, but I can't be sure, yet.)

The point. What is the point or the purpose of teacher evaluation? Well, I can think of a few possibilities, but the bottom line of each and everyone one of them would have to be improving outcomes for children. How might teacher evaluation systems do that?

Identify struggling teachers for removal.
Identify struggling teachers for targeted intervention to improve their effectiveness.
Intimidate struggling teachers to remove themselves.
Provide a structure or framework for struggling teachers and those who support them to think about teaching, so that they can better improve their effectiveness.
Provide a structure or framework for all teachers and those who support them to think about teaching, so that they can better improve their effectiveness.

That's basic mechanisms by which a formal teacher evaluation system may improve aggregate teacher effectiveness, and thereby improve outcomes for children.

But I think there are more than that because mechanisms #1 and #2 are ambiguous in who must know the identity of the struggling teachers. If it is just the teachers themselves, then mechanisms #1 and #2 are the equivalent of #3 and #4, but there remain multiple possibilities. Perhaps struggling teachers should be identified to their local supervisors? Perhaps they should be identified to their local peers? Perhaps they should be identified to their district offices? Or to the public, or the the state of feds?

Each of those implies a somewhat different mechanism. Peers might support a struggling teacher in other ways than a supervisor, and systems by which peers decide on the removal of ineffective teachers – usually called Peer Assistance and Review (PAR) – do exist in some places. Certainly, targeted assistance and termination procedures are quite different if they bare based on local supervisors rather than the district office – and likely different again if based on state officials or feds.

So, it's pretty complicated.

But here's where I think Rick (and likely Mark and Allison) is making his big mistake: many of these mechanisms do not require accurate reporting of teacher effectiveness. Many of these mechanisms are not undermined by fudging the public or official recording to the advantage of the struggling teacher.

So long as a teacher, his/her supervisor and/or his/her peers know that this teacher is struggling, the mechanisms based on improving his/her effectiveness can still work.

So long as a teacher knows that s/he is struggling, s/he can still leave. So long as a supervisor knows that a teacher is struggling, s/he can still pressure the teacher to leave. So long as a supervisor knows that an untenured teacher is struggling, s/he can fire that teacher without citing ineffectiveness as a the reason.

Let me say this again: Teacher evaluation programs do not have to accurately record which teachers are struggling/ineffective to improve aggregate teacher performance and/or outcomes for children. They do not.

But what does require accurate recording of teacher ineffectiveness?

State of federal intervention in handling struggling teachers.
Humiliation of struggling teachers by public shaming.

Now, Rick doesn't believe that the feds can effectively intervene in this kind of delicate problem, and his logic there applies to most states, as well. So, where does that leave us? Either, the one of the goals of teacher evaluation systems is the humiliation of teachers (individually or collectively), or Rick is simply wrong that crazy high reporting of effective teachers (95%+) are a sign that the systems are not working.

I think that Rick is simply wrong.

**********************************************

This actually takes us to a common problem with our education policy. For a variety of reasons – some better than others – we want unprecedented amount of transparency in our efforts to improve schools. I don't know of any other field that that calls for the public to know how individual workers are evaluated -- either individually or in the aggregate. Similarly, nor is franchise or branch office performance made public.

Sure, we all know how a sports team did each game, but no one expects the internal ratings of each members' performance to reported to the public. But politicians to not release the performance evaluations of their staffs, neither individually nor in the aggregate. Researchers do not publicly release their evaluations of their students or their teams, neither individually nor in the aggregate. Think tanks do not release evaluations of their members, contributors or staffs.

So, why is it that we need to know how many teachers were deemed effective? It is not because without releasing these numbers publicly that we cannot improve outcomes for students. I am not happy with the only reason I can think of.

Our Electoral Primary Process as a Measurement Problem

February 29, 2016

There is a lot to complain about with regards to our electoral primary process. My wife's favorite complaint -- other than the weird undemocratic nature of caucuses and the unfairness of the same two states going first every cycle -- is that our votes "never matter." We've never lived an an early primary state, and rarely even voted as early as Super Tuesday..

Does this mean that our votes have not mattered? Does it mean that our votes have mattered less than Super Tuesday primary voters? Well, if we think of the primary process as a measurement problem, I think that the answer is, "No." In this post, Let me lay out what that means. Next time, I will explore this view for lessons about measurement in education.

The Construct Being Measured

The goal of the primary process is to select a candidate for the party's nomination for the presidency. The goal is not to find out who is the third most popular candidate. The goal is not to figure out who has the best chance of winning in the general election -- though perhaps it should be. The goal to figure out who the party's supporters (i.e., votes) want to be the party's nominee.

Challenges Measurement

Every (interesting) measurement problem poses it's own set of challenges. I see three major challenges with this one.

1) Voters and potential voters may lack information about the candidates.

2) Candidates may lack the resources they need to inform voters

3) Voters and potential voters are not distributed homogeneously around the country.

The Key Assumption

There remains a key question that we must answer, because what we assume to be the answer will inform our solution and how satisfied we are with it.

Do the primaries reveal a relativity stable preference of the group, or do the primaries themselves shape and influence the development of a shifting preference?

Nate Silver's original work in 2008 on the Democratic primaries was based on a single insight -- one that I and others noticed as well. While Clinton and Obama's overall share of the votes varied considerably from state to state, their support within demographic groups was remarkably stable from state to state. Thus, given a just a small set of results, he was able to extrapolate future results quite accurately, just based on the demographic profiles of the states.

This strongly suggests that the underlying trait (i.e., the preference of the group) is stable, and the fluctuations are essentially due to differences in the composition of each sample (i.e., the demographics of each state).

What About the Narrative?

This stable underlying preference goes against how we have long thought about the primary process. We have believed that there is a narrative there, with earlier results having a causal relationships to later results. A candidate's early wins or losses lead to -- result in -- later wins and losses. That candidates rise and fall because the race is changing.

But I do not think this is true. I think that the data suggests otherwise. Instead, I think that this a measurement problem. We have some limited tools to access the underlying trait, so we have adopted a system to address those weaknesses.

How the Primary Process Addresses the Challenges

The third challenge -- the heterogeneity in the distribution of voters -- is the easiest to address: we take multiple measurements. We have primaries (or caucuses) in every state. This gives us multiple readings,

Our primary process also addresses the first two challenges. We begin with a small handful of small states. We give candidates enough time to inform voters in these states, without requiring them to raise massive sums of money. We give these voters time to learn about all of these candidates, and a clear deadline for when they each need to decide. The first two challenges are easy to address in smaller population states and a lot of time.

This year, we learned in the early states that Martin O'Malley just was not going to do well. That is, with just a couple of measurements, we saw enough to know that he didn't have it. We saw the same with quite a few GOP candidates, and learned where our questions really lied.

This of this as being an adaptive test -- the kind of test that adjusts which questions you get next based on your performance with earlier questions, narrowing down on where limits of your knowledge, skills and abilities.

Voters in later contests have fewer candidates to choose from. The candidates have fewer rivals with with to compete for media attention, in raising money and for voters' attention. Challenges #1 & 2 are not as great with a smaller field.

But wait...!

But does the order of the primaries really not matter?

Let me ask you this? In hindsight, do you think that any of the candidates who have not made it to Super Tuesday could have won with a different ordering of the primaries?

Well, putting a candidate's home state earlier would result in a better result, but would that change anything? Wouldn't any good showing just be chalked up to home-court advantage? How well would a candidate have to do in his/her home state to overcome that? Right now, people are saying that if the Marco Rubio and Ted Cruze do not win their home states, they are just done. The bare minimum for them has to be a win there. The stakes are higher for them there, rather than presenting an easy victories. Victories there will not convince anyone of anything.

What about a demographically different state? Well, if O'Malley couldn't get 15% in either Iowa or New Hampshire, does anyone really think he could actually win elsewhere?

In hindsight, can anyone name a single candidate (in any year) who could really have won their party's nomination, but for the ordering of the primaries?

Identifying and Answering the Real Question

The early primaries (and caucuses) narrow the question, winnowing the two fields. We learn whether the race is close, and the process allows voters to focus on the truly viable candidates.

Do the Democrats want Bernie Sanders or Hillary Clinton? Do the Republicans want Donald Trump, Marco Rubio or Ted Cruz? With just four measurements, we've already been able to focus much more tightly than before.

Super Tuesday will be about those those questions. And if we do not get definitive answers, we will have more measurements (with more tightly focused questions) to learn more. When we eventually do get definitive answers, we play out the string, let the eventual nominee rack up wins, and build up a delegate majority.

But, but, but...

No, many voters in later states will not get to vote for their absolutely favorite candidate while thinking that s/he has a chance to win the nomination. On the other hand, every one of them will get a chance to vote for the eventual winner, perhaps to vote for the last rival to that winner, or at least to signal their disapproval of the eventual winner. If the race stays close, they will be able to decide if they want to select between the last viable candidates, or to cast a more symbolic vote -- perhaps through write-in -- for their old favorite.

You see, the system is not designed to give voters a chance to vote for whomever they want -- though they can write-in anyone they want. Rather, it is designed to select one nominee who best reflects the preferences of the party. An an exercise in measurement, it actually works pretty well.

Next time, I will explore how these ideas might be applied in educational measurement.

Coming in 2016

January 4, 2016

The AleDev blog, More Thoughtful, will launch later in 2016.

Complex Variety: Assessment Development, Education and Occasional Other Topics