Recent research has shown that it may be more difficult to observe the greatest teaching outlined in observation rubrics in schools and classrooms with lower levels of achievement and the symptoms that often accompany low achievement like high rates of poverty, more behavioral issues and more transient populations. That makes sense. However, I completely disagree with the policy conclusion of the need to adjust observation scores.
Yes, it is likely that there is some systemic bias within teacher observation practices. The possibility also exists that teachers in low performing schools have a more difficult job. This makes sense although there are plenty of examples where this doesn't play out in practice, largely due to the variance of individual observers or even the cultural expectations of an entire district. Score adjustments are not the solution.
Introducing a score adjustment explicitly lowers the expectations for low-income, minority children. The adjustment described is not actually based on prior testing history, it is based on demographic information which correlates with achievement but does not determine achievement. This is the opposite of what we aspire to as an education system, which is to achieve a system where who your parents are, which zip code you were born in and the the color of your skin do not determine how well you perform and how you are treated. What is being communicated is that black and brown children can't have classrooms like white children. If we not only believe that but systematize that belief into how we observe classrooms there is no reason to believe teachers and students will do anything other than meet those lower expectations, returning us to where we are already, students meeting and exceeding expectations only to learn they are woefully unprepared to compete.
There are biases in all measures. Knowing what those biases are, or are likely to be, is an important part of understanding data. If we aspire to measure teachers perfectly then measurement will continue to improve but if we hold an expectation requiring perfect, unbiased measurement then we will be paralyzed into measuring nothing and making no decisions. The proper comparison for any measure is not perfection, where any measure will come up short, but the measure that came before. In that respect, value-added metrics are head and shoulders above the degrees and experience of the past. Rigorous observation is in a different universe than the non-differentiating exercise in paperwork of a decade ago. This is called progress and it is progress not just for students but for teachers who finally have access information about their performance that doesn't rely entirely on something they can't change (their age), something that is both costly and does not have a track record of helping them improve (degrees) and pure politics (whatever we called human capital decision making when there was very little information to use to make decisions).
If we do believe that it is harder to teach effectively in schools where it has traditionally been considered more difficult to teach then a better policy conclusion is to pay the teachers in those schools more than the teachers in other schools. Those schools need better teachers to break even. The teachers in those schools are likely performing better than they appear. And it starts to equalize the incentives in terms of recruitment.
If we do believe there is bias in the observers and it has nothing to do with how difficult it is to teach effectively then we need to do a better job of training and supporting observers. Rubrics are flexible and good judgment is necessary. There are many ways to arrive at an effective lesson where students learn what they're supposed to learn. In my experience many observers focus so heavily on the teacher action and so heavily on the easily observable that they sometimes miss the outcome, the learning, that should drive scoring on an effective rubric. Over time this will likely improve as rising leaders, who have more experience with observation and are more likely to have taught while engaging with teacher evaluation, take over leadership roles with a deeper understanding of instruction and how to interpret rubrics.
And if we believe that observation needs to provide a more systemic leveling of the playing field that protects teachers from all the quirks and biases we can think of, well that has been invented already and it is called value-added. Value-added is the answer to fairness. Teachers with students all over the achievement map do well on value-added metrics. By using students' full testing history and multiple years of data the metrics are stable. By taking into account error they address different class sizes and contexts. By applying the same process to all assessments and all teachers it takes out the variance of the individual observer. If we want the closest thing to perfection that we have, it is value-added and we already have it. If we want observation to serve in a similar capacity as value-added then we need to layer on a similar amount of protective complexity and invest even more heavily in norming and training for observers. That means being okay with a major increase in resources, both time and money, and also being okay with a level of complexity that is difficult to communicate and undermines buy-in.
Overall this feels like missing the forest for the trees. Yes, almost undoubtedly observations are covered in bias, just as all measures are. But what we currently have is a major step forward and likely even stronger qualitative feedback systems than what most of us have ever experienced in non-teaching jobs. The idea isn't to provide perfect fairness to teachers, although we strive for that, or even to help teachers improve, although that is central to the whole strategy. The idea is to help all students learn. Lowering expectations for the students who need us the most is not a way to do that.