For more than two decades, school systems in the US and around the world have introduced new accountability and incentive systems for public school educators that rely of the test scores of students as performance signals for educators. There is now a large empirical body of evidence on the effects of these assessment-based accountability and performance pay systems.The Barlevy and Neal pay method is Pay for Percentile:
Existing systems often suffer from two design flaws: They rely on performance targets or standards that are vulnerable to manipulation, and they use exams that can be easily compared to past formats in ways that encourage coaching or teaching to the test. (References removed.)
In Barlevy and Neal (forthcoming), we describe an assessment-based incentive system for educators built around educator performance metrics that are invariant to the scaling of student assessments. These metrics are also relative-performance metrics because they do not express educator performance relative to a statistical target but relative to the performance of other educators that form an appropriate comparison set.Reference:
We call this scheme Pay for Percentile. It is built around a performance metric called the Percentile Performance Index (PPI). The following algorithm describes how one might calculate PPI scores for teams of teachers that work together to teach one class, eg fifth-grade maths, in the same school.
We show that it may be possible to elicit effective teaching by paying teams of educators performance bonuses that are proportional to PPI metrics. Further, our basic result holds in the presence of instructional spillovers, peer effects, and heterogeneity in rates of student learning within classrooms.
- Step One: Consider all students in a large school district or state who are taking the same class, eg fifth-grade maths. Place each student in a comparison set with students who are similar in terms of their expected achievement given their past academic performance, their demographic characteristics, and the characteristics of other students in their school or classroom.
- Step Two: At the end of the year, when the fifth-grade maths assessment results are reported, rank all students in each comparison set based on their end–of-year scores, and assign each student a percentile equal to the fraction of students in her comparison set who performed the same or worse.
- Step Three: Overall the students in a given school who are taking a particular subject, eg fifth-grade maths, form the average of their percentile scores. This average is the PPI score for the team of fifth-grade maths teachers at this school. It reflects how often students in a given course in a given school perform as well or better than comparable students elsewhere.
A large literature in economics explores how properly seeded contests can be used to create incentive systems. Pay for Percentile generalises these results to a setting where workers (educators) produce many different outputs (achievement growth for many students) simultaneously by allocating time among several different tasks (lecturing, tutoring, lesson planning, etc). Because all contests are properly seeded, teachers respond by allocating efficient effort to all tasks that foster achievement for the set of students in a given class.
Pay for Percentile uses seeded competition to create performance metrics. PPI scores implicitly summarise the outcomes of many different simultaneous contests among students, and every contest that one fifth-grade maths team wins is a contest that another team lost. By construction, PPI scores do not tell policymakers how often students in a given school or classroom reached some pre-determined achievement target. Rather, PPI scores tell policymakers how often students in a given class or school outperformed students in other schools that began the year as their academic peers.
Because every contest between matched students in different schools must have one winner and one loser by construction, this approach also eliminates the Lake Wobegon effects that plague many accountability and performance pay systems. Neal (2011) argues that often, in target-based systems that permit the possibility that all educators can be judged satisfactory, almost all educators are deemed satisfactory whether they deserve to be or not.
As we note above, since Pay for Percentile involves assessments that avoid repeated items and predictable formats, these assessments will not provide much information about secular trends in student achievement. However, if policymakers use a separate no-stakes assessment system to measure student achievement, they eliminate incentives for educators to engage in the "teaching to the test" behaviours that often inflate reported achievement trends derived from high-stakes testing systems.
This approach may seem bizarre to many in the education testing and policy community. To many, it seems intuitive that, if educators should be held accountable for what their students learn, education officials should create measures of student achievement and educator performance using a single assessment system. However, the job of placing student assessment results on modern psychometric scales does not need to be and should not be part of the process of building accountability and incentive systems for educators. Whenever policymakers insist that these tasks be intertwined, they are only guaranteeing that education officials will perform both tasks poorly.
- Barlevy, Gadi and Derek Neal (2011), “Pay for Percentile”, American Economic Review, forthcoming.