On the closely monitored retest, the classrooms that the algorithm identified as likely cases of cheating had score declines of more than a full grade equivalent.
In the last decade, the states and the federal government have begun using student scores on assessment tests to evaluate public school performance. For teachers and administrators, the stakes are high. In California, teachers in schools with large increases in test scores may be eligible for merit pay increases of as much as $25,000. In other states, years of abysmal test results have resulted in entire school staffs being required to reapply for their jobs. Such high stakes give teachers and other school officials a growing incentive to cheat on school accountability tests.
In Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating (NBER Working Paper No. 9413) and Catching Cheating Teachers: The Results of an Unusual Experiment in Implementing Theory (NBER Working Paper No. 9414), co-authors Brian Jacob and Steven Levitt use Iowa Test scores from 3rd through 7th grade students in the Chicago public schools to develop and test a statistical technique for identifying likely cases of teachers or administrators who cheat by systematically altering student test forms. Their results suggest that such cheating occurred in 3-5 percent of the elementary classrooms in their sample, and that relatively small changes in incentives affect the amount of cheating. The authors conclude that school accountability programs based on testing would be well advised to institute safeguards against teacher cheating.
The authors' cheating algorithm relies on the fact that students in cheating classrooms will likely "experience unusually large test score gains in the year of the cheating, followed by unusually small gains or even declines in the following year." Just as important, answers within a cheating classroom will also display unusual patterns, such as identical blocks of answers for many students, or cases in which students answer difficult questions correctly but get easier ones wrong.
In Spring 2002, the Chicago Public Schools invited Jacob and Levitt to identify classrooms suspected of cheating so that they could be included in its regular quality control retest program. The 117 classrooms chosen for retesting fell into three groups. The first, and largest, consisted of classrooms that the algorithm identified as having unusual test score gains and highly suspicious answer patterns. The second, the "good teachers," had large test score gains but normal answer patterns. The third was a randomly chosen control group.
On the closely monitored retest, the classrooms that the algorithm identified as likely cases of cheating had score declines of more than a full grade equivalent. In reading, the good teacher classes actually registered small increases in the retest, and the randomly selected group had a slight decline. Chicago Public Schools is investigating the 29 cases with suspicious answer patterns and the greatest test-retest score declines. The authors caution that their method catches only the most obvious of the many ways to cheat on high stakes tests, and they urge careful consideration of the tradeoffs between the "real benefits of high-stakes testing and the real costs associated with behavioral distortions aimed at artificially gaming the standard."
-- Linda Gorman