Children Left Behind
Some of you might be tired of hearing about mortgage markets, financial crises, how the Fed should react, etc., so here's something different:
New research suggests that the US incentive system built around test scores almost guarantees that academically disadvantaged children do not benefit and may actually be harmed. Policy-makers should take incentive design issues more seriously or follow the lead of many private sector firms and look for other ways to monitor and motivate teachers.
Here's the write-up of the research:
Many U.S. Children are Left Behind by Design, by Derek Neal and Diane Whitmore Schanzenbach, Vox EU: Roughly two decades ago, education policy makers in the United States began to rely more heavily on standardized test scores as performance metrics for teachers and schools. During the late 1980s and through the 1990s, many states adopted test-based accountability systems that spelled out rewards and sanctions for teachers and principals as a function of the performance of their students on standardized tests, and when the federal government adopted the No Child Left Behind Act (NCLB) of 2001, test-based accountability became a nation-wide policy. The proponents of this development cite the need to bring "business practices" into public schools and the need to make schools "data driven" in ways that mirror the practices of private-sector companies.
It is ironic that, during this same period, economists who study the design of jobs and the structure of incentives within private firms began to take more seriously the task of explaining why firms rarely attach performance pay to objective measures of output. Most incentive pay takes the form of raises, promotions, or bonuses related to subjective evaluations of a broad range of qualitative and quantitative information, and Holmstrom and Milgrom (1991) argue that, in many instances, firms find it optimal to pay workers a fixed base wage and monitor their allocation of effort among tasks even if the firm has access to "good" performance measures in a statistical sense. Their key insight is that jobs may involve multiple tasks, and as a result, incentive pay based on any given performance measure can easily lead to undesirable distortions in the amount of effort allocated to various tasks, even if the performance measure is highly correlated with total output. They discuss "teaching to the test" in response to test-based accountability systems as an example of such a distortion, and in recent years, a significant literature has explored the extent to which test-based accountability systems actually create increases in subject mastery or only increases in measured performance on a specific type of exam.
In recent work,[1] we explore a different effect of test-based accountability systems on the allocation of teacher effort, and we find evidence consistent with the hypothesis that test-based accountability systems not only shape decisions of teachers concerning what to teach but also whom to teach. We show that even though advocates of NCLB offered it as a remedy for disadvantaged children who receive poor service from their public schools, the design of NCLB almost guarantees that the most academically disadvantaged children will not benefit from its implementation and may actually be harmed.
Under NCLB, schools are judged against a standard called Adequate Yearly Progress (AYP). NCLB requires all students in certaingrade levels to take standardized tests, and each state education agency selects or constructs its own tests and defines the scores required for proficiency in various subjects. The law requires states to set yearly targets concerning the fraction of students in a school that must be proficient, and these AYP targets must increase at regular intervals, ultimately reaching 100% by the year 2014. The current AYP targets for proficiency rates are far below 100% in most states, and the 2014 goal of universal proficiency appears to be one of the least credible parts of the legislation in terms of the expectations of teachers and principals. Thus, the most expedient strategy for many schools is to devote extra attention to students whose recent test results place them near the proficiency standards in their state.
NCLB provides almost no incentive to devote extra attention to students who are far below grade level. Existing work on educational production functions suggests that it is somewhere between extremely costly and impossible to bring a child up several grade levels in a short period of time, and educators know that many of their students will soon be in another school, either because of geographic mobility or the natural progressions from elementary school to middle school or middle school to high school. Further, NCLB provides weak incentives to devote extra resources to the most advanced students because they are going to be proficient regardless.
To measure the impact of NCLB on student test scores, we compare the achievement of fifth graders in Chicago who took exams following NCLB with the achievement of comparable fifth graders who took exams just prior to NCLB. For both cohorts, we are able to condition on a set of third grades scores from tests administered prior to NCLB. We find noteworthy increases in reading and math scores among students in the middle of the achievement distribution, but we also find that the least academically advantaged students in Chicago scored the same or worse following NCLB than one would have expected given the pre-NCLB relationships between third and fifth grade scores. We find weak evidence of systematic gains from accountability among high achievers. While our empirical results provide strong circumstantial evidence that teachers respond to proficiency count systems by shifting attention to students near the proficiency standard, a growing ethnographic literature provides more direct evidence. Teachers report that school principals explicitly tell them to concentrate their efforts on the so-called “bubble” students near the proficiency threshold, and some schools use rather sophisticated software to identify groups of students whose proficiency status is most likely to improve given extra instruction.
Our results highlight the breadth of concerns that education policy makers must confront when designing accountability systems built around test scores. Accountability proponents responded to previous concerns about schools tailoring instruction to specific assessments by arguing that policy makers simply need to develop tests that are "worth teaching to," but our results show that, even with perfect assessments, the task of mapping a matrix of scores for a set of students into an overall performance index for the teachers is a daunting design challenge. The current NCLB system shifts effort to students near their state's proficiency level for rather transparent reasons. However, it is difficult to imagine how one could design a set of exams, scales measuring exam performance, and rules for aggregating scores into a performance index that would not provide incentives ex post to allocate relatively more attention to some types of students than others.
In addition, test-based accountability systems are likely to shape the equilibrium assignment of teachers to students. Because states make AYP calculations based on the level of student performance rather than improvements in student performance, teachers in schools that educate primarily advantaged students in homogeneous communities often face relatively little pressure under NCLB while schools that educate large numbers of academically disadvantaged students face the constant threat of failure even if their students are performing well given their baseline skill levels at school entry. There is only suggestive evidence at this point, but it seems quite likely that this feature of NCLB adversely affects the willingness of teachers to teach in disadvantaged schools. NCLB includes the placement of highly qualified teachers in every classroom as an explicit objective, and the AYP system does create pressure to hire and retain teachers based on performance. Nonetheless, AYP's reliance on levels of student performance and not improvements in student performance reduces the number of quality teachers who are willing to work in disadvantaged schools.
Recently, the federal Department of Education allowed a few states to calculate AYP based, in part, on growth in student achievement during a school year rather than levels only. This is a step in the right direction, but without careful design work, these systems may simply create a different set of unintended effort distortions among teachers. Knowledge does not come on a natural scale, and given any particular scale, gains of a given size may be easier to achieve at some points on the scale than others. Thus, something as apparently pedestrian as the scaling of exams could have significant and unintended consequences for the allocation of teacher effort to different types of students if states do not carefully design value-added versions of AYP.
Because teachers are charged with fostering knowledge, character, and other things that are hard to measure, it is not obvious that incentive systems built around objective performance measures are even desirable strategies for monitoring teachers. Test-based accountability systems have nonetheless enjoyed strong support because school principals and others who monitor the performance of teachers in public schools are seen as agents of large bureaucracies that, especially in cities, have a long record of disappointing results. Nonetheless, our empirical results and the insights gained from research on the economics of organizations suggest that policy makers must tackle difficult design questions in order to construct accountability systems that deliver quality instruction for all students regardless of their aptitude and prior achievement. Policy makers should either take these design issues more seriously or follow the lead of many private sector firms and look for other ways to monitor and motivate teachers.
[1] “Left Behind by Design: Proficiency Counts and Test-based Accountability” NBER Working Paper No. 13293. The discussion of these findings is organized using results in Holmstrom and Milgrom, "Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership and Job Design," Journal of Law, Economics and Organization, vol. 7, January 1991, pp. 24-52.
Posted by Mark Thoma on Tuesday, August 28, 2007 at 04:32 PM in Economics, Policy | Permalink | TrackBack (0) | Comments (20)

Another case of a 'moral hazard' that has been well understood for years and largely ignored by policymakers because it didn't suit the version of the world they desired. Standardized tests were never appropriate instruments for measuring teacher or school performance because (1) their curriculum is unpublished and not known in detail by either teachers or students, (2) they are ranking tools and must exclude items that everyone knows and should know in order to improve score dispersion, and (3) they strongly encourage teachers to emphasize beating the test rather than core subject matter.
I always liked the scouting model of performance assessment myself: Students accumulate 'merit badges' w/ fully known performance criteria and, when enough have been accumulated in various preidentified core categories, they advance in rank; no grade level or calendar driven instruction. That's what standards-based education was supposed to be before it got mucked up by crypto-jocks who couldn't understand anything unless it was a contest w/ identifiable winners and losers.
Posted by: RW | Link to comment | Aug 28, 2007 at 05:35 PM
Sound business practice to desert a such large number? Yet we do and the price is high indeed. Same with nutrition; we see the malnourished child grow into weakened the adult requiring more health care throughout their lifetime.
Posted by: ken melvin | Link to comment | Aug 28, 2007 at 05:48 PM
I grew up in an area with heavy use of racial tracking. There was much racism in the school system. The focus on testing has been helpful with this problem: it has spotlighted the achievement issues among minority children and focused attention and resources on that.
In a better world, metrics could be discarded. In this one, it serves to focus attention where it needs to be focused.
Testing is very much in the interests of minority inner city youth and NOT in the interest of their school districts. It is not particularly helpful in good school districts or among advantaged children. Given whom it helps, it seems likely to me it will fall by the wayside, and that would be unfortunate.
Posted by: megan adams | Link to comment | Aug 28, 2007 at 06:05 PM
megan: Very important point. I may view the use of standardized tests as invalid in the strict sense since they do not, indeed can not by their very purpose and design, measure what school accountability proponents wish them to measure but the fact that they may be part of a regime that forces school systems to pay attention to those traditionally discriminated against identifies a policy baby that should not be thrown out with the bathwater. That ties into the point I believe ken melvin was making as well I think: What society can we build if it includes mechanisms that exclude the least of us; if we truly value the individual then an individual must matter ...or nothing does.
Posted by: RW | Link to comment | Aug 28, 2007 at 06:30 PM
Alfie Kohn and Frank Smith have much to say about education and they need to be listened to.
Kohn would be especially interesting to economist/business types as he uses a lot of research from the business arena.
For example, while most would agree that punishment is not a great motivator, would we also realize that praise can be detrimental as well? Or that giving rewards reduces our desire for doing a good job?
Kohn
Frank Smith is a great one to read for understanding the testing frenzy. There was a time before standardized testing, incredibly enough. Thomas Jefferson seemed to be able to reach a high level of erudition without ever filling in a bubble sheet.
Why not also take a look at that great Humanitarian Father of Modern Education, John Amos Comenius. (who was asked to be Harvard's first president)
Comenius
Posted by: Elvis | Link to comment | Aug 28, 2007 at 06:57 PM
RW:
"I always liked the scouting model of performance assessment ..."
Same here! Thanks for making this point. I also like the self-paced aspect of it. Putting children in narrow age cohorts is very limiting and probably aggravates anti-social tendencies in adolescence particularly.
John Taylor Gatto is very thought provoking on the topic of the educational system generally:
http://www.amazon.com/Different-Kind-Teacher-American-Schooling/dp/1893163210
Posted by: STS | Link to comment | Aug 28, 2007 at 08:30 PM
Learn how to read,
Teach how to read,
Reflect on what you have read,
Rince and repeat.
We do not need tests to make intelligent people, only to learn how to control them.
Posted by: Trainwreck | Link to comment | Aug 28, 2007 at 09:07 PM
I have to respectfully disagree with Megan Adams, though I am sympathetic with her point that minority children were often ignored. In my educational experience testing serves to stigmatize the disadvantaged. The resources that follow are minimal. This, in my experience, is the main problem with No Child Left Behind--plenty of penalties, few additional resources. Instead, what happens with low test scores, especially in "mixed" districts, is that the middle class and whites tend to withdraw into "charter" schools. (These represent a well-intentioned reform that has, in the part of Southern California I live, tended to segregate the schools even more.) Given my personal experience, which includes my work as a teacher and the experience of my children, I would like to ask Megan Adams, again, with all due respect, if she is confident of a well-documented, consistent pattern of low test scores focusing good, reforming attention--especially including adequate additional resources--on minority school children.
Posted by: jac | Link to comment | Aug 28, 2007 at 09:18 PM
RW:
"I always liked the scouting model of performance assessment ..."
Same here! Thanks for making this point. I also like the self-paced aspect of it. Putting children in narrow age cohorts is very limiting and probably aggravates anti-social tendencies in adolescence particularly.
John Taylor Gatto is very thought provoking on the topic of the educational system generally:
http://www.amazon.com/Different-Kind-Teacher-American-Schooling/dp/1893163210
Posted by: STS | Link to comment | Aug 28, 2007 at 09:43 PM
I can understand the complaints on 'reading' or 'english' based standardized tests, as the questions and paragraphs to read are placed within a context of while, upper class culture. That much was apparent in the 1980s, and, some of it has been replaced with purported more neutral 'scientrific' and/or 'anthropological' narratives. A move towards culture neutrality, perhaps.
But, the math sections? The logic sections? Are we going to argue that geometry or charts and graphs are culturally insensitive?
The argument that certain children are disadvantaged because they do poorly on these exams misses the point of standardized exams. We, as a society, must identify exceptional students, to move them along, and get them away from more remedial students who hold them back. As a society, we need thought leaders and it is a strategic advantage to cultivate them early.
Remedial students, unfortunately, take up too much energy for the teaching profession, and, are bad investments for the society financing this education. Instead of lamenting 'no child left behind', or any other 'program' which seeks to better, or standardize, or identify educational deficiencies, why not locate the exact root of the real problem...the parents who raise these children.
Posted by: Icarus | Link to comment | Aug 28, 2007 at 11:05 PM
RW - re merit badges.
With you on that one (Ivan Illich rises from the grave). Emphasize the positive - what kids CAN do. But unfortunately the purpose of schools these days is to rank kids not to teach them.
It is one of pet peeves about management science. You can't manage what you can't measure, so we manage what we can measure. And forget about what we were really trying to acchieve. That is happening all over the place.
Posted by: reason | Link to comment | Aug 29, 2007 at 12:21 AM
Although it has a sort of off color tone, Icarus' comment does raise a good point.
I moved around a lot as a kid and attended schools in some dramatically different socio-economic levels, and I can't say I ever saw a school that wasn't completely representative of the community surrounding it. There were, however, some obvious differences in what schools could offer kids in terms of extracurriculars, which teach the sorts of things that are entirely ignored in standardized tests.
But as flawed as NCLB is, could there be a successful alternative that could achieve the same goal? Is it even within the ability of a free society to ensure that all children receive equivalent education? Can the government do any more than to make sure that the opportunity is there for those that want it? It seems like NCLB is trying to give out handicaps when what's really needed is a level playing field.
Posted by: Dan | Link to comment | Aug 29, 2007 at 01:01 AM
Reason...
Exactly what are we trying to achieve?
Rankings and measurement are quite critical to identifying excellence. How would you suggest we do it otherwise? Shall we all sit in a circle, feel good about ourselves and our efforts, and just vibe together until we agree on who's the "a" student?
Posted by: Icarus | Link to comment | Aug 29, 2007 at 01:03 AM
Rankings and measurement are quite critical to identifying excellence.
Is "identifying excellence" the main purpose of school? I would think instilling and cultivating a desire for excellence would be important.
As for "how to do it otherwise", you might start looking at the Big Picture Schools: a charter school that stared in Rhode Island and has now expanded. They seem to be doing a great job without any tests--or a standard curriculum!
I also assume most homeschooling parents aren't about to start ranking their children to find out which is more excellent.
Competitive testing gives the impression that students need to be coaxed to learn. Frank Smith, Alfie Kohn, and Comenius all think that as humans we are natural learners--wanting to make sense of the world from the get-go.
Posted by: elvis | Link to comment | Aug 29, 2007 at 04:38 AM
When individual performance measurement is found to be a bad idea, group performance measurement can be a good alternative.
In education, especially public K-12, the process builds each year on what has been accomplished in previous years. Not all students are created equal, but lawyers have decided that they must be treated as such. Some parents teach their children to respect the teacher, behave, and do their homework. Others disrespect teachers as a group, refuse to have their children take their meds, and place soccer practice, dance lessons and video games as priorities. And tax payers, school boards and school administrators rationalize a minimal and flat pay scale and wonder why all but the most dedicated and untalented teachers leave education for private sector jobs. And how about drugs and gangs? This is a very messy environment for individual performance measurement and monetary reward systems. Unfortunately, all the public discussion focuses on the classroom teacher, with little consideration of the effect that school boards, administrators, parents and the students themselves have on the success of public schools.
There is no silver bullet here, nor an easy answer. I am not even sure we have a problem. But I do know that this simple minded focus on teachers is not helpful to improving the quality of public education. Performance measurement, and standardized tests seems a reasonably good idea to drive improvement, but the focus of the measurements should be at the school and district levels, not the classroom.
Posted by: Hokie | Link to comment | Aug 29, 2007 at 05:28 AM
Icarus
Exactly what are we trying to achieve?
Yes that is exactly where we disagree. Got it in one.
But surely you understood the bit about merit badges, i.e. Certificates of Competence. They are much more precise than merit rankings which say nothing about what you can do.
In other words education, not credentialism.
Posted by: reason | Link to comment | Aug 29, 2007 at 05:35 AM
NCLB was designed to discredit teacher's unions and public education in general. This was a step on the road towards school vouchers and other programs to support government funding of private schools.
The tests, as has been widely discussed, don't measure anything meaningful, but they do set an impossible to meet goal. When the poor schools fail as planned then it is permissible to sweep out the old team and bring in the new.
There are, of course, poor schools, but more importantly these schools are in poor neighborhoods. The students suffer from a poor living environment and the schools suffer from a lack of money. How are teachers supposed to overcome all this by dealing with kids for 1/6 of the hours of the year? They can't and the NCLB promoters knew this from the start.
The program was designed to fail. Now is anyone really interested in improving education? Those in the best school districts (yes public schools) are doing quite well. They get into good colleges and have successful careers. Why should they or their parents devote more resources to the underprivileged? We live in the YOYO (you're on your own) society. The sociologists can go back to their studies, but if they don't understand the game that's being played all the evidence in the world isn't going to change policy.
Once again it's all about the rich coming out on top and to hell with everyone else.
Posted by: robertdfeinman | Link to comment | Aug 29, 2007 at 07:43 AM
There is much that is indictable in our approach to education but what tends to bother me most is how much of that approach is tacit. For example, subject-area content aside, standardized achievement tests also measure a fairly specific set of competencies -- secondary or tertiary literacies such as abstractive (keyword) reading, contextual interpretation and filling in, speed reading, etc. -- necessary to successfully complete the test in the allotted time; not coincidentally these literacies are also quite helpful in successfully pursuing a college career (but not necessarily other careers).
I am not speaking theoretically here. I encountered this issue directly as someone responsible for educating college students from Appalachia who in most cases were also the first members of their family (or clan) to attend college; most displayed a high degree of competency in other respects and on various performance measures but consistently struggled to achieve high scores on standardized tests, not infrequently taking a required exam more than once to gain the minimal, passing score. Such exams are clearly also performance measures with their own 'merit badge' but their curriculum is at least partially masked or tacit; i.e., an appropriate 'certificate of competency' would acknowledge both the content area and literacy accomplishments revealed in the test whereas an alternative performance measure would have the student perform a task designed to measure the target competency more directly with fewer confounding variables.
I developed training to help strengthen the necessary test-taking literacies in my students but had to do some primary research myself because -- quite astonishingly (and frankly grotesquely given how high-stakes these tests frequently are) -- there was virtually nothing in the publicly available research literature that specifically investigated and detailed the literacies involved in standardized test-taking. The tacit assumption seemed to be that these were tests of content-area or some specific analytic competency only, that 'cultural bias' had been appropriately accounted for and that, at most, a few test-taking skills would take care of the rest assuming the student possessed the requisite subject-area ability. I spent an entire afternoon at a university library pouring through dust-covered monographs from STS and what I found could basically be boiled down to that; amazing.
Reforms such as NCLB only compound these kinds of problems. Appropriate measures of human performance require powerful theories, really serious work and long-term commitment; wishful thinking, sound bites and a short attention span just don't have the legs.
Posted by: RW | Link to comment | Aug 29, 2007 at 08:42 AM
"I would like to ask Megan Adams, again, with all due respect, if she is confident of a well-documented, consistent pattern of low test scores focusing good, reforming attention--especially including adequate additional resources--on minority school children."
Well in San Francisco all 'reforming attention' on minority children takes as part of its context these low test scores.
I think the problem with not having test scores is that often well-meaning people persuade themselves that there is no problem. They persuade themselves that the children are doing fairly well by standards that are (conveniently) impossible to empirically measure. This then undermines efforts to fund programs targeted at this population.
A more specific example in San Francisco was an effort by the former superintendent (who was run out of town on a rail for political reasons, much to my dismay). She started 'dream schools' for poor minority youth which followed the KIPP school program in many ways: much, much longer school year in hours. Half day Saturday classes, school from 8-5 mon-fri, 1 month off in summer. An article in the last year or so in the NYTimes magazine discussed this program and how and why it worked for this population.
The point is that the low scores motivated the search for a solution.
A lack of scores is really a lack of transparency. We need to know where the failures are to deal with them.
Posted by: megan adams | Link to comment | Aug 29, 2007 at 09:11 AM
Dan's comment inspires me to quote the much studied Finnish example. Finland has excelled in PISA studies but concentrating of giving specific targeted help to students with problems. This is not expecting equal outcomes, but it is trying to place a floor below which students (so long as they have some capability) will not fall.
Posted by: reason | Link to comment | Aug 30, 2007 at 01:13 AM