Category Archives: Evaluation

Why Is There a Disconnect Between Research and Practice and What Can Be Done About It?

What characteristics of teacher candidates predict whether they’ll do well in the classroom? Do elementary school students benefit from accelerated math coursework? What does educational research tell us about the effects of homework?

three interconnected cogs, one says policy, one says practice, one says research

These are questions that I’ve heard over the past few years from educators who are interested in using research to inform practice, such as the attendees of researchED conferences. These questions suggest a demand for evidence-based policies and practice among educators. And yet, while the past twenty years have witnessed an explosion in federally funded education research and research products, data indicate that many educators are not aware of federal research resources intended to support evidence use in education, such as the Regional Education Laboratories or What Works Clearinghouse.

Despite a considerable federal investment in both education research and structures to support educators’ use of evidence, educators may be unaware of evidence that could be used to improve policy and practice. What might be behind this disconnect, and what can be done about it? While the recently released Institute of Education Sciences (IES) priorities focus on increasing research dissemination and use, their focus is mainly on producing and disseminating: the supply side of research. Continue reading

Little Kids, Big Progress: New York Times’ Head Start Coverage

It’s not often that early childhood stories make the front page of the New York Times. But this week, the paper featured an article by Jason DeParle about Head Start, a federal early childhood program that serves nearly 900,000 low-income children, and how the quality of the program has improved over the past several years.

DeParle’s article is a great example of journalism that moves past the common (and relatively useless) question of “does Head Start work?” and goes deeper into exploring how the program has improved  its practices, including changes related to coaching, teacher preparation and quality, use of data, and the Designation Renewal System (all of which Bellwether has studied and written about previously). This type of reporting contributes to a more productive conversation about how to create high-quality early learning opportunities for all children that can inform changes to early childhood programs beyond Head Start.

Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action.

As DeParle points out and the data clearly show, while there is wide variation between individual programs, overall the quality of teaching in Head Start is improving. But while this trend is undoubtedly positive, it raises some questions: What effect will these changes ultimately have on children’s academic and life outcomes? And what can Head Start programs do to their program content and design to even better serve children?

Next month, Bellwether will release a suite of publications that tries to answer those questions. We identified five Head Start programs that have evidence of better-than-average impact on student learning outcomes and thoroughly examined these programs’ practices to understand how they contributed to their strong performance. I visited each program, conducted in-depth interviews with program leadership and staff, reviewed program documents and data, hosted focus groups with teachers and coaches, and observed classroom quality using the Classroom Assessment Scoring System, CLASS (the measure of teaching quality on which DeParle notes Head Start classrooms nationally have shown large quality improvements). By better understanding the factors that drive quality among grantees and identifying effective practices, we hope to help other programs replicate these exemplars’ results and advance an equity agenda.

As the New York Times front page recently declared, Head Start’s progress offers a ray of hope in a dysfunctional federal political landscape. But there is still room for progress. Looking at what high-performing programs do well can help extend the reach and impact of recent changes to produce even stronger outcomes for young children and their families.

All I Want for Christmas Is for People to Stop Using the Phrase “Education Reform”

In a recent opinion piece at The 74, Robin Lake casts off the label of educator reformer, arguing that “to imply that they are some monolithic group of reformers is ridiculous.” I agree, not so much because education reform has lost its meaning but because it never had a single definition in the first place. At the heart of reform is an acknowledgement that the educational system isn’t serving all kids well, but agreeing that the system could be improved is far from agreeing on how to get there.

definition of educationTo some people “education reform” is about holding schools and districts accountable for student outcomes, which can be viewed as either a means of ensuring that society doesn’t overlook subgroups of students, or as a top-down approach that fails to account for vast differences in school and community resources. To others education reform is shorthand for increasing school choice, or requiring students to meet specific academic standards to be promoted or graduate from high school, or revising school discipline practices that disproportionately impact students of color. Each of these ideas has supporters and detractors, and I suspect that many people who are comfortable with one type of reform vehemently disagree with another.

To take a specific example, consider teacher evaluation reform. One challenge in debating this particular education reform is that there are multiple ways teacher evaluation could change outcomes: one way is by providing feedback and targeted support to educators; another is the identification and removal of low-performing teachers. So even when “education reform” types favor a policy, they might have very different views on the mechanisms through which that policy achieves its goals. In the case of teacher evaluation reform, the dueling mechanisms created trade-offs in evaluation design, as described by my Bellwether colleagues here. (As they note, in redesigning evaluation systems, states tended to focus on the reliability and validity of high-stakes measures and the need for professional development plans for low performing teachers, devoting less attention to building the capacity of school leaders to provide meaningful feedback to all teachers.)

I personally agree with those who argue that teacher evaluation should be used to improve teacher practice, and I have written previously about what that might look like and about the research on evaluation’s role in developing teachers. In a more nuanced conversation, we might acknowledge that there are numerous outcomes we care about, and that even if a given policy or practice is effective at achieving one outcome — say, higher student achievement — it might have unintended consequences on other outcomes, such as school climate or teacher retention.

Instead of broadly labeling people as “education reformers,” we need to clearly define the type of reform we’re discussing, as well as the specific mechanisms through which that reform achieves its intended goals. Doing so provides the basis for laying out the pros and cons of not just the overall idea, but of the policy details that transform an idea into action. Such specificity may help us avoid the straw man arguments that have so often characterized education policy debates.

How an East Coast/West Coast Hip Hop Rivalry Helped Us Find Evaluation’s Middle Ground

Everyone loves a good rivalry. The Hatfields vs. the McCoys. Aaron Burr vs. Alexander Hamilton. Taylor Swift vs. Katy Perry.

As evaluators, we’re partial to Tupac vs. Biggie. For the better part of three decades, these rappers from opposing coasts have remained in the public eye, recently reemerging with the release of a television series about their unsolved murders. Interestingly, their conflict about artistry and record labels mirrors a conflict within evaluation’s own ranks around a controversial question:

Can advocacy be evaluated?

Images via Stanford University, Zennie Abraham, Takeshl, and Harvard University

On the East Coast, Harvard’s Julia Coffman acknowledges that evaluating advocacy can be challenging, thanks to the unique, difficult-to-measure goals that often accompany these efforts. Nevertheless, she contends, these challenges can be mitigated by the use of structured tools. By using a logic model to map activities, strategies, and outcomes, advocates can understand their efforts more deeply, make adjustments when needed, and, overall, reflect upon the advocacy process. This logic model, she claims, can then become the basis of an evaluation, and data collected on the model’s components can be used to evaluate whether the advocacy is effectively achieving its intended impact.

In contrast to the East Coast’s structured take, West Coast academics refer to advocacy as an “elusive craft.” In the Stanford Social Innovation Review, Steven Teles and Mark Schmitt note the ambiguous pace, trajectory, and impact related to the work of changing hearts and minds. Advocacy, they claim, isn’t a linear engagement, and it can’t be pinned down. Logic models, they claim, are “at best, loose guides,” and can even hold advocates back from adapting to the constantly changing landscape of their work. Instead of evaluating an organization’s success in achieving a planned course of action, Teles and Schmitt argue that advocates themselves should be evaluated on their ability to strategize and respond to fluctuating conditions.

Unsurprisingly, the “East Coast” couldn’t stand for this disrespect when the “West Coast” published their work. In the comment section of Teles and Schmitt’s article, the “East Coast” Coffman throws down that “the essay does not cite the wealth of existing work on this topic,” clearly referring to her own work. Teles and Schmitt push back, implying that existing evaluation tools are too complex and inaccessible and “somewhat limited in their acknowledgement of politics.” Them’s fighting words: the rivalry was born.

As that rivalry has festered, organizations in the education sector have been building their advocacy efforts, and their need for evidence about impact is a practical necessity, not an academic exercise. Advocacy organizations have limited resources and rely on funders interested in evidence-based results. Organizations also want data to fuel their own momentum toward achieving large-scale impact, so they need to understand which approaches work best, and why.  

A case in point: In 2015, The Collaborative for Student Success, a national nonprofit committed to high standards for all students, approached Bellwether with a hunch that the teacher advocates in their Teacher Champions fellowship were making a difference, but the Collaborative lacked the data to back this up.

Teacher Champions, with support from the Collaborative, were participating in key education policy conversations playing out in their states. For example, in states with hot debates about the value of high learning standards, several Teacher Champions created “Bring Your Legislator to School” programs, inviting local and state policymakers into their classrooms and into teacher planning meetings to see how high-quality, standards-driven instruction provided for engaging learning opportunities and facilitated collaborative planning.

But neither the Collaborative nor the teachers knew exactly how to capture the impact of this workWith Teacher Champions tailoring their advocacy efforts across 17 states, the fellowship required flexible tools that could be adapted to the varied contexts and approaches. Essentially, they needed an East Coast/West Coast compromise inspired by Tupac and Biggie and anchored by Coffman, Teles, and Schmitt. Continue reading