Summer Slide, Public Policy, and the Replication and Reproducibility Problem

By Mario and Andrew

Is summer learning loss really as big of a problem as parents have been lead to believe?

Paul von Hippel, Professor of Public Affairs at the University of Texas at Austin, recently challenged the existence of the broadly-accepted education phenomenon based on the fact that most studies on summer learning, including the most famous, an examination of Baltimore City Students, does not reproduce (that is obtaining consistent results using the same input data, computational steps, methods, code, and conditions of analysis). Note: it is important to differentiate replication, which is defined as obtaining consistent results across studies aimed at answering the same scientific question, with new, independently obtained data. von Hippel concluded that the measure of learning that informed these findings, tests, were fundamentally unreliable,

“Many of us—parents, teachers, politicians, even most researchers—take standardized test scores at face value; we interpret scores as though they reflected children’s skills neutrally, like a mirror. But in the 1980s, some scores could give a misleading reflection, like a fun-house mirror. Scores from the 1980s got children in more or less the right order, with more-advanced students ahead of less-advanced kids. But they distorted the distances between children, making some gaps look larger or smaller than they were.”

von Hippel argues that we really don’t know that much about summer learning loss, mostly because we can’t reproduce key studies that are based on outdated test-scoring methods.

It’s intuitive that summer learning interventions likely work well for kids who are behind as it allows them to catch up to their peers. But the value of such interventions that are preemptive, those that prevent on-track learners from sliding backward, are unknown and may not contribute to closing achievement gaps because there aren’t gaps from summer, to begin with.

This has significant public policy implications.

Could the collective resources we allocate to prevent summer slide be more impactful spent? More strategically targeted? Is this something we should really be concerned with? Or, is the counterargument true--that our increasing focus on metrics and external validation are leaving our children with precious little time to play and enjoy being kids for the summer? Are we doing more harm than good?

What’s additionally compelling, and slightly disturbing, about von Hippel’s work is that it arrives at an extended moment of a broader problem of research replicability and reproduction (the media has termed this a crisis). It is true, that from economics and psychology to international business science (see the outcomes of the below poll of 1500 research scientists from Nature), long-established and influential findings often don’t replicate as they should (see: the marshmallow test).

That’s the basis of the scientific method--that if we control all of the variables, we should be able to ascribe responsibility for various outcomes by incrementally varying the inputs. If that theorem no longer holds, it becomes infinitely more difficult to create sound public policies.

We are particularly interested in medicine as advancements from research have increased our life expectancy by 13 years since the 1990s. But, recently, the validity of a number of medical studies have been questioned.

Amgen, a US biotech company, attempted to replicate 53 high-impact cancer research studies and were reportedly able to replicate only six.
Bayer, a German pharmaceutical company, reported that they were only able to replicate 24 out of 67 studies.
John Ioannidis, MD, Professor of Medicine and Statistics at Stanford University—a strong voice in the replication debate—showed that of 45 of the most influential clinical studies, only 44% were successfully replicated.

John Arnold recently highlighted the same phenomena (good conversation follows the tweet).

Disagreements persist. Over both what is driving these problems (e.g. p-hacking) and the extent to this problem actually is one we should be concerned about.

Maybe there is something more fundamental happening here.

Something related to this unique moment in history and its evolving cultural norms, what some refer to as a, “post-truth” era. There is a constant battle to shape public perception and the interpretation of data that relies less on hard facts, and increasingly on the speed of amplification of the data. In basic science and academic medicine, there is pressure to develop and publish data in scientific journals as quickly as possible with the hope of not only owning revolutionary findings, and the economics associated with commercialization, but also of shaping the debate about what those findings mean. This pressure may have contributed to a number of falsified publications or studies tainted by external interests.

How can we build stronger validity into our research processes? Some argue that we should replicate and try to reproduce results until the observed phenomenon is validated. We spend hundreds of billions of dollars on scientific research alone each year, often with little meaningful progress to show for it. American taxpayers, for instance, rightly ask why we have not yet cured cancer or found a cure for AIDS? It certainly is not for lack of trying or lack of funding. Yet, something needs to change. Ideas abound.

We should also recognize that we are operating in a society where much of what we are currently capable of studying has already been discovered with our existing infrastructure. Until we build new tools, invent new technologies, and develop new learning methodologies, we will continue to operate in proximity to our current level of knowledge. The truth will continue to be heavily influenced by clicks, popularity, and a race-to-be-first mentality. While we work towards greater human - and machine - abilities to confirm core truths, perhaps it makes sense to engage our hearts (not at the expense of our heads, but in equal proportion). In some instances, taking moral action may be our best course.

Summer Slide, Public Policy, and the Replication and Reproducibility Problem

The Middle-Skills Jobs Maze

What's Missing from the Upzoning Conversation?