The importance of empirical evidence in choosing a social impact career
By Paul Healy and Phil Dearing
At Second Day, we try to hone in on organizations that are both impactful and provide opportunities for meaningful work for our job seekers, but deciding where to apply and ultimately where you choose to work is entirely up to you.
This blog post, an interview with Paul Healy, J.D. Candidate at Yale Law and MSc in Economics for Development, is for the empirically-minded of you who are looking for optimal impact. Paul gives us some general principles for how to prioritize different job opportunities in the social impact space, as well as some challenging and meaningful questions to ask potential employers.
Hi Paul, can you let our readers know briefly who you are and why the topic of social impact effectiveness is important to you?
Sure! I did a B.A. at Georgetown in Classics and Economics, where I became friends with the wonderful founders of Second Day. I then worked at McKinsey for two years in Washington, D.C. I think I got lucky in consulting, in the sense that I got to work on a few big public problems during my first year (public transportation, public debt, and private infrastructure investing) that helped me realize I want to spend my career thinking about economic development in some way. So, I went to do a master’s degree in development economics to gain the skills I would need to think about these problems with more formality and structure. And I’m currently in my first year of law school, where I plan to continue thinking about development. I think the disciplines of economics and law can be really complementary in understanding and advancing human welfare.
We spent a lot of time in my master’s program learning about and implementing empirical methods in economics- so I hope I can provide some simple advice on how to think about empirical evaluation and apply it to one’s career choices.
So, why is it important to think empirically about the effectiveness of social interventions or organizations?
There are two facts that highlight the importance of taking effectiveness seriously. First, most social interventions seem to be ineffective—by this I mean they do not produce better outcomes than an alternative of “nothing.” Second, even among social interventions that have proven impact, there can be exponential differences in cost-effectiveness - meaning some organizations may achieve impact but do so at great cost, while others do so far more efficiently (for just one example in the education sector, see page 10 of this paper). If we take these two facts together, we can see right away that in allocating our labor to a socially-minded organization, we run the risk of not doing much good at all or doing much less good than we could if we had a more accurate perspective on effectiveness.
I saw the economist Esther Duflo give a lecture at Yale recently, in which she similarly acknowledged that “it is a sad truth of our profession, as development economists, that most social interventions we look at are not effective.” Yet, she followed up by saying that she loves her job because of the upside: she gets to amplify the solutions that do work. I think we should all take a similar lens to our searches for socially beneficial jobs: thinking critically about impact can often be a real downer, but when we find something that is effective, we have uncovered an incredible chance to dedicate our time to something that we know improves human welfare, and I think this is pretty awesome!
Now, for most of us, empirical effectiveness is just one of many criteria to take into account—we want jobs that build our own skills and make us feel good on a daily basis. This is why most people I talk to are not interested in “earning to give,” meaning working in finance and donating 90% of one’s earning to empirically effective charities. So, you should be open with yourself about what priorities you are balancing in your job search, and where effectiveness falls on that list.
What exactly do you mean by “effectiveness”?
To better define effectiveness in social impact, it is often easier to start by honing in on the ways that organizations can be ineffective. I think social organizations can be ineffective in two ways: first, interventions can be conceptually ineffective: even if a particular action fully achieves its aims, it will not improve human welfare by much at all. For instance, some argue that philanthropic donations to Ivy League universities or to the Make-a-Wish foundation might fit this description.
Second, interventions can be empirically ineffective: the intervention may have a lofty, high-impact mission, but when put to the test, does not outperform an alternative of “nothing.” Just a few examples in this category include (i) microfinance (long praised as a ladder out of poverty, but evidence suggests it does not reduce poverty—article here and a helpful video here with Rachel Glennerster, Chief Economist at DFID), (ii) TOMs shoes (World Bank researchers found insignificant effects on foot health), and even (iii) workplace wellness programs in the U.S. (this New York Times article also explains very well the importance of randomized evaluations).
How do we know if an intervention is effective or not? What are the optimal methods of evaluation ?
As a heuristic, three words will suffice: randomized controlled trials (RCTs). Let me try to explain what I mean. If we do X, and observe in the world that outcome Y improves, we want to try to attribute causality to intervention X—otherwise, how would we know whether it’s worth investing our time and resources in X? In our attempts to identify the causality of X on Y, we must always be wary of confounding factors (i.e., things other than X that might have been responsible for the increase in Y). RCTs—studies in which some intervention is administered randomly to some people in a population but not others—do not present a risk of confounding factors. Unfortunately, two of the most common ways that non-profits report their impact show the potential dangers of confounding factors:
A before-and-after comparison: Unfortunately, lots of things change over time. So comparing the same group of treated individuals before and after treatment doesn’t provide much information about effectiveness. Just imagine offering a job placement service on the eve of the Great Recession versus just after the recession.
A comparison to “similar” individuals outside of the intervention: For example: a senior executive laments America’s ineffective job training programs because she had learned that, in a particular Midwestern town, individuals enrolled in a job placement service had worse employment outcomes than those not enrolled. Here is the flaw in this executive’s thinking: in the real world, there is usually a reason why some people end up receiving an intervention while others don’t. People who can find employment without the help of a placement service probably wouldn’t seek out job placement services. Note that the reverse can just as likely be true: imagine a non-profit that aims to reduce recidivism advertising that its participants are much less likely to end up back in prison than the national average. Most likely, people who are particularly committed to becoming law-abiding citizens and avoiding recidivism choose to seek out such an organization, so this observation might be true regardless of the organization’s effectiveness. Even when individuals appear to be “similar” on the surface, it’s nearly impossible to verify that they are in fact similar on such intangible qualities as motivation, commitment, or interest.
Are RCTs really the end-all-be-all, only way to prove an organization’s effectiveness?
As I mentioned above, RCTs are a good heuristic. And over the last couple decades, there has been an RCT revolution in the worlds of social policy and economics (for more on this, see the books Poor Economics and Randomistas). But all heuristics oversimplify reality. There are indeed many valid non-experimental methods—often referred to as “quasi-experimental methods” or “natural experiments”—in which researchers find some variation in the real world and put forth an argument that this variation occurred as if it were a randomized experiment (for a great, nuanced discussion of these methods, see this paper). And, on the other hand, there are many critiques of the RCT revolution that have merit:
Ethics: Some argue that it’s unethical to conduct experiments on human subjects in which we withhold potentially life-changing solutions from people in poverty or experiencing other kinds of suffering, or expose vulnerable populations to potential harm (see chapter 4 of this paper).
Macro solutions: It’s hard to test out macro-level solutions (i.e., at the level of national or international institutions) with an RCT, and we wouldn’t want to blind ourselves to making progress on the big questions involving democracy or governance reform. I personally tend to think that even these big-picture solutions usually can be empirically validated through smaller interventions that we can assess with causality, even if not through an RCT (for instance, here is a great non-experimental assessment of e-procurement, which is one tangible way countries can “make government more transparent”).
Scalability: Another critique—at the cutting edge of development economics—is that an RCT for a given intervention might be effective when conducted at a small scale, but may work very differently when scaled up and administered across a larger region.
But conducting RCTs is just not feasible for many small organizations. So doesn’t your view unfairly create a bias toward large organizations that can afford to carry out such studies?
I hope not! I think there is a difference between ineffective and not-yet-proven-to-be-effective. For instance, many early-stage organizations do not have the resources to conduct a randomized trial, but one could productively spend time working for such an organization and think of it as “social impact R&D,” provided that the organization is committed to taking its impact seriously once it reaches a large enough scale.
An even deeper concern in this vein is generalizability (also referred to as external validity). It would be enormously wasteful to conduct randomized experiments on literally every social intervention ever carried out. On this issue, I highly recommend this piece by Rachel Glennerster.
Why do you think this problem of discerning effectiveness exists in the social sector? Why isn’t it the case that the most effective organizations expand while ineffective ones contract or drop out of the social sector?
When seeking private sector employment it’s generally easy to see whether a company is successful or not. But improvements in human welfare are much harder to verify than profitability. And more importantly, the incentives for growth and compensation in the social and public sectors are not currently tied to impact. Social and public sector organizations—almost definitionally—provide services to those who cannot pay full price for them. So, social sector organizations grow by amassing donations and growing their operations. And donors—both wealthy philanthropists and regular people trying to be generous around the holidays—have not historically used an empirical lens to allocate their resources. I think Angus Deaton, Nobel economist, summed up this phenomenon well when he wrote in his book The Great Escape that “the need to do something tends to trump the need to understand what needs to be done. And without data, anyone who does anything is free to claim success.”
Okay, this is all interesting, but how can regular people—who might not have the time to keep up with academic research—ensure that they’re making accurate assessments about impact? Can you give some practical advice for users of Second Day who are considering different job opportunities?
First, you can check organizations’ websites, as you would to prepare for a typical job interview. Most organizations have an “impact” tab on their website, where they might link to any studies or reports they’ve created. As you browse organizations’ websites, check whether the organization reports its “outputs,” “outcomes,” or “impact.” The last of these three metrics is really the only way we can think about effectiveness. For instance, in an educational context, a listing of “students served” would communicate output, while “% of our students who graduated” would communicate an outcome. And best of all, a link to a randomized study conducted by the organization—or other organizations with similar interventions—would communicate impact.
Next, just ask! During an informational chat or even an interview, you can ask your potential employer how they measure their own impact. You might want to use the magic words “randomized controlled trial” and see how the employer reacts. The best possible scenario—albeit very rare—would be if the organization has participated in an RCT first-hand. Most organizations don’t have the time or money to conduct such studies, so the next best scenario would be if (i) the organization has rooted their operations in a mechanism that’s been tested in several RCTs, ideally in contexts similar to their own, and (ii) the organization has an actionable plan to conduct a rigorous test when they achieve proper scale.