THE DEVELOPING WORLD
"Counterfactual" aid evaluation: When what you see may not be what you're getting
Spring 2010
Development aid specialists have long argued about how development assistance should be evaluated so that both host governments and aid donors can have a clearer picture of what works and what doesn’t. Martin Prowse and Lídia Cabral debate two contrasting visions of how to make aid more effective
How effective is development assistance? It’s a question that has come under close scrutiny in recent years, and in 2005 the Paris Declaration set five key “aid effectiveness” principles. Point one is that local “ownership” should be strengthened by ensuring that the government of a country receiving aid sets the agenda. Second, that aid donors should align their thinking and programmes with the host government’s policies and management systems. Third, improved harmonisation of different aid donors’ development programmes through closer cooperation and agreed divisions of labour. Fourth, better evaluation of development results and, fifth, mutual accountability so that both aid recipients and donors are equally accountable.
Progress on implementing these important principles was assessed in September 2008 in Accra, Ghana, at the third High Level Forum on Aid Effectiveness. The Paris/Accra agenda suggests that aid effectiveness can be improved by changing the processes that make up an aid management system, not least by introducing better donor coordination mechanisms and harmonising strategies and policies. It also suggests that progress in improving aid effectiveness can be judged mainly by measuring the degree of compliance with these changes in processes and systems.
A very different approach to measuring the effectiveness of aid has been taken by advocates of “counterfactual evaluation”, who don’t focus on process issues but on the degree to which aid improves the well-being of the poor. These evaluations set out to answer what they call the counterfactual question of whether beneficiaries’ well-being would have changed even if the intervention had not taken place?
This sort of evaluation is often done by randomly selecting participants from a wider population, and then randomly assigning participants to a “treatment” group (which receives an aid intervention) and a “control” group (which does not). As participants in the control group are similar to those in the treatment group, any significant difference between the two groups is said to be attributable to the aid intervention.
Advocates of this approach, the so-called “randomistas”, argue that weak aid evaluations have contributed to a lack of consensus around the simplest of questions, which is “what works”? Believing their approach to aid evaluation is a superior model, they have gone so far as to intimate that all aid should be based on randomised experiments. Not surprisingly, this trend has sparked a variety of reactions, six of which are outlined below.
The first concerns policy horizons. Randomised experiments often need time to ensure that aid interventions have become fully embedded before the final survey is conducted. This may conflict with the shorter time horizons of governments and donors who often want evidence produced quickly to fit in with budgetary, legislative or political windows.
A second response focuses on moral and ethical concerns about using a control group. In other words, should we intentionally withhold an intervention from potential beneficiaries as part of an experiment? Advocates of randomised experiments suggest that it is easy to avoid unethical evaluations. For example, it is common that the entire eligible population is not reached by a project immediately, either because of budget constraints, or because the intervention is being rolled out over a period of time. In the latter case, those to receive the intervention later can be the comparison group for the first participants. Moreover, randomistas argue that what is really unethical is to go on spending billions of dollars on ineffective interventions.
The third criticism of randomised experiments concerns context. Will successful interventions in one region or country have the same effect in a different region or country, or through a different institutional structure? A fourth critique focuses on experiments’ design; for example, do intervention and control groups stay separate, or is there some direct and indirect leakage between the two groups?
The fifth point looks at wider political dimensions of aid relationships. The case for randomised experiments on efficiency grounds may ignore political currents that are a critical element of development assistance. Aid donors and recipients often have (undeclared) strategic interests that need to be factored in when assessing the impact of aid. The sixth and last critique highlighted here concerns the scale and reach of evaluations. The argument here is that while RCTs may be well-suited to small-scale development projects, they are not appropriate for evaluating larger aid-funded operations and broad policy changes. Investments in large-scale infrastructure, and also major policy shifts like public sector reforms, are not at all suitable because of the difficulties of establishing counterfactual data. This is important because, as the Paris agenda shows, the grain of aid flows has been moving towards direct budget support and other forms of programmatic aid which focus on broad governance and institutional issues which may not be amendable to a counterfactual design.
At first glance, then, it appears that two major trends in improving the effectiveness of aid are moving in opposite directions. Yet, there may be more synergies between these two trends than initially meets the eye. Three areas merit greater attention. The Paris/Accra agenda promotes programmatic forms of support, such as general and sectoral budget support. And although establishing a broad “with versus without” counterfactual approach is difficult, it may be possible to do so within an over-arching evaluation made-up of different elements.
Second, it may also be possible to use a before and after counterfactual design (so-called interrupted time series designs) when there is enough available longitudinal data using quarterly or monthly figures. This sort of analysis needs the analysts to be aware of any other important policy shifts that may have been implemented at the same time, or changes in administrative procedures and even how data was captured.
Counterfactual evaluations could also help governments decide which interventions are most effective within a sector that receives substantial financial support. Reconstructing a control group through matching intervention and control units according to their recorded characteristics, for instance, offers plenty of potential. Other statistical impact evaluation approaches which don’t use a counterfactual design can also serve this purpose.
The debate on measuring the effectiveness of aid is far from over. Counterfactual evaluations have certainly raised interest in better evidence and more formalised and rigorous evaluation techniques. But so far there has been limited progress in assessing the new aid orthodoxy, with its main focus on processes and systems. And although counterfactual designs are good at telling us what works, most are not very good at telling us why they work. In other words, counterfactual evaluations that rely solely on quantitative methods may be unable to tell us very much about how or why success occurs – they often can’t tell us about key transmission mechanisms such as cultural values, or local norms or practices associated with the intervention in question. So complementing counterfactual evaluations with qualitative forms of research like focus groups, semi-structured interviews or just participant observation may give evaluators a clearer idea of whether success in one intervention can be replicated elsewhere. In fact, both quantitative and qualitative research within counterfactual evaluations looks a good way of giving governments and donors a clearer picture of what works and what doesn’t.