Clients, particularly new clients, often come to us with a seemingly simple question. It typically goes something like this: “Do the kids understand and retain what we are trying to teach them?” “Are the resident’s happy with the program?” “Are patients’ viral counts lower?” “Are couples getting along better?” Typically, they want help developing a survey or a set of focus group or interview questions to help them find out whether they are getting the results they want. In nearly every case, they need this kind of outcome data for funders, either those supporting the current program or those the development department is seeking to cultivate. While sometimes getting a legitimate answer can be technically complex (reliability and validity issues, complex sampling or response rate issues, statistical significance concerns) the answers themselves are simple, the program is either achieving its outcomes or it isn’t.
In our experience though, social programs rarely just work or don’t work. Instead, they work for some participants but not for others, or under some circumstances but not others. Take this example. A job shadowing program is offered at several sites within a human services agency. The goal of the program is to introduce at-risk youth to the world of work. The program hires an evaluator who develops a survey designed to assess work attitudes and values and administers it to students at each of the sites before the program starts and then again after it ends. Analysis of the data shows a small, but statistically significant, improvement in attitudes. The development team is happy, they can tell potential funders that results are statistically significant (if you link to the article be sure to read through to the second paragraph on effect size), but the program people are concerned– all that time and effort and the improvement in work attitudes is so small? Somewhat demoralized, they sallie forth.
What can we do to make this study more impactful? The short answer— slice and dice the data to see what factors drive outcomes. For example, do results differ by program site? Do some sites do better than others in improving participants’ attitudes? If so, what is it about those sites that account for the better outcomes? More experienced staff, more appropriate mentors, better prepared participants? What about participant factors overall? If there are no differences across the sites, could participant factors drive differences? For example, do boys do better than girls? Do older teens do better than younger ones? And what about the organizations that host the participants for the job shadowing? Do certain organizations, say companies A, B, and C, do better than companies D, E and F? Is there anything that distinguishes them? For example, do small locally based companies do better than large national firms? What about the mentor / mentee relationship? Could it be the case that outcomes are generally better when male mentees are paired with male mentors and female mentees with female mentors? Be careful though, it can get more complicated still. The gender matching effect (or any factor driving outcomes) might work only for the males but not the females.
The key here is to understand that looking only at the total outcome of a program limits your ability to use evaluation data for program improvement. Take another look at the examples above. If outcomes are better with small locally based job sites it may be a good idea to focus on recruiting those kinds of organizations in the future. If outcomes are better for girls, you might look at ways to improve how you work with boys. If older youth do better, you might decide to open up more spaces for these kids and decrease the number of slots available for younger ones. We believe strongly that the first goal of any evaluation project ought to be program improvement. By conducting this kind of driver analysis, it can. In the coming weeks, we’ll be posting some more examples of how this kind of analysis works. Please stay tuned for more.