Evaluation Guide for Social Science Researchers

Becoming an evaluator

If you’ve been asked to do an evaluation but you have not done one before, and you were not taught how to do it in your social science degree, this article is for you. (Congratulations, you are “falling into evaluation” – check out Chris Lysy’s illustrated blog post).

In evaluation, you simply use qualitative and quantitative research methods to assess if something works and how well. If you want a book, Patton’s “Utilisation-focused evaluation” is a great start. Alternatively, find previous evaluations from the organisation, or look up evaluations published by reputable agencies and consultancies and learn by example.

There are dozens of approaches and theories out there, but don’t let this distract you. Most of the time, your non-academic clients or employers will not care about theory. All they want is an answer to “how well does this work?” supported by evidence.

Here are some things I have learned:

1. Do not overcomplicate

Unlike in academia, there are no brownie points for the novelty and originality of your methods and approach.

90% of my evaluations use a combination of surveys, interviews and operational data analysis. Many of the innovative approaches beloved by theorists are time and resource-intensive for the evaluator and the participants. I learned that people often do not want to co-design, photo-journal, or participate in full-day sessions because it is too much of a burden on top of an already busy job or life. There are cases where the time- and resource-intensive approaches are the right choice, but before you use them, ensure that there is sufficient support from the organisation and the participants.

Choose the right level of robustness

Many academics transitioning to market research and evaluation struggle with the difference in what is considered sufficient in terms of methodological robustness. For example, this is a 172-page document describing the sampling for the New Zealand Attitudes and Values Study. They use the electoral roll, contact people by post , try to reach the same person several times, use multiple sample frames etc. to ensure the sample is as representative as possible. This is the academic gold standard.

Outside of academic research, it is common to use internet panels and quota sampling. These panels are opt-in – people choose to subscribe to them, mostly for the rewards and prizes (and a few out of professional curiosity). Quota sampling means you have limits for how many people from each demographic get into the sample. If the sample is 1,000 with a 50/50 male/female quota, the first 500 women who click the link will be allowed to fill out the survey but the 501st and following women won’t (”we already have enough respondents with your profile”).

From a statistical point of view, such sampling does not allow you to claim representativeness. It is not a probabilistic sample. Nonetheless, quota sampling from large internet panels works well enough for many topics. It is much faster and cheaper than real probabilistic sampling, and the market research industry uses it as the standard.

Expect similar differences in standards when using other methods. Be flexible enough to accept what the “good enough” looks like for each research project and do not stubbornly stick to academic standards when they would only add costs without adding much value.

2. Do not take stakeholder input at face value

In social research, there is an assumption that the participants are honest when providing the researcher with information. This is true much of the time. It helps that as a result of the majority of social research, absolutely nothing happens.

An abstract of a typical social sciences paper (Werbner 2013)

When a programme is evaluated, stakeholders or beneficiaries may be invested in avoiding changes. Sometimes it’s a simple desire to keep their jobs or benefits. Sometimes it is political. I know of a programme that does not work. However, the ministry running the programme does not have anything else that would address the problem, and the politicians do not want to be seen as abandoning the affected communities and doing nothing for them. The PR value is deemed worth the continued expense.

Sometimes, people will have misguided theories on what causes the issue. Sometimes, there is an honest gap between the perception and the reality, like when staff told me that a facility is not much used in the evening but the data from cameras showing how many people enter and exit showed otherwise. These evening users were just sitting quietly in hidden nooks and crannies and not creating the sort of buzz that staff associated with high use.

When you are given operational or administrative data, it may not be as high quality as the datasets you encounter in academic research. You won’t believe how often a number turns out to be a loose estimate made by someone months after the event. Or, different people count something differently because there was never a clear guideline of what qualifies to be counted. If you are given any already collected data, interrogate how it was obtained and check it for consistency.

3. Be realistic with your recommendations

A common mistake of new grads without non-academic research experience is recommending unrealistic solutions. This may happen if you just pass on all the wishes and ideas of participants without a filter.

For example, in one evaluation a common complaint from staff and users was insufficient parking near the service centre, which, combined with suboptimal public transport options, made people less likely to visit and use the service. Several participants floated the idea of adding an underground carpark under the building. It didn’t take much to find out that for this building, adding an underground carpark would be prohibitively expensive. I would be laughed out of the room if I recommended it. There were other, more feasible options to address the issue.

Cost may be one obvious reason why a solution won’t work. Other less obvious reasons are stakeholder resistance, people not being able or willing to do what it takes, or unavailability of critical partners / support. I’ve seen recommendations that assume the existence of a lot of “community partners” out there with the capacity and willingness to enter collaborative arrangements, without verifying that they exist.

Then there’s political feasibility. Depending on the country and context, some solutions that may look perfectly reasonable are unacceptable. For example, New Zealand does not have a unique identifier for its citizens. Having such an identifier would improve the efficiency of many government services, and many countries have it, but for historical reasons that would take too long to explain it is politically unacceptable to consider.

If you want to include unrealistic ideas, signal that you understand what it is, by naming that section “blue sky thinking”, “provocations” or similar.

4. Communicate a lot from the start

In academic social sciences it is common to work alone or with a small group of researchers who only show their work to the rest of the world after it is in a peer-reviewed, finished state. Participants can sign up to receive research results but often so much time passes between the data collection and the final publication that no one remembers to inform the participants.

In evaluation, the best practice is to share data and analysis early and often, with everyone who is involved or interested. Sometimes this is formalised as “sense-making” sessions where the researcher shows the (anonymised) data to participants and stakeholders to get their interpretations and input. The minimum is “no surprises” – there should be nothing in the final report that the main audiences didn’t see coming.

Apart from transparency during the research process, ideally, you will engage with stakeholders enough to understand their values and priorities. What is desirable or undesirable, or normal or uncommon differs across ethnic, socioeconomic, professional and other groups. For example, one participant group I worked with was from a cultural background that values multigenerational families staying close and supporting each other. From a perspective of housing statistics and definitions of overcrowding, they lived in overcrowded dwellings. From their perspective, it was not a bad thing, quite the opposite, and many of them could afford to live separately from their extended family but chose not to. The problem, if anything, was the lack of housing that could comfortably accommodate multigenerational, extended families.

A common faux pas is a single- or multi-choice survey question with a limited number of options none of which reflects a common response in the group being surveyed. In the example above, it would be a question about family structure that only has different combinations of parents and children.

So talk to everyone a lot – but be respectful of people’s time and energy and do not exploit their free labour. For me, it is acceptable to not offer anything extra when participants are staff or volunteers of the organisation that commissioned the evaluation and the organisation asked them to support the evaluation and they do it within their work hours. When using the time and expertise of anyone who is not participating as part of their job or volunteer activity, they should be paid or at least reimbursed. Exceptions are when the participants believe that they have so much to gain from the evaluation that you have more willing contributors and participants than you can handle, but this rarely happens.

5. Distill the findings

In academia, people spend a lot of time describing how they arrived at the findings. In evaluation, everyone is interested in the findings and recommendations first. If your audience was included and updated before and during the evaluation, they know about the methodology. So start with the findings and recommendations, then present supporting evidence, and hide methodological and technical notes in the appendices for those few who will be interested. If you want to know more about report writing, I wrote another article on it.

An evaluator’s main audience is usually the decisionmakers, and in larger organisations these will often be busy managers and executives who do not want the level of detail that academics are taught to include. Learn to summarise, not just in a reductive way but by identifying underlying themes and drivers. One example I use in my courses is: in an evaluation of a cultural event, some people complained about a tent being too far from the stage to see what is happening, others about insufficient parking space, others about the lack of culturally appropriate food options and others about the lack of guidance about what to do and not to do. The underlying theme is “make sure everyone is included and supported to fully participate”. If applied, this solves the above problems and other unidentified or future problems linked to the same cause. If, instead of offering this recommendation with a couple of examples you go into the operational details of everything that happened you may lose the interest of your executive audience.

Apart from summarising at the level of underlying drivers, matrices are your friend. Classify your recommendations, findings and other lists along two axes. The standard (for recommendations) is cost vs. impact (or effort vs value), which identifies the quick wins. Influence vs. interest is common for stakeholder mapping, probability vs. impact for risk, service customisation vs customer participation for service process and so on. You can also get creative with any two variables that matter and create axes for low and high values or opposites. See David A. Field’s page for examples.