Designing or reviewing measures has been the most common request I received in my evaluation roles. Here is my unofficial checklist.
1. When creating a new measure, especially one that requires new resources to support data collection / evaluation, we should ask:
- what is the bad thing that could happen if we don’t have this measure?
- what is an example of a bad decision we might make if we don’t know that measure?
- what is an example of a risk or concerning trend that we might not notice if we do not have the measure?
For useful measures, at least one of these answers is obvious and examples are easy to find. If people struggle to answer the questions above, the measure probably does not add value.
2. Avoid measures that are binary milestones, e.g. “report published”, “programme implemented”. Measures should ideally measure how well the work has been done or how much work has been done, rather than just state that work has been done. You can measure engagement with the report, or progress against a broader work programme.
3. Where measures often fail is the definition of what should be measured and how. For example, if someone proposes a measure “Number of initiatives that promote social connectedness and address social isolation”, we should ask the following:
- Does it measure one thing? (no, promoting connectedness and addressing social isolation are different – pick one or split)
- Do we know how to count? In the example above, what characteristics does an initiative need to have to address social isolation? If we leave it undefined, we risk that people will count all activities where human interactions happen, which will likely mean 100% of the activities. The definition should make it easy to think of examples of initiatives that do not meet the definition.
- What is one unit? In the example above, what is an initiative? Some things can be easily divided or merged. I saw a case where the organisation wanted to support a specific type of initiative and the message from the leadership was that more is better. Staff divided existing initiatives meeting the definition into the tiniest possible pieces. For example, there was the “Tuesday Fingerpainting” and “Thursday Fingerpainting”, counted as two, even though the activity and participants were the same. If there is a possibility of something like this happening, you need an unambiguous definition of one unit.
4. Does good quality data exist? How much will it cost to collect it? This is an important question especially for measures that rely on how someone feels (“Youth feel empowered to participate in local politics”, “Whānau/families feel supported”) etc. because the only way to find out how people feel is to ask them. Someone will need to prepare and send out a survey. Emailing a link to an online survey is a method that requires the least effort and expense but tends to have low response rates. Will this be still a useful and meaningful measure if we get a 40%, 25%, 10% response rate? Have staff and volunteers been consulted if the data collection imposes extra work on them? For example, if asked to report the number of participants of events, will people fill out the form from memory many months after the event because it’s the last thing on their to-do list, resulting in inaccurate data?