Don’t average categorical or ordinal data!


A surprisingly common mistake that is made with the analysis and interpretation of M&E data is to try to treat categorical or ordinal data as though it is quantitative data (i.e. interval or ratio data).  This is simply wrong.  It is bad numeracy!

First some background…

Categorical Data

Categorical data (also called ‘nominal data’) describes an ‘exclusive category’.  For example, consider a survey question:

What pets do you own: a) dog; b) cat; c) goldfish; d) none

These are categorical responses.  You either do or do not own a cat. 

Ordinal Data

Ordinal data is the same as categorical data, except that the order of the responses matters.  Consider a survey question:

How do you feel today: a) lousy; b) ok; c) great; d) never better

These are ordinal responses because most people would agree that they represent a progression of feelings.  BUT, the difference between the states is debatable.  My ‘gap’ between ‘lousy’ and ‘ok’ may be different to your gap between ‘lousy’ and ‘ok’.  And furthermore, how I interpret the gap between ‘lousy’ and ‘ok’ today may be different to how I interpret it tomorrow!  But I will always agree that ‘lousy’ is worse than ‘ok’.

The Common Error

Now to the common error…

For the sake of convenience, people often ‘code’ the responses in surveys with numbers, such as: 1) lousy; 2) ok; 3) great; 4) never better.  Then the tempation arises, to start manipulating the data quantitatively.  For example, someone might try to average all of the responses to the question “how do you feel today” and conclude that because the number ‘3.4’ can be derived, that the ‘average’ feeling is “great”.

The way to check that your analysis is sensical, is to substitute the numbers for letters and see if you can still perform the same analysis, and derive the same conclusion.  For instance:

A + B / 2 = ?? 

Or

‘Cat’ + ‘Dog’/2 = ??

Or

‘lousy’ + ‘ok’ = ??

Useful analysis of categorical and ordinal data includes:

  • Mode: what is the most frequent response (e.g. more people answered ‘dog’ than any other response)
  • Frequency: the percentage of respondents that chose ‘goldfish’; or the distribution of frequencies for all responses
  • Crosstabulation: e.g. ‘of those people that said they own a “dog”, what proportion said that they felt “never better”?’

Recent Content

link to The 'theory of change' approach

The 'theory of change' approach

For a long time, I’ve been using the phrase ‘theory of change’ to express the idea that a project is essentially a social experiment, and that M&E is about testing the hypotheses implicit in the social experiment.  Recently I was challenged to succinctly elaborate what I thought embodied the ‘theory of change’ approach.  The following […]