I’m often asked to recommend a good place to learn, or brush up on, the basics of statistics used in survey research.
It’s a difficult question, but I do have a couple of favourites.
The problem is that there are layers of understanding. Let’s take confidence intervals as an example.
Layer 1 – gist
It’s quite easy to understand them in simple terms, something like “the range within which we are 95% sure the true figure for the population would have fallen if we had spoken to everyone“.
This saves us from a completely naïve view of research.
Layer 2 – use
If you do a bit of reading and playing around with a calculator or Excel, you can soon figure out how to calculate confidence intervals correctly. You’ve learned that the 95% confidence interval for a mean is:
So now you can use confidence intervals with your own analysis.
Layer 3 – context
To get to the next layer of the onion, to understand the assumptions we have made, the conclusions we can safely draw, and the theory on which they’re based is much more difficult.
It’s worth investing the time.
One really good book is PDQ Statistics, which is a slim volume aimed at the intelligent layperson. It has a very practical bent, but also respects its reader enough to explain the basis on which ideas such as confidence intervals rest.
It has a clear explanation, for instance, of why statistical tests can only tell you the probability of getting the result you have given a hypothesis; rather than the probability of your hypothesis.
A more specialist book is Statistical Rules of Thumb. It’s aimed at practitioners, notably statistical consultants, as a reference text; and it’s extremely comprehensive.
It was from this book that I learned one of my favourite statistical tricks – the Rule of Threes. To quote the book:
Given no observed events in n trials, a 95% upper bound on the rate of occurrence is 3/n
This is fantastically useful.
Imagine you speak to 50 customers and none of them had a problem during their experience. Does this mean that you never create problems? Of course not. But how prevalent are they?
This trick lets us put a 95% upper bound on the rate of problems, in this instance at 3/50 = 6%
This is a really good example of the kind of conclusion that is only possible with a deep understanding of statistics.
Good statistical analysis is not theoretical naval-gazing, it helps us learn broad concrete truths about our customers.