Tag Archives: statistics

Hans Rosling: a great data storyteller

gapminderI was sad to hear, yesterday, that Hans Rosling had passed away.

For anyone interested in telling stories with data, he was an inspiration and an example.

His videos use a lively combination of data, innovative visualisation, and passionate argument. This is one of my favourites: 200 years that changed the world.

The mission he, Ola, and Anna set themselves at Gapminder was to combat ignorance with data; to discover where knowledge gaps exist, and to attack them with fact. He tended to underestimate the importance of his own charm and storytelling skill in engaging the audience not just with the data, but with its significance.

In a world in which news feels increasingly negative, dominated by assertion and prejudice over fact, Hans Rosling was a tremendous force for good. He made us see and acknowledge the progress that has been and is being made.

We could all do with being a bit more like Hans Rosling.


Tagged , ,

From drivers to design thinking

networkDriver analysis is great, isn’t it? It reduces the long list of items on your questionnaire to a few key drivers of satisfaction or NPS. A nice simple conclusion—”these are the things we need to invest in if we want to improve”.

But what if it’s not clear how to improve?

Often the key drivers turn out to be big picture, broad-brush, items. Things like “value” or “being treated as a valued customer” which are more or less proxies for overall satisfaction. Difficult to action.

Looking beyond key drivers, there’s a lot of insight to be gained by looking at how all your items relate to each other, as well as to overall satisfaction and NPS. Those correlations, best studied as either a correlogram (one option below) or network diagram (top right) can tell you a lot, without requiring much in the way of assumptions about the data.
In particular, examining the links between specific items can support a design thinking approach to improving the customer experience based on a more detailed understanding of how your customers see the experiences you create.

Your experiences have a lot of moving parts—don’t you think you ought to know how they mesh together?

Tagged , , , , , , ,

Telling the story with data


This is a diagram from my course about data presentation and infographics.

I use it as a starting point to discuss the skills you need to do the job well, summarised as “telling a compelling story with integrity”.

The idea of the diagram is that too much or too little of any of the three axes tends to be a bad thing.

For instance, too heavy on the “statistician” axis might mean that your charts are accurate and robust, but impenetrable to many people. Too light on the same axis, and you might be committing basic analytical mistakes (perhaps ignoring random measurement error).

It’s a rare person who embodies all of those skills to a truly expert level, which is one reason the best infographics often involve a team of people.


Finding your audience

It isn’t necessarily a case of shooting for the middle of the triangle. There’s a zone of acceptable variation around the middle in which competent and engaging data storytelling happens.

What’s appropriate for a scientific publication is not appropriate for your board, or for frontline staff. It’s all about getting the balance right for your audience.

Obvious? Yes, but it’s worth thinking about what it means in practice. Which “rules” of data storytelling are unbreakable, and which need to be tailored according to your audience?


How much do we know about what works?

Stephen Few takes a dim view of infographics which he sees as prioritising shallow gimmicks over effective visual communication. David McCandless has been on the receiving end of severe critiques.

He also points out that more work needs to be done to test which graphic forms are most effective, rather than relying on opinion. I agree – we can’t begin to pretend we’re working in a serious field until we approach these questions scientifically.

Robert Kosara has published interesting work showing that pie charts, much derided by experts, are more effective than we thought.

But is communication our only aim? Not always.


Telling the story

The science of which data graphics work most effectively is only part of the equation. The best graphic in the world is wasted if no one looks at it.

Let’s go back to the idea of storytelling.

What makes a story? Dave Trott, in one of his excellent blog posts, quotes Steven Pressfield’s simple version. A story consists of Hook, Build, and Payoff.

If we apply that to data storytelling I think it makes it easier for us to choose our place in the triangle.

  • Hook: we need to capture the attention of our audience, with something relevant and/or fascinating. This is where McCandless excels.
  • Build: there should be enough depth to reward engagement with the data.
  • Payoff: there’s got to be a reason for looking. What am I going to do differently as a result of spending time with this data?





Tagged , , , , ,

Are you measuring importance right?

One of the universal assumptions about customer experience research is that the topics on your questionnaire are not equally important.

It’s pretty obvious, really.

That means that when we’re planning what to improve, we should prioritise areas which are more important to customers.

Again, pretty obvious.

But how do we know what’s important? That’s where it starts to get tricky, and where we can get derailed into holy wars about which method is best. Stated importance? Key Driver Analysis (or “derived importance”)? Relative importance analysis? MaxDiff?

An interesting article in IJMR pointed out that these decisions are often made, not on the evidence, but according to the preferences of whoever the main decision maker is for a particular project.

Different methods will suggest different priorities, so personal preference doesn’t seem like a good way to choose.

The way out of this dilemma is to stop treating “importance” as a single idea that can be measured in different ways. It isn’t. Stated importance, derived importance and MaxDiff are all measuring subtly different things.

The best decisions come from looking at both stated and derived importance, using the combination to understand how customers see the world, and addressing the customer experience in the appropriate way:


  • High stated, low derived – a given. Minimise dissatisfaction, but don’t try to compete here.
  • Low stated, high derived – a potential differentiator. If your performance is par on the givens, you may get credit for being better than your competitors here.
  • High stated, high derived – a driver. This is where the bulk of your priorities will sit. Vital, but often “big picture” items that are difficult to action.

That’s a much more rounded view than choosing a single “best” measure to prioritise, and more accurately reflects how customers think about their experience.

Tagged , , , , , , ,

Getting started with statistics

I’m often asked to recommend a good place to learn, or brush up on, the basics of statistics used in survey research.

It’s a difficult question, but I do have a couple of favourites.

The problem is that there are layers of understanding. Let’s take confidence intervals as an example.

Layer 1 – gist

It’s quite easy to understand them in simple terms, something like “the range within which we are 95% sure the true figure for the population would have fallen if we had spoken to everyone“.

This saves us from a completely naïve view of research.

Layer 2 – use

If you do a bit of reading and playing around with a calculator or Excel, you can soon figure out how to calculate confidence intervals correctly. You’ve learned that the 95% confidence interval for a mean is:


So now you can use confidence intervals with your own analysis.

Layer 3 – context

To get to the next layer of the onion, to understand the assumptions we have made, the conclusions we can safely draw, and the theory on which they’re based is much more difficult.

It’s worth investing the time.

One really good book is PDQ Statistics, which is a slim volume aimed at the intelligent layperson. It has a very practical bent, but also respects its reader enough to explain the basis on which ideas such as confidence intervals rest.

It has a clear explanation, for instance, of why statistical tests can only tell you the probability of getting the result you have given a hypothesis; rather than the probability of your hypothesis.

A more specialist book is Statistical Rules of Thumb. It’s aimed at practitioners, notably statistical consultants, as a reference text; and it’s extremely comprehensive.

It was from this book that I learned one of my favourite statistical tricks – the Rule of Threes. To quote the book:

Given no observed events in n trials, a 95% upper bound on the rate of occurrence is 3/n

This is fantastically useful.

Imagine you speak to 50 customers and none of them had a problem during their experience. Does this mean that you never create problems? Of course not. But how prevalent are they?

This trick lets us put a 95% upper bound on the rate of problems, in this instance at 3/50 = 6%

This is a really good example of the kind of conclusion that is only possible with a deep understanding of statistics.

Good statistical analysis is not theoretical naval-gazing, it helps us learn broad concrete truths about our customers.

Tagged , , ,

How segmentation can damage your customer experience

noun_254122Big organisations often seem to spend most of their time running segmentation projects.

Projects that will unlock deep insights into the motivation and behaviour of customers, drive up sales, and deliver exceptional, differentiated, customer service.

All of which would be wonderful, if it wasn’t the third big segmentation project in five years. This one won’t be any better than the last two, and secretly you know it.

Customer experience segmentation almost never works. Why? Because businesses assume that marketing segmentation and CX segmentation are the same thing.

What makes a good marketing segmentation?

Let’s start with what doesn’t make a good segmentation—lazy stereotypes. As Mark Ritson points out, that includes silly generalisations based on gender, age, or even generation.

“Clearly millennials as a generational cohort do exist – they are the two billion people on the planet born between 1981 and 2000. But the idea that this giant army all want similar stuff or think in similar ways is clearly horseshit.”

Mark Ritson

Good marketing segments are those which reliably predict which messages will resonate and who is most likely to respond, allowing businesses to target the right customers with the right messages.

Segmentation, in practice, is usually built on statistical tools such as cluster analysis or archetypal analysis, which brings us to our next question…

What makes a good statistical segmentation?

When we segment, we look for a way to group customers together that maximises the differences between groups while minimising the differences within groups. The customers in a group are not identical, but they should be similar to each other and dissimilar from people in other groups.


If you can find groups with big differences and little overlap, that’s brilliant. If not, even small differences in average responsiveness can be useful for targeted marketing (particularly in old-school direct mail).

Why? Because the cost of making the wrong judgement (i.e. not targeting someone who would have responded) is only the missed opportunity, it doesn’t do any harm. Marketing segments can be useful, even if they’re not very good. The same isn’t true for customer experience.

What makes a good CX segmentation?

Bad CX segments have the potential to harm your customer experience.

If you can genuinely find segments which are clearly separated, that’s great. More often, in reality, segments are barely differentiated, with a lot of overlap. Unlike choosing whether or not to send someone a piece of direct mail, making the wrong judgement about which customer experience you offer can have serious negative consequences.

Rather than tailoring the experience, bad segments make feel customers that they have been slotted into clumsy, stereotypical, boxes.


So should you give up on trying to segment customers? Not at all. But stick to a few rules for safe CX segmentation:

  • Segments should increase choice, not diminish it
  • Segments which reinforce stereotypes are usually toxic
  • Segments should clarify which needs exist, but…
  • Segments should not be boxes to put people in
Tagged , , , ,

p. values are bad for your health

A few months ago you may have seen a flurry of stories about the slimming benefits of chocolate.

It turned out to be a hoax, well documented here.

The key point is that, although it was a deliberate hoax, the methodology and statistics used were not unrepresentative of those used in real nutrition “studies”.

They used a randomised controlled trial, and the chocolate-eating group did lose weight significantly faster (as measured by the all-important p. value) than the control group.

So what’s the problem? To understand that, we need to understand what a p. value tells us.

Statistical significance means a small chance of being wrong

In simple terms we set a p.value to control how sure we want to be about a difference we have found. By convention we set it to 0.05, or 5%.

In other words, there is less than a 5% chance that we would have seen the scores we have if there was no real difference between the control group and the treatment group.

So far, so good.

The chance of being wrong is additive

The problem is that 5% chance adds up for every measure we look at. In this instance, the “researchers” measured a total of 18 things (weight, cholesterol, sleep quality,…).

That means that the chance of making a mistake goes up to 5% x 18 = 90%.

In other words, there is a 90% chance of seeing a large difference on one of these 18 measures, even if there was no real difference between the control group and the treatment group.

Robust research corrects for this problem using techniques such as the familywise error rate or false discovery rate.

Are you fooling yourself?

Statistical significance testing is an immensely powerful tool, but it is very dangerous when used for “fishing expeditions” dredging through hundreds of comparisons to turn up ones that are significant.

The answer is to be clear about whether your analysis is testing or generating an idea. If it’s the latter, then you need to test that theory with fresh data before having much confidence in it.

Tagged ,