Category Archives: analysis

From drivers to design thinking

networkDriver analysis is great, isn’t it? It reduces the long list of items on your questionnaire to a few key drivers of satisfaction or NPS. A nice simple conclusion—”these are the things we need to invest in if we want to improve”.

But what if it’s not clear how to improve?

Often the key drivers turn out to be big picture, broad-brush, items. Things like “value” or “being treated as a valued customer” which are more or less proxies for overall satisfaction. Difficult to action.

Looking beyond key drivers, there’s a lot of insight to be gained by looking at how all your items relate to each other, as well as to overall satisfaction and NPS. Those correlations, best studied as either a correlogram (one option below) or network diagram (top right) can tell you a lot, without requiring much in the way of assumptions about the data.
In particular, examining the links between specific items can support a design thinking approach to improving the customer experience based on a more detailed understanding of how your customers see the experiences you create.

Your experiences have a lot of moving parts—don’t you think you ought to know how they mesh together?

Tagged , , , , , , ,

Telling the story with data


This is a diagram from my course about data presentation and infographics.

I use it as a starting point to discuss the skills you need to do the job well, summarised as “telling a compelling story with integrity”.

The idea of the diagram is that too much or too little of any of the three axes tends to be a bad thing.

For instance, too heavy on the “statistician” axis might mean that your charts are accurate and robust, but impenetrable to many people. Too light on the same axis, and you might be committing basic analytical mistakes (perhaps ignoring random measurement error).

It’s a rare person who embodies all of those skills to a truly expert level, which is one reason the best infographics often involve a team of people.


Finding your audience

It isn’t necessarily a case of shooting for the middle of the triangle. There’s a zone of acceptable variation around the middle in which competent and engaging data storytelling happens.

What’s appropriate for a scientific publication is not appropriate for your board, or for frontline staff. It’s all about getting the balance right for your audience.

Obvious? Yes, but it’s worth thinking about what it means in practice. Which “rules” of data storytelling are unbreakable, and which need to be tailored according to your audience?


How much do we know about what works?

Stephen Few takes a dim view of infographics which he sees as prioritising shallow gimmicks over effective visual communication. David McCandless has been on the receiving end of severe critiques.

He also points out that more work needs to be done to test which graphic forms are most effective, rather than relying on opinion. I agree – we can’t begin to pretend we’re working in a serious field until we approach these questions scientifically.

Robert Kosara has published interesting work showing that pie charts, much derided by experts, are more effective than we thought.

But is communication our only aim? Not always.


Telling the story

The science of which data graphics work most effectively is only part of the equation. The best graphic in the world is wasted if no one looks at it.

Let’s go back to the idea of storytelling.

What makes a story? Dave Trott, in one of his excellent blog posts, quotes Steven Pressfield’s simple version. A story consists of Hook, Build, and Payoff.

If we apply that to data storytelling I think it makes it easier for us to choose our place in the triangle.

  • Hook: we need to capture the attention of our audience, with something relevant and/or fascinating. This is where McCandless excels.
  • Build: there should be enough depth to reward engagement with the data.
  • Payoff: there’s got to be a reason for looking. What am I going to do differently as a result of spending time with this data?





Tagged , , , , ,

Are you measuring importance right?

One of the universal assumptions about customer experience research is that the topics on your questionnaire are not equally important.

It’s pretty obvious, really.

That means that when we’re planning what to improve, we should prioritise areas which are more important to customers.

Again, pretty obvious.

But how do we know what’s important? That’s where it starts to get tricky, and where we can get derailed into holy wars about which method is best. Stated importance? Key Driver Analysis (or “derived importance”)? Relative importance analysis? MaxDiff?

An interesting article in IJMR pointed out that these decisions are often made, not on the evidence, but according to the preferences of whoever the main decision maker is for a particular project.

Different methods will suggest different priorities, so personal preference doesn’t seem like a good way to choose.

The way out of this dilemma is to stop treating “importance” as a single idea that can be measured in different ways. It isn’t. Stated importance, derived importance and MaxDiff are all measuring subtly different things.

The best decisions come from looking at both stated and derived importance, using the combination to understand how customers see the world, and addressing the customer experience in the appropriate way:


  • High stated, low derived – a given. Minimise dissatisfaction, but don’t try to compete here.
  • Low stated, high derived – a potential differentiator. If your performance is par on the givens, you may get credit for being better than your competitors here.
  • High stated, high derived – a driver. This is where the bulk of your priorities will sit. Vital, but often “big picture” items that are difficult to action.

That’s a much more rounded view than choosing a single “best” measure to prioritise, and more accurately reflects how customers think about their experience.

Tagged , , , , , , ,

Getting started with statistics

I’m often asked to recommend a good place to learn, or brush up on, the basics of statistics used in survey research.

It’s a difficult question, but I do have a couple of favourites.

The problem is that there are layers of understanding. Let’s take confidence intervals as an example.

Layer 1 – gist

It’s quite easy to understand them in simple terms, something like “the range within which we are 95% sure the true figure for the population would have fallen if we had spoken to everyone“.

This saves us from a completely naïve view of research.

Layer 2 – use

If you do a bit of reading and playing around with a calculator or Excel, you can soon figure out how to calculate confidence intervals correctly. You’ve learned that the 95% confidence interval for a mean is:


So now you can use confidence intervals with your own analysis.

Layer 3 – context

To get to the next layer of the onion, to understand the assumptions we have made, the conclusions we can safely draw, and the theory on which they’re based is much more difficult.

It’s worth investing the time.

One really good book is PDQ Statistics, which is a slim volume aimed at the intelligent layperson. It has a very practical bent, but also respects its reader enough to explain the basis on which ideas such as confidence intervals rest.

It has a clear explanation, for instance, of why statistical tests can only tell you the probability of getting the result you have given a hypothesis; rather than the probability of your hypothesis.

A more specialist book is Statistical Rules of Thumb. It’s aimed at practitioners, notably statistical consultants, as a reference text; and it’s extremely comprehensive.

It was from this book that I learned one of my favourite statistical tricks – the Rule of Threes. To quote the book:

Given no observed events in n trials, a 95% upper bound on the rate of occurrence is 3/n

This is fantastically useful.

Imagine you speak to 50 customers and none of them had a problem during their experience. Does this mean that you never create problems? Of course not. But how prevalent are they?

This trick lets us put a 95% upper bound on the rate of problems, in this instance at 3/50 = 6%

This is a really good example of the kind of conclusion that is only possible with a deep understanding of statistics.

Good statistical analysis is not theoretical naval-gazing, it helps us learn broad concrete truths about our customers.

Tagged , , ,

p. values are bad for your health

A few months ago you may have seen a flurry of stories about the slimming benefits of chocolate.

It turned out to be a hoax, well documented here.

The key point is that, although it was a deliberate hoax, the methodology and statistics used were not unrepresentative of those used in real nutrition “studies”.

They used a randomised controlled trial, and the chocolate-eating group did lose weight significantly faster (as measured by the all-important p. value) than the control group.

So what’s the problem? To understand that, we need to understand what a p. value tells us.

Statistical significance means a small chance of being wrong

In simple terms we set a p.value to control how sure we want to be about a difference we have found. By convention we set it to 0.05, or 5%.

In other words, there is less than a 5% chance that we would have seen the scores we have if there was no real difference between the control group and the treatment group.

So far, so good.

The chance of being wrong is additive

The problem is that 5% chance adds up for every measure we look at. In this instance, the “researchers” measured a total of 18 things (weight, cholesterol, sleep quality,…).

That means that the chance of making a mistake goes up to 5% x 18 = 90%.

In other words, there is a 90% chance of seeing a large difference on one of these 18 measures, even if there was no real difference between the control group and the treatment group.

Robust research corrects for this problem using techniques such as the familywise error rate or false discovery rate.

Are you fooling yourself?

Statistical significance testing is an immensely powerful tool, but it is very dangerous when used for “fishing expeditions” dredging through hundreds of comparisons to turn up ones that are significant.

The answer is to be clear about whether your analysis is testing or generating an idea. If it’s the latter, then you need to test that theory with fresh data before having much confidence in it.

Tagged ,

On metadata

Metadata, information that describes data, is immensely important in big data analysis. Sometimes more important, or at least more accessible, than the data it describes.

A few years ago a very funny blog post did the rounds. Purporting to be the report of an 18th century British intelligence agent about the power of “Social Networke Analysis“, it shows how Paul Revere could have been identified as a key intelligence target simply by looking at which radicals belonged to which organisations.

You can read the full post here: Using metadata to find Paul Revere.

That’s just using a table of people who shared memberships. Imagine how much more powerful these techniques can be when applied to social media networks, or by looking at telephone and email records (note – not the content of the calls, just who is talking to who).

This is traffic analysis. A high-profile recent example hit the news when an FBI investigation found that CIA director David Petraeus was having an affair. How? The couple had exchanged draft emails in an anonymous Gmail account, but Petraeus’ mistress had logged into the account from hotel Wi-Fi networks. Cross-referencing guest lists and IP records was enough to expose the affair.

Knowing who you talk to tells us a lot about you. So much, in fact, that these techniques can accurately predict (for example) sexual orientation. As the abstract of the research puts it:

Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information.

What does this all mean for Customer Experience? Like much within the world of big data, it gives us powerful tools to use to help us understand customers. We can infer, sometimes very accurately, what sort of people customers are just by who they associate with.

It can also be very dangerous. We have a moral, and sometimes a legal, duty not to be creepy. We are not the FBI, and our customers are mostly not terrorists or philandering public figures.

If we use these tools, we must make sure we do so transparently, and that it’s for our customers’ benefit as well as ours.

Tagged , ,

On predictive analytics

At a recent conference one of our clients, Roger Binks from RSA, spoke about their use of predictive analytics to anticipate and prevent complaints.

Digging through the data they found that 70% of customers who had to phone in more than twice during a home insurance claim went on to make a complaint. 80% of storm and flood claimants who had not been contacted for over 6 weeks complained.

In other words, RSA knew that a chunk of customers was likely to complain before they actually did, which meant they could pre-empt the complaint by picking up the phone and calling the customer.

That saved the business money by replacing irate inbound calls with much more positive outbound calls, and it made customers more satisfied.

And the data had been there all along…they just needed someone to sift through it and make the connections.

That’s the true power of predictive analytics.

Not fancy statistical techniques. Simple analysis, applied to data you probably already have, which delivers instant benefits for your staff, your customers, and your business.

Tagged , , ,

On big data

I’ve been thinking about the future, and specifically about big data.

It seems to me that the phrase “big data” is mostly misapplied. Datasets large enough to merit the name are vanishingly rare for most businesses (governments and search engines are another matter).

More interesting for most of us are the specific techniques that are at play. Behind the jargon are some really useful subjects:

  • Predicting the future (“predictive analytics”)
  • Computers understanding the world (“machine learning”)
  • Data about data (“metadata”)

All three of these have untapped potential, which I’ll cover in the next few posts.

Before that there is one crucial point to make.

These techniques are useless unless you (a) have the data, (b) have access to the data, and (c) understand how it was obtained.

In my experience (a) is true much less often than companies believe. “Surely we must know…” is something you hear surprisingly often.

And (b) is even trickier. “It’s in the system, but the MI team can’t pull it out in the format we need.”

Finally, (c) is almost unheard of. “There’s a column here that says ‘product’…but half of the entries are missing. And some of them are in Sanskrit.”

Let’s not kid ourselves about big data before we get borderline competent at ordinary-sized data.

Tagged , ,