Control Charts: A ‘how to’ guide

A key component of Deming’s ‘Theory of Profound Knowledge’ is in relation to the measurement of performance (of a system) and the ‘Theory of Variation’.

I’ve noticed over the years that, whilst the foundational points around variation can be well understood, the use of control charts within operational practice can be ‘absolutely butchered’ (technical term 🙂 ).

 

This caused me to write a ‘how to’ guide a while back, for me and my colleagues.

I recently ‘dusted it down’ and tidied it up into a version 2.0 in order that I can share it more widely, for anyone who can find value within.

I attach it as a pdf document for anyone interested:

Control charts – a how to guide V2.0

It doesn’t replace the excellent writings of Donald Wheeler…though it hopefully makes you curious to ‘pull’ his writings towards you.

It doesn’t tell you what to measure…because it couldn’t!

It doesn’t ‘do it for you’…but, hopefully, it does give you enough so that you can experiment with doing it for yourself.

…and it can’t beat working alongside someone who knows what they are doing, and can act as your coach.

 

Note: If you do end up using/ sharing this guide then I’d be grateful if you could add a simple comment at the bottom of this page so that I am aware of this. Not because I’m going to invoice you (I’m not!)…but because I would find this knowledge useful (#feedback).

You might tell me: what you thought of it (warts and all), where you might use it, whether you have shared it with others (and whether they appreciated this or not!)… and if it has improved your measurement practices.

Thanks, Steve

Counts, categories and computations

This post sits squcalculatorarely within the ‘measurement’ section of this blog – a topic dear to me, given the vagaries of measurements that we are subjected to or are required to produce in our working lives1.

The catalyst for writing it was from revisiting a ‘Donald Wheeler’ chapter2 and reminding myself of being around some ‘daft work assignments’ of years ago.

I’ll start with an ordinary looking table that (let’s say) represents3 the feedback received by a presenter (Bob), after running a 1 hour session at a multi-day conference.

I’ve deliberately used a rather harmless-looking subject (i.e. feedback to a presenter) so that I can cover some general points…which can then be applied more widely.

Bobs presentation tableSo, let’s walk through this table.

Conference attendees were asked to evaluate Bob’s session against five perfectly reasonable questions, using a five-point rating scale (from ‘Poor’ through to ‘Excellent’). The body of the table (in blue) tells us the percentage of evaluators that awarded each rating per question (and, as you would expect, the ratings given for each question sum to 100%).

Nice, obvious, easy….but that table is sure hard to read. It’s just a blur of boring numbers.

Mmm, we’d better add some statistics3 (numbers in red)…to make it more, ahem, useful.

Pseudo-Average

So the first ‘analysis’ usually added is the ‘average score per question’. i.e. we can see that there is variation in how people score…and we feel the need to boil this down into the score that a (mythical) ‘average respondent’ gave.

To do this, we assume a numerical weighting for each rating (e.g. a ‘poor’ scores a 0…all the way up to an ‘excellence’ scoring a 4) and then use our trusty spreadsheet to crunch out an average. Looking at the table, Bob scored an overall 1.35 on the quality of her pre-session material, which is somewhere between ‘average’ (a score of 1) and ‘good’ (a score of 2).

…and it is at this point that we should pause to reflect on the type of data that we are dealing with.

“While numbers may be used to denote an ordering among categories, such numbers do not possess the property of distance. The term for numbers used in this way is ordinal data.” (Wheeler)

There is a natural order between poor, average, good, very good and excellent…however there is no guarantee that the distance between ‘excellent’ and ‘very good’ is the same as the distance from ‘good’ to ‘average’ (and so on)…yet by assigning numbers to categories we make distances between categories appear the same5.

If you compute an average of ordinal data then you have a pseudo-average.

“Pseudo averages are very convenient, but they are essentially an arbitrary scoring system which is used with ordinal data. They have limited meaning, and should not be over interpreted.”

Total Average

Okay, so going back to our table of Bob’s feedback: we’ve averaged each row (our pseudo-averages)…so our next nifty piece of analysis will be to average each column, to (supposedly) find out how Bob did in general…and we get our total average line. This shows that Bob mainly scored, on average, in the ‘good’ and ‘very good’ categories.

But what on earth does this mean? Combining scores for different variables (e.g. the five different evaluation questions in this case) is daft. They have no meaningful relationship between themselves.

It’s like saying “I’ve got 3 bikes and 10 fingers….so that’s an average of 6.5”. Yes, that’s what the calculator will say…but so what?!

“The total average line (i.e. computing an average from different variables) is essentially a triumph of computation over common sense. It should be deleted from the summary.”

Global Pseudo-Average

And so to our last piece of clever analysis…that table of numbers is quite hard to deal with. Is there one number that tells us ‘the answer’?

Well, yes, we could create a global pseudo average, which would be to compute a pseudo average from the total average line. Excellent, we could calculate a one-number summary for each presenter at our conference…and then we could compare them…we could even create a (fun!) league table 🙂

Oh, bugger, our Bob only got a 2.4. That doesn’t seem very good.

To compute a global pseudo-average would be to cross-pollinate the misleading pseudo-average with the nonsensical total average line and arrive in computation purgatory.

The wider point

which wayLet’s move away from Bob’s presentation skills.

Who’s seen pseudo-averages, total average lines and global pseudo-averages ‘used in anger’ (i.e. with material decisions being made) on ordinal data?

A classic example would be within software selection exercises, to (purportedly) compare competing vendors in a robust, objective and transparent manner.

  • In terms of pseudo averages, we get situations where 10 ‘nice to have’ features end up supposedly equaling 1 ‘essential’ function;
  • In terms of total average lines, we get variables like software functionality, support levels and vendor financial strength all combined together (which is akin to my bikes and fingers);
  • …and at the very end, the ‘decider’ between selecting Vendor A or B might go down to which one has been lucky enough to garner a slightly superior global pseudo-average. “Hey, Vendor B wins because they got 6.85”

The above example refers to software but could be imagined across all selection exercises (recruitment, suppliers,….).

Ordinal data is used and abused regularly. The aim of this short post is just to remind (or educate) people (including myself) of the pitfalls.

Side note: as a rule-of-thumb, my ‘bullshit-ohmmeter’ usually starts to crackle into life (much like a Geiger counter) whenever I see weightings applied to categories…

In summary

Before ‘playing with numbers’, the first thing we should do is think about what we are dealing with.

“In order to avoid a ‘triumph of computation over common sense’ it is important for you to think about the nature of your data…

…a spreadsheet programme doesn’t have any inhibitions about computing the average for a set of telephone numbers.”

Addendum: ‘Back to school’ on data types

This quick table gives a summary of the traditional (though not exhaustive) method of categorising numerical data:

Data types table

Footnotes

1. It’s not just our working lives: We are constantly fed ‘numbers’ by central and local government, the media, and the private sector through marketing and advertisement.

2. Wheeler’s excellent book called ‘Making Sense of Data: SPC for the service sector’. All quotes above (in blue) are from this book.

3. ‘Represents’: If you are wondering, these are not real numbers. I’ve mocked it up so that you can hopefully see the points within.

4. Statistic: “a fact in the form of a number that shows information about something” (Cambridge Dictionary).

We should note, however, that just because we’ve been able to perform a calculation on a set of numbers doesn’t make it useful.

5. Distance: A nice example to show the lack of the quality of distance within ordinal data is to think of a race: Let’s say that, after over 2 hours of grueling racing, two marathon runners A and B sprint over the line in a photo finish, whilst runner C crawls over the line some 15 minutes later…and yet they stand on the podium in order of 1st, 2nd and 3rd. However, you can’t comprehend what happened from viewing the podium.

Bobs presentation table26. Visualising the data: So how might we look at the evaluation of Bob’s session?

How about visually…so that we can easily see what is going on and take meaningful action. How about this set of bar graphs?

There’s no computational madness, just the raw data presented in such a way as to see the patterns within:

  • The pre-session material needs working on, as does the closure of the session;
  • However, all is not lost. People clearly found the content very useful;
  • …Bob just needs to make some obvious improvements. She could seek help from people with expertise in these areas.

Note: There is nothing to be learned within an overall score of ‘2.4’…and plenty of mischief.

 

 

 

 

How good is that one number?

Lottery ballsThis post is a promised follow up to the recent ‘Not Particularly Surprising’ post on Net Promoter Score.

I’ll break it into two parts:

  • Relevance; and
  • Reliability

Part 1 – Relevance

A number of posts already written have explained that:

Donald Wheeler, in his superb book ‘Understanding Variation’, nicely sets out Dr Walter Shewhart’s1 ‘Rule One for the Presentation of Data’:

“Data should always be presented in such a way that preserves the evidence in the data…”

Or, in Wheeler’s words “Data cannot be divorced from their context without the danger of distortion…[and if context is stripped out] are effectively rendered meaningless.”

And so to a key point: The Net Promoter Score (NPS) metric does a most excellent job of stripping out meaning from within. Here’s a reminder from my previous post that, when asking the ‘score us from 0 – 10’ question about “would you recommend us to a friend”:

  • NPS scaleA respondent scoring a 9 or 10 is labelled as a ‘Promoter’;
  • A scorer of 0 to 6 is labelled as a ‘Detractor’; and
  • A 7 or 8 is labelled as being ‘Passive’.

….so this means that:

  • A catastrophic response of 0 gets the same recognition as a casual 6. Wow, I bet two such polar-opposite ‘Detractors’ have got very different stories of what happened to them!

and yet

  • a concrete boundary is place between responses of 6 and 7 (and between 8 and 9). Such an ‘on the boundary’ responder may have vaguely pondered which box to tick and metaphorically (or even literally) ‘tossed a coin’ to decide.

Now, you might say “yeah, but Reichheld’s broad-brush NPS metric will do” so I’ve mocked up three (deliberately) extreme comparison cases to illustrate the stripping out of meaning:

First, imagine that I’ve surveyed 100 subjects with my NPS question and that 50 ‘helpful’ people have provided responses. Further, instead of providing management with just a number, I’m furnishing them with a bar chart of the results.

Comparison pair 1: ‘Terrifying vs. Tardy’

Below are two quite different potential ‘NPS question’ response charts. I would describe the first set of results as terrifying, whilst the second is merely tardy.

Chart 1 Terrifying vs Tardy

Both sets of results have the same % of Detractors (below the red line) and Promoters (above the green line)…and so are assigned the same NPS score (which, in this case would be -100). This comparison illustrates the significant dumbing down of data by lumping responses of 0 – 6 into the one category.

I’d want to clearly see the variation within the responses i.e. such as the bar charts shown, rather than have it stripped out for the sake of a ‘simple number’.

You might respond with “but we do have that data….we just provide Senior Management with the single NPS figure”….and that would be the problem! I don’t want Senior Management making blinkered decisions2, using a single number.

I’m reminded of a rather good Inspector Guilfoyle poster that fits perfectly with having the data but deliberately not using it.

Comparison pair 2: ‘Polarised vs. Contented’

Below are two more NPS response charts for comparison….and, again, they both derive the same NPS score (-12 in this case) …and yet they tell quite different stories:

Chart 2 Polarised vs Cotented

The first set of data uncovers that the organisation is having a polarising effect on its customers – some absolutely love ‘em …whilst many others are really not impressed.

The second set shows quite a warm picture of contentedness.

Whilst the NPS scores may be the same, the diagnosis is unlikely to be. Another example where seeing the variation within the data is key.

Comparison pair 3: ‘No Contest vs. No Show’

And here’s my penultimate pair of comparison charts:

Chart 3 No contest vs No show

Yep, you’ve guessed it – the two sets of response data have the same NPS scores (+30).

The difference this time is that, whilst the first chart reflects 50 respondents (out of the 100 surveyed), only 10 people responded in the second chart.

You might think “what’s the problem, the NPS of +30 was retained – so we keep our KPI inspired bonus!” …but do you think the surveys are comparable. Why might so many people not have responded? Is this likely to be a good sign?  Can you honestly compare those NPS numbers? (perhaps see ‘What have the Romans ever done for us?!’)

….which leads me nicely onto the second part of this post:

Part 2 – Reliability

A 2012 article co-authored by Fred Reichheld (creator of NPS), identifies many issues that are highly relevant to compiling that one number:

  • Frequency: that NPS surveys should be frequently performed (e.g. weekly), rather than, say, a quarterly exercise.

The article doesn’t, however, refer to the essential need to always present the results over time, or whether/ how such ‘over time’ charts should (and should not) be interpreted.


  • Consistency: that the survey method should be kept constant because two different methods could produce wildly different scores.

The authors comment that “the consistency principle applies even to seemingly trivial variations in methodologies”, giving an example of the difference between a face-to-face method at the culmination of a restaurant meal (deriving an NPS of +40) and a follow-up email method (NPS of -39).


  • Response rate: that the higher the response rate, then the greater the accuracy – which I think we can all understand. Just reference comparison 3 above.

But the article goes to say that “what counts most, of course, is high response rates from your core or target customers – those who are most profitable…” In choosing these words, the authors demonstrate the goal of profitability, rather than customer purpose. If you want to understand the significance of this then please read ‘Oxygen isn’t what life is about’.

I’d suggest that there will be huge value in studying those customers that aren’t your current status quo.


  • Freedom from bias: that many types of bias can affect survey data.

The authors are clearly right to worry about the non-trivial issue of bias. They go on to talk about some key issues such as ‘confidentiality bias’, ‘responder bias’ and the whopper of employees ‘gaming the system’ (which they unhelpfully label as unethical behaviour, rather than pondering the system-causing motivations – see ‘Worse than useless’)


  • Granularity: that of breaking results down to regions, plants/ departments, stores/branches…enabling “individuals and small teams…to be held responsible for results”.

Owch….and we’d be back at that risk of bias again, with employees playing survival games. There is nothing within the article that recognises what a system is, why this is of fundamental importance, and hence why supreme care would be needed with using such granular NPS feedback. You could cause a great deal of harm.

Wow, that’s a few reliability issues to consider and, as a result, there’s a whole NPS industry being created within organisational customer/ marketing teams3…which is diverting valuable resources from people working together to properly study, measure and improve the customer value stream(s) ‘in operation’, towards each and every customer’s purpose.

Reichheld’s article ends with what it calls “The key”: the advice to “validate [your derived NPS number] with behaviours”, by which he explains that “you must regularly validate the link between individual customers’ scores and those customers’ behaviours over time.”

I find this closing advice amusing, because I see it being completely the wrong way around.

Rather than getting so obsessed with the ‘science’ of compiling frequent, consistent, high response, unbiased and granular Net Promoter Scores, we should be working really hard to:

“use Operational measures to manage, and [lagging4] measures to keep the score.” [John Seddon]

…and so to my last set of comparison charts:

Chart 4 Dont just stand there do something

Let’s say that the first chart corresponds to last month’s NPS survey results and the second is this month. Oh sh1t, we’ve dropped by 14 whole points. Quick, don’t just stand there, do something!

But wait…before you run off with action plan in hand, has anything actually changed?

Who knows? It’s just a binary comparison – even if it is dressed up as a fancy bar chart.

To summarise:

  • Net Promoter Score (NPS) has been defined as a customer loyalty metric;
  • There may be interesting data within customer surveys, subject to a heavy caveat around how such data is collected, presented and interpreted;
  • NPS doesn’t explain ‘why’ and any accompanying qualitative survey data is limited, potentially distorting and easily put to bad use;
  • Far better data (for meaningful and sustainable improvement) is to be found from:
    • studying a system in operation (at the points of demand arriving into the system, and by following units of demand through to their customer satisfaction); and
    • using operational capability measures (see ‘Capability what?’) to understand and experiment;
  • If we properly study and redesign an organisational system, then we can expect a healthy leap in the NPS metric – this is the simple operation of cause and effect;

  • NPS is not a system of management.

Footnotes

1. Dr Walter Shewhart (1891 – 1967) was the ‘father’ of statistical quality control. Deming was heavily influenced by Shewhart’s work and they collaborated together.

2. Blinkered decisions, like setting KPI targets and paying out incentives for ‘hitting it’.

3. I should add that, EVEN IF the (now rather large) NPS team succeeds in creating a ‘reliable’ NPS machine, we should still expect common cause variation within the results over time. Such variation is not a bad thing. Misunderstanding it and tampering would be.

4. Seddon’s original quote is “use operational measures to manage, and financial measures to keep the score” but his ‘keeping the score’ meaning (as demonstrated in other pieces that he has written) can be widened to cover lagging/ outcome/ results measures in general…which would include NPS.

Seddon’s quote mirrors Deming’s ‘Management by Results’ criticism (as explained in the previous post).

’80 in 20’…erm, can we change that?!

80 in 20This is a bit of a ‘back to basics’ post, inspired by refreshing my memory from reading a superb book. It’s long…but hopefully interesting 🙂

Some years back I was working with a most excellent colleague, who managed a busy contact centre operation. Let’s call her Bob. She was absolutely committed to doing the best she could, for her staff and her customers.

Bob came to me one day for some help: Things weren’t going well, she had a meeting with senior management coming up and she was going to ask them to approve a radical thing – to change, by which I mean relax, their current call handling target.

I didn’t know too much about contact centres back then…so I started by asking some dumb questions. And it went something like this:

Me: “What’s this ‘80 in 20’ measure about?”

Bob: “It’s our main ‘Key Performance Indicator’ (KPI), called ‘Grade of Service’ (or GOS for short) and it means that we aim to pick up 80% of all incoming calls within 20 seconds of the customer calling.”

Me: “Oh…and where do these figures comes from?”

Bob: “It’s an industry recognised KPI. All ‘up to date’ contact centres use it to measure how they are doing and ‘80 in 20’ is Best Practise.”

Me: “…what ‘industry body’ and where did they get these figures?”

Bob: “The [insert name of a] ‘Contact Centre Association’…and I’ve got no idea where the figures come from.”

Me: “So, we have a target of picking up a customer’s call within an arbitrary 20 seconds…and we have an arbitrary target on meeting this target 80% of the time? …so it’s a target on a target?”

Bob: “Yes…I suppose it is…but we are having a real tough time at the moment and we hardly ever achieve it.”

Me: “Okay…but why do you want to ask senior management to ‘relax’ this target-on-a-target? What will this achieve?”

Bob: “Because we publish our GOS results against target for all our contact centre team leaders to see…and frankly there’s not much they can do about it…and this is really demoralising. If I could just get senior management to relax it to, say, 70% in 30 seconds then my staff could see that they at least achieve it sometimes.”

…and that’s how my discussion with Bob started.


I have just finished reading Donald Wheeler’s superb book ‘Understanding Variation – the key to managing chaos’ and my work with Bob1 all those years ago came flooding back to me…and so I thought I’d revisit it, and jot down the key points within. Here goes…

Confusing ‘Voice of the Customer’ and ‘Voice of the Process’

VoPI’ll start with clarifying the difference between the customer and the process. In the words of Donald Wheeler:

“The ‘voice of the customer’ defines what you want from a system.

The ‘voice of the process’ defines what you will get from a system.”

The difference in words is subtle, but in meaning is profound.

In Bob’s case, she has determined that customers want the phone to be picked up within 20 seconds2. However, this wishful thinking (a target) is completely outside the system. Bob could set the customer specification (target) at anything, but this has got nothing to do with what the process can, and will predictably3, achieve.

What we really want to see is what the system (‘handling4 customer calls’) is achieving over time.

A target is digital (on/off) – either ‘a pat on the back’ or ‘not good enough!’

On off switch “A natural consequence of this specification [target] approach…is the suddenness with which you can change from a state of bliss to a state of torment. As long as you are ‘doing okay’ there is no reason to worry, so sit back, relax, and let things take care of themselves. However, when you are in trouble, ‘don’t just stand there – do something!’ …This ‘on-again, off again’ approach is completely antithetical to continual improvement.” (Wheeler)

Unfortunately, Bob is constantly the wrong side of the (current) specification and therefore has the unwavering torment of ‘don’t just stand there – do something!’

But do what? And how would Bob know if whatever they try is actually an improvement or not? Using a target is such a blunt (and inappropriate) tool. Future results:

  • might ‘beat target’ (gaining a ‘pat on the back’) and yet simply be noise5; or
  • might still be lower than target (receiving another ‘kick’) and yet contain an important signal.

Bob cannot see the true effects of any experimentation on her system whilst relying on her current Industry best practise ‘Grade of Service’ KPI. She does not have a method to separate out potential signals from probable noise.

Thinking that a target can change things for the better

pressure“When people are pressured to meet a target value, there are three ways they can proceed:

  1. They can work to improve the system;
  2. They can distort the system; or
  3. They can distort the data.               

(Wheeler, referencing Brian Joiner)

What can a call agent do to ‘hit’ that target? Well, not much really. They can’t influence the number of calls coming in or what those customers want or need. They CAN, however, try to ‘get off the phone’ so as to get to the next call. Mmm, that’s not going to help the (customer-defined) purpose…and is probably likely to create failure demand, complaints and re-work…and make things worse.

What can the contact centre management (from team leaders and upwards to Bob) do to ‘hit’ that target? They could try to improve the system* (which, whilst being the right thing to do, is also the hardest) OR they could simply ask for the target to be relaxed. If they aren’t allowed to do either, then they might begin to ‘play games’ with the data…and hide what is actually happening.

* To improve the system, Bob needs contextual data presented such that it uncovers what is happening in the system…which will enable her to listen to the process, see signals, ask relevant questions, understand root cause, experiment and improve. She, and her team, cannot do this at present using her hugely limiting KPI.

In short, the target is doing no good…and probably some (and perhaps a lot of) harm.

It’s perhaps worth reflecting that “Bad measures = bad behaviours = bad service” (Vanguard)

What’s dafter than a target? A target on a target!

stop that its very sillyWhy? Well, because it removes us from the contextual data, stripping out the necessary understanding of variation within and thus further hiding the ‘voice of the process’.

It’s worth noting that, in Bob’s ‘20 seconds to answer’ target world:

  • A call answered in 3 seconds is worth the same as one answered in 19 seconds; and, worse
  • A call answered in 21 seconds is treated the same as one answered in, say, 480 seconds….and beyond…perhaps even an hour!

Note: I’ve added an addendum at the end of this post with a specific ‘target on a target’ example (hospital wait times). I hope that it is of use to demonstrate that using a ‘target on a target’ is to hide the important data underneath it.

“Setting goals [targets] on meeting goals is an act of desperation.” (Wheeler)

Worse still, a ‘target on a target’ can fool us into thinking that we are looking at something useful. After all, I can still graph it…so it must be good…mustn’t it?

Here’s a control chart of Bob’s ‘Grade of Service’ (GOS) KPI:

I-MR

 You might look at it and think “Wow, that looks professional with all that I-MR control charty stuff! I thought you said that we’d be foolish to use this ’80 in 20’ target on a target?”

You can see that Bob’s contact centre never met the ’80 in 20’ target-on-a-target6 (and, with the current system, isn’t likely to)…and you can perhaps see why she wants to ‘relax’ it to ’70 in 30’….but we can’t see what really happens.

What’s the variation in wait times? (times of day, days of week etc.)

Do some people get answered in 5 seconds? Is it common for some people to wait for 200 seconds? (basically, what’s actually gong on?!)

Is the variation predictable? Are there any patterns within?

Are those months really so comparable?…are any games being played?!

Okay, so I’ve shot at what Bob has before her…but what advice can I offer to help?

Does Bob need to change her ’80 in 20’ KPI?  Yes, she does….but not by relaxing the target.

‘The right data, measured right’ (‘what’, and ‘how’)

what how whyAt its very simplest, Bob’s measures need to help her (and her people) understand and improve the system.

To do this, they need to see:

WHAT matters to the customer? …which could be uncovered by:
“Don’t make me queue” Volume of calls, time taken to answer, abandonment rate.
“I want you to deal with me at my first point of contact” % of calls resolved at first point of contact (i.e. didn’t need to be passed on).
“Don’t put me on hold unnecessarily” % of calls put on hold (including reason types and frequencies).
“I want to deal with the right person (i.e. with the necessary knowledge, expertise, and authority)” % of calls passed on (including reason types and frequencies).
“I want you to action what you have promised, when I need it…and to do so first time.” Failure demand, either chasing up or complaining (including reason types and frequencies).

Now, Bob (and most contact centres) might reply “We already measure some of that stuff!”

Yes, I expect you do.

What also matters is HOW you measure it.  Measures should be:

  • shown over time, in chronological order (i.e. in control charts, to show variation), with control limits (to separate out signals from noise);

  • updated regularly (i.e. at meaningful intervals) and shown visually (on the floor, at the gemba), providing feedback to those working in the system;

  • presented/ displayed together, as a set of measures, to show the system and its interactions, rather than a ‘Grade of Service’ KPI on a dashboard;

  • monitored and analysed to identify signals, and consider the effect of each experimental change towards the customer purpose;

  • devoid of a target! The right measures, measured right will do just fine.

Why are control charts so important? Wheeler writes that:

“Instead of attempting to attach a meaning to each and every specific value of the time series, the process behaviour [i.e. control] chart concentrates on the behaviour of the underlying process.”

aeroplane dashboardWhy do we need to see a set of measures together? Simon Guilfoyle uses the excellent analogy of an aeroplane cockpit – you need to see the full set of relevant system measures to understand what is happening (speed, altitude, direction, fuel level…). There isn’t ‘One metric that matters’ and it is madness to attempt to find one.

Looking at Bob’s proposed set of capability measures (the table above), you can probably imagine why you’d want to see them all together, so as to spot any unintended consequences to changes you are experimenting with.

I.e. if one measure appears to be improving, is another one apparently worsening? Remember – it’s a system with components!

To summarise:

In a nutshellIf I am responsible for a process (a system) then I want to:

  • see the actual voice of the process;
  • get behind (and then drop) any numerical target;
  • split the noise from any signals within;
  • understand if the system is ‘in control’ (i.e. stable, predictable) or not; and
  • spot, and investigate any special causes7

and, perhaps more important, I want to:

  • understand what is causing the demand coming into the system (rather than simply treating all demand as work to be done);
  • involve all of the people in their process, through the use of visual management (done in the right way); and then
  • experiment towards improving it…safe in the knowledge that our measures will tell us whether we should adopt, adapt or abandon each proposed change.

Bob and I continued to have some great conversations 🙂


I said that I would add an addendum on the subject of ‘a target on a target’…and here it is:

Addendum: An example to illustrate the point

I’ll borrow two diagrams8 from a really interesting piece of analysis on NHS hospitals (i.e. in the UK) and their Accident and Emergency (A&E) wait times.

The first chart is of Alder Hey Children’s hospital. It shows a nice curve of the time it takes for patients to be discharged:

Alder Hey

The second chart is of Croydon University Hospital. Same type of chart, but their data tells a vastly different story!

Croydon

Q1: Do you think that an activity target has been set on the A&E system and, if so, where do you think it has been set?

I’d bet (heavily) that there is an A&E ‘time to discharge’ target, set from management above, of 4 hours (i.e. 240 minutes). It’s sort of evident from the first graph…but ‘smacks you between the eyes’ in the second.

Two further questions for you to ponder9: Looking at the charts for these two hospitals…


Q2: Which one has a smooth, relatively under control A&E system, and which do you think might be engaged in ‘playing (survival) games’ to meet the target?

I’d say that Alder Hey is doing rather well, whilst Croydon is (likely) engaged in all sorts of tricks to ship patients somewhere (anywhere!) ‘before the 4 hour buzzer’ – with a likely knock-on effect to patient experiences and outcomes;


Q3: Which one looks better on a ‘% of patients that met the 4 hour target’ league table? (i.e. a target on a target)

It is typical for health services to set an A&E ‘target on a target’ of, say, ‘95% discharged from A&E within 4 hours’10. This is just like Bob’s ‘80% in 20 seconds’.

Sadly, Croydon will sit higher up this league table (i.e. appear better) than Alder Hey!

If you don’t understand why, have a closer look at the two charts. Look specifically at the volume of patients being discharged after the 240 min. mark. Alder Hey has some, but Croydon has virtually none.

Foot notes

1. Just in case you hadn’t worked it out, she (or he) wasn’t called Bob!

2. Customer Target: Setting aside that the customer target shouldn’t (and indeed can’t) be used to improve the ‘handling calls’ system, I have two problems with the 20 second ‘customer specification’.

a. An industry figure vs. reality: rather than assuming that a generic industry figure of 20 seconds is what Bob’s customers want, I asked Bob to provide me with her call abandonment data.

I then graphed a histogram of the time (in seconds) that each customer abandoned their call and the corresponding volume of such calls. This provided us with evidence as to what exactly was happening within Bob’s system…which leads me on to:

b. An average customer vs. variety: There’s no such thing as ‘an average customer’ and we should resist thinking in this way. Some people were abandoning after a couple of seconds, others did so after waiting for two minutes. We can see that there is plenty of customer variety within – we should be thinking about how we can absorb that variety rather than meet some non-existent average.

3. Predictably, assuming that it is stable and there is no change made to the process.

4. Handling: I specifically wrote ‘handling’ and not ‘answering’. Customers don’t just want their call answered – they want their need to be met. To properly understand a system we must first set out its purpose from the customer’s perspective, and then use an appropriate set of measures that reveal the capability of the system against this customer purpose. ‘Answering calls’ may be necessary, but it’s not sufficient.

5. Noise vs. Signal: I’m assuming in this post that you understand the difference between noise and signals. If you don’t (or would like a refresh) then an earlier (foundational) posts on variation might assist: The Spice of Life

6. A clarification in respect of the example ‘I’ control chart: The Upper Control Limit (UCL) red line (at 80.55%) does not represent/ is not the 80% target. It just happens to be the case that the calculated UCL for Bob’s data works out to be nearly the same as the arbitrary target – this is an (unfortunate) fluke. A target line does not belong on a control chart!

7. Special Cause tests: The most obvious signal on a control chart can been seen when a point appears outside the upper or lower control limits. There are, however, other types of signals indicating that something special has occurred. These include ‘trends’, ‘shifts’, and ‘hugging’. Here’s a useful diagram (sourced from here):

special causes

8. Hospital charts: The full set of charts (covering 144 NHS hospitals for the period 2012-13) is here. I’ve obviously chosen hospitals at both extremes to best illustrate the point.

I can’t remember where I first came across these hospital charts – which annoys me!…so if it was via a post on your blog – I’m sorry for my crap referencing/ recognition of your efforts 🙂 

9. Here’s a 4th and final question to ponder: If, after pondering those two questions, you still think that a ‘target on a target’ makes sense then how do you cope with someone not always meeting it? Do you set them a target…to motivate them?

How about a target for the ‘target on a target’???

  • A 95% target of achieving an ‘80% of calls answered in 20 seconds’ target
  • A 90% target of achieving an ’95% of patents discharged within 4 hours’ target
  • ….

…and, if you are okay with this…but they don’t always meet it then how about setting them a target…where does the madness end?!

We are simply ‘playing with numbers’, moving ever further from reality and usefulness.

10. Hospital ‘Emergency department’ League tables:

Emergency tableHere’s a New Zealand ‘Emergency departments’ league table, ranking district health boards against each other (Source).

Notice that it shows:

  • A ‘target on a target’ (95% within 6 hrs)
  • A single quarter’s outcome
  • A binary comparison ‘with last quarter’
  • A (competitive) ranking

All of which are, ahem, ‘problematic’ (that’s me being polite 🙂

You can’t actually see how each district is performing (whether stable, getting better…or worse)

…and you certainly can’t see whether games are being played.