Musing On Our Metrics
By Ric Kosiba, Sharpen Technologies
We were talking to (cue name drop) the fantastic analyst Sheila McGee-Smith the other day, and we got onto the topic of metrics. She made a great point about how many of our metrics and concepts are derived from ideas from many years ago, that spread across the industry, and that some may not be as useful today as they once were.
The History of Service Level
When I was young and working in my first contact center, I got the opportunity to hang out with some of the pioneers of the call center industry — the engineers at AT&T. I had all sorts of questions about metrics, and why they were chosen, and what they actually measured. I was told this fun story about service level and the eighty-percent-within-twenty-second rule that has been the contact center standard for so many years.
It seems there was an engineer who was tasked to develop a standard report for all of AT&T’s automatic call distributors. Every customer on their call center system would get the very same report, printed on rented AT&T printers, on the ubiquitous green and white striped printer paper. And he had to decide what every single one of their customers would see in his report.
Given that there was no real precedent and, given that in the days before answering machines everyone would run to the phone to pick up a call as fast as possible, the answer times we were accustomed to were pretty short. So, he started paying attention to the calls he made, and the number of seconds it would take before he became impatient when the call wasn’t picked up — his personal irritation threshold. It was around four rings or 20 seconds, and he coded that in his standard report on the AT&T system. That became the rule, the acceptable amount of time to make someone wait when they called your company for the entire fledgling industry, all because we all used the very same report. Let’s call the AT&T engineer Carl. For years, we all staffed to hit Carl’s personal preference. Good for Carl.
When the ability to change the right-hand side of the service level equation was created in our updated platforms, few of us used it, for the standard was eighty-percent-within-twenty-seconds. It’s what everyone did.
It wasn’t until people started questioning the status quo (and we questioned the significant cost of an 80/20 service level) that companies began to abandon the old metric, probably around the early 2010’s.
At Sharpen, I’ve had the luxury of having the freedom to look at customer data and re-define our customers’ metrics based on their specific business questions. This has given me and my team the ability to challenge the status quo on some of our metrics and to make up new metrics. Here are some musings on metrics.
Service Standards
I’ve been working with a very smart data scientist, Lucas Schaller, who does wonders in turning contact center data into meaningful visuals. Figure 1 shows a fun graph he developed to answer an oft-asked question: does wait time affect CSAT survey results? We are simply plotting the percentage of low scores against wait time. Looking at this graph (CSAT scores on a scale of 1 to 5), for this specific customer, it seems that wait time might affect CSAT scores. The longer the wait, the more likely you will get a low CSAT score (outside of the interesting first bucket).
Figure 1. Percentage of calls rated 2 or less on CSAT survey
Now, to be clear, we may not be measuring what customers feel at all. Instead, we may be measuring the feelings of our agents. Higher wait times imply that agent occupancy is nearing 100% and our agents may be burning out and changing their behaviors, either being short with customers, rushing customers, or putting customers on hold, to give themselves a break. They may be providing customers with a poorer customer experience. Or just maybe the customers were fed up with their wait!
Figure 2 is a similar view, but instead of looking at low CSAT scores, we’ll look at high CSAT scores. We are measuring the percentage of CSAT scores of 5 (out of 5) plotted against customer wait time.
Figure 2. Percentage of CSAT scores of 5 by wait time
Again, this is interesting in that great service at this company seems to be consistent, until the customer waited 20 minutes or more. Is a 20-minute wait an inflection point on CSAT for this company?
So, this got us thinking about “bands” of service standards. Certainly, we could measure customer patience (the normal way to look at setting a service goal), or we could look at CSAT scores, like Lucas did here, or even use a gut feel, as the AT&T guy did years ago to measure “good service.” But are there other “bands” of service we should watch? Here is what we white-boarded:
- A maximum band related to the point at which customers’ CSAT is affected (time to CSAT degradation, like in Figures 1 and 2).
- A maximum band at which customers’ patience is affected (time to abandon).
- A minimum band at which customers do not notice a service difference, and if we provide service better than that we are overstaffed (no change in CSAT or abandons). This is interesting because I have never thought of service this way: Should we have a minimum service level, say 95% > 12 seconds or some such? If service is less than 12 seconds, the customer does not notice it, and we are spending money that has no benefit to the organization or the customer. So why should we staff higher? We could call it a burning money minimum threshold.
- A maximum service band related to our brand. There are some premium brands that simply must answer the phone quickly.
- Is there a minimum service band related to our brand? Meaning, we are expected to be busy, our customers know our service is highly complicated and technical, so they expect to wait to be picked up? I dunno.
These bands can be used to trigger actions, like if someone waits past our maximum service band, we could add them to the surprise and delight list (for routing next time) and send them a coupon or a text message. It would be nice to have automation bots perform these checks and actions.
I’ve even heard of a scenario where someone had a peculiar routing rule: once a customer has waited long past a threshold of terrible service, a company might move them lower in the queue, knowing that person’s experience is already awful, but the more recent callers still can be saved. Sounded weird, but is this logic sound?
Active Call Resolution (ACR) and Forecasting Volumes
Here is a brain teaser: what relationship does Active Contact Resolution (ACR), which we all should be measuring, have with our volume forecast? If you recall from our previous On Target articles, we are very bullish on ACR as a metric, and our ability to improve it. Improving ACR has, as a cool side effect, the ability to reduce future calls, most often significantly.
While all the forecasting tools — even the “AI” ones — use history as a guide, knowing the efficiency of your operation is still key to forecasting AHT and volumes correctly. In this case, any change in ACR directly impacts volume forecasts positively, and maybe AHT adversely. Meaning as you improve ACR, you need to lower your call volume forecasts, but might need to slightly increase your AHT forecasts. But ACR is unknown to your “AI” or your forecasting algorithms, and you may need to adjust your forecasts as you improve ACR.
Level of Effort
Service level tells companies how long their customers wait to be answered — is there a better metric that likely measures higher frustration? One of our smart customers, Aaron Feinberg, had this question: how hard are we to do business with? How much effort does a customer have to go through to get their issues handled? We brainstormed together to come up with some analytics around the question, and Figure 3 tries to answer the question from a time-on-the-phone-with-us perspective.
Figure 3. By unique phone call, the amount of time spent speaking to an agent (in minutes)
On this graph, we simply roll up how much time the customer spent with us before they stopped calling. This view
of the data presents an opportunity to learn — we can listen to these specific calls, and figure out what our agents could do to help answer their questions better. Do we have a process issue? A training issue? What’s fun is that, with a combination of sleuthing and incenting agents, Aaron has improved this picture by improving ACR by a whopping 16%! Reach out if you would like to chat about how he did it.
ACR-Weighted AHT
When we created the ACR metric, we worried about the effect on handle times from improving ACR. Would AHT increase? Seems logical that agents might spend more time with each customer. And sure enough, when we checked, it did, albeit only slightly for a few of our customers’ operations. But looking at level of effort and ACR together showed an interesting picture.
In Figure 4 we are plotting the average handle time, over time, for contacts that have achieved ACR against those that have not. You can see that calls without ACR, have a significantly lower AHT than calls with ACR. On average, putting more calls into the ACR-achieved category will likely raise handle times, which might raise a flag. Until the overall customer experience is examined. The top right line represents the sum of time the customer spent with us if ACR is not achieved, and it is significant and a customer experience problem.
Figure 4. The impact of ACR on customer level of effort
While the agent who does not achieve ACR, who does not handle the customer’s issues, will show higher productivity per call, they are clearly not achieving the company’s mission, and are adding to the load that another agent must then clean up. And the level of effort of the customer goes way up. Should we be measuring ACR-weighted AHT?
Personal Occupancy: Busyness
When the industry moved to an at-home workforce, we looked for ways to help management to control an invisible workforce. One question we heard frequently was: How do I measure who is working hard? Occupancy is a great measure for measuring the overall work level, and to help know if the WFM team is providing steady work environments. It is not a measure for an individual agent, however, since agents don’t control occupancy at all. So how do we know who is working hard? Adherence is a decent measure, but it does not get us there.
So, we started looking into how we would measure how to capture agents’ productivity while working from home. One way was to measure their personal occupancy—their pace of work. Including after-call work, which they control to a large degree, we can measure their personal time between calls, their pace. We called it “Busyness” but would welcome a better name!
Figure 5. The time between calls distribution for a specific agent
Figure 5 shows the distribution of time between calls for a specific agent. What was interesting is that there was a significant difference in busyness from agent to agent in most of the data sets we looked at, implying that agents certainly managed their own pace. You can average this measure, look at it by hour across the workweek, and develop outlier graphs. The goal is to find patterns and outliers in busyness and council appropriately.
Busyness does not capture the whole productivity picture, by a long shot — ACR, adherence, AHT, ACR-weighted AHT, and other metrics help paint the more complete picture.
We are in an interesting time, with new challenges and tools that change our operations significantly. The old metrics should be challenged — they just may not be telling the story we think they are.
Ric Kosiba is a charter member of SWPP and is the Chief Data Scientist at Sharpen Technologies. He can be reached at rkosiba@sharpencx.com or (410) 562-1217.
Sharpen Technologies builds the Agent First contact center platform, designed from the ground up with the agent’s experience in mind. A happy agent makes happy customers!