AI for Revolutionizing Customer Care Routing System at Wix

Aug 29, 202411 min read

Call centers, similar to most queueing systems, are traditionally optimized to minimize customers waiting time. Nevertheless, anyone who ever waited on line for service knows that if there is something worse than waiting in line, it is waiting in vain just to be escalated to higher tier experts.

In reality, these two objectives of serving fast and serving effectively often collide, especially when the service requires a high level of expertise. Moreover, besides these two goals, additional aspects can also tilt the routing decision balance, such as fairness and workers’ workload.

In this post, I will share one of the most innovative AI projects in Wix customer care — Expert Smart Routing. It is a data-driven, end-to-end, Reinforcement Learning (RL) system for completely redesigning the way customers get served, resulting in significant improvement in overall customer satisfaction.

Introduction : The Customer Care Landscape at Wix

Wix, the leading site creation and business management platform, serves hundreds of millions of customers in nearly 200 countries worldwide. Thousands of Wix experts serve clients in 10 languages via three possible contact channels — callbacks, live chats and posts (emails). They cover dozens of distinct topics, such as billing and payments, site setup, business management, SEO, etc. Naturally the user intent also varies, from deep technical troubleshooting, “how to” advice or advanced tips to have a more successful business.

Such a heterogeneous Customer Care (CC) system requires a smart routing mechanism, matching each user’s service request with an adequate expert who is applicable and eligible for the channel and the relevant language, and is educated in that specific domain. The traditional routing mechanism optimizes waiting time within the hard constraints of topic expertise.

AI Customer Care Routing System — Image 1: The three dimensions the customer care organization works and divided by. Image by author.

Problem Definition

We define the routing task as a recommendation problem, aiming to return top K expert candidates for a given ticket and system state. Each expert is qualified to work on one or more channels, and one or multiple topics.

The goal of each service system is to maximize customer satisfaction while minimizing costs, or alternatively, maximizing efficiency under a given cost. This satisfaction is influenced not just by the waiting time. Equally important are first interaction resolution, the expert knowledge and skills, his/her empathy, attentiveness, and authority [1 - see references at the end of the post].

Surprisingly, the waiting experience (e.g., callback vs online waiting) and waiting estimations communicated [2, 3] are more important than the actual waiting duration, that is, the perceived waiting time. Interestingly, user satisfaction and waiting time relation is asymmetric: waiting longer than expected leads to a minor decrease in satisfaction, whereas waiting shorter than expected substantially increases satisfaction [3].

On top of considering waiting time and/or servers’ efficiency, recent papers also mix abandonment rate (quantity served/lost), service quality or resolution rate [6, 9]. Here, our objective is to maximize a loss composed of five different KPIs:

Total waiting time: Time the user waits until the interaction begins. Naturally, we seek to minimize this time.
Churn/ abandon rate: When users do not respond at contact time and passively leave the queue. Whereas it directly correlates with waiting time, it also has a long term effect on user adherence and the perceived level of service availability.
Ticket resolution rate: Whether the issue was solved during interaction, or was transferred to another expert. While some transfers are unavoidable (for example, bugs in products detected by the user), others can be prevented. Most transfers are dependent on the expert knowledge and level of training.
Tiers matching : As in most technical service teams, experts are divided into tiers of expertise. High tiers are better used for more complex tickets such as escalations, that others cannot solve. The higher the tier, the lower their share in the population, creating a need for proper resource allocation.
Occupancy of experts: Aim to balance the work across the service agents, promote fairness, and avoiding “greedy” assignments that overload high quality experts while encouraging idleness of others.

Method

The routing problem we encountered with a heterogeneous population of agents has been looked at in a variety of ways over the years: from linear programming [9, 13], queue theory based methods [10, 11], dynamic programming [12], and more recently - with supervised learning [5, 6, 7, 9].

Recently, Reinforcement Learning (RL) was also researched in the routing context, mostly for vehicles [14] or cloud communication [15], and even for call routing [16]. Adopting the RL mindset, we treat the routing problem as a routing game, in which we route users’ tickets (service requests) to CC experts.

Why Reinforcement Learning?

The decision to go with RL will be elaborated further in another post. Briefly, the decision relies on three main reasons:

Prediction dependency: Since each prediction affects the following state, RL-based models can learn to optimize a few steps ahead.
Multi-objectives: Based on a collection of stochastic outcomes. No ground truth. This setting serves as a great fit for the RL framework.
Learning from a simulator & online tuning: Historical data derived from the current routing policy, that doesn’t allow us to explore different actions. Hence, a simulator is required. Online learning is required to keep the model up to date with frequent product changes and trends.

Why not reinforcement learning?

Two main challenges of RL compared to machine learning or other approaches:

Interpretability: RL models, especially deep learning based, are harder to explain. Here, as the system directly interacts with humans, explainability and transparency are vital for a successful deployment.
Complexity: RL was proven to be a challenging unstable task, especially when the number of possible actions is high. Here, the model action space is composed of hundreds of experts at any given time, creating a difficult setting to tackle.

The Simulator — Major Asset for Both Product & Research

Regardless of the modeling approach, the beating heart of the system is its simulator, both for evaluation and model training. Due to the inability to conduct an A/B test, the simulator is fundamental for the system evaluation, as it is the sole gatekeeper before production.

For training, it serves as a proxy to the real world, allowing the model to fully explore actions and states, without being chained to biased historical data. In addition, the simulator may serve as a playground for future product changes and improvements.

Furthermore, it has become such a powerful tool, it is also utilized for optimizing preceding processes in the CC funnel, such as workforce allocation and scheduling.

Image 2 demonstrates the dual evaluation executed for each simulator version:

Simulation evaluation: Approximating its error to real life by comparing simulated results on a validation set, between the current routing policy and the KPIs observed during that time on production.
Policy evaluation: Simulating and comparing current policy vs another policy on same test scenarios or dataset.

The more effort put in the simulator, the more real-life processes can be modeled, and higher accuracy can be achieved. That said, any parameter or a process added also requires validation and maintenance. We highly recommended conducting thorough analysis to estimate the impact of each piece, and prioritizing accordingly.

Image 2: The dual evaluation executed for each simulator version: simulation evaluation and policy evaluation. Image by author.

Modeling CC Routing System for RL

For those who are complete strangers to Reinforcement Learning, I recommend a brief read on the excellent Reinforcement Learning 101 tutorial by Shweta Bhatt [4].

Briefly, the RL architecture involves a model agent that outputs an action upon a given state, for each time t. The action is processed by the environment - either a physical system or a simulation of it, and outputs the reward resulting from this action, along with its new state. Interestingly, the reward may be reported in a delay and in a noisy manner.

For the common example of chess, the agent predicts the next move (action) to recommend, based on the given state of the board game. While the new state is returned once action was executed by the game engine, the true reward of this action is quite hard to estimate, especially right after it was performed.

AI for Revolutionizing Customer Care Routing System at Wix — Image 3: The RL paradigm and loop. Image by author.

In order to formulate the problem into the RL framework, we need to define its three core elements.

The State

Provides a complete description of the system, its entities and their attributes. Since the state is the full input to the RL agent to predict its next action, it contains everything meaningful to the routing decision making: the experts, queues, time, etc (Image 4). The exact representation is up to you, recommended to be rich, yet lean as possible.

AI for Revolutionizing Customer Care Expert Routing at Wix — Image 4: Demonstration of the system state. Image by author.

Actions Space

Define the set of possible actions the model can take. It is extremely hard to train an RL model on large action space, as each action has to be observed in many states to learn its state-value relationship, known as the curse of dimensionality. A possible mitigation is to narrow the action space, focusing on which channel to route the ticket to [8, 16] or to a group of experts to [9].

For our use case, we defined the action space as the pool of experts who can be assigned to the service request. Practically, it defines a dynamic action space, composed of hundreds of possible actions, changing both in their identity and size. This variety raised the need for an approach that is agnostic to the amount, identity and order of experts.

The Reward

Each of the five previously mentioned KPIs is associated with a score for every possible outcome. These scoring rules compose the reward — which is in the foundation of the RL solution. It associates each possible routing decision with its value, serving as a beacon for training models to differentiate between “good” or “bad”.

The KPIs can be divided into three main forces, creating a triangle trade-off: (1) fast time to service, and (2) high quality of service, and (3) fairly distributed workload across experts. For example, one can assign all tickets to the best expert queues, but then both waiting time and work balance will be severely harmed.

Interestingly, in Wix’s case, each channel has its own spot within this triangle, reflecting different objectives, demands and priorities. Therefore, we coupled each channel with its own reward. Analogically, we can imagine each channel as a different mode of the same game, as in games in which one can play in “last man standing” or “catch the flag”. We illustrate the RL paradigm on our use case on image 5:

Image 5: demonstration of the RL system loop. Image by author.

Research & Development Process

The first building block of this project is the system design. It elaborates how the routing system should work, its product flows, edge cases, API protocols, etc. This also covers the reward weights definition - our (and the model’s) compass.

Second in order is having the infrastructure to facilitate RL model predictions, managing events, maintaining an async state, etc. It also covers a toggle that allows us to quickly fallback to the current system in case we have to.

Third, extensive data collection takes place. It covers everything we need to parametrize the system to mimic production flows and entities: from shifts creation, service duration and ticket resolution, experts breaks, churn of tickets, and much more. Naturally, many of the phenomena require an ad-hoc deep analysis to unveil its underlying mechanics and distributions.

In parallel to the data collection, simulation programming is kicked off— implementing the flows, integrating data and estimated parameters. The output of this step is the simulator's approximation error relative to real-life scenarios based on the current CC system.

This phase served as a major checkpoint in the process that required a few quick iterations of going back and revisiting earlier steps’ assumptions and quality. Once the simulator is validated, evaluated and freezed, we can proceed to the calibration phase and eventually to the RL model training. In practice, prior to the RL model, we first developed a simpler, highly interpretable model that allowed us to bring value faster, examine project assumptions and experience with real data. Both the RL and the “alpha” version will be covered more extensively in a future post.

One Challenge to Share: Embracing a New Mindset

As the current system (and mainstream industry) is waiting-time oriented, the proposed solution requires re-alignment across the CC organization. Operationaly, it also demands collaboration and human adherence, otherwise it may put the entire project in jeopardy.

Easing this transition starts with maintaining direct communication with the product team and investing in mutual education. On one hand — sharing DS knowledge in AI and RL with the team helped achieve a collaborative environment, more transparency and fewer misconceptions. On the other hand, the care team was nurturing domain expertise among data scientists, achieving better understanding of the organization's pain points and concerns. Sharing these points of views was found to be highly effective, either for brainstorming, designing the solution, handling edge cases and leveraging deliveries to bring value to other projects.

One step that was particularly valuable in embracing the project among the CC organization was a scenario based survey, done in real time with the CC management forum. Its purpose was to raise dilemmas and lead to discussion, exposing the weaknesses of current methodology, and setting expectations of the new approach. It also helped verifying the reward principles are aligned with the management majority.

Lastly, this drastic change called for a smooth and gradual deployment strategy, without damaging harmony and customer experience. Therefore, we started the callbacks channel - highest in its impact (traffic), yet flexible in terms of waiting time. Each time a new piece of the project was introduced, we collected feedback from both users and CC experts that were translated into action items, either for backend, experts UI, the simulator or the policy itself. For callbacks, we were able to fully deploy the solution after a single short test.

Results

It took us one test in production before going fully live to reach a much better equilibrium than we had before. We traded-off more focus on resolution rate and tier matching, at the expense of longer waiting time. We showed significant growth in customer satisfaction, reinforcing our hypothesis that satisfaction is influenced by more than just waiting time.

Fueled by this success, we will soon deploy a second model, this time for the chats channel (with a different reward), alongside with additional models that utilize either the simulator or the model’s reward estimation for better efficiency throughout the CC funnel.

Summary

In this post I shared with you the journey we experienced in developing and deploying our Reinforcement Learning based solution for customer care routing. We demonstrated how complex CC systems have tradeoff objectives — time to serve, quality of service and workload balance, and hypothesized that focusing on waiting time alone will result in sub-optimal overall customer care satisfaction.

Encouraged by improving customer satisfaction, we are about to launch additional solutions that expand this innovative project. Soon, we will publish another post, a deep dive into the modeling journey of this project. So login to Wix.com, contact us, and experience our brand new routing system. Every new ticket is another step for getting better.

References

[1] Muldme, Anne, et al. "A survey on customer satisfaction in national electronic ID user support." 2018 International Conference on eDemocracy & eGovernment (ICEDEG). IEEE, 2018.

[2] Chicu, Dorina, et al. "Exploring the influence of the human factor on customer satisfaction in call centres." BRQ Business Research Quarterly 22.2 (2019): 83-95.

[3] Caruelle, Delphine, Line Lervik-Olsen, and Anders Gustafsson. "The clock is ticking—Or is it? Customer satisfaction response to waiting shorter vs. longer than expected during a service encounter." Journal of Retailing 99.2 (2023): 247-264.

[4] 101 tutorial by Shweta Bhatt

[5] Ali, Abbas Raza. "Intelligent call routing: Optimizing contact center throughput." Proceedings of the eleventh international workshop on multimedia data mining. 2011.

[6] Kleinerman, Akiva, Ariel Rosenfeld, and Hanan Rosemarin. "Machine-learning based routing of callers in an Israeli mental health hotline." Israel Journal of Health Policy Research 11.1 (2022): 25.

[7] Ilk, Noyan, Guangzhi Shang, and Paulo Goes. "Improving customer routing in contact centers: An automated triage design based on text analytics." Journal of Operations Management 66.5 (2020): 553-577.

[8] De Andrade, Rodrigo Caporali, Paul T. Grogan, and Somayeh Moazeni. "Simulation Assessment of Data-Driven Channel Allocation and Contact Routing in Customer Support Systems." IEEE Open Journal of Systems Engineering 1 (2023): 50-59.

[9] Mehrotra, Vijay, et al. "Routing to manage resolution and waiting time in call centers with heterogeneous servers." Manufacturing & service operations management 14.1 (2012): 66-81.

[10] Armony, Mor, and Constantinos Maglaras. "On customer contact centers with a call-back option: Customer decisions, routing rules, and system design." Operations research 52.2 (2004): 271-292.

[11] Ahghari, Mahvareh, and Bariş Balcioĝlu. "Benefits of cross-training in a skill-based routing contact center with priority queues and impatient customers." Iie Transactions 41.6 (2009): 524-536.

[12] Legros, Benjamin, and Oualid Jouini. "On the scheduling of operations in a chat contact center." European Journal of Operational Research 274.1 (2019): 303-316.

[13] Tezcan, Tolga, and Jiheng Zhang. "Routing and staffing in customer service chat systems with impatient customers." Operations research 62.4 (2014): 943-956.

[14] Zang, Xinshi, et al. "Metalight: Value-based meta-reinforcement learning for traffic signal control." Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 01. 2020.

[15] Sanabria, Pablo, et al. "Connection-Aware Heuristics for Scheduling and Distributing Jobs under Dynamic Dew Computing Environments." Applied Sciences 14.8 (2024): 3206.

[16] Liu, Zining, et al. "Which channel to ask my question?: Personalized customer service request stream routing using deep reinforcement learning." IEEE Access 7 (2019): 107744-107756.

This post was written by Ofir Magdaci

More of Wix Engineering's updates and insights:

Follow us on: Twitter | Facebook | LinkedIn | TikTok
Join our Telegram channel
Visit us on GitHub
Subscribe to our monthly newsletter
Subscribe to our YouTube channel
Follow our Medium publication
Listen to our podcast on Apple, Spotify or Google