🥕 👒 💞 How Methodius became Anna: the experience of developing and launching voice message classifiers. Part 3 🔞 🧘🏻 🧦

Objective Series

Let me remind you that in the first and second posts, we got a model for classifying technical support calls and learned how to output it to the productive without collecting all the rakes. We came to the conclusion that before building complex models, you need to understand the completeness and accuracy of your data. And the conclusion №2 became like this: understand your user and then start the service will be much easier.

In this article we will talk about the second case, which Anna's voice robot helped us to solve.

Case No. 2. Task and data

After we understood the logic of people and filled up the bumps when introducing the first voice classifier, we were inspired to solve another problem.

The issue.

34% of calls from the sales department are transferred to the technical support service. I want to reduce the number of transfers between departments. First, let's figure out how it worked before? There is a call to the call center of the company, a check is made whether this number is known or not (does it exist in our crm). If the number is known to the company, then this is already our client, they sent a call to technical support, if the number is unfamiliar, then the call is routed to the sales department.

Such a check does not solve the problem. Nevertheless, the sales department still transferred the third part of the calls to technical support, because not all customer numbers are familiar to us. At least each of us has two SIM cards. Or, on the existing connection, it’s not the one who left their contacts who calls, but his relatives, but the question is technical, although the number for the company is not familiar.

Thus, it is required to develop a system that automatically distributes calls between technical support and the sales department based on the text spoken by the caller. The diagram below schematically shows the call processing algorithm.

The data were approximately the same as for the solution of the first case . Recognized phrases from calls received by the sales department were marked for the presence of a transfer to the technical support department. In this way, we wanted to separate technical issues from purchase / connection issues.

Case Solution

We trained various models and got the following quality.

Algorithm	Class	f-score
Logreg	sale	0.78
Logreg	support	0.69
Random forest	sale	0.75
Random forest	support	0.62
SVM	sale	0.71
SVM	support	0.62
XGBoost	sale	0.61
XGBoost	support	0.57
CNN	sale	0.76
CNN	support	0.63

As can be seen from the table, the quality is poor. You need to determine the sale with the highest possible quality, as this is the loyalty of future customers. It is categorically impossible to transfer a person who wants to purchase our services to technical support.

Difficulties of the decision. Re-layout

To improve the quality of classification, we decided to check whether the classes are separable by the vocabulary used in them. Conducted an analysis.

Frequently used word table before re-allocation

As you can see, most of the words are common to both classes. It was expected that all the technical words would be in the tech support class, but it turned out that in the “Sale” class there was even the word “reboot”. We began to understand the reasons for this. It turned out that often the operator of the sales department advised on light technical issues, without translating them into technical support, this resulted in incorrect markup.

We re-allocated the dataset and again unloaded the top words for each of the classes.

Frequently used word table after re-layout

It has become better, all the “technical” words in the “tech support” class are already, and the words accompanying the sale are in the “sales” class. We saw this on the quality of classification.

Algorithm	Class	f-score was	f-score, became
Logreg	sale	0.78	0.94
Logreg	support	0.69	0.87
Random forest	sale	0.75	0.92
Random forest	support	0.62	0.82
SVM	sale	0.71	0.93
SVM	support	0.62	0.86
XGBoost	sale	0.61	0.91
XGBoost	support	0.57	0.78
CNN	sale	0.76	0.93
CNN	support	0.63	0.86

Case No. 2. Output.

What is the conclusion of the article? Understand the business process that you influence . Yes, one could say that it is important to understand the data, because that is why we started re-partitioning. But if we figured it out beforehand in the process of making calls, we would immediately find out that the sales department operators are technically savvy and do not always transfer the call to tech support. So, to take the presence of a translation as a markup was not quite the right decision. Conclusion - understanding business processes is much more useful than mastering complex algorithms and solving small technical problems.

Results of a series of articles

We have implemented a system that understands the subject of the subscriber’s question and routes the call. We find out what the caller has a question, and if the question is technical, then we select the technical support operator who understands this topic. If the question is connected, then transfer to the sales department.

Why do we need all this? What have you achieved? First, we reduced the number of transfers between departments. The graph shows that on January 19 and 20 there were test days, and from February 7, the classifier was launched on an ongoing basis.

And secondly, we managed to develop a system with which it is comfortable to communicate with the robot. The latest audio examples in the second article are proof of this.

Conclusions of the three posts

Deal with data and markup
Understand system users
Understand the business process before changing it
Learn how to quickly test and respond to results

The last conclusion appeared after we realized how much time we spent from setting the task to the actual launch of the system. I wish everyone to shorten the hypothesis testing cycle and bring their work to production more quickly.

What's next? Our plans

We plan to understand not only the first phrase of the client, but also the following, in order to maintain a conversation and not bring “light” calls to the operator.

How Methodius became Anna: the experience of developing and launching voice message classifiers. Part 3