How Methodius became Anna: the experience of developing and launching voice message classifiers. Part 3

Objective Series



Let me remind you that in the first and second posts, we got a model for classifying technical support calls and learned how to output it to the productive without collecting all the rakes. We came to the conclusion that before building complex models, you need to understand the completeness and accuracy of your data. And the conclusion №2 became like this: understand your user and then start the service will be much easier.



In this article we will talk about the second case, which Anna's voice robot helped us to solve.



Case No. 2. Task and data



After we understood the logic of people and filled up the bumps when introducing the first voice classifier, we were inspired to solve another problem.



The issue.



34% of calls from the sales department are transferred to the technical support service. I want to reduce the number of transfers between departments. First, let's figure out how it worked before? There is a call to the call center of the company, a check is made whether this number is known or not (does it exist in our crm). If the number is known to the company, then this is already our client, they sent a call to technical support, if the number is unfamiliar, then the call is routed to the sales department.



image



Such a check does not solve the problem. Nevertheless, the sales department still transferred the third part of the calls to technical support, because not all customer numbers are familiar to us. At least each of us has two SIM cards. Or, on the existing connection, it’s not the one who left their contacts who calls, but his relatives, but the question is technical, although the number for the company is not familiar.



Thus, it is required to develop a system that automatically distributes calls between technical support and the sales department based on the text spoken by the caller. The diagram below schematically shows the call processing algorithm.



image



The data were approximately the same as for the solution of the first case . Recognized phrases from calls received by the sales department were marked for the presence of a transfer to the technical support department. In this way, we wanted to separate technical issues from purchase / connection issues.



Case Solution



We trained various models and got the following quality.

Algorithm Class f-score
Logreg sale 0.78
Logreg support 0.69
Random forest sale 0.75
Random forest support 0.62
SVM sale 0.71
SVM support 0.62
XGBoost sale 0.61
XGBoost support 0.57
CNN sale 0.76
CNN support 0.63


As can be seen from the table, the quality is poor. You need to determine the sale with the highest possible quality, as this is the loyalty of future customers. It is categorically impossible to transfer a person who wants to purchase our services to technical support.



Difficulties of the decision. Re-layout



To improve the quality of classification, we decided to check whether the classes are separable by the vocabulary used in them. Conducted an analysis.



Frequently used word table before re-allocation
image



As you can see, most of the words are common to both classes. It was expected that all the technical words would be in the tech support class, but it turned out that in the “Sale” class there was even the word “reboot”. We began to understand the reasons for this. It turned out that often the operator of the sales department advised on light technical issues, without translating them into technical support, this resulted in incorrect markup.



We re-allocated the dataset and again unloaded the top words for each of the classes.

Frequently used word table after re-layout
image



It has become better, all the “technical” words in the “tech support” class are already, and the words accompanying the sale are in the “sales” class. We saw this on the quality of classification.

Algorithm Class f-score was f-score, became
Logreg sale 0.78 0.94
Logreg support 0.69 0.87
Random forest sale 0.75 0.92
Random forest support 0.62 0.82
SVM sale 0.71 0.93
SVM support 0.62 0.86
XGBoost sale 0.61 0.91
XGBoost support 0.57 0.78
CNN sale 0.76 0.93
CNN support 0.63 0.86


As can be seen from the table, the quality is poor. You need to determine the sale with the highest possible quality, as this is the loyalty of future customers. It is categorically impossible to transfer a person who wants to purchase our services to technical support.



Case No. 2. Output.



What is the conclusion of the article? Understand the business process that you influence . Yes, one could say that it is important to understand the data, because that is why we started re-partitioning. But if we figured it out beforehand in the process of making calls, we would immediately find out that the sales department operators are technically savvy and do not always transfer the call to tech support. So, to take the presence of a translation as a markup was not quite the right decision. Conclusion - understanding business processes is much more useful than mastering complex algorithms and solving small technical problems.



Results of a series of articles



We have implemented a system that understands the subject of the subscriber’s question and routes the call. We find out what the caller has a question, and if the question is technical, then we select the technical support operator who understands this topic. If the question is connected, then transfer to the sales department.







Why do we need all this? What have you achieved? First, we reduced the number of transfers between departments. The graph shows that on January 19 and 20 there were test days, and from February 7, the classifier was launched on an ongoing basis.



image



And secondly, we managed to develop a system with which it is comfortable to communicate with the robot. The latest audio examples in the second article are proof of this.



Conclusions of the three posts



  1. Deal with data and markup
  2. Understand system users
  3. Understand the business process before changing it
  4. Learn how to quickly test and respond to results


The last conclusion appeared after we realized how much time we spent from setting the task to the actual launch of the system. I wish everyone to shorten the hypothesis testing cycle and bring their work to production more quickly.



What's next? Our plans



We plan to understand not only the first phrase of the client, but also the following, in order to maintain a conversation and not bring “light” calls to the operator.



image



image



image







All Articles