Gestalt testing: a new approach to mailing list optimization based on Bayesian theory and machine learning





Multivariate (A / B / N, or split) testing is the most popular way to test mailing lists. This tool has proven to be effective, but it has drawbacks related primarily to the fact that testing and main shipping are separated in time.



For our part, DashaMail decided to influence the situation and found a different approach to testing mailings, which allows us to simultaneously test and optimize sending. He uses Bayesian theory, technology of neural networks and machine learning - as a result, all this allows to increase the openability of letters by an average of 20%.



Background



One of the tools to improve the effectiveness of email newsletters is testing. A lot of factors affect the openness of letters and audience engagement, including the subject of the message, sender’s name, distribution time, etc.



Not so long ago, at one of the brainstorming sessions, we came to the conclusion that machine learning algorithms so popular today can change the situation in testing mailings, namely, positively affect openness and engagement. The well-known split testing is by no means perfect, as we would like, but there really are options for improvements.



A / B / N tests are the main hypothesis test option in email marketing. The main difficulty: the results of such tests can always be analyzed only after the fact. This makes the whole process quite lengthy and time-consuming: first you need to send out several mailing options, then study the results, optimize the test parameters, and send it again. And there can be many such iterations.



But what if you create a way to simultaneously test and optimize? This idea gave birth to the Gestalt testing tool in DashaMail.



Bayesian approach: test and optimize on the fly



The response of subscribers to different message options received at different times can vary greatly. The winning option, determined as a result of a multivariate test, when sending the main newsletter may not be as effective.



To avoid this problem and to be able to take into account all the important parameters of the mailing in real time, the Bayesian approach to decision making and statistical evaluation was used. Yes, we at DashaMail are very fond of math and probability theory.



Bayes vs A / B / N tests



With A / B / N tests, on the one hand, everything is simple, and on the other, their accuracy can be very doubtful. Everything seems pretty straightforward: if we need to test, for example, the effectiveness of mailings with different designs, then, in the case of two options, we can send one of them to the subscriber’s base and the other to the second. Then analyze the results.



But you need to understand the minimum number of users that both options must see in order to obtain statistically significant results. Indeed, if it is enough to allocate only 20% of the subscriber base for the test, then for the remaining 80% we will be able to launch the most effective version of the letter and get the best result. But there is no guarantee that a simple allocation of two groups of 10% will give the correct result. If in one version of the letter there is more red color, then it may turn out that those who do not like this color accidentally fall into a group of 10% of users. Moreover, if more people participated in the test, this option could win. So we come to the concept of errors of the first and second kind - there are enough articles about them on Habré. These errors have their own probability of occurrence.



As a result, the analysis of this testing method leads to the fact that it does not eliminate the uncertainty at all, that is, the test does not give an exact answer to the question “What is better?” The work was done, but it did not become clearer.



In contrast to this method, the so-called Bayesian multi-armed bandits are used. The essence of this method is that it allows not only to conduct a hypothesis test, but also to get an answer to the question of which is more likely to be more effective. And what’s important: the estimates dynamically change in the same way as the sample sizes for each hypothesis are determined in real time (i.e. how much traffic / letters should be sent to test a specific option).



Imagine a situation when we came to a casino with slot machines of the “one-armed bandit” type. We have a limited amount of money, time is also not infinite. It is necessary as soon as possible to determine the "promising" machine, while at the same time with minimal costs. This is a multi-armed bandit task. There are many options for solving it, one of them is based on Thompson sampling and Bayes' theorem; it is described in detail in this article on Habré .



For mailing lists, this works as follows. In the process of testing two or more hypotheses (mailing options), we do not want to send too many letters with obviously losing parameters (in A / B tests you need to send equal shares). But at the same time, I would like to follow such variations, too, because there is a chance that over time they will start to work out better (at first they were just unlucky) and can even become leaders - and then more traffic will go to them.



This theory formed the basis of a new tool called Gestalt testing.







The main difference from traditional A / B testing: despite the fact that most letters go with the winning option, other options always have the last chance, because if the behavior pattern of the subscribers changes, you need to respond in time and send the most suitable option to the situation.



Gestalt testing is, in addition, the ability to use emotional marketing in newsletters, creating letters that are different in emotional coloring. It works like this: the email marketer sending the newsletter sets the basic topic, then you can choose to rephrase this topic in different emotions - there can be up to ten options (fear, gratitude, etc.).







The neural network paraphrases the text of the topic, using the given emotional colors, and offers them for consideration. In this case, the email marketer may make changes at his discretion.



An example of emotions and their respective topics, as well as indicators of discoveries for each of them:







After the start, the system begins to send letters in groups - each package contains all the proposed options. All mailing takes about 10 hours, a pack every half hour. As you can see, the tool is not suitable for short-term stocks that need to be sent out quickly. Rather, you can consider the option of medium-term promotions or content distribution. Statistics are available for each option - so you can immediately see what works better.



In the example below, according to the openings and clicks, the variant with the theme rewritten by the neural network in the emotions “love” leads: “You are the most beautiful in the office! -30% for office models from our selection. ” However, it also shows the highest among all other options, the unsubscribe rate. This may indicate that the content of the letter was weaker than the topic or we were able to catch the attention of the previously sleeping segment of subscribers.







Since sending out mailings with Gestalt testing is extended in time, testing of the specific moment of sending is also automatically carried out. Moreover, the service remembers which emotion of the newsletter and at what time each particular subscriber responds better, and during subsequent sendings using this functionality it will adapt to it. Therefore, over time, the effectiveness of using Gestalt testing increases.







Why does it work



The idea of ​​the new testing tool is that it allows you to take into account the fact that the recipients respond better to personalized and emotionally colored messages than to dry text.



At the same time, in Gestalt testing, machine learning methods are applied to all variants of topics. The most successful option during the test is used most actively, but other comparison participants get a little traffic. This allows you to monitor the behavior patterns of subscribers over time: it often happens that the topic, which gave good performance at one time, loses the rest of the options with a bang. If the system "detects" such a pattern change, then the newsletter will be optimized on the fly to maintain maximum efficiency.



Behavioral patterns are analyzed for each subscriber. Based on the discovery history of a particular recipient, an individual sending time is selected for him. Temporary patterns can also change - for example, a person may change the start and end time of a working day and the ability to check personal mail may occur at another time. The gestalt function automatically adjusts to such changes.



An important point: the Gestalt test is a method that requires a certain amount of data, otherwise it will be difficult to maintain high efficiency. That is why it is available only for databases of 10 thousand addresses and above.



Conclusion: what results you can count on



It sounds logical, but what results can you really count on with the proposed testing tool? Let's look at an example. This is how the report on the use of the Gestalt function for mailing looks: it includes the final open rate (OR), the result relative to the base topic and a comparison with the indicators that would have been achieved with the usual multivariate test with the same distribution of letters by subject.







According to statistics from DashaMail clients, the average increase in opening rates for mailings with Gestalt tests is 20%. Over time, the effectiveness of using this function grows, as the system learns and remembers at what time and which emotion a particular subscriber responds better, and as a result can increase the open rate (OR) of mailings by 1.5–2 times compared to basic theme.



Well, maybe you have a question: what does the term “gestalt” have to do with it? .. No, we did not close our gestalt, but decided to develop a tool for experiments with the mailing form. And translated from German, "gestalt" is a "form". Thus, it is possible through experiments with the form to come to an ideal mailing list.



To keep abreast of current trends in email marketing in Russia, to receive useful life hacks and our materials - subscribe to the DashaMail Facebook page and read our blog .



All Articles