A simple explanation of the Bayes theorem

The Bayes theorem is detailed in a separate article . This is a great job, but it has 15,000 words. In the same translation of the article from Kalid Azad, the very essence of the theorem is briefly explained.

The results of research and testing are not events. There is a method for diagnosing cancer, and there is an event itself - the presence of the disease. The algorithm checks whether the letter contains spam, but the event (spam really came to the mail) must be considered separately from the result of its work.
There are errors in the test results. Often, our research methods reveal what is not (false positive result), and do not reveal what is (false negative result).
Through tests, we get probabilities of a certain outcome. We too often review the test results by themselves and do not take into account method errors.
False positive results distort the picture. Suppose that you are trying to identify some very rare phenomenon (1 case per 1,000,000). Even if your method is accurate, most likely, its positive result will actually be false positive.
It is more convenient to work with natural numbers. Better to say: 100 out of 10,000, not 1%. With this approach, there will be fewer errors, especially when multiplying. Suppose we need to continue to work with this 1%. Percentage arguments are clumsy: "in 80% of cases out of 1% got a positive outcome." Much easier information is perceived as: "in 80 cases out of 100 observed a positive outcome."
Even in science, any fact is just the result of applying a method. From a philosophical point of view, a scientific experiment is just a test with a probable error. There is a method that identifies a chemical substance or some phenomenon, and there is an event itself - the presence of this phenomenon. Our test methods can give a false result, and any equipment has its inherent error.

The Bayes theorem turns test results into probability events.

If we know the probability of an event and the probability of false-positive and false-negative results, we can correct measurement errors.
The theorem relates the probability of an event to the probability of a certain outcome. We can relate Pr (A | X): the probability of the event A, if the outcome is given X, and Pr (X | A): the probability of the outcome X, if the event is given A.

We will understand the method

The article referenced at the beginning of this essay deals with a diagnostic method (mammogram) detecting breast cancer. Consider this method in detail.

1% of all women suffer from breast cancer (and, accordingly, 99% do not get sick)
80% of mammograms reveal a disease when it really is (and, accordingly, 20% do not detect)
9.6% of studies reveal cancer when there is none (and, accordingly, 90.4% correctly determine a negative result)

Now let's create the following table:

	Sick (1%)	Do not get sick (99%)
Positive result of the method	80%	9.6%
Negative result of the method	20%	90.4%

How to work with this data?

1% of women suffer from breast cancer
if the patient has a disease, look in the first column: there is an 80% chance that the method gave the correct result, and a 20% chance that the test result is wrong (false negative)
if the patient has not identified the disease, look at the second column. With a probability of 9.6%, it can be said that a positive test result is incorrect, and with 90.4% probability it can be said that the patient is really healthy.

How accurate is the method?

Now we analyze the positive test result. What is the probability that a person is really sick: 80%, 90%, 1%?

Let's think:

There is a positive result. Let us analyze all possible outcomes: the result obtained can be either true positive or false positive.
The probability of a true positive result is: the probability of getting sick, multiplied by the probability that the test actually revealed the disease. 1% * 80% = .008
The probability of a false positive result is: the probability that there is no disease, multiplied by the probability that the method revealed the disease is wrong. 99% * 9.6% = .09504

Now the table looks like this:

	Sick (1%)	Do not get sick (99%)
Positive result of the method	True positive: 1% * 80% = .008	False positive: 99% * 9.6% = .09504
Negative result of the method	False negative: 1% * 20% = .002	True negative: 99% * 90.4% = .89496

What is the probability that a person is really sick if a mammogram positive result is obtained? The probability of an event is the ratio of the number of possible outcomes of an event to the total number of all possible outcomes.

event probability = event outcomes / all possible outcomes

The probability of a true positive result is .008. The probability of a positive result is the probability of a true positive outcome + the probability of a false positive.

(.008 + 0.09504 = .10304)

So, the probability of disease with a positive result of the study is calculated as: .008 / .10304 = 0.0776. This value is about 7.8%.

That is, a positive mammogram result only means that the probability of having a disease is 7.8%, not 80% (the latter value is only the estimated accuracy of the method). This result seems at first incomprehensible and strange, but you need to take into account: the method gives a false positive result in 9.6% of cases (and this is quite a lot), so the sample will have a lot of false positive results. For a rare disease, most positive results will be false positive.

Let's take a quick look at the table and try to intuitively grasp the meaning of the theorem. If we have 100 people, only one of them has a disease (1%). With this person, with 80% probability, the method will give a positive result. Of the remaining 99%, 10% will have positive results, which gives us, roughly speaking, 10 false-positive outcomes out of 100. If we consider all positive results, then only 1 out of 11 will be true. Thus, if a positive result is obtained, the probability of the disease is 1/11.

Above, we considered that this probability is 7.8%, i.e. the number is actually closer to 1/13, but here, using simple reasoning, we managed to find a rough estimate without a calculator.

Bayes theorem

Now we will describe the course of our thoughts with a formula, which is called the Bayes theorem. This theorem allows you to correct the results of the study in accordance with the distortion that false-positive results introduce:

P r (A | X) = f r a c P r (X | A) P r (A) P r (X | A) P r (A) + P r (X | n o t A) P r (n o t A)

$Pr (A | X) = \ frac {Pr (X | A) Pr (A)} {Pr (X | A) Pr (A) + Pr (X | not A) Pr (not A)}$

Pr (A | X) = probability of illness (A) with a positive result (X). This is exactly what we want to know: what is the probability of an event in the event of a positive outcome. In our example, it is equal to 7.8%.
Pr (X | A) = the probability of a positive result (X) in the case when the patient is really sick (A). In our case, this is the value of true positive - 80%.
Pr (A) = probability of getting sick (1%)
Pr (not A) = probability not to get sick (99%)
Pr (X | not A) = probability of a positive outcome of the study if there is no disease. This is the value of false positives - 9.6%.

It can be concluded: to get the probability of an event, the probability of a true positive outcome must be divided by the probability of all positive outcomes. Now we can simplify the equation:

P r (A | X) = f r a c P r (X | A) P r (A) P r (X)

$Pr (A | X) = \ frac {Pr (X | A) Pr (A)} {Pr (X)}$

Pr (X) is the normalization constant. She served us well: without her, the positive outcome of the tests would give us an 80% chance of an event.

Pr (X) is the probability of any positive result, whether it is a real positive result in the study of patients (1%) or a false positive in the study of healthy people (99%).

In our example, Pr (X) is a rather large number, because the probability of false-positive results is high.

Pr (X) produces a result of 7.8%, which at first glance seems counterintuitive.

The meaning of the theorem

We are testing to find out the true state of affairs. If our tests are perfect and accurate, then the probabilities of the tests and the probabilities of events will coincide. All positive results will be really positive, and negative - negative. But we live in the real world. And in our world, trials give wrong results. The Bayes theorem takes into account distorted results, corrects errors, recreates the population and finds the probability of a true positive result.

Spam filter

Bayes theorem is successfully used in spam filters.

We have:

event A - in a spam letter
the result of the test is the content in the letter of certain words:

P r (s p a m | w o r d s) = f r a c P r (w o r d s | s p a m) P r (s p a m) P r (w o r d s)

$Pr (spam | words) = \ frac {Pr (words | spam) Pr (spam)} {Pr (words)}$

The filter takes into account the test results (the content in the letter of certain words) and predicts whether the letter contains spam. Everyone understands that, for example, the word "Viagra" is more common in spam than in regular letters.

The blacklist-based spam filter has flaws - it often gives false positive results.

The spam filter based on the Bayes theorem uses a balanced and reasonable approach: it works with probabilities. When we analyze words in a letter, we can calculate the probability that the letter is spam, and not make decisions like “yes / no.” If the probability that a letter contains spam is 99%, then the letter really is.

Over time, the filter is trained on a larger sample and updates the probabilities. For example, advanced filters based on the Bayes theorem test multiple words in a row and use them as data.

Additional sources:

All Articles