Probability theory. Bayes formula

Let some experiment be conducted.

w_{1}, . . ., w_{N}

$w_1, ..., w_N$ - elementary events (elementary outcomes of an experiment).

\ Omega = \ {w_i \} _ {i = 1} ^ N

$\ Omega = \ {w_i \} _ {i = 1} ^ N$ - the space of elementary events (the set of all possible elementary outcomes of the experiment).

Definition 1:

Set system

S i g m a

$\ Sigma$ is called a sigma algebra if the following properties are satisfied:

$\ Omega \ in \ Sigma;$
$A \ in \ Sigma \ Rightarrow \ overline {A} \ in \ Sigma;$
$A_1, A_2, ... \ in \ Sigma \ Rightarrow \ bigcup \ limits_ {i = 1} ^ \ infty A_i \ in \ Sigma.$

From properties 1 and 2 of Definition 1 it follows that

e m p t y s e t i n S i g m a

$\ emptyset \ in \ Sigma$ . From properties 2 and 3 of Definition 1 it follows that

b i g c a p l i m i t s_{i = 1}^{} i n f t y A_{i} i n S i g m a s p a c e (

$\ bigcap \ limits_ {i = 1} ^ \ infty A_i \ in \ Sigma \ space ($ because

A_{i} i n S i g m a R i g h t a r r o w_{S t . 3} o v e r l i n e A_{i} i n S i g m a R i g h t a r r o w_{S t . 3} b i g c u p l i m i t s_{i = 1}^{} i n f t y o v e r l i n e A_{i} i n S i g m a R i g h t a r r o w_{s v .2} R i g h t a r r o w_{s v .2} o v e r l i n e b i g c u p l i m i t s_{i = 1}^{} i n f t y o v e r l i n e A_{i} i n S i g m a R i g h t a r r o w b i g c a p l i m i t s_{i = 1}^{} i n f t y A_{i} i n S i g m a) .

$A_i \ in \ Sigma \ Rightarrow_ {St. 3} \ overline {A_i} \ in \ Sigma \ Rightarrow_ {St. 3} \ bigcup \ limits_ {i = 1} ^ \ infty \ overline {A_i} \ in \ Sigma \ Rightarrow_ {sv.2} \\ \ Rightarrow_ {sv.2} \ overline {\ bigcup \ limits_ {i = 1} ^ \ infty \ overline {A_i}} \ in \ Sigma \ Rightarrow \ bigcap \ limits_ {i = 1} ^ \ infty A_i \ in \ Sigma).$

Definition 2:

$A$ - event $\ forall A \ in \ Sigma;$
- probabilistic measure (probability) if:
1. $P (\ Sigma) = 1;$
2. $\ forall A \ in \ Sigma \ space \ space P (A) \ geqslant 0;$
3. $\ {A_i \} _ {i = 1} ^ \ infty, \ space A_i \ in \ Sigma, \ space A_i \ cap A_j = \ emptyset$ at $i \ not = j \ Rightarrow P (\ bigcup \ limits_ {i = 1} ^ \ infty A_i) = \ sum \ limits_ {i = 1} ^ \ infty P (A_i).$

Probability Properties:

$P (A) \ leqslant 1;$
$P (A) = 1-P (\ overline {A});$
$P (\ emptyset) = 0;$
$A \ subseteq B \ Rightarrow P (A) \ leqslant P (B);$
$P (A \ cup B) = P (A) + P (B) -P (A \ cap B);$
$\ forall \ {A_i \} _ {i = 1} ^ N \\ \ space \ space P (\ bigcup \ limits_ {i = 1} ^ N A_i) = \ sum \ limits_ {i = 1} ^ NP ( A_i) - \ sum \ limits_ {i <j} P (A_i \ cap A_j) + \ sum \ limits_ {i <j <k} P (A_i \ cap A_j \ cap A_k) -... + \\ + ( -1) ^ {n-1} P (A_1 \ cap A_2 \ cap ... \ cap A_n);$
$\ forall \ {A_i \} _ {i = 1} ^ \ infty \ colon (A_ {i + 1} \ subseteq A_i, \ space \ bigcap \ limits_ {i = 1} ^ \ infty A_i = \ emptyset) \ space \ space \ space \ lim \ limits_ {i \ to \ infty} P (A_i) = 0.$

Definition 3:

(O m e g a, S i g m a, P)

$(\ Omega, \ Sigma, P)$ - probability space .

Definition 4:

f o r a l l A, B i n S i g m a : P (B) > 0

$\ forall A, B \ in \ Sigma: P (B)> 0$

q q u a d P (A | B) = f r a c P (A B) P (B)

$\ qquad P (A | B) = \ frac {P (AB)} {P (B)}$ - conditional probability of an event

A

$A$ subject to the event

B

$B$ .

Definition 5:

Let for

\ {A_i \} _ {i = 1} ^ N

$\ {A_i \} _ {i = 1} ^ N$ where

f o r a l l i i n o v e r l i n e 1, N A_{i} i n S i g m a

$\ forall i \ in \ overline {1, N} A_i \ in \ Sigma$ , performed

f o r a l l i, j i n o v e r l i n e 1, N s p a c e A_{i} c a p A_{j} = e m p t y s e t

$\ forall i, j \ in \ overline {1, N} \ space A_i \ cap A_j = \ emptyset$ and

b i g c u p l i m i t s_{i = 1}^{N} A_{i} = O m e g a

$\ bigcup \ limits_ {i = 1} ^ N A_i = \ Omega$ . Then

\ {A_i \} _ {i = 1} ^ N

$\ {A_i \} _ {i = 1} ^ N$ called a partition of the space of elementary events.

Theorem 1 (total probability formula):

\ {A_i \} _ {i = 1} ^ N

$\ {A_i \} _ {i = 1} ^ N$ - partition of the space of elementary events,

f o r a l l i i n o v e r l i n e 1, N s p a c e P (A_{i}) > 0

$\ forall i \ in \ overline {1, N} \ space P (A_i)> 0$ .

Then

f o r a l l B i n S i g m a q u a d P (B) = s u m l i m i t s_{i = 1}^{N} P (B | A_{i}) P (A_{i})

$\ forall B \ in \ Sigma \ quad P (B) = \ sum \ limits_ {i = 1} ^ NP (B | A_i) P (A_i)$ .

Theorem 2 (Bayes formula):

\ {A_i \} _ {i = 1} ^ N

$\ {A_i \} _ {i = 1} ^ N$ - partition of the space of elementary events,

f o r a l l i i n o v e r l i n e 1, N s p a c e P (A_{i}) > 0

$\ forall i \ in \ overline {1, N} \ space P (A_i)> 0$ .

Then

f o r a l l B i n S i g m a c o l o n P (B) > 0 q u a d P (A_{i} | B) = f r a c P (B | A_{i}) P (A_{i}) s u m l i m i t s_{i = 1}^{N} P (B | A_{i}) P (A_{i}) = f r a c P (B | A_{i}) P (A_{i}) P (B)

$\ forall B \ in \ Sigma \ colon P (B)> 0 \ quad P (A_i | B) = \ frac {P (B | A_i) P (A_i)} {\ sum \ limits_ {i = 1} ^ NP (B | A_i) P (A_i)} = \ frac {P (B | A_i) P (A_i)} {P (B)}$ .

Using the Bayes formula, we can overestimate the a priori probabilities (

P (A_{i})

$P (A_i)$ ) based on observations (

P (B | A_{i})

$P (B | A_i)$ ), and get a whole new understanding of reality.

An example :

Suppose that there is a test that is applied to a person individually and determines whether he is infected with the “X” virus or not? We assume that the test was successful if it delivered the correct verdict for a particular person. It is known that this test has a probability of success of 0.95, and 0.05 is the probability of both errors of the first kind (false positive, i.e. the test passed a positive verdict, and the person is healthy), and errors of the second kind (false negative, i.e. the test passed a negative verdict, and the person is sick). For clarity, a positive verdict = test “said” that a person is infected with a virus. It is also known that 1% of the population is infected with this virus. Let some person get a positive verdict of the test. How likely is he really sick?

Denote:

t

$t$ - test result,

d

$d$ - the presence of the virus. Then according to the formula for total probability:

P (t = 1) = P (t = 1 | d = 1) P (d = 1) + P (t = 1 | d = 0) P (d = 0) .

$P (t = 1) = P (t = 1 | d = 1) P (d = 1) + P (t = 1 | d = 0) P (d = 0).$

By Bayes theorem:

P (d = 1 | t = 1) = f r a c P (t = 1 | d = 1) P (d = 1) P (t = 1 | d = 1) P (d = 1) + P (t = 1 | d = 0) P (d = 0) = = f r a c 0.95 t i m e s 0.01 0.95 t i m e s 0.01 + 0.05 t i m e s 0.99 = 0.16

$P (d = 1 | t = 1) = \ frac {P (t = 1 | d = 1) P (d = 1)} {P (t = 1 | d = 1) P (d = 1) + P (t = 1 | d = 0) P (d = 0)} = \\ = \ frac {0.95 \ times0.01} {0.95 \ times0.01 + 0.05 \ times0.99} = 0.16$

It turns out that the probability of being infected with the "X" virus, subject to a positive test verdict, is 0.16. Why such a result? Initially, a person with a probability of 0.01 is infected with the “X” virus and even with a probability of 0.05 the test will fail. That is, in the case when only 1% of the population is infected with this virus, the probability of a test error of 0.05 has a significant impact on the likelihood that a person is really sick, provided that the test gives a positive result.

Bibliography:

“Fundamentals of probability theory. Textbook ", M.E. Zhukovsky, I.V. Rodionov, Moscow Institute of Physics and Technology, MOSCOW, 2015;
“Deep learning. Immersion in the world of neural networks ”, S. Nikulenko, A. Kadurin, E. Arkhangelskaya, PETER, 2018.

Synopsis on “Machine Learning”. Probability theory. Bayes formula

Probability theory. Bayes formula

Bibliography:

More articles: