Synopsis on “Machine Learning”. Probability theory. Bayes formula





Probability theory. Bayes formula



Let some experiment be conducted.



w1,...,wN - elementary events (elementary outcomes of an experiment).

\ Omega = \ {w_i \} _ {i = 1} ^ N - the space of elementary events (the set of all possible elementary outcomes of the experiment).



Definition 1:



Set system  Sigma is called a sigma algebra if the following properties are satisfied:



  1.  Omega in Sigma;
  2. A in Sigma Rightarrow overlineA in Sigma;
  3. A1,A2,... in Sigma Rightarrow bigcup limitsi=1 inftyAi in Sigma.


From properties 1 and 2 of Definition 1 it follows that  emptyset in Sigma . From properties 2 and 3 of Definition 1 it follows that  bigcap limitsi=1 inftyAi in Sigma space( because Ai in Sigma RightarrowSt.3 overlineAi in Sigma RightarrowSt.3 bigcup limitsi=1 infty overlineAi in Sigma Rightarrowsv.2 Rightarrowsv.2 overline bigcup limitsi=1 infty overlineAi in Sigma Rightarrow bigcap limitsi=1 inftyAi in Sigma).



Definition 2:





Probability Properties:



  1. P(A) leqslant1;
  2. P(A)=1P( overlineA);
  3. P( emptyset)=0;
  4. A subseteqB RightarrowP(A) leqslantP(B);
  5. P(A cupB)=P(A)+P(B)P(A capB);
  6. \ forall \ {A_i \} _ {i = 1} ^ N \\ \ space \ space P (\ bigcup \ limits_ {i = 1} ^ N A_i) = \ sum \ limits_ {i = 1} ^ NP ( A_i) - \ sum \ limits_ {i <j} P (A_i \ cap A_j) + \ sum \ limits_ {i <j <k} P (A_i \ cap A_j \ cap A_k) -... + \\ + ( -1) ^ {n-1} P (A_1 \ cap A_2 \ cap ... \ cap A_n);
  7. \ forall \ {A_i \} _ {i = 1} ^ \ infty \ colon (A_ {i + 1} \ subseteq A_i, \ space \ bigcap \ limits_ {i = 1} ^ \ infty A_i = \ emptyset) \ space \ space \ space \ lim \ limits_ {i \ to \ infty} P (A_i) = 0.


Definition 3:



( Omega, Sigma,P) - probability space .



Definition 4:



 forallA,B in Sigma:P(B)>0

 qquadP(A|B)= fracP(AB)P(B) - conditional probability of an event A subject to the event B .



Definition 5:



Let for \ {A_i \} _ {i = 1} ^ N where  foralli in overline1,NAi in Sigma , performed  foralli,j in overline1,N spaceAi capAj= emptyset and  bigcup limitsi=1NAi= Omega . Then \ {A_i \} _ {i = 1} ^ N called a partition of the space of elementary events.



Theorem 1 (total probability formula):



\ {A_i \} _ {i = 1} ^ N - partition of the space of elementary events,  foralli in overline1,N spaceP(Ai)>0 .

Then  forallB in Sigma quadP(B)= sum limitsi=1NP(B|Ai)P(Ai) .



Theorem 2 (Bayes formula):



\ {A_i \} _ {i = 1} ^ N - partition of the space of elementary events,  foralli in overline1,N spaceP(Ai)>0 .



Then  forallB in Sigma colonP(B)>0 quadP(Ai|B)= fracP(B|Ai)P(Ai) sum limitsi=1NP(B|Ai)P(Ai)= fracP(B|Ai)P(Ai)P(B) .



Using the Bayes formula, we can overestimate the a priori probabilities ( P(Ai) ) based on observations ( P(B|Ai) ), and get a whole new understanding of reality.



An example :



Suppose that there is a test that is applied to a person individually and determines whether he is infected with the “X” virus or not? We assume that the test was successful if it delivered the correct verdict for a particular person. It is known that this test has a probability of success of 0.95, and 0.05 is the probability of both errors of the first kind (false positive, i.e. the test passed a positive verdict, and the person is healthy), and errors of the second kind (false negative, i.e. the test passed a negative verdict, and the person is sick). For clarity, a positive verdict = test “said” that a person is infected with a virus. It is also known that 1% of the population is infected with this virus. Let some person get a positive verdict of the test. How likely is he really sick?



Denote: t - test result, d - the presence of the virus. Then according to the formula for total probability:





P(t=1)=P(t=1|d=1)P(d=1)+P(t=1|d=0)P(d=0).





By Bayes theorem:





P(d=1|t=1)= fracP(t=1|d=1)P(d=1)P(t=1|d=1)P(d=1)+P(t=1|d=0)P(d=0)== frac0.95 times0.010.95 times0.01+0.05 times0.99=0.16





It turns out that the probability of being infected with the "X" virus, subject to a positive test verdict, is 0.16. Why such a result? Initially, a person with a probability of 0.01 is infected with the “X” virus and even with a probability of 0.05 the test will fail. That is, in the case when only 1% of the population is infected with this virus, the probability of a test error of 0.05 has a significant impact on the likelihood that a person is really sick, provided that the test gives a positive result.



Bibliography:






All Articles