Random variable taking values in the set c -algebra of subsets called any measurable function , i.e the condition is satisfied \ xi ^ {- 1} (A) = \ {\ omega \ in \ Omega \ space \ colon \ space \ xi (w) \ in A \} \ in \ Sigma .
Definition 2:
The sample space is the space of all possible values of the observation or sample along with -algebra of measurable subsets of this space.
Designation: .
Defined on probability space random variables spawn in space probabilistic measures P_ \ xi \ {C \} = P \ {\ xi \ in C \}, P_ \ eta \ {C \} = P \ {\ eta \ in C \}, \ ldots On a sample space, not one probability measure is determined, but a finite or infinite family of probability measures.
In problems of mathematical statistics, a family of probability measures is known.\ {P_ \ theta, \ space \ theta \ in \ Theta \} defined in the sample space, and it is required to determine from the sample which of the probability measures of this family corresponds to the sample.
Definition 3:
A statistical model is an aggregate consisting of a sample space and a family of probability measures defined on it.
Designation: where \ mathscr {P} = \ {P_ \ theta, \ space \ theta \ in \ Theta \} .
Let and - selective space.
Sampling can be considered as a combination real numbers. We assign to each element of the sample a probability equal to .
Let
Definition 4:
An empirical distribution constructed from sample X is a probability measure :
I.e - the ratio of the number of sample elements that belong , to the total number of sample items: .
Definition 5:
Selective moment order called
- sample mean .
Definition 6:
Selective central moment of order defined by equality
- sample variance .
In machine learning, many tasks are to learn how to select a parameter from the data available. which best describes this data. In mathematical statistics, the maximum likelihood method is often used to solve a similar problem.
In real life, the error distribution often has a normal distribution. For some justification, we state the central limit theorem .
Estimates for mathematical expectation and variance were obtained.
If you look closely at the formula
we can conclude that the function assumes its maximum value when is minimal. In machine learning problems, the least squares method is often used, in which the sum of the squared deviations of the predicted values from the true ones is minimized.
Bibliography:
Lecture notes on mathematical statistics, author unknown;
“Deep learning. Immersion in the world of neural networks ”, S. Nikulenko, A. Kadurin, E. Arkhangelskaya, PETER, 2018.