What patterns do neural networks find?

In this post I want to talk about the patterns that neural networks can find. Many guides for beginners focus on the technique of writing code for neural networks, while questions of “logic” (what can neural networks? Which architectures are better suited for which tasks and why?) Often remain on the sidelines. I hope my post will help beginners better understand the capabilities of neural networks. To do this, we will try to see how they cope with some model tasks. Sample code will be provided in python using the keras library.

Task 1. Let's start with a simple one. We construct a neural network approximating the sine.

import numpy as np import matplotlib import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense def get_X_y(n): X = np.random.uniform(0, np.pi, n) y = np.sin(X) return X, y n = 40 X, y = get_X_y(n) print("X shape:", X.shape) model = Sequential() model.add(Dense(6, input_dim=1, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error']) model.fit(X, y, epochs=1000, batch_size=4) X_test = np.linspace(start=0, stop=np.pi, num=500) print("X test shape:", X_test.shape) y_test = model.predict(X_test) font = {'weight': 'bold', 'size': 25} matplotlib.rc('font', **font) axes = plt.gca() axes.set_ylim(0, 1) plt.plot(X_test, y_test, c='green', marker='o', markersize=5) plt.title("Sinus approximated by neural network") plt.yticks(np.arange(0, 1, 0.1)) plt.grid() plt.show()

We get the following chart:

As you can see, the neural network successfully coped with the task of approximating a simple function.

Task 2. Let's see how the neural network will cope with a more difficult task. We will input x values uniformly distributed on the interval [0, 1], and y will be set randomly: for x <0.6, y will be a random variable taking value 0 with a probability of 0.75 and 1 with a probability of 0.25 (that is, a binomial random value with p = 0.25). For x> 0.6, y will be a random variable taking the value 0 with probability 0.3 and the value 1 with probability 0.7. As an optimized function, we take the standard error.

 import numpy as np import matplotlib import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense def get_X_y(n): X = np.random.uniform(0, 1, n) y0 = np.random.binomial(size=n, n=1, p=0.25) y1 = np.random.binomial(size=n, n=1, p=0.7) y = np.where(X < 0.6, y0, y1) return X, y n_inputs = 1 n_hidden1 = 100 n_hidden2 = 50 n_outputs = 1 n = 2000 X, y = get_X_y(n) print("X shape:", X.shape) model = Sequential() model.add(Dense(n_hidden1, input_dim=1, activation='relu')) model.add(Dense(n_hidden2, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=200, batch_size=100) X_test = np.linspace(start=0, stop=1, num=100) print("X test shape:", X_test.shape) y_test = model.predict(X_test) font = {'weight': 'bold', 'size': 25} matplotlib.rc('font', **font) axes = plt.gca() axes.set_ylim(0, 1) plt.plot(X_test, y_test, c='green', marker='o', markersize=5) plt.title("Binomial distribution approximated by neural network") plt.yticks(np.arange(0, 1, 0.1)) plt.grid() plt.show()

We get the following graph of an approximated function neural network:

As you can see, the neural network approximated the mathematical expectation of our random variable y. So, neural networks can (in principle) approximate the average values of random variables depending on the parameters. For example, we can expect them to solve the following problem: people with incomes up to $ 1,000 are on average unhappy, and people with incomes above $ 1,000 are on average happy; one must learn to predict the "level of happiness" depending on income. The neural network will be able to find the dependence of the average level of happiness on income, despite the fact that among people with any income level there are both happy and unhappy.

Problem 3. Now we turn to the prediction of sequences. We will consider sequences of 0 and 1 defined by the following rule: 10 members - equiprobably 0 or 1, and the eleventh equals 1 if the previous term is 0, and equally probable 0 or 1 if the previous term 1. We will generate such sequences of length 11 (10 input sequence members and one, the last, we predict) and train them on our recurrent neural network. And after training, let’s check how she copes with the prediction on new sequences (also length 11).

 import numpy as np from keras.models import Sequential from keras.layers import LSTM, Dense def get_X_y(m, n): X = np.random.binomial(size=(m,n), n=1, p=0.5) y0 = np.ones(m) y1 = np.random.binomial(size=m, n=1, p=0.5) y = np.where(X[:, n-1]==0, y0, y1) X = np.reshape(X, (X.shape[0], X.shape[1], 1)) return X, y model = Sequential() model.add(LSTM(units=50)) model.add(Dense(units=1)) model.compile(optimizer = 'adam', loss = 'mean_squared_error') X_train, y_train = get_X_y(1000, 10) model.fit(X_train, y_train, epochs = 20, batch_size = 32) m_test = 12 n_test = 10 X_test, y_test = get_X_y(m_test, n_test) y_predicted = model.predict(X_test) for i in range(m_test): print("x_last=", X_test[i, n_test-1, 0], "y_predicted=", y_predicted[i, 0])

Let’s see what forecasts our neural network gives on the tested sequences (your results will be different, since here randomness is present both in the choice of sequences and in training the neural network).

Sequence number	Penultimate member of the sequence	Predicted value
one	0	0.96
2	0	0.95
3	0	0.97
four	0	0.96
5	0	0.96
6	one	0.45
7	0	0.94
8	one	0.50
9	0	0.96
10	one	0.42
eleven	one	0.44
12	0	0.92

As you can see, if the penultimate member of the sequence is 0, then the neural network predicts a value close to 1, and if it is 1, then a value close to 0.5. This is close to the optimal forecast. A similar example from "life" could look like this: "if I go to the cinema today, then tomorrow I will have lunch at a restaurant; if I go to the theater today, then tomorrow I’ll have lunch anywhere. ” As we have seen, a neural network can catch patterns of this type and predict a trip to a restaurant by going to the movies (and by going to the theater to predict “something in between”).

Task 4. We complicate the task of the neural network. Let everything be as in the previous example, only the eleventh member of the sequence will be determined not by the previous, but by the second member of the sequence (by the same rule). We will not give the code here, since it practically does not differ from the previous one. My experiment showed that the neural network still finds a pattern, but for more time (I had to use 100 epochs instead of 20 for training).

Thus, neural networks can (again, in principle, clarify) catch fairly long-term dependencies (in our “life example”, they can catch patterns like “I go to a restaurant today if I was in a movie a week ago”).

Task 5. Let's see how the neural network uses the available information for forecasting.

To do this, we will train on sequences of length 4. In total, we will have 3 different equally probable sequences:

Thus, after the initial combination of 0, 0, we always meet two units, after the combination of 0, 1 we are equally likely to meet 0 or 1, but we will know the last number for sure. We will now ask our neural network to return sequences by setting return_sequences = True. As the predicted sequences, we take our own sequences shifted by one step and supplemented by zero on the right. Now we can already assume what happens: at the first step, the neural network will produce a number close to 2/3 (since with a probability of 2/3 the second term is 1), and then for a combination of 0, 0 it will produce two numbers close to unit, and for 0, 1 first it will give out a number close to 0.5, and then it will give out a number close to 0 or 1, depending on whether we got the sequence 0, 1, 0 or 0, 1, 1. At the end of the neural network will always produce a number close to 0. Checking with the following code shows that our assumptions are correct.

 import numpy as np from keras.models import Sequential from keras.layers import LSTM, Dense import random def get_X_y(n): X = np.zeros((n, 4)) z = np.array([random.randint(0, 2) for i in range(n)]) X[z == 0, :] = [0, 0, 1, 1] X[z == 1, :] = [0, 1, 0, 1] X[z == 2, :] = [0, 1, 1, 0] y = np.zeros((n, 4)) y[:, :3] = X[:, 1:] X = np.reshape(X, (X.shape[0], X.shape[1], 1)) y = np.reshape(y, (y.shape[0], y.shape[1], 1)) return X, y model = Sequential() model.add(LSTM(units=20, return_sequences=True)) model.add(Dense(units=1)) model.compile(optimizer = 'adam', loss = 'mean_squared_error') X_train, y_train = get_X_y(1000) model.fit(X_train, y_train, epochs = 100, batch_size = 32) X_test = np.zeros((3, 4)) X_test[0, :] = [0, 0, 1, 1] X_test[1, :] = [0, 1, 0, 1] X_test[2, :] = [0, 1, 1, 0] X_test = np.reshape(X_test, (3, 4, 1)) y_predicted = model.predict(X_test) print(y_predicted)

From this example, we see that the neural network can dynamically change the forecast depending on the information received. We would do the same, trying to predict a certain sequence: when the available information allows us to estimate the probabilities of outcomes in the next step, we predict based on this information; but when we find out additional information in the next step, we change the forecast depending on it.

So, if we see that someone is coming to us from the darkness, then we say "this is a man, we don’t know in more detail"; when we begin to distinguish long hair in the dark, we say "this is probably a woman." But if after that we consider that a person has a mustache, then we say that this is probably a man (albeit with long hair). A neural network, as we have seen, acts similarly, using the entirety of the information currently available for forecasting.

So, we looked at simple examples of how neural networks work and what patterns they can find. In general, we saw that often neural networks behave quite “reasonably”, making predictions close to those that a person would make. Although, it should be noted, in order to catch simple patterns, they need much more data than people.

All Articles

What patterns do neural networks find?

More articles: