Hands free, but not the phone. Obedient home when hands are not enough

Hello, Community!



I had a desire to pick out the microcontroller again and do something useful. The goal was formed almost immediately, because something strained me in the apartment.



As you know, a computer desk is also a dinner table, in order to watch Drobyshevsky or read Giktayms / Green Cat / etc. simultaneously with dinner. But there is a problem - I usually leave the kitchen with both hands busy, back too , because cups are accumulated in 3 pieces. Turning on and off the light in the kitchen (triple switch - kitchen / bath / toilet) has a shoulder, nose, little finger. That is inconvenient in any way, but it is impossible to rearrange it below. There was a task to manage somehow remotely.



All kinds of presence and passage sensors were swept away immediately - not that accuracy, there is no control at the will of the owner. The solution is found in the sound control, voice. I have to say right away that I did not plan to make a speech repentance, it is not needed here. The light that turns on cotton is described in Radio 80s, but I didn’t want to do that. It turned out a kind of handsfree, when the hands are busy. Details - next.



Hardware part



There was a board with Atmega32 with quartz and SEM0007M-32A peripherals and a scattering of electronics.



image






There was a microphone and an operational amplifier. For the output - a transistor in the sot32 package on the board, there is also a relay for 7 Amps. Everything is collected upihano in a box for business cards, the relay is paralleled with a switch, the microphone is hidden under the outlet. The scheme is banal, I did not even draw it. Only one analog input and one discrete output of MK are used. SEM is redundant, but for now let it be.



image




Fee and untied wires. Then he redid the neater.



image




The switch itself, the microphone is not visible under the dismantled outlet.



The search algorithm.



Purpose: the sensor must respond to the word, for example β€œ light! "With a minimum of code.

Task: to reveal the command word against the background of possible noise, knocking, clicking with the same switch. That is, simply amplitude analysis does not fit, but spectral analysis showed that there are too many harmonics in the word and they certainly change. Therefore, it was necessary to find a simple solution, but with an acceptable noise protection. You can make several time-frequency filters and comparison with a sample word, but there is no need to engage in recognition. It was decided to analyze the presence of only a vowel sound, for example, the sound β€œE” or β€œE”.



image




Sound "E". You can see a lot of harmonics, because of this, the analysis is difficult.



image




Sound "A". The spectrum looks cleaner, there is a main frequency.



Software part



In order to know the spectral components of the signal, you can use digital filters. There is a good program on the Internet for building digital FIR and IIR filters and calculating their coefficients - it’s clearly there and the C code is generated automatically.



But I refused from digital filters
For acceptable filtering (4 or more orders of magnitude), many coefficients were obtained, and even float. Something like this, plus all the calculations of the float filter:



float ACoef[NCoef+1] = { 0.00000347268864059354, 0.00000000000000000000, -0.00001389075456237415, 0.00000000000000000000, 0.00002083613184356122, 0.00000000000000000000, -0.00001389075456237415, 0.00000000000000000000, 0.00000347268864059354 }; float BCoef[NCoef+1] = { 1.00000000000000000000, -7.09708794063733790000, 22.77454294680684300000, -43.03321836036351300000, 52.29813665034108500000, -41.84199842886619100000, 21.53121088556695300000, -6.52398963450972500000, 0.89383378261285684000 };
      
      





The microcontroller might have done it, but there would be problems with debugging - it’s not easy to push the boundaries of the filter - these are new coefficients.


After some searches, I stopped at a single-frequency Fourier transform online. That is, the classical discrete Fourier transform, performed on the arrival of each signal sample with a sampling frequency (1600 Hz), does not pass through the frequencies, the frequency is one, so it is easy to adjust via RS-232 during adjustment. As a result, the analysis was made for a frequency of 128 Hz.



Due to the short samples (blocks) and the rectangular window, the frequency resolution turns out to be low, which gives selective sensitivity in the range of 114 ... 140 Hz, and this is the P-filter that I wanted to get.



First you need to understand where the voice command signal starts. To do this, the zero level of the signal is first calculated through exponential smoothing with a smoothing constant of 1/64. The code is below.



Part of the timer code for signal processing. 1600 Hz Timer Frequency
The signal is normalized to the average. To determine the level of sound intensity, the absolute values ​​of the signal are also averaged with a constant of 1/16, for HF filtering from individual half-waves of the signal (this is analogous to RMS, but easier to calculate). Exceeding this level above the threshold is the beginning of the voice command, and the sequential analysis of 5 blocks of 135 counts (84.3 ms) begins.



 // Timer 0 output compare interrupt service routine interrupt [TIM0_COMP] void timer0_comp_isr(void) { a = adc_data[0] << 2 ; //       4   . a0 = (a0*63 + a+ 63) >> 6; // .   "0".   10%  150  ae = (int)(a - a0); // a = ae; //      . ae = abs(ae); if (ae < 32) { //      ae = 0; }; d = (int)((15 * (long int) d + ae + 15) >> 4); //  .  //   10%  35  if (d > 100) { //    if (snd == 0) { Yz=0; snd++; } //    PORTB.1 = 1; //      }; .....
      
      





The figure below shows the signal, signal level, threshold and 5 blocks.



image






Interference protection



The signal is divided into blocks for protection against a pulse β€” a click or a knock. A pulse, as is known, has a uniform frequency response, that is, in any frequency band there will be a non-zero result and probably above the threshold. But hair lengths, yes impulse mind is short. That is, there will be no more impulse in the next block, which means that the level in the frequency band will be below the threshold. At the same time with this advantage, short blocks give a low frequency resolution . Therefore, some frequency differences in the signal still fall into the selected frequency line.



Frequency conversion



In each block, a single-frequency Fourier transform is performed β€” a transform for one frequency f.



Traditionally, to speed up the calculations, the sin and cos functions are made tabular and scaled to -127 .. + 127.

The ps



index of the array si(ps)



computed from the argument sin (2 * Ο€ * f * t / T), of course, with a loopback within one period. The pc



index for cos (2 * Ο€ * f * t / T) is simply shifted 12 positions forward in the same si



array.



Result Y - the level of the spectral line is obtained as the sum of the absolute values ​​of the real and imaginary parts during one block.
So that
in the right way, you need to do the sum of squares, and the root, but this is horror for an 8-bit MK.



In the same timer:
  a_si = (long int) (a * si[ps]) >> 4; // a*sin a_co = (long int) (a * si[pc]) >> 4; // a*cos Ysi = Ysi + a_si ; // Yco = Yco + a_co; Y = (labs(Ysi) + labs(Yco)) >> 7; //   128    int    rs-232.
      
      





At the end of each block, Y is compared with the threshold, the number of blocks with exceeding the threshold β€” the activated blocks is calculated. After the experiments, it turned out that the minimum number of triggered blocks is 3 out of 5.



image




An example of the spectral intensity in blocks with voice command. The team has passed.



Three or more triggered blocks are interpreted as a correctly accepted command. The signal at the discrete output of the MC is inverted, turning on or off the light. Since the entire analysis takes place inside the blocks, there is no delay after the last block.



The computation time is about 1600 clock cycles, the timer is called every 9000 clock cycles, so the workload of the MC is low - there is room for further experiments with recognition. Or you can make a complete solution of a smaller size and on a weak MK.



The control of the correctness of the algorithms was carried out by exchanging the necessary variables (log) via RS-232 with the program on VBasic. The frequency f and thresholds are stored in the eeprom.



As a result: the sensor turned out to be very convenient, it responds to words from β€œ A ”, for example, β€œ Waaau ”, β€œ Taaam ”, β€œ Laait ”, β€œ Yaaaau ”, β€œ Yao-Yao ” . Volume is normal for human conversation. The word " Shine " stubbornly refuses to listen. Clicks, knocking doors, steps, pouring water ignores. Now you can walk with full hands cups and plates)).



All Articles