How the vulnerability in Yandex.Stations inspired me to the project: Music data transfer

Last week, I talked about how Yandex.Stations are activated via sound. It turned out that the WiFi password is transmitted in clear text. I wondered why it was necessary to do activation in this way, and not in some debugged way.



In the end, I came to the conclusion that the show is important in this process. But what will happen if we make a data transfer protocol that is fully focused on the user's impression? This is how the Octave project was born - for melodic data transfer.







Under the cut, I will tell you how the prototype was made, and give a link to the demo. You can listen to how any message sounds :)



Summary of the previous article



I recorded the sound with which the Station is activated, looked at the visualization of the moving Fourier transform and realized how the signal is arranged and where the WiFi password lies in the clear.







A hex-string is transmitted, where for each character 0 - F there is a frequency of 1 kHz - 4.6 kHz with a step of 240 Hz. I wondered why it was necessary to do the activation in this way, and not via Bluetooth, as for example with Chinese robot vacuum cleaners, and came to the conclusion that in this case, efficiency is more important than safety or speed.



Inspiration



Really! After all, the communication protocol is always a compromise between range, speed and reliability. But what if all these characteristics fade into the background, and the decisive factor is the impression factor for the user?



I liked the simple idea, like a hammer, of Yandex developers - to select 16 frequencies: one for each hex-symbol. And I also had a signal receiver from a previous study, so I decided to develop this idea, and not come up with everything from scratch.



Two improvements



Remove phase break



Firstly, when I analyzed the activation signal of the Station, I was confused by noise at all frequencies at the time of switching the symbol. These are the vertical bars in the spectrogram:







In these moments, clicks are heard. The reason for this effect is a phase gap between characters. The fact is that the length of one symbol does not fit an integer number of periods of sound vibrations. Therefore, at the time of switching the frequency, the signal amplitude changes dramatically. More or less like this:







There are various methods in the radio to avoid this effect. I decided to smoothly reduce the amplitude of the signal at the time of switching the frequency, and then smoothly build up - it sounds softer. It looks like this:







Perhaps the clicks were not a bug, but features and gave a more “futuristic” sound, but I like it better without them :)



Add music



We transmit data through sound. Why not use notes for this frequency? I tried different options, in the end I chose 16 notes, starting from Before the first octave.







Using higher notes will make your ears less comfortable. And lower notes are worse transmitted due to the characteristics of the frequency response of speakers and microphones. Also, the frequencies of low notes are closer to each other, which affects reception.



It turned out a kind of musical-frequency modulation. Let's call it "Croup-modulation" :)



We launch



How does that sound? So that you can try right in the browser, I rewrote the Krup-modulated transmitter from python to js and made a simple interface.



I take this opportunity to say hello:





I use utf-8, which means Cyrillic characters and even emojis can also be transmitted. Parcels with them are longer, since each such character has more than 1 byte.





It sounds a little less pleasant than Latin, since each Cyrillic character contains the same address byte. But still interesting :)



You can try any phrases here . (Duplicate at the end of the article)



But what about the receiver?



Of course, it’s fun to listen to random sounds based on text, but data transmission can only be called if the signal is received, demodulated and decoded.



I made a prototype of a python receiver as a proof of concept. Here's how it works:





You see, the data transfer is as if by notes! Of course, there is no question of any production right now. There is no synchronization, error-correcting coding and integrity control. But if the community shows interest and throws a couple of options for practical use, I can implement the above functionality and wrap it in a normal library :)



Cyclist?
I know that data transmission through sound is well developed. There are libraries working including with ultrasound , DTMF is still widespread, and people even emulated tone commands with a whistle . But I have not seen projects that use notes for data transfer. Please write in the comments if you know something like that.



Summarizing



It was an interesting project for a couple of evenings with a rather spectacular result. Such data transfer can be used, for example, as a “sound QR-code” - to share an account from a phone to a website, etc.



Alternatively, you can use it to create ringtones for brands. Here, for example, sounds like habr .



All current developments are available on the github - you can try to develop the project yourself.



Duplicate the link to the demo running in the browser.



Thank you for reading! I hope you were interested.



Good luck!



All Articles