Sound generation on AVR microcontrollers using wavetable method with polyphony support

AVR microcontrollers are fairly cheap and widespread. Probably almost any embedded developer starts with them. And among amateurs, the Arduino ball rules, the heart of which is usually ATmega328p. Surely many wondered: how can you make them sound?



If you look at existing projects, they are of several types:



  1. Square pulse generators. Generate using PWM or yank pins in interrupts. In any case, a very characteristic squeaking sound is obtained.
  2. Using external equipment such as an MP3 decoder.
  3. Using PWM to output 8 bit (sometimes 16 bit) sound in PCM or ADPCM format. Since the memory in the microcontrollers is clearly not enough for this, they usually use an SD card.
  4. Using PWM to generate sound based on wave tables like MIDI.


The latter type was especially interesting for me, because almost does not require additional equipment. I present my option to the community. First, a small demo:







Interested please under cat.



So, the equipment:





Look like that's it.



A simple RC circuit with a speaker is connected to the output of the microcontroller. The output is an 8-bit sound with a sampling frequency of 31250Hz. At a crystal frequency of 8 MHz, up to 5 sound channels + one noise channel for percussion can be generated. In this case, almost all the processor time is used, but after filling the buffer, the processor can be occupied with something useful in addition to sound:





This example fits completely into the ATmega8 memory, 5 channels + noise are processed at a crystal frequency of 8 MHz and there is little time for animation on the display.



In this example, I also wanted to show that the library can be used not only as a regular musical postcard, but also connect sound to existing projects, for example, for notifications. And even when using just one sound channel, notifications can be much more interesting than a simple tweeter.



And now the details ...



Wave tables or wavetables



The math is extremely simple. There is a periodic tone function, for example tone (t) = sin (t * freq / (2 * Pi)) .



There is also a function for changing the volume of the fundamental tone over time, for example volume (t) = e ^ (- t) .



In the simplest case, the sound of an instrument is the product of these functions instrument (t) = tone (t) * volume (t) :



On the chart, it all looks something like this:







Next, we take all the instruments that sound at a given time and summarize them with some volume factors (pseudo-code):



for (i = 0; i < CHANNELS; i++) { value += channels[i].tone(t) * channels[i].volume(t) * channels[i].volume; }
      
      





It is only necessary to select the volume so that there is no overflow. And that’s almost all.



The noise channel works in much the same way, but instead of a tone function, a pseudo-random sequence generator.



Percussion is a mix of noise channel and low-frequency wave, at about 50-70 Hz.

Of course, sound quality in this way is difficult to achieve. But we have only 8 kilobytes for everything. Hope this can be forgiven.



What can I squeeze out of 8 bits



Initially, I focused on ATmega8. Without external quartz, it operates at a frequency of 8 MHz and has an 8-bit PWM, which gives a base sampling frequency of 8000000/256 = 31250 Hz. One timer uses PWM to output sound, and when overflowed, it causes an interrupt to transmit the next value to the PWM generator. Accordingly, we have 256 clock cycles for calculating the sample value for everything, including interrupt overhead, updating sound channel parameters, tracking the time when you need to play the next note, etc.



For optimization, we will actively use the following tricks:





First, divide the time into intervals of 4 milliseconds (I called them ticks). At a sampling frequency of 31250Hz, we get 125 samples per tick. The fact that each sample must be read must be counted every sample, and the rest - once per tick or less. For example, within one tick, the volume of the instrument will be constant: instrument (t) = tone (t) * currentVolume ; and currentVolume itself will be recalculated once per tick taking into account volume (t) and the selected volume of the sound channel.



A tick duration of 4ms was chosen based on a simple 8-bit limit: with an eight-bit sample counter, you can work with a sampling frequency of up to 64 kHz, with an eight-bit tick counter we can measure time up to 1 second.



Some code



The channel itself is described by this structure:



 typedef struct { // Info about wave const int8_t* waveForm; // Wave table array uint16_t waveSample; // High byte is an index in waveForm array uint16_t waveStep; // Frequency, how waveSample is changed in time // Info about volume envelope const uint8_t* volumeForm; // Array of volume change in time uint8_t volumeFormLength; // Length of volumeForm uint8_t volumeTicksPerSample; // How many ticks should pass before index of volumeForm is changed uint8_t volumeTicksCounter; // Counter for volumeTicksPerSample // Info about volume uint8_t currentVolume; // Precalculated volume for current tick uint8_t instrumentVolume; // Volume of channel } waveChannel;
      
      





Conditionally, the data here are divided into 3 parts:



  1. Information about the waveform, phase, frequency.



    waveForm: information about the tone (t) function: link to an array of 256 bytes long. Sets the tone, instrument sound.



    waveSample: high byte indicates the current index of the waveForm array.



    waveStep: sets the frequency by which waveSample will be increased when counting the next sample.



    Each sample is considered something like this:



     int8_t tone = channelData.waveForm[channelData.waveSample >> 8]; channelData.waveSample += channelaData.waveStep; return tone * channelData.currentVolume;
          
          





  2. Volume information. Sets the function of changing volume over time. Since the volume does not change so often, you can recount it less often, once per tick. This is done like this:



     if ((channel->volumeTicksCounter--) == 0 && channel->volumeFormLength > 0) { channel->volumeTicksCounter = channel->volumeTicksPerSample; channel->volumeFormLength--; channel->volumeForm++; } channel->currentVolume = channel->volumeForm * channel->instrumentVolume >> 8;
          
          





  3. Sets the volume of the channel and the calculated current volume.



Please note: the waveform is eight-bit, the volume is also eight-bit, and the result is 16-bit. With a slight loss in performance, you can make the sound (almost) 16 bit.



When fighting for performance, I had to resort to some black magic.



Example number 1. How to recalculate the volume of channels:



 if ((tickSampleCounter--) == 0) { //    tickSampleCounter = SAMPLES_PER_TICK – 1; //   - } // volume recalculation should no be done so often for all channels if (tickSampleCounter < CHANNELS_SIZE) { recalculateVolume(channels[tickSampleCounter]); }
      
      





Thus, all channels recount the volume once per tick, but not simultaneously.



Example number 2. Keeping channel information in a static structure is cheaper than in an array. Without going into details of the implementation of wavechannel.h, I will say that this file is inserted into the code several times (equal to the number of channels) with different preprocessor directives. Each insert creates new global variables and a new channel calculation function, which is then inline into the main code:



 #if CHANNELS_SIZE >= 1 val += channel0NextSample(); #endif #if CHANNELS_SIZE >= 2 val += channel1NextSample(); #endif …
      
      





Example number 3. If we start to play the next note a little later, then no one will notice. Let's imagine the situation: we took up the processor with something and during this time the buffer was almost empty. Then we begin to fill it and suddenly it turns out that a new measure is coming: we need to update the current notes, read from the array what's next, etc. If we do not have time, then there will be characteristic stuttering. It is much better to fill the buffer a bit with old data, and only then update the state of the channels.



 while ((samplesToWrite) > 4) { //          fillBuffer(SAMPLES_PER_TICK); //     -  updateMusicData(); //    }
      
      





In a good way, it would be necessary to re-fill the buffer after the loop, but since we have almost everything inline, the size of the code is noticeably inflated.



Music



An eight-bit tick counter is used. When zero is reached, a new measure begins, the counter is assigned the measure duration (in ticks), a bit later the array of musical commands is checked.



Music data is stored in an array of bytes. It is written something like this:



 const uint8_t demoSample[] PROGMEM = { DATA_TEMPO(160), // Set beats per minute DATA_INSTRUMENT(0, 1), // Assign instrument 1 (see setSample) to channel 0 DATA_INSTRUMENT(1, 1), // Assign instrument 1 (see setSample) to channel 1 DATA_VOLUME(0, 128), // Set volume 128 to channel 0 DATA_VOLUME(1, 128), // Set volume 128 to channel 1 DATA_PLAY(0, NOTE_A4, 1), // Play note A4 on channel 0 and wait 1 beat DATA_PLAY(1, NOTE_A3, 1), // Play note A3 on channel 1 and wait 1 beat DATA_WAIT(63), // Wait 63 beats DATA_END() // End of data stream };
      
      





All that starts with DATA_ are preprocessor macros that expand the parameters into the required number of data bytes.



For example, the DATA_PLAY command is expanded into 2 bytes, in which are stored: the command marker (1 bit), the pause before the next command (3 bits), the channel number on which to play the note (4 bits), information about the note (8 bits). The most significant limitation is that this command cannot be used for long pauses, with a maximum of 7 measures. If you need more, then you need to use the DATA_WAIT command (up to 63 clock cycles). Unfortunately, I did not find whether the macro can be expanded into a different number of bytes of the array depending on the macro parameter. And even warning I do not know how to display. Maybe you tell me.



Using



In the demos directory there are several examples for different microcontrollers. But in short, here is a piece from readme, I really have nothing to add:



 #include "../../microsound/devices/atmega8timer1.h" #include "../../microsound/micromusic.h" // Make some settings #define CHANNELS_SIZE 5 #define SAMPLES_SIZE 16 #define USE_NOISE_CHANNEL initMusic(); // Init music data and sound control sei(); // Enable interrupts, silence sound should be generated setSample(0, instrument1); // Use instrument1 as sample 0 setSample(1, instrument2); // Init all other instruments… playMusic(mySong); // Start playing music at pointer mySong while (!isMusicStopped) { fillMusicBuffer(); // Fill music buffer in loop // Do some other stuff }
      
      





If you want to do something else besides music, you can increase the size of the buffer using BUFFER_SIZE. The buffer size should be 2 ^ n, but, unfortunately, with a size of 256, performance degradation occurs. Until I figured it out.



To increase productivity, you can increase the frequency with external quartz, you can reduce the number of channels, you can reduce the sampling frequency. With the last trick, you can use linear interpolation, which somewhat compensates for the drop in sound quality.



Any delay is not recommended, because CPU time is wasted. Instead, its own method is implemented in the microsound / delay.h file , which, in addition to the pause itself, is involved in filling the buffer. This method may not work very accurately on short pauses, but on long pauses more or less sane.



Making your own music



If you write commands manually, you need to be able to listen to what happens. Pouring each change into the microcontroller is not convenient, especially if there is an alternative.



There is a rather funny service wavepot.com - an online JavaScript editor in which you need to set the function of the sound signal from time to time, and this signal is output to the sound card. The simplest example:



 function dsp(t) { return 0.1 * Math.sin(2 * Math.PI * t * 440); }
      
      





I ported the engine to JavaScript, it is in demos / wavepot.js . The contents of the file must be inserted in the editor wavepot.com and you can conduct experiments. We write our data to the soundData array, listen, do not forget to save.



We should also mention the variable simulate8bits. She, according to the name, simulates an eight-bit sound. If suddenly it seems that the drums are buzzing, and noise appears in damped instruments with a quiet sound, then this is it, a distortion of an eight-bit sound. You can try to disable this option and listen to the difference. The problem is much less noticeable if there is no silence in the music.



Connection



In a simple version, the circuit looks like this:



 +5V ^ MCU | +-------+ +---+VC | R1 | Pin+---/\/\--+-----> OUT | | | +---+GN | === C1 | +-------+ | | | --- Grnd --- Grnd
      
      





The output pin depends on the microcontroller. Resistor R1 and capacitor C1 must be selected based on the load, amplifier (if any), etc. I’m not an electronic engineer and I won’t give formulas; they are easy to google along with online calculators.



I have R1 = 130 Ohms, C1 = 0.33 μF. To the output I connect ordinary Chinese headphones.



What was there about 16 bit sound?



As I said above, when we multiply two eight-bit numbers (frequency and volume), we get a 16-bit number. You can not round it to eight bits, but output both bytes in 2 PWM channels. If you mix these 2 channels in the proportion of 1/256, then we can get 16 bit sound. The difference with the eight-bit is especially easy to hear on smoothly fading sounds and drums in moments when only one instrument sounds.



16-bit output connection:



 +5V ^ MCU | +-------+ +---+VCC | R1 | PinH+---/\/\--+-----> OUT | | | | | R2 | | PinL+---/\/\--+ +---+GND | | | +-------+ === C1 | | --- Grnd --- Grnd
      
      





It is important to mix the 2 outputs correctly: the R2 resistance should be 256 times greater than the R1 resistance. The more accurate, the better. Unfortunately, even resistors with an error of 1% do not give the required accuracy. However, even with a not very accurate selection of resistors, distortion can be noticeably attenuated.



Unfortunately, when using 16-bit sound, performance degrades and 5 channels + noise can no longer be processed in the allotted 256 clock cycles.



Is it possible on the Arduino?



Yes you can. I only have a Chinese nano clone on ATmega328p, it works on it. Most likely, other arduins on the ATmega328p should also work. The ATmega168 seems to have the same timer control registers. Most likely they will work unchanged. On other microcontrollers you need to check, you may need to add a driver.



There is a sketch in demos / arduino328p , but for it to open normally in the Arduino IDE, you need to copy it to the root of the project.



In the example, 16-bit sound is generated and the outputs D9 and D10 are used. To simplify, you can limit yourself to 8-bit sound and use only one D9 output.



Since almost all arduins work at 16 MHz, then, if desired, you can increase the number of channels to 8.



What about ATtiny?



ATtiny has no hardware multiplication. Software multiplication that the compiler uses is wildly slow and should not be used. When using optimized assembler inserts, performance drops by 2 times compared to ATmega. It would seem that there is no point in using ATtiny at all, but ...



Some ATtiny have a frequency multiplier, PLL. And this means that on such microcontrollers there are 2 interesting features:



  1. The frequency of the PWM generator is 64 MHz, which gives a PWM period of 250 kHz, which is much better than 31 250 Hz at 8 MHz or 62500 Hz with quartz at 16 MHz on any ATmega.
  2. The same frequency multiplier allows the crystal to clock at 16 MHz without quartz.


Hence the conclusion: some ATtiny can be used to generate sound. They manage to process the same 5 instruments + noise channel, but at 16 MHz and they do not need external quartz.



The downside is that you can’t increase the frequency anymore, and the calculations take up almost all the time. To free resources, you can reduce the number of channels or sample rate.



Another minus is the need to use two timers at once: one for PWM, the second for interruption. This is where the timers usually end.



Of the PLL microcontrollers I know, I can mention ATtiny85 / 45/25 (8 legs), ATtiny861 / 461/261 (20 legs), ATtiny26 (20 legs).



As for memory, the difference with ATmega is not great. In 8kb, several instruments and melodies will fit perfectly. In 4kb you can put 1-2 instruments and 1-2 tunes. It's hard to put something in 2 kilobytes, but if you really want to, then you can. It is necessary to unlink the methods, disable some functions such as volume control over the channels, reduce the sampling frequency and the number of channels. In general, for an amateur, but there is a working example on ATtiny26.



Problems



There are problems. And the biggest problem is the speed of computing. The code is completely written in C with small assembler multiplication inserts for ATtiny. Optimization is given to the compiler and it sometimes behaves strangely. With small changes that should not seem to influence anything, you can get a noticeable decrease in performance. Moreover, changing from -Os to -O3 does not always help. One such example is the use of a 256 byte buffer. Particularly unpleasant is that there is no guarantee that in new versions of the compiler we will not get a drop in performance on the same code.



Another problem is that the attenuation mechanism before the next note is not implemented at all. Those. when on one channel one note is replaced by another, the old sound is abruptly interrupted, sometimes a small click is heard. I would like to find a way to get rid of this without losing productivity, but so far.



There are no commands for smoothly increasing / decreasing the volume. It is especially critical for short notification tones, where in the end you need to make a quick fade in volume so that there is no sharp break in the sound. Part of the problem is by writing a series of commands with manually setting the volume and a short pause.



The chosen approach, in principle, is not able to provide a naturalistic sound for the instruments. For a more natural sound, you need to divide the sounds of the instruments into attack-sustain-release, use at least the first 2 parts and with a much longer duration than one oscillation period. But then the data for the tool will need much more. There was an idea to use shorter wave tables, for example, in 32 bytes instead of 256, but without interpolation, the sound quality decreases dramatically, and with interpolation, performance decreases. And another 8 bits of sampling is clearly not enough for music, but this can be circumvented.



The buffer size is limited to 256 samples. This corresponds to approximately 8 milliseconds and this is the maximum integral time period that can be given to other tasks. At the same time, the execution of tasks is still periodically suspended by interruptions.



Replacing the standard delay does not work very accurately for short pauses.



I am sure that this is not a complete list.



References






All Articles