Assembler code generator library for AVR microcontrollers. Part 1

Part 2. Getting Started →







Assembler Code Generator Library for AVR Microcontrollers



Part 1. First acquaintance



Good afternoon, dear Khabrovites. I want to bring to your attention the next (from the great many available) project for programming popular microcontrollers of the AVR series.







It would be possible to spend a lot of text to explain why this was needed, but instead, just look at examples of how it differs from other solutions. And all the explanations and comparisons with existing programming systems will be, as necessary, in the process of parsing examples. The library is now in the process of being finalized, so the implementation of some functions may not seem completely optimal. Also, some of the tasks that are assigned to the programmer in this version are supposed to be further optimized or automated.







So let's get started. I want to clarify right away that the presented material should by no means be considered as a complete description, but only as a demonstration of some of the features of the developed library in order to help understand how interesting this approach may be to readers.







We will not deviate from the prevailing practice and begin with a classic example, a kind of "Hello world" for microcontrollers. Namely, we blink the LED connected to one of the processor legs. Let's open VisualStudio from Microsoft (any release will do) and create a console application for C #. For those who do not know - Community Edition, sufficient for work, is absolutely free.







Actually the text itself is as follows:







Source Code Example 1
using NanoRTOSLib; using System; namespace ConsoleApp { class Program { static void Main(string[] args) { var m = new Mega328(); m.PortB[0].Mode = ePinMode.OUT; m.PortB.Activate(); m.LOOP(m.TempL, (r, l) => m.GO(l), (r) => { m.PortB[0].Toggle();}); Console.WriteLine(AVRASM.Text(m)); } } }
      
      





Of course, for everything to work and you need the very library that I represent.

After compiling and running the program, in the console output we will see the following result of this program.







Compilation result of example 1
 #include “common.inc” RESET: ldi r16, high(RAMEND) out SPH,r16 ldi r16, low(RAMEND) out SPL,r16 outi DDRB,0x1 L0000: in TempL,PORTB ldi TempH,1 eor TempL,TempH out PORTB,TempL xjmp L0000 .DSEG
      
      





If you copy the result to any environment that knows how to work with the AVR assembler and connect the Common.inc macro library (the macro library is also one of the components of the presented programming system and works in conjunction with NanoRTOSLib ), then this program can be compiled and checked on an emulator or a real chip and make sure everything works.







Consider the source code of the program in more detail. First of all, we assign to the variable m the type of crystal used. Next, set the digital output mode for the zero bit of port B of the crystal and activate the port. The next line looks a little strange, but its meaning is quite simple. In it, we say that we want to organize an infinite loop, in the body of which we change the value of the zero bit of port B to the opposite. The last line of the program actually visualizes the result of everything previously written in the form of assembler code. Everything is extremely simple and compact. And the result is practically no different from what one could write in assembler. There are only two questions to the output code: the first - why initialize the stack if we still do not use it, and what kind of xjmp ? The answer to the first question and at the same time an explanation of why assembler is output, rather than a ready-made HEX, will be the following: the result in the form of assembler allows you to further analyze and optimize the program, allowing the programmer to select and modify code fragments that he does not like. And the initialization of the stack was left at least for those reasons that without using the stack you can come up with not many programs. However, if you do not like it, feel free to clean it up. The output to assembler is for this purpose intended. As for xjmp , this is an example of using macros to increase the readability of the output assembler. Specifically, xjmp is a replacement for jmp and rjmp with the correct substitution depending on the length of the transition.







If we fill the program with a chip, then of course we will not see blinking with a diode, despite the fact that the pin state changes. It just happens too fast for it to be seen through the eyes. Therefore, we consider the following program, in which we continue to blink with a diode, but so that it can be seen. For an example, a delay of 0.5 seconds is quite suitable: not too fast and not too slow. It would be possible to make many nested loops with NOPs to form a delay, but we will skip this step as not adding anything to the description of the library’s capabilities and immediately take advantage of the opportunity to use the available hardware. We change our application as follows.







Source Code Example 2
 using System; namespace ConsoleApp { class Program { static void Main(string[] args) { var m = new Mega328(); m.PortB[0].Mode = ePinMode.OUT; m.PortB.Activate(); m.WDT.Clock = eWDTClock.WDT500ms; m.WDT.OnTimeout = () => m.PortB[0].Toggle(); m.WDT.Activate(); m.EnableInterrupt(); var loop = AVRASM.newLabel(); m.GO(loop); Console.WriteLine(AVRASM.Text(m)); } } }
      
      





Obviously, the program is similar to the previous one, so we will only consider what has changed. First, in this example, we used WDT (watchdog timer). For working with large delays that do not require extreme accuracy, this is the best option. All that is needed to use it is to set the required frequency by setting the divider through the WDT.Clock property and determine the actions that must be performed at the time the event is triggered, by defining the code through the WDT.OnTimeout property. Since we need interrupts to work, they must be enabled with the EnableInterrupt command. But the main cycle can be replaced by a dummy. In it, we still do not plan to do anything. Therefore, we will declare and set a label and make an unconditional transition to it to organize an empty cycle. If you like LOOP more - please. The result of this will not change.

Well, in the final, let's look at the resulting code.







Compilation Result of Example 2
 #include “common.inc” jmp RESET reti ; IRQ0 Handler nop reti ;IRQ1 Handler nop reti ;PC_INT0 Handler nop reti ;PC_INT1 Handler nop reti ;PC_INT2 Handler nop jmp WDT ;Watchdog Timer Handler RESET: ldi r16, high(RAMEND) out SPH,r16 ldi r16, low(RAMEND) out SPL,r16 outi DDRB,0x1 ldi TempL, (1<<WDCE) | (1<<WDE) sts WDTCSR,TempL ldi TempL, 0x42 sts WDTCSR,TempL sei L0000: xjmp L0000 WDT: push r17 push r16 in r16,SREG push r16 in TempL,PORTB ldi TempH,1 eor TempL,TempH out PORTB,TempL pop r16 out SREG,r16 pop r16 pop r17 reti .DSEG
      
      





Those who are familiar with this processor will undoubtedly have a question where several more interrupt vectors have gone. Here we used the following logic - if the code is not used - the code is not needed. Therefore, the interrupt table ends on the last used vector.

Despite the fact that the program copes with the task perfectly, the most picky may not like the fact that the set of possible delays is limited, and the step is too rough. Therefore, we will consider another way, and at the same time, we will see how the work with timers is organized in the library. In the Mega328 crystal, which is taken as a sample, there are as many as 3 of them. 2 8-bit and one 16-bit. The architects tried very hard to invest as many opportunities as possible in these timers, therefore their setting is quite voluminous.







First, let's calculate which counter should be used for our delay of 0.5 seconds. If we take the crystal clock frequency of 16 MHz, then even with the maximum peripheral divider it is impossible to meet the 8-bit counter. Therefore, we will not complicate and use the only 16-bit Timer1 counter available to us.







As a result, the program takes the following form:







Source Code Example 3
 using NanoRTOSLib; using System; namespace ConsoleApp { class Program { static void Main(string[] args) {var m = new Mega328(); m.FCLK = 16000000; m.CKDIV8 = false; var bit1 = m.PortB[0]; bit1.Mode = ePinMode.OUT; m.PortB.Activate(); m.Timer1.Mode = eWaveFormMode.CTC_OCRA; m.Timer1.Clock = eTimerClockSource.CLK256; m.Timer1.OCRA = (ushort)((0.5 * m.FCLK) / 256); m.Timer1.OnCompareA = () => bit1.Toggle(); m.Timer1.Activate(); m.EnableInterrupt(); m.LOOP(m.TempH, (r, l) => m.GO(l), (r) => { }); Console.WriteLine(AVRASM.Text(m)); } } }
      
      





Since we use the main generator as the clock source for our timer, for the correct calculation of the delay, you must specify the processor clock frequency, the divider setting and the peripheral clock fuse. The main text of the program is setting the timer to the desired mode. Here, a deliberator of 256 and not a maximum is deliberately chosen for clocking, since when you select a divider of 1024 for the required clock frequency of 500ms, which we want to get, a fractional number is obtained.







The resulting assembler code of our program will look like this:







Compilation Result of Example 3
 #include “common.inc” jmp RESET reti ; IRQ0 Handler nop reti ;IRQ1 Handler nop reti ;PC_INT0 Handler nop reti ;PC_INT1 Handler nop reti ;PC_INT2 Handler nop reti ;Watchdog Timer Handler nop reti ;Timer2 Compare A Handler nop reti ;Timer2 Compare B Handler nop reti ;Timer2 Overflow Handler nop reti ;Timer1 Capture Handler nop jmp TIM1_COMPA ;Timer1 Compare A Handler RESET: ldi r16, high(RAMEND) out SPH,r16 ldi r16, low(RAMEND) out SPL,r16 outi DDRB,0x1 outiw OCR1A,0x7A12 outi TCCR1A,0 outi TCCR1B,0xC outi TCCR1C,0x0 outi TIMSK1,0x2 outi DDRB,0x1 sei L0000: xjmp L0000 TIM1_COMPA: push r17 push r16 in r16,SREG push r16 in TempL,PORTB ldi TempH,1 eor TempL,TempH out PORTB,TempL pop r16 out SREG,r16 pop r16 pop r17 reti .DSEG
      
      





There already seems to be nothing more to comment on. We initialize the devices, configure interrupts and enjoy the program.







Work through interrupts is the easiest way to create programs for working in real time. Unfortunately, switching between parallel tasks using only interrupt handlers to perform these tasks is not always possible. The restriction is the ban on nested interrupt handling, which leads to the fact that until the processor exits, the processor does not respond to all other interrupts, which can lead to loss of events if the processor runs for too long.







A solution is to separate the event registration code and their processing. The Parallel multi-threaded processing core from the library is organized in such a way that when an event occurs, the interrupt handler only registers the given event and, if necessary, performs the minimum necessary data capture operations, and all processing is performed in the main stream. The kernel sequentially checks for the presence of unprocessed flags and, if found, proceeds to the corresponding task.







Using this approach simplifies the design of systems with several asynchronous tasks, allowing you to consider each of them in isolation, without focusing on the problems of switching resources between tasks. As an example, consider the implementation of two independent tasks, each of which switches its output with a certain delay.







Source Code Example 4
 using NanoRTOSLib; using System; namespace ConsoleApp { class Program { static void Main(string[] args) { var m = new Mega328(); m.FCLK = 16000000; m.CKDIV8 = false; m.PortB.Direction(0x07); var bit1 = m.PortB[1]; var bit2 = m.PortB[2]; m.PortB.Activate(); var tasks = new Parallel(m, 4); tasks.Heap = new StaticHeap(tasks, 64); var t1 = tasks.CreateTask((tsk) => { var loop = AVRASM.NewLabel(); bit1.Toggle(); tsk.Delay(32); tsk.TaskContinue(loop); },"Task1"); var t2 = tasks.CreateTask((tsk) => { var loop = AVRASM.NewLabel(); bit2.Toggle(); tsk.Delay(48); tsk.TaskContinue(loop); }, "Task2"); var ca = tasks.ContinuousActivate(tasks.AlwaysOn, t1); tasks.ActivateNext(ca, tasks.AlwaysOn, t2); ca.Dispose(); m.EnableInterrupt(); tasks.Loop(); Console.WriteLine(AVRASM.Text(m)); } } }
      
      





In this task, we configure the zero and first outputs of port B to output and change the value from 0 to 1 and vice versa with a period of 32ms for zero and 48ms for the first output. A separate task is responsible for managing each port. The first thing to note is the definition of an instance of Parallel. This class is the core of task management. In its constructor, we determine the maximum allowable number of simultaneously running threads. The following is the allocation of memory for storing stream data. The StaticHeap class used in the example allocates a fixed number of bytes for each stream. To solve our problem, this is acceptable, and using a fixed memory allocation compared to dynamic simplifies the algorithms and makes the code more compact and faster. Further in the code we describe a set of tasks that are designed to run under the control of the kernel. You should pay attention to the asynchronous function Delay, which we use to form a delay. Its peculiarity is that when this function is called, the required delay is set in the stream settings, and control is transferred to the kernel. After the set interval has elapsed, the kernel returns control to the task from the command following the Delay command. Another feature of the task is programming the behavior of the task flow upon completion in the last task command. In our case, both tasks are configured to be executed in an infinite loop with control returning to the kernel at the end of each cycle. If necessary, completion of the task may free the thread or transfer it to perform another task.







The reason for invoking the task is to activate the signal assigned to the task flow. The signal can be activated both programmatically and hardware by interrupts from peripheral devices. A task call resets the signal. An exception is the AlwaysOn predefined signal, which is always in the active state. This makes it possible to create tasks that will receive control in each polling cycle. The LOOP function is required to invoke the main execution loop. Unfortunately, the size of the output code when using Parallel is already becoming significantly larger than in the previous examples (approximately 600 commands) and cannot be fully cited in the article.







And for sweet - something more like a live project, namely a digital thermometer. Everything is as always simple. A digital sensor with an SPI interface, a 7-segment 4-digit indicator and several processing threads to keep things cool. In one, we drive a cycle for dynamic indication, in another, events that trigger a temperature reading cycle, in the third we read the values ​​received from the sensor and convert it from a binary code to BCD and then into a segment code for a dynamic indication buffer.







The program itself is as follows.







Source Code Example 5
 using NanoRTOSLib; using System; namespace ConsoleApp { class Program { static void Main(string[] args) { var m = new Mega328(); m.FCLK = 16000000; m.CKDIV8 = false; var led7s = new Led_7(); led7s.SegPort = m.PortC; led7s.Activate(); m.PortD.Direction(0xFF); m.PortD.Activate(); m.PortB[0].Mode = ePinMode.OUT; var tc77 = new TC77(); tc77.CS = m.PortB[0]; tc77.Port = m.SPI; m.Timer0.Clock = eTimerClockSource.CLK64; m.Timer0.Mode = eWaveFormMode.Normal; var reader = m.DREG("Temperature"); var bcdRes = m.DREG("digits"); var tmp = m.BYTE(); var bcd = new BCD(reader, bcdRes); m.subroutines.Add(bcd); var os = new Parallel(m, 4); os.Heap = new StaticHeap(os, 64); var tmrSig = os.AddSignal(m.Timer0.OVF_Handler); var spiSig = os.AddSignal(m.SPI.Handler, () => { m.SPI.Read(m.TempL); m.TempL.MStore(tmp); }); var actuator = os.CreateTask((tsk) => { var loop = AVRASM.NewLabel(); tc77.ReadTemperatureAsync(); tsk.Delay(16); tsk.TaskContinue(loop); }, "actuator"); var treader = os.CreateTask((tsk) => { var loop = AVRASM.NewLabel(); tc77.ReadTemperatureCallback(os, reader, tmp); reader >>= 7; m.CALL(bcd); tsk.TaskContinue(loop); }, "reader"); var display = os.CreateTask((tsk) => { var loop = AVRASM.NewLabel(); m.PortD.Write(0xFE); m.TempQL.Load(bcdRes.Low); m.TempQL &= 0x0F; led7s.Show(m.TempQL); os.AWAIT(); m.PortD.Write(0xFD); m.TempQL.Load(bcdRes.Low); m.TempQL >>= 4; led7s.Show(m.TempQL); os.AWAIT(); m.PortD.Write(0xFB); m.TempQL.Load(bcdRes.High); m.TempQL &= 0x0F; led7s.Show(m.TempQL); os.AWAIT(); m.PortD.Write(0xF7); m.TempQL.Load(bcdRes.High); m.TempQL >>= 4; led7s.Show(m.TempQL); os.AWAIT(); tsk.TaskContinue(loop); }, "display"); var ct = os.ContinuousActivate(os.AlwaysOn, actuator); os.ActivateNext(ct, spiSig, treader); os.ActivateNext(ct, tmrSig, display); tc77.Activate(); m.Timer0.Activate(); m.EnableInterrupt(); os.Loop(); Console.WriteLine(AVRASM.Text(m)); } } }
      
      





It is clear that this is not a working draft, but only a technological demo designed to demonstrate the capabilities of the NanoRTOS library. But in any case, less than 100 lines of source and less than 1kb of output code is quite a good result for a workable application.







In future articles, I plan, in case of interest in this project, to dwell in more detail on the principles and features of programming using this library.








All Articles