Porting OS to Aarch64

Aarch64 is a 64-bit architecture from ARM (sometimes called arm64). In this article I will tell you how it differs from the “regular” (32-bit) ARM and how difficult it is to port your system to it.

This article is not a detailed guide, rather an overview of those system modules that will have to be redone, and how much the architecture as a whole differs from ordinary 32-bit ARMs; all this from my personal experience porting Embox to this architecture. For direct porting of a specific system, one way or another you will have to deal with the documentation, at the end of the article I left links to some documents that may be useful.

In fact, there are more differences than similarities, and Aarch64 is more of a new architecture than a 64-bit extension of the familiar ARM. Aarch64's predecessor is largely Aarch32 (this is an extension of the usual 32-bit ARM), but since I did not have experience with it, I will not write about it :)

Further in the article, if I write about the "old" or "old" ARM, I mean 32-bit ARM (with a set of ARM commands).

I will briefly go through the list of changes compared to 32-bit ARM, and then I will analyze them in more detail.

General purpose registers have become 2 times wider (now they are 64 bits each), and their number has doubled (that is, now there are not 16, but 32).
Refusal of the concept of coprocessor registers, now they can be accessed simply by name, for example msr vbar_el1, x0

(against the previous mcr p15, 0, %0, c1, c1, 2

)
The new MMU model (it is not connected with the old one, you have to write again).
Previously, there were two privilege levels: user (corresponding to the USR processor mode) and system (corresponding to the SYS, IRQ, FIQ, ABT, ...) modes, now everything is both easier and more complicated at the same time - now there are 4 modes.
AdvSIMD replaced NEON, floating-point operations are done through it.

Now more on the points.

Registers and instruction set

General-purpose registers are r0-r30, and you can access them as 64-bit (x0-x30) or as 32-bit (w0-w30, access to the lower 32 bits).

The instruction set for Aarch64 is called A64. Read the description of the instructions here . The basic arithmetic and some other commands in assembly language remained the same:

  mov w0, w1 /*    w1  w0 */ add x0, x1, 13 /*   x0  x1   13 */ b label /* ""   "label" bl label /* ""   "label",     x30 */ ldr x3, [x1, 0] /*   x3 ,    x1 */ str x3, [x0, 0] /*   x3  ,    x0 */

Now a little about the differences:

A special “zero” -register rzr/xzr/wzr

, which is zero when reading (you can use write to the register, but the result of the calculation will not be written anywhere).

 subs xzr, x1, x2 /*  x1  x2    NZCV,       */

You cannot stack many registers ( stmfd sp!, {r0-r3}

) into the stack at once, you have to do this in pairs:

  stp x0, x1, [sp, 16]! stp x2, x3, [sp, 16]!

The PC register (Program counter, a pointer to the current instruction being executed) is now not a general register (it used to be R15), therefore, it cannot be accessed with ordinary commands ( mov

, ldr

), only through ret

, bl

and so on.
The program status now displays not CPSR (this register simply does not exist), but DAIF registers (contains the IRQ, FIQ mask, etc., AIF - the same bits A, I, F from CPSR), NZCV (bits negative, zero, carry , oVerflow - all of a sudden, the same NZCV from CPSR) and System Control Register (SCTLR, to enable caching, MMU, endianness and so on).

It seems that these commands are enough to write a simple bootloader that can transfer control to a platform-independent code :)

Performance modes and switching between them

Performance modes are well written in Fundamentals of ARMv8-A ; here I will briefly recount the essence of this document.

Aarch64 has 4 privilege levels (Execution level, hereinafter abbreviated EL).

EL3 - Secure Monitor (it is assumed that firmware is running at this level)
EL2 - Hypervisor
EL1 - OS
EL0 - Applications

On a 64-bit OS, you can run both 32-bit and 64-bit applications; on a 32-bit OS, only 32-bit applications can be run.

Transitions between ELs are made either with the help of exceptions (system calls, interrupts, memory access error), or with the help of the return from exception ( eret

) eret

.

Each EL has its own registers SPSR, ELR, SP (ie it is "banked registers").

Many system registers are also divided by EL - for example, the MMU context register ttbr0

is ttbr0_el2

, ttbr0_el1

, and you need to access your register on the corresponding EL. The same applies to program status registers - DAIF, NZCV, SCTLR, SPSR, ELR ...

MMU

Armv8-A supports MMU ARMv8.2 LPA, more about this can be found in the D5 chapter of the ARM Architecture Reference Manual for Armv8, Armv8-A .

In short, this MMU supports 4KiB pages (4 levels of virtual memory tables), 16KiB (4 levels) and 64KiB (3 levels). At any of the intermediate levels, you can specify a memory block, thus indicating not the next level of the table, but a whole piece of memory of the size that the next level table should "cover". I have a long-standing article about virtual memory, where you can read about tables, translation levels, and that’s all.

Of the small changes - they refused the domains (domain), but they added flags like dirty bit.

In general, in addition to "blocks" instead of intermediate translation tables, no special conceptual changes were noticed, MMU as MMU.

Advanced simd

There are significant AdvSIMD differences with the old NEON, both when working with floating point and vector operations (SIMD). For example, if earlier D0 consisted of S0 and S1, and Q0 - of D0 and D1, then now it is not so: Q0 corresponds to D0 and S0, for Q1 - D1 and S0, and so on. At the same time, support for VFP / SIMD is mandatory, by calling agreement there is now no programmatic transfer of parameters (what used to be called "soft float ABI", in GCC - the flag -mfloat-abi=softfp

), so you have to implement hardware support for floating point .

There were 16 registers of 128 bits:

There are 32 registers of 128 bits each:

You can read more about NEON in this article ; a list of available commands for Aarch64 can be found here .

Basic operations with floating point registers:

  fadd s0, s1, s2 /* s0 = s1 + s2 */ fmul d0, d1, d2 /* d0 = d1 * d2 */

Basic SIMD operations:

  /*  , : NEON,    */ /* q0 = q1 + q2,   --   4     */ vadd.s32 q0, q1, q2 /* : AdvSIMD,    */ /* v0 = v1 + v2,   --   4     */ add v0.4s, v1.4s, v2.4s /*   v1 (  2 64- )    d1 */ addv d1, v1.ds /*     4   0 */ movi v1.4s, 0x0

Platforms

QEMU

QEMU has support for Aarch64. One of the platforms is virt

, so that it runs in 64-bit mode, you must additionally pass the -cpu cortex-a53

, something like this:

 qemu-system-aarch64 -M virt -cpu cortex-a53 -kernel ./embox -m 1024 -nographic # ./embox -- ELF-

What is nice, a lot of peripherals are used for this platform, drivers for which were already in Embox - for example, PL011 for the console, ARM Generic Interrupt Controller, etc. Of course, these devices have different base register addresses and other interrupt numbers, but the main thing driver code works without changes on the new architecture. When the system starts, the control is in EL1.

i.MX8

Because of this piece of iron, porting to Aarch64 - i.MX8MQ Nitrogen8M was started.

Unlike QEMU, u-boot transfers control to the image in EL2, and, moreover, for some reason it includes MMU (all memory is mapped 1 to 1), which creates some additional problems during initialization.

Embox already supported i.MX6, and, well, in i.MX8 the part of the periphery is the same - for example, UART and Ethernet, which also worked (I had to fix a couple of places where there was a tight binding to 32-bit addresses). On the other hand, the interrupt controller is different there - ARM GICv3, which is quite different from the first version.

Conclusion

At the moment, support for Aarch64 in Embox is not complete, but there is already minimal functionality - interrupts, MMU, input-output via UART. Much remains to be finalized, but the first steps were easier to make than it seemed from the very beginning. There is much less documentation and articles than on ARM, but there is more than enough information to deal with everything.

In general, if you have experience with ARM, porting to Aarch64 is a feasible task. Although, as usual, you can stumble on some little things :)

You can download the project to poke it in QEMU from our repository , if you have any questions - write in the comments, or in the newsletter , or in the chat in the Telegram (there is also a channel ).

useful links

A64 Instructions
Fundamentals of ARMv8-A
ARM Architecture Reference Manual for Armv8, Armv8-A
Aarch64 ABI (calling convention)
Migrating code from ARM to ARM64 - a small presentation with recommendations for writing portable code

PS

On August 24-25, we will be speaking at TechTrain , listen to our performances about two or three times , come to the stand - we will answer your questions :)

All Articles