🍳 👨🏾‍💼 ⛅️ By order of Embedded developers: looking for bugs in Amazon FreeRTOS 👨🏽‍🏭 👿 🍉

Everyone who programs microcontrollers probably knows about FreeRTOS, or at least heard about this operating system. Amazon guys decided to expand the capabilities of this operating system for working with AWS Internet of Things services - this is how Amazon FreeRTOS appeared. We, the developers of the PVS-Studio code analyzer, were asked to check these projects in the mail and in the comments below the articles. Well, you asked - we did. What came of this - read on.

A bit about projects

To begin, I’ll tell you a little about the “dad” of the checked project - FreeRTOS (you can find the source code here ). As Wikipedia says, FreeRTOS is a multi-tasking real-time operating system for embedded systems.

It was written in the good old C, which is not surprising - this operating system should work in conditions typical of microcontrollers: low computing power, a small amount of RAM, and the like. The C language allows you to work with resources at a low level and has high performance, so it is the best suited for developing such an OS.

Now back to Amazon, which does not sit still and develops in various promising areas. For example, Amazon is developing the Amazon Lumberyard AAA game engine, which we also tested .

One such area is the Internet of Things (Internet of Things, IoT). To develop in this area, Amazon decided to write their own operating system - and they took the FreeRTOS kernel as a basis.

The resulting system - Amazon FreeRTOS - is positioned as "providing the ability to securely connect to Amazon Web Services, such as AWS IoT Core or AWS IoT Greengrass." The source code for this project is stored on Github.

In this article, we will examine whether there are errors in FreeRTOS, as well as how secure the operating system from Amazon is in terms of static code analysis.

How was the check

The code was checked using the automatic error search tool: PVS-Studio static code analyzer. It is able to detect errors in programs written in C, C ++, C # and Java.

Before starting the analysis, it is necessary to assemble the project - so I will be sure that I have all the necessary dependencies and everything is in order with the project. There are several ways to verify a project — for example, using a compilation monitoring system. I did the analysis using the plug-in for Visual Studio - it’s good that in the repositories of both projects there is a set of project files that make it easy to build under Windows.

All that was needed was to collect the projects to make sure that there was everything necessary for verification. Next, I started the analysis and voila! - in front of me is a ready-made analyzer report.

Third-party libraries included in these projects may also contain errors, and they, of course, can also affect the operation of the program. However, I excluded them from the analysis for the sake of purity of narration.

So, the projects are analyzed, reports are received, interesting errors are written out. It's time to move on to their analysis!

What hides FreeRTOS

Initially, I expected to write two separate articles: one for each operating system. I already rubbed my hands, preparing to write a good article about FreeRTOS. Anticipating the detection of at least a couple of juicy errors (like CWE-457 ), I eagerly looked at the analyzer's few warnings, and ... and nothing. I did not find any interesting errors.

Many warnings that the analyzer issued were not relevant for FreeRTOS. For example, such warnings were 64-bit shortcomings like casting size_t to uint32_t . This is due to the fact that FreeRTOS is designed to work on devices with a pointer size of no more than 32 bits.

I carefully checked all the V1027 warnings related to casts between pointers to unrelated structures. If reducible structures have the same alignment, then such a cast is not an error. And I did not find a single dangerous cast!

All other suspicious places were either associated with the coding style, or were equipped with a comment explaining why this is done here and why this is not a mistake.

In general, I want to contact the developers of FreeRTOS. You guys are really great! We have never met such clean and high-quality projects like yours. And I was very pleased to read clean, tidy, and well-documented code. Hats off to you.

Although I could not find interesting errors that day, I understood that I would not stop there. I was going home with the firm conviction that something interesting would be found in the Amazon version 100%, and that tomorrow I would definitely collect enough errors for the article. As you might have guessed, I was right.

What hides Amazon FreeRTOS

Amazon’s version of the system turned out to be ... to put it mildly, a little worse. The legacy of FreeRTOS has remained just as clean, but the new revisions turned out to be quite interesting.

In some places, the logic of the program was violated, somewhere incorrectly worked with pointers. In some places the code could lead to undefined behavior, but somewhere the programmer simply did not know about the error pattern that he made. I even found some serious potential vulnerabilities.

Something I delayed with the introduction. Let's start parsing the bugs!

Program logic violation

Let's start with the problem areas, which clearly indicate that the program does not run exactly as the programmer expected. The first such place will be suspicious work with an array:

/** * @brief Pool of request and associated response buffers, * handles, and configurations. */ static _requestPool_t _requestPool = { 0 }; .... static int _scheduleAsyncRequest(int reqIndex, uint32_t currentRange) { .... /* Set the user private data to use in the asynchronous callback context. */ _requestPool.pRequestDatas[reqIndex].pConnHandle = &_connHandle; _requestPool.pRequestDatas[reqIndex].pConnConfig = &_connConfig; _requestPool.pRequestDatas[reqIndex].reqNum = reqIndex; _requestPool.pRequestDatas[reqIndex].currRange = currentRange; _requestPool.pRequestDatas[reqIndex].currDownloaded = 0; _requestPool.pRequestDatas[reqIndex].numReqBytes = numReqBytes; .... _requestPool.pRequestDatas->scheduled = true; .... }

PVS-Studio issued two warnings for this piece of code:

V619 The array '_requestPool.pRequestDatas' is being utilized as a pointer to single object. iot_demo_https_s3_download_async.c 973
V574 The '_requestPool.pRequestDatas' pointer is used simultaneously as an array and as a pointer to single object. Check lines: 931, 973. iot_demo_https_s3_download_async.c 973

Just in case, let me remind you: the name of the array is a pointer to its first element. That is, if _requestPool.pRequestDatas is an array of structures, then _requestPool.pRequestDatas [i] .scheduled is access to the scheduled member of the i- th structure of the array. And if you write _requestPool.pRequestDatas-> scheduled , this will mean access to the scheduled member of the first structure of the array.

This is what happens in the code snippet above. The last line always changes the value only for a member of the first structure of the array. By itself, this call is already suspicious, but the situation here is even more obvious: throughout the whole body of the function, the _requestPool.pRequestDatas array is accessed by index, and only at the end of the indexing operation were forgotten to be applied.

As I understand it, the last line should look like this:

 _requestPool.pRequestDatas[reqIndex].scheduled = true;

The following error lies in a small function, so I will give it in its entirety:

 /* Return true if the string " pcString" is found * inside the token pxTok in JSON file pcJson. */ static BaseType_t prvGGDJsoneq( const char * pcJson, const jsmntok_t * const pxTok, const char * pcString ) { uint32_t ulStringSize = ( uint32_t ) pxTok->end - ( uint32_t ) pxTok->start; BaseType_t xStatus = pdFALSE; if( pxTok->type == JSMN_STRING ) { if( ( uint32_t ) strlen( pcString ) == ulStringSize ) { if( ( int16_t ) strncmp( &pcJson[ pxTok->start ], // <= pcString, ulStringSize ) == 0 ) { xStatus = pdTRUE; } } } return xStatus; }

PVS-Studio Warning: V642 [CWE-197] Saving the 'strncmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. aws_greengrass_discovery.c 637

Let's take a look at the strncmp function definition:

 int strncmp( const char *lhs, const char *rhs, size_t count );

In the example, a result of type int , whose size is 32 bits, is converted to a variable of type int16_t . With such a "narrowing" conversion, the most significant bits of the return value will be lost. For example, if the strncmp function returns 0x00010000 , then the one will be lost during the conversion, and the condition will be satisfied.

In fact, it is strange to see such a cast in a condition. Why even do it if you can compare normal int with zero? On the other hand, if the programmer consciously wanted the function to sometimes return true , even if it shouldn’t, then why is this tricky behavior not described by the comment? But then this is already a bookmark. In general, I am inclined to believe that this is a mistake. What do you think?

Undefined behavior and pointers

Now there will be a rather large example. It conceals the potential dereferencing of a null pointer:

 static void _networkReceiveCallback(....) { IotHttpsReturnCode_t status = IOT_HTTPS_OK; _httpsResponse_t* pCurrentHttpsResponse = NULL; IotLink_t* pQItem = NULL; .... /* Get the response from the response queue. */ IotMutex_Lock(&(pHttpsConnection->connectionMutex)); pQItem = IotDeQueue_PeekHead(&(pHttpsConnection->respQ)); IotMutex_Unlock(&(pHttpsConnection->connectionMutex)); /* If the receive callback is invoked * and there is no response expected, * then this a violation of the HTTP/1.1 protocol. */ if (pQItem == NULL) { IotLogError(....); fatalDisconnect = true; status = IOT_HTTPS_NETWORK_ERROR; goto iotCleanup; } .... iotCleanup : /* Report errors back to the application. */ if (status != IOT_HTTPS_OK) { if ( pCurrentHttpsResponse->isAsync && pCurrentHttpsResponse->pCallbacks->errorCallback) { pCurrentHttpsResponse->pCallbacks->errorCallback(....); } pCurrentHttpsResponse->syncStatus = status; } .... }

PVS-Studio Warning: V522 [CWE-690] There might be dereferencing of a potential null pointer 'pCurrentHttpsResponse'. iot_https_client.c 1184

Problem dereferences are at the very bottom if . Let's see what happens here.

At the beginning of the function, the pCurrentHttpsResponse and pQItem variables are initialized to NULL , and the status variable is initialized to IOT_HTTPS_OK , which means that everything goes smoothly.

Next, pQItem is assigned the value returned from the IotDeQueue_PeekHead function, which returns a pointer to the beginning of a doubly connected queue.

What happens if the queue is empty? In this case, the IotDeQueue_PeekHead function will return NULL :

 static inline IotLink_t* IotDeQueue_PeekHead (const IotDeQueue_t* const pQueue) { return IotListDouble_PeekHead(pQueue); } .... static inline IotLink_t* IotListDouble_PeekHead (const IotListDouble_t* const pList) /* @[declare_linear_containers_list_double_peekhead] */ { IotLink_t* pHead = NULL; if (pList != NULL) { if (IotListDouble_IsEmpty(pList) == false) { pHead = pList->pNext; } } return pHead; }

Next, the condition pQItem == NULL is satisfied, and the control flow proceeds via goto to the lower part of the function body. By this time, the pCurrentHttpsResponse pointer will remain null, and status will no longer be equal to IOT_HTTPS_OK . As a result, we will fall into that very branch if , and ... broads! The consequences of this dereferencing you yourself know.

Okay This was a slightly ornate example. Now I bring to your attention a very simple and understandable potential dereference:

 int PKI_mbedTLSSignatureToPkcs11Signature (uint8_t * pxSignaturePKCS, uint8_t * pxMbedSignature ) { int xReturn = 0; uint8_t * pxNextLength; /* The 4th byte contains the length of the R component */ uint8_t ucSigComponentLength = pxMbedSignature[ 3 ]; // <= if( ( pxSignaturePKCS == NULL ) || ( pxMbedSignature == NULL ) ) { xReturn = FAILURE; } .... }

PVS-Studio Warning: V595 [CWE-476] The 'pxMbedSignature' pointer was utilized before it was verified against nullptr. Check lines: 52, 54. iot_pki_utils.c 52

This function gets two pointers to uint8_t . Both pointers are checked for NULL , which is good practice - such situations must be worked out immediately.

But this is bad luck: by the time pxMbedSignature is checked, it will be dereferenced literally one line above. Ta-daa!

Another example of speculative code:

 CK_RV vAppendSHA256AlgorithmIdentifierSequence ( uint8_t * x32ByteHashedMessage, uint8_t * x51ByteHashOidBuffer ) { CK_RV xResult = CKR_OK; uint8_t xOidSequence[] = pkcs11STUFF_APPENDED_TO_RSA_SIG; if( ( x32ByteHashedMessage == NULL ) || ( x51ByteHashOidBuffer == NULL ) ) { xResult = CKR_ARGUMENTS_BAD; } memcpy( x51ByteHashOidBuffer, xOidSequence, sizeof( xOidSequence ) ); memcpy( &x51ByteHashOidBuffer[ sizeof( xOidSequence ) ], x32ByteHashedMessage, 32 ); return xResult; }

PVS-Studio warnings:

V1004 [CWE-628] The 'x51ByteHashOidBuffer' pointer was used unsafely after it was verified against nullptr. Check lines: 275, 280. iot_pkcs11.c 280
V1004 [CWE-628] The 'x32ByteHashedMessage' pointer was used unsafely after it was verified against nullptr. Check lines: 275, 281. iot_pkcs11.c 281

The analyzer warns that function parameters that are pointers are used insecurely after they have been tested for NULL . Indeed, the arguments are checked, but if any of them turns out to be NULL , no action is taken, except for writing to xResult . This piece of code seems to say: “Yeah, that means the arguments turned out to be bad. We’ll write it down now, while you continue, continue. ”

Bottom line: NULL will be passed to memcpy . What can come of this? Where will the values be copied and which ones? In fact, guessing about it is not worth it, because the standard clearly states that such a call leads to undefined behavior (see paragraph 1).

The analyzer report still contains examples of incorrect operation with pointers that I found in Amazon FreeRTOS, but I think that the examples given are enough to show you the capabilities of PVS-Studio in detecting such errors. Consider something new.

TRUE! = 1

Several errors that I found were related to one pattern, which, unfortunately, is often forgotten.

The fact is that the bool type (from C ++) is different from the BOOL type (commonly used in C). The first can only contain true or false . The second is a typedef of some integer type ( int , long , etc.). For him, “false” is the value 0 , and “true” is any value other than zero.

Since there is no built-in Boolean type in C, these constants are defined for convenience:

 #define FALSE 0 #define TRUE 1

Now consider an example:

 int mbedtls_hardware_poll(void* data, unsigned char* output, size_t len, size_t* olen) { int lStatus = MBEDTLS_ERR_ENTROPY_SOURCE_FAILED; HCRYPTPROV hProv = 0; /* Unferenced parameter. */ (void)data; /* * This is port-specific for the Windows simulator, * so just use Crypto API. */ if (TRUE == CryptAcquireContextA( &hProv, NULL, NULL, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT)) { if (TRUE == CryptGenRandom(hProv, len, output)) { lStatus = 0; *olen = len; } CryptReleaseContext(hProv, 0); } return lStatus; }

PVS-Studio warnings:

V676 [CWE-253] It is incorrect to compare the variable of BOOL type with TRUE. aws_entropy_hardware_poll.c 48
V676 [CWE-253] It is incorrect to compare the variable of BOOL type with TRUE. Correct expression is: 'FALSE! = CryptGenRandom (hProv, len, output)'. aws_entropy_hardware_poll.c 51

Found a mistake? And it is :) Functions CryptAcquireContextA and CryptGenRandom are standard functions from the wincrypt.h header. If successful, they return a nonzero value. I emphasize - non - zero . So, theoretically, this can be any value other than zero: 1 , 314 , 42 , 420 .

Apparently, the programmer who wrote the function from the example did not think about it, and as a result, the obtained values are compared with unity.

With what probability is the condition TRUE == CryptGenRandom (....) not fulfilled? It is hard to say. Maybe CryptGenRandom returns a unit more often than other values, and maybe it always returns only one. We cannot know for sure: the implementation of this cryptographic function is hidden from the eyes of mortal programmers :)

It is important to remember that such comparisons are potentially dangerous. And instead of:

 if (TRUE == GetBOOL())

use a safer option:

 if (FALSE != GetBOOL())

Optimization Issues

Several analyzer warnings have been associated with slow-running constructs. For example:

 int _IotHttpsDemo_GetS3ObjectFileSize(....) { .... pFileSizeStr = strstr(contentRangeValStr, "/"); .... }

PVS-Studio Warning: V817 It is more efficient to seek '/' character rather than a string. iot_demo_https_common.c 205

Briefly and clearly, is not it? The strstr function here is used to search for only one character, which is passed to the parameter as a string (it is enclosed in double quotes).

This place can be potentially optimized by replacing strstr with strchr :

 int _IotHttpsDemo_GetS3ObjectFileSize(....) { .... pFileSizeStr = strchr(contentRangeValStr, '/'); .... }

Then the search will work a little faster. A trifle, but nice.

Such optimizations are, of course, good, and the analyzer has found another place that can be optimized much more noticeably:

 void vRunOTAUpdateDemo(void) { .... for (; ; ) { .... xConnectInfo.cleanSession = true; xConnectInfo.clientIdentifierLength = (uint16_t)strlen(clientcredentialIOT_THING_NAME); xConnectInfo.pClientIdentifier = clientcredentialIOT_THING_NAME; .... } }

Warning PVS-Studio: V814 Decreased performance. The 'strlen' function was called multiple times inside the body of a loop. aws_iot_ota_update_demo.c 235

Hmmm ... Inside the loop, at each iteration, strlen is called, which each time calculates the length of the same line. Not the most efficient operation :)

Let's take a look at the definition of clientcredentialIOT_THING_NAME :

 /* * @brief Host name. * * @todo Set this to the unique name of your IoT Thing. */ #define clientcredentialIOT_THING_NAME ""

The user is prompted to enter the name of his device here. By default, it is empty, and in this case, everything is fine. But what if the user wants to enter some long and beautiful name there? For example, I would love to call my brainchild " The Passionate And Sophisticated Coffee Machine BarBarista-N061E The Ultimate Edition ". Can you imagine what my surprise would be if after that my beautiful coffee machine began to work a little slower? Mess!

To fix the error, strlen should be taken out of the loop body. After all, the name of the device does not change while the program is running. Ehhh, here would be constexpr from C ++ ...

Okay, okay, I admit: here I was a little thickened. As my colleague Andrei Karpov noted, modern compilers know what strlen is and he personally observed how they simply use a constant in binary code, if they understand that the length of a string cannot change. So there is a high probability that in the build mode of the release version, instead of the real calculation of the string length, a previously calculated value will simply be used. However, this does not always work, so writing such code is not a good practice.

A few words about MISRA

PVS-Studio analyzer has a large set of rules allowing you to check your code for compliance with MISRA C and MISRA C ++ standards. What are these standards?

MISRA is the coding standard for highly responsive embedded systems. It contains a set of strict rules and guidelines for writing code and setting up the development process. There are quite a few of these rules, and they are aimed not only at eliminating serious errors, but also at various “code smells”, as well as at writing maximum clear and readable code.

Thus, following the MISRA standard not only helps to avoid errors and vulnerabilities, but also significantly - significantly! - reduce the likelihood of their occurrence in an existing code.

MISRA is used in the aerospace, medical, automotive and military industries - where human lives depend on the quality of the embedded software.

Apparently, Amazon FreeRTOS developers are aware of this standard, and for the most part follow it. That's right: if you are writing a broad-based OS for embedded systems, then you must think about security.

However, I found quite a few violations of the MISRA standard. I will not give here examples of rules like “do not use union” or “a function should have only one return at the end of the body” - unfortunately, they are not spectacular, like most MISRA rules. I’d better give you examples of violations that could potentially lead to serious consequences.

Let's start with the macros:

 #define FreeRTOS_ms_to_tick(ms) ( ( ms * configTICK_RATE_HZ + 500 ) / 1000 )

 #define SOCKETS_htonl( ulIn ) ( ( uint32_t ) \ ( ( ( ulIn & 0xFF ) << 24 ) | ( ( ulIn & 0xFF00 ) << 8 ) \ | ( ( ulIn & 0xFF0000 ) >> 8 ) | ( ( ulIn & 0xFF000000 ) >> 24 ) ) )

 #define LEFT_ROTATE( x, c ) ( ( x << c ) | ( x >> ( 32 - c ) ) )

PVS-Studio warnings:

V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting the 'ms' parameter of the 'FreeRTOS_ms_to_tick' macro. FreeRTOS_IP.h 201
V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting the 'ulIn' parameter of the 'SOCKETS_htonl' macro. iot_secure_sockets.h 512
V2546 [MISRA C 20.7] The macro and its parameters should be enclosed in parentheses. Consider inspecting parameters 'x', 'c' of the 'LEFT_ROTATE' macro. iot_device_metrics.c 90

Yes, that’s exactly what you thought. The parameters of these macros are not wrapped in brackets. If someone accidentally writes something like

 val = LEFT_ROTATE(A[i] | 1, B);

then such a “call” macro will open in:

 val = ( ( A[i] | 1 << B ) | ( A[i] | 1 >> ( 32 - B ) ) );

Remember the priorities of operations? First, a bitwise shift is performed, and only after it is a bitwise “or”. Therefore, the logic of the program will be violated. A simpler example: what happens if the expression " x + y " is passed to the FreeRTOS_ms_to_tick macro? One of the main goals of MISRA is to prevent the occurrence of such situations.

Someone may object: “if you have programmers who do not know about this, then no standard will save you!” And I will not agree with this. Programmers are also people, and no matter how experienced a person is, he too can get tired and make a mistake at the end of the working day. This is one of the reasons MISRA strongly recommends the use of automated analysis tools to verify that a project complies with the standard.

I turn to Amazon FreeRTOS developers: PVS-Studio found 12 more unsafe macros, so you are more careful there with them :)

Another interesting MISRA violation:

 /** * @brief Callback for an asynchronous request to notify * that the response is complete. * * @param[in] 0pPrivData - User private data configured * with the HTTPS Client library request configuration. * @param[in] respHandle - Identifier for the current response finished. * @param[in] rc - Return code from the HTTPS Client Library * signaling a possible error. * @param[in] status - The HTTP response status. */ static void _responseCompleteCallback(void* pPrivData, IotHttpsResponseHandle_t respHandle, IotHttpsReturnCode_t rc, uint16_t status) { bool* pUploadSuccess = (bool*)pPrivData; /* When the remote server response with 200 OK, the file was successfully uploaded. */ if (status == IOT_HTTPS_STATUS_OK) { *pUploadSuccess = true; } else { *pUploadSuccess = false; } /* Post to the semaphore that the upload is finished. */ IotSemaphore_Post(&(_uploadFinishedSem)); }

Can you find the error yourself?

PVS-Studio Warning: V2537 [MISRA C 2.7] Functions should not have unused parameters. Consider inspecting the parameter: 'rc'. iot_demo_https_s3_upload_async.c 234

Take a closer look: the rc parameter is not used anywhere in the function body. Moreover, in the commentary on the function it is explicitly written that this parameter is a return code of another function, and that it can signal an error. Then why this parameter is not processed in any way? There is clearly something wrong here.

However, even without such comments, unused parameters often indicate broken program logic. Otherwise, why are they needed in the function signature?

Here I have given a small function that is well suited for an example in the article. In addition to her, I found 10 more unused parameters. Many of them are used in larger functions, and finding them is not the easiest thing.

It is suspicious that they were not found before. Indeed, compilers can easily detect such cases.

Conclusion

These were not all the problem areas found by the analyzer, but the article was already quite large. I hope that thanks to it, Amazon FreeRTOS developers will be able to fix some shortcomings, and maybe even want to try PVS-Studio on their own. So it will be possible to study the warnings more thoroughly, and indeed - to work with a convenient interface is much easier than looking at a text report.

Thank you for reading our articles! See you in the next issue: D

PS It just so happened that this article was published on October 31. Therefore, I wish everyone a happy Halloween!

If you want to share this article with an English-speaking audience, then please use the link to the translation: George Gribkov. On request of Embedded Developers: Detecting Errors in Amazon FreeRTOS .

By order of Embedded developers: looking for bugs in Amazon FreeRTOS

A bit about projects

How was the check

What hides FreeRTOS

What hides Amazon FreeRTOS

Program logic violation

Undefined behavior and pointers

TRUE! = 1

Optimization Issues

A few words about MISRA

Conclusion

More articles: