CMake: the case when the project is unforgivable the quality of its code





Picture 1






CMake is a cross-platform automation system for building projects. This system is much older than the PVS-Studio static code analyzer, while no one has yet tried to apply it to the code and review errors. It turns out that there are many mistakes. CMake's audience is huge. On it, new projects begin and old ones are transferred. It’s scary to imagine how many programmers might have this or that error.



Introduction



CMake (from the English cross-platform make) is a cross-platform automation system for building software from source code. CMake does not directly build, but only generates assembly control files from CMakeLists.txt files. The first release of the program took place in 2000. For comparison, the PVS-Studio static analyzer appeared only in 2008. Then it was focused on finding errors when porting programs from 32-bit systems to 64-bit ones, and in 2010 the first set of general-purpose diagnostics ( V501 - V545 ) appeared. By the way, there are several warnings from this first set on the CMake code.



Unforgivable Mistakes



V1040 Possible typo in the spelling of a pre-defined macro name. The '__MINGW32_' macro is similar to '__MINGW32__'. winapi.h 4112



/* from winternl.h */ #if !defined(__UNICODE_STRING_DEFINED) && defined(__MINGW32_) #define __UNICODE_STRING_DEFINED #endif
      
      





V1040 diagnostics have only recently been implemented. At the time of publication of the article, most likely, there will not be a release with it, but with the help of this diagnostics we have already managed to find a tough mistake.



Here they made a typo in the name __MINGW32_ . In the end, one underscore is missing. If you search by code with this name, you can make sure that the project actually uses the version with two underscores on both sides:







Picture 3








V531 It is odd that a sizeof () operator is multiplied by sizeof (). cmGlobalVisualStudioGenerator.cxx 558



 bool IsVisualStudioMacrosFileRegistered(const std::string& macrosFile, const std::string& regKeyBase, std::string& nextAvailableSubKeyName) { .... if (ERROR_SUCCESS == result) { wchar_t subkeyname[256]; // <= DWORD cch_subkeyname = sizeof(subkeyname) * sizeof(subkeyname[0]); // <= wchar_t keyclass[256]; DWORD cch_keyclass = sizeof(keyclass) * sizeof(keyclass[0]); FILETIME lastWriteTime; lastWriteTime.dwHighDateTime = 0; lastWriteTime.dwLowDateTime = 0; while (ERROR_SUCCESS == RegEnumKeyExW(hkey, index, subkeyname, &cch_subkeyname, 0, keyclass, &cch_keyclass, &lastWriteTime)) { .... } .... }
      
      





When the array is statically declared, the sizeof operator will calculate its size in bytes, taking into account both the number of elements and the size of the elements. When calculating the value of the variable cch_subkeyname, the programmer did not take this into account and received a value 4 times larger than planned. Let us explain where it is "4 times."



The array and its incorrect size are passed to the RegEnumKeyExW function:



 LSTATUS RegEnumKeyExW( HKEY hKey, DWORD dwIndex, LPWSTR lpName, // <= subkeyname LPDWORD lpcchName, // <= cch_subkeyname LPDWORD lpReserved, LPWSTR lpClass, LPDWORD lpcchClass, PFILETIME lpftLastWriteTime );
      
      





The pointer lpcchName must point to a variable containing the size of the buffer specified in characters: "A pointer to a variable that specifies the size of the buffer specified by the lpClass parameter, in characters". The size of the subkeyname array is 512 bytes and it is capable of storing 256 characters of the wchar_t type (in Windows, wchar_t is 2 bytes). This value is 256 and should be passed to the function. Instead, 512 is multiplied by 2 again to get 1024.



I think how to fix the error is now clear. Instead of multiplication, use division:



 DWORD cch_subkeyname = sizeof(subkeyname) / sizeof(subkeyname[0]);
      
      





By the way, exactly the same error occurs when calculating the value of the cch_keyclass variable.



The described error can potentially lead to a buffer overflow. It is necessary to fix all such places:





V595 The 'this-> BuildFileStream' pointer was utilized before it was verified against nullptr. Check lines: 133, 134. cmMakefileTargetGenerator.cxx 133



 void cmMakefileTargetGenerator::CreateRuleFile() { .... this->BuildFileStream->SetCopyIfDifferent(true); if (!this->BuildFileStream) { return; } .... }
      
      





The this-> BuildFileStream pointer is dereferenced just before the validation check. Was this really causing any problems? Below is another example of such a place. It is made directly under the carbon paper. But in fact, there are a lot of V595 warnings and most of them are not so obvious. From experience, I can say that correcting warnings of this diagnosis is the longest.





V614 Uninitialized pointer 'str' used. cmVSSetupHelper.h 80



 class SmartBSTR { public: SmartBSTR() { str = NULL; } SmartBSTR(const SmartBSTR& src) { if (src.str != NULL) { str = ::SysAllocStringByteLen((char*)str, ::SysStringByteLen(str)); } else { str = ::SysAllocStringByteLen(NULL, 0); } } .... private: BSTR str; };
      
      





The analyzer detected the use of the uninitialized pointer str . And this arose because of the usual typo. When calling the SysAllocStringByteLen function , you had to use the src.str pointer.



V557 Array overrun is possible. The value of 'lensymbol' index could reach 28. archive_read_support_format_rar.c 2749



 static int64_t expand(struct archive_read *a, int64_t end) { .... if ((lensymbol = read_next_symbol(a, &rar->lengthcode)) < 0) goto bad_data; if (lensymbol > (int)(sizeof(lengthbases)/sizeof(lengthbases[0]))) goto bad_data; if (lensymbol > (int)(sizeof(lengthbits)/sizeof(lengthbits[0]))) goto bad_data; len = lengthbases[lensymbol] + 2; if (lengthbits[lensymbol] > 0) { if (!rar_br_read_ahead(a, br, lengthbits[lensymbol])) goto truncated_data; len += rar_br_bits(br, lengthbits[lensymbol]); rar_br_consume(br, lengthbits[lensymbol]); } .... }
      
      





Several problems were found in this piece of code. When accessing the arrays of lengthbases and lengthbits, it is possible to go beyond the boundary of the array, because above the code, the developers wrote the operator '>' instead of '> ='. Such a check began to skip one invalid value. We are faced with a classic error pattern called Off-by-one Error .



The whole list of places to access arrays by invalid index:





Memory leak



V773 The function was exited without releasing the 'testRun' pointer. A memory leak is possible. cmCTestMultiProcessHandler.cxx 193



 void cmCTestMultiProcessHandler::FinishTestProcess(cmCTestRunTest* runner, bool started) { .... delete runner; if (started) { this->StartNextTests(); } } bool cmCTestMultiProcessHandler::StartTestProcess(int test) { .... cmCTestRunTest* testRun = new cmCTestRunTest(*this); // <= .... if (testRun->StartTest(this->Completed, this->Total)) { return true; // <= } this->FinishTestProcess(testRun, false); // <= return false; }
      
      





The analyzer detected a memory leak. Memory by pointer testRun is not freed if function testRun-> StartTest returns true . When another branch of code is executed, the memory using the testRun pointer is freed in the this-> FinishTestProcess function .



Resource leak



V773 The function was exited without closing the file referenced by the 'fd' handle. A resource leak is possible. rhash.c 450



 RHASH_API int rhash_file(....) { FILE* fd; rhash ctx; int res; hash_id &= RHASH_ALL_HASHES; if (hash_id == 0) { errno = EINVAL; return -1; } if ((fd = fopen(filepath, "rb")) == NULL) return -1; if ((ctx = rhash_init(hash_id)) == NULL) return -1; // <= fclose(fd); ??? res = rhash_file_update(ctx, fd); fclose(fd); rhash_final(ctx, result); rhash_free(ctx); return res; }
      
      





Strange logic in conditions



V590 Consider inspecting the '* s! =' \ 0 '&& * s ==' '' expression. The expression is excessive or contains a misprint. archive_cmdline.c 76



 static ssize_t get_argument(struct archive_string *as, const char *p) { const char *s = p; archive_string_empty(as); /* Skip beginning space characters. */ while (*s != '\0' && *s == ' ') s++; .... }
      
      





Comparing the * s character with a terminal zero is superfluous. The condition of the while loop depends only on whether the character is equal to a space or not. This is not a mistake, but an extra complication of the code.



V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. cmCTestTestHandler.cxx 899



 void cmCTestTestHandler::ComputeTestListForRerunFailed() { this->ExpandTestsToRunInformationForRerunFailed(); ListOfTests finalList; int cnt = 0; for (cmCTestTestProperties& tp : this->TestList) { cnt++; // if this test is not in our list of tests to run, then skip it. if ((!this->TestsToRun.empty() && std::find(this->TestsToRun.begin(), this->TestsToRun.end(), cnt) == this->TestsToRun.end())) { continue; } tp.Index = cnt; finalList.push_back(tp); } .... }
      
      





The analyzer warns that perhaps negation should be put out of brackets. It seems that there is no such mistake here - just extra double brackets. But, most likely, there is a logical error in this condition.



The continue statement is executed if the list of tests this-> TestsToRun is not empty and cnt is absent in it. It is logical to assume that if the test list is empty, then the same action must be performed. Most likely, the condition should be like this:



 if (this->TestsToRun.empty() || std::find(this->TestsToRun.begin(), this->TestsToRun.end(), cnt) == this->TestsToRun.end()) { continue; }
      
      





V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. cmMessageCommand.cxx 73



 bool cmMessageCommand::InitialPass(std::vector<std::string> const& args, cmExecutionStatus&) { .... } else if (*i == "DEPRECATION") { if (this->Makefile->IsOn("CMAKE_ERROR_DEPRECATED")) { fatal = true; type = MessageType::DEPRECATION_ERROR; level = cmake::LogLevel::LOG_ERROR; } else if ((!this->Makefile->IsSet("CMAKE_WARN_DEPRECATED") || this->Makefile->IsOn("CMAKE_WARN_DEPRECATED"))) { type = MessageType::DEPRECATION_WARNING; level = cmake::LogLevel::LOG_WARNING; } else { return true; } ++i; } .... }
      
      





A similar example, but here I am more confident in the presence of an error. The IsSet function ("CMAKE_WARN_DEPRECATED") verifies that the CMAKE_WARN_DEPRECATED value is set globally, and the IsOn function ("CMAKE_WARN_DEPRECATED") verifies that the value is specified in the project configuration. Most likely, the negation operator is superfluous, because in both cases, it is correct to set the same type and level values.



V728 An excessive check can be simplified. The '(A &&! B) || (! A && B) 'expression is equivalent to the' bool (A)! = Bool (B) 'expression. cmCTestRunTest.cxx 151



 bool cmCTestRunTest::EndTest(size_t completed, size_t total, bool started) { .... } else if ((success && !this->TestProperties->WillFail) || (!success && this->TestProperties->WillFail)) { this->TestResult.Status = cmCTestTestHandler::COMPLETED; outputStream << " Passed "; } .... }
      
      





Such code can be greatly simplified by rewriting the conditional expression this way:



 } else if (success != this->TestProperties->WillFail) { this->TestResult.Status = cmCTestTestHandler::COMPLETED; outputStream << " Passed "; }
      
      





Some more places you can simplify:





Miscellaneous warnings



V523 The 'then' statement is equivalent to the subsequent code fragment. archive_read_support_format_ar.c 415



 static int _ar_read_header(struct archive_read *a, struct archive_entry *entry, struct ar *ar, const char *h, size_t *unconsumed) { .... /* * "__.SYMDEF" is a BSD archive symbol table. */ if (strcmp(filename, "__.SYMDEF") == 0) { archive_entry_copy_pathname(entry, filename); /* Parse the time, owner, mode, size fields. */ return (ar_parse_common_header(ar, entry, h)); } /* * Otherwise, this is a standard entry. The filename * has already been trimmed as much as possible, based * on our current knowledge of the format. */ archive_entry_copy_pathname(entry, filename); return (ar_parse_common_header(ar, entry, h)); }
      
      





The expression in the last condition is identical to the last two lines of the function. This code can be simplified by removing the condition, or there is an error in the code and should be fixed.



V535 The variable 'i' is being used for this loop and for the outer loop. Check lines: 2220, 2241. multi.c 2241



 static CURLMcode singlesocket(struct Curl_multi *multi, struct Curl_easy *data) { .... for(i = 0; (i< MAX_SOCKSPEREASYHANDLE) && // <= (curraction & (GETSOCK_READSOCK(i) | GETSOCK_WRITESOCK(i))); i++) { unsigned int action = CURL_POLL_NONE; unsigned int prevaction = 0; unsigned int comboaction; bool sincebefore = FALSE; s = socks[i]; /* get it from the hash */ entry = sh_getentry(&multi->sockhash, s); if(curraction & GETSOCK_READSOCK(i)) action |= CURL_POLL_IN; if(curraction & GETSOCK_WRITESOCK(i)) action |= CURL_POLL_OUT; actions[i] = action; if(entry) { /* check if new for this transfer */ for(i = 0; i< data->numsocks; i++) { // <= if(s == data->sockets[i]) { prevaction = data->actions[i]; sincebefore = TRUE; break; } } } .... }
      
      





The variable i is used as a loop counter in the outer and nested loops. In this case, the counter value again starts counting from zero in the enclosed one. This may not be a mistake here, but the code is suspicious.



V519 The 'tagString' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 84, 86. cmCPackLog.cxx 86



 oid cmCPackLog::Log(int tag, const char* file, int line, const char* msg, size_t length) { .... if (tag & LOG_OUTPUT) { output = true; display = true; if (needTagString) { if (!tagString.empty()) { tagString += ","; } tagString = "VERBOSE"; } } if (tag & LOG_WARNING) { warning = true; display = true; if (needTagString) { if (!tagString.empty()) { tagString += ","; } tagString = "WARNING"; } } .... }
      
      





The tagString variable is frayed by the new value in all places. It’s hard to say what the mistake was, or why they did it. Perhaps the operators '=' and '+ =' were confused.



The whole list of such places:





V519 The 'aes-> aes_set' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 4052, 4054. archive_string.c 4054



 int archive_mstring_copy_utf8(struct archive_mstring *aes, const char *utf8) { if (utf8 == NULL) { aes->aes_set = 0; // <= } aes->aes_set = AES_SET_UTF8; // <= .... return (int)strlen(utf8); }
      
      





Forcing AES_SET_UTF8 looks suspicious. I think such a code will mislead any developer who is faced with the refinement of this place.



This code was copied to one more place:





How to find errors in a project on CMake



In this section I will tell you a little how to easily and easily check projects on CMake using PVS-Studio.



Windows / Visual Studio



For Visual Studio, you can generate the project file using the CMake GUI or the following command:



 cmake -G "Visual Studio 15 2017 Win64" ..
      
      





Next, you can open the .sln file and test the project using the plug-in for Visual Studio.



Linux / macOS



On these systems, the compile_commands.json file is used to verify the project. By the way, it can be generated in different assembly systems. In CMake, this is done like this:



 cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ..
      
      





It remains to run the analyzer in the directory with the .json file:



 pvs-studio-analyzer analyze -l /path/to/PVS-Studio.lic -o /path/to/project.log -e /path/to/exclude-path -j<N>
      
      





We also developed a module for CMake projects. Some people like to use it. The CMake module and examples of its use can be found in our repository on GitHub: pvs-studio-cmake-examples .



Conclusion



The huge audience of CMake users is a good tester of the project, but many problems could not have been prevented before the release, using such static code analysis tools as PVS-Studio .



If you liked the results of the analyzer, but your project is not written in C and C ++, I want to remind you that the analyzer also supports analysis of projects in C # and Java. You can try the analyzer on your project by going to this page.











If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. CMake: the Case when the Project's Quality is Unforgivable .



All Articles