Creating a plugin for Clang Static Analyzer to search for integer overflows







Article Author: 0x64rem







Introduction



A year and a half ago, I had the idea to realize my phaser as part of the thesis at the university. I began to study materials about control flow graphs, data flow graphs, symbolic execution, etc. Next came the search for tools, a sample of different libraries (Angr, Triton, Pin, Z3). Nothing concrete happened in the end, until this summer I went to the Digital Security Summer of Hack 2019 summer program, where I was offered the extension of the Clang Static Analyzer as a project theme. It seemed to me that this topic will help me put my theoretical knowledge on the shelves, begin to implement something substantial and get recommendations from experienced mentors. Next, I will tell you how the process of writing the plug-in went and describe the course of my thoughts during the internship month.







Clang static analyzer



For development, Clang provides three interface options for interaction:









Since we are going to expand the capabilities of Clang Static Analyzer, we choose the implementation of the plugin. You can write code for the plugin in C ++ or Python.







For the latter, there are binders that allow you to parse the source code, iterate over the nodes of the resulting abstract syntax tree, also have access to the properties of the nodes and can map the node to the line of the source code. Such a set is suitable for a simple checker. You can find out more about the code in the llvm repository .







My task requires a detailed analysis of the code, so C ++ was chosen for development. Next is an introduction to the tool.







Clang Staic Analyzer (hereinafter CSA) is a tool for static analysis of C / C ++ / Objective-C code based on symbolic execution. The analyzer can be called through the Clang frontend by adding the -cc1 and -analyze flags to the build command, or through a separate scan-build binar. In addition to the analysis itself, CSA makes it possible to generate visual html reports.







# ,      clang' clang -cc1 --help #  CSA  №1 clang++ -cc1 -x c++ -load path/to/Checker.so -analyze -analyzer-checker=test.Me -analyzer-config $BUILD_OPTIONS Checker.cpp
      
      





  #  CSA  №2 scan-build -load-plugin path/to/Checker.so -enable-checker test.Me $BUILD_COMMAND
      
      





  #       DivideZero clang++ -cc1 -analyze -analyzer-checker=core.DivideZero -o reports div-by-zero-test.cpp
      
      











CSA has an excellent library for parsing source code using AST (Abstract Syntax Tree), CFG (Control Flow Graph). From the structures you can see further the declarations of variables, their types, the use of binary and unary operators, you can get symbolic expressions, etc. My plugin will use the functionality of AST classes, this choice will be justified further. The following is a list of classes that was used in the implementation of the plugin, the list will help to get a primary understanding of the capabilities of CSA:









Integer Overflow Search



To start implementing the plugin, you need to choose the task that it will solve. For this case, the llvm website provides lists of potential checkers ; you can also modify existing stable or alpha checkers. In reviewing the code of the available checkers, it became clear that for a more successful development of libclang it is better to write your checker from scratch, so the choice was made from a list of unrealized ideas . As a result, the option was chosen to create a checker for the integer overflow detection. Clang already has functionality to prevent this vulnerability (the flags -ftrapv, -fwrapv and the like are indicated for its use), it is built into the compiler, and such exhaust is poured into warnings, and it is not often looked there. There is still UBSan , but these are sanitizers, not everyone uses them, and this method is about identifying problems at runtime, and the CSA plug-in works at compile time, analyzing the sources.







Next is the collection of material on the selected vulnerability. Integer overflow used to be something simple and not serious. In fact, the vulnerability is entertaining and can have impressive consequences.

Integer overflows are a type of vulnerability that could result in integer-type data in the code taking unexpected values. Overflow - if the variable has become larger than it was intended, Underflow - less than its original type. Such errors can appear both because of the programmer, and because of the compiler.







In C ++, during the operation of comparing arithmetic, integer values ​​are cast to the same type, more often to a larger bit. And such ghosts occur everywhere and constantly, they can be explicit or implicit. There are several rules by which ghosts occur [1]:









Those. the trigger for the vulnerability can be unsafe user input, incorrect arithmetic, incorrect type conversion caused by a programmer or compiler during optimization. The time bomb option is also possible, when a piece of code is harmless with one version of the compiler, but with the release of a new optimization algorithm “explodes” and causes unexpected behavior. In history, there has already been such a case with the SafeInt class (very ironic) [5, 6.5.2].







Integer overflows open a wide vector: it is possible to force execution to take a different path (if the overflow affects conditional statements), cause a buffer overflow. For clarity, you can familiarize yourself with specific CVEs, see their causes, consequences. Naturally, it is better to search for integer overflow in open source products so that you not only read the description, but also see the code.









In order not to reinvent the wheel, the code for detecting integer overflow in the CppCheck static analyzer was considered. His approach is as follows:







  1. Determine if an expression is a binary operator.
  2. If so, check to see if both arguments are of integer type.
  3. Determine the size of the types.
  4. Check by means of calculations whether the value can go beyond its maximum or minimum limits.

    But at this stage it did not give clarity. It turns out a lot of different stories, and from this systematization of information becomes more difficult. Everything in its place put the list of CWE . In total, there are 9 types of integer overflow allocated on the site:

    • 190 - integer oveflow
    • 191 - integer underflow
    • 192 - integer coertion error
    • 193 - off-by-one
    • 194 - Unexpected Sign Extension
    • 195 - Signed to Unsigned Conversion Error
    • 196 - Unsigned to Signed Conversion Error
    • 197 - Numeric Truncation Error
    • 198 - Use of Incorrect Byte Ordering


We consider the reason for each option and understand that overflows occur with incorrect explicit / implicit casts. And since any casts are displayed in the structure of the abstract syntax tree, we will use AST for analysis. In the figure below (Fig. 3), it can be seen that any operation that causes a cast in the tree is a separate node, and, wandering around the tree, we can check all type conversions based on a table with transformations that may cause an error.







Sign g Sign l Sign e Unsign g Unsign l Unsign e
Sign + - + - - -
Unsign + - - - - +








More specifically, the algorithm sounds like this: we go around Casts and look at IntegralCast (integer conversions). If you find a suitable node, look at the descendants in search of a binary operation or Decl (variable declaration). In the first case, you need to check the sign and bit depth that the binary operation uses. In the second case, compare only the type of declaration.







Checker implementation



Let's get down to implementation. We need a skeleton for a checker, which can be a stand-alone library, or can be assembled as part of Clang. In the code, the difference will be small. If you are planning to write your own plugin, I recommend that you read the small pdf immediately: “Clang Static Analyzer: A Checker Developer's Guide” , the basic things are well described there, though something may not be relevant anymore, the library is updated regularly, but you’ll be updated grab right away.







If you want to add your checker to your clang assembly, then you need to:







  1. Write the checker itself with approximately the following content:







     namespace { class SuperChecker : public Checker<check::PreStmt<BinaryOperator>> { //       ,    .       struct CheckerOpts { //       string FlagOne; int FlagTwo; }; CheckerOpts Opts; //cool code }; } void ento::registerSuperChecker(CheckerManager &mgr) { auto checker = mgr.registerChecker<SuperChecker>(); //       ,   4    //       ,  stand-alone    . AnalyzerOptions &AnOpts = mgr.getAnalyzerOptions(); SuperChecker::CheckerOpts &ChOpts = checker->Opts; ChOpts.FlagOne = AnOpts.getCheckerStringOption("Inp1", "", checker); ChOpts.FlagTwo = AnOpts.getCheckerIntegerOption("Inp2", 0, checker); // getCheckerIntegerOption:  ,  ,   }
          
          





  2. Then, in the source code of Clang, you need to change the files CMakeLists.txt



    and Checkers.td



    . Live around here ${llvm-source-path}/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt





    and here ${llvm-source-path}/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td



    .

    In the first, you just need to add the file name with the code, in the second you need to add a structural description:







      #Checkers.td def SuperChecker : Checker<"SuperChecker">, HelpText<"test checker">, Documentation<HasDocumentation>;
          
          







If it is not clear, then in the Checkers.td



file there are enough examples of how and what to do.







Most likely you will not want to rebuild Clang, and you will resort to the option with the library assembly (so / dll). Then in the code of the checker should be something like this:







 namespace { class SuperChecker : public Checker<check::PreStmt<BinaryOperator>> { //       ,    .       struct CheckerOpts { string FlagOne; int FlagTwo; }; CheckerOpts Opts; //cool code }; } void initializationFunction(CheckerManager &mgr){ SuperChecker *checker = mgr.registerChecker<SuperChecker>(); //       ,   4    AnalyzerOptions &AnOpts = mgr.getAnalyzerOptions(); TestChecker::CheckerOpts &ChOpts = checker->Opts; ChOpts.FlagOne = AnOpts.getCheckerStringOption("Inp1", "", checker); ChOpts.FlagTwo = AnOpts.getCheckerIntegerOption("Inp2", 0, checker); // getCheckerIntegerOption:  ,  ,   } extern "C" void clang_registerCheckers (CheckerRegistry &registry) { registry.addChecker(&initializationFunction, "test.Me", "SuperChecker description", "doc_link"); } extern "C" const char clang_analyzerAPIVersionString [] = "8.0.1";
      
      





Next, collect your code, you can write your own script for assembly, but if you have any problems with this (as the author had :)), then you can use the Makefile in the clang source code and the make clangStaticAnalyzerCheckers command in a strange way.







Next, call the checker:















To fix this, we can:









To reinforce further arguments, it is worth mentioning that when analyzing Clang, all files specified in the #include



directive also parse, as a result, the size of the resulting AST increases. As a result, of the proposed options, only one is rational regarding a specific task:









The remaining time for the internship was spent reading GenericTaintChecker.cpp



and trying to remake it to fit your needs. It did not work out successfully by the end of the term, but it remained a task for refinement already beyond the scope of training at DSec. Also during the development it became clear that identifying dangerous functions is a separate task, not always dangerous places in the project come from some standard functions, so a flag was added to the checker to indicate a list of functions that will be considered “poisoned” / “marked” during taint analysis.

Additionally, a check was added to determine if the variable is a bit field. By standard CSA tools, the size is determined by type, and if we work with a bit field, then its size will have the value of the bit type of the entire field, and not the number of bits specified in the variable declaration.







What is the result?



At the moment, a simple checker has been implemented that can warn only of potential integer overflows. A modified class for taint analysis, which still has a lot of work to do. After that, you need to use SMT to determine overflows. For this, the Z3 SMT solver is suitable, which was added to the Clang assembly in version 5.0.0 (judging by release notes ). To use the solver, it is necessary that Clang be built with the option CLANG_ANALYZER_BUILD_Z3=ON



, and when the CSA plugin is called directly, the flags -Xanalyzer -analyzer-constraints=z3



are transmitted.







GitHub Results Repository







References:



  1. Howard M., Leblanc D., Viega J. "The 24 Sins of Computer Security"







  2. How to Write a Checker in 24 Hours







  3. Clang Static Analyzer: A Checker Developer's Guide







  4. CSA checker development manual







  5. Dietz W. et al. Understanding integer overflow in C / C ++










All Articles