Skip to content

utbot inside

Olga Naumenko edited this page May 17, 2023 · 1 revision

How UTBot Works

UTBot uses KLEE symbolic execution engine to obtain information about program branches. C/C++ projects usually consist of source files and headers, which are compiled and linked into libraries and executable files (while libraries can also be linked to one another or other executables, forming a directed acyclic graph of project linkage). Executables and libraries can be considered as the artifacts of project building. KLEE operates with LLVM IR so, to symbolically execute code from a given product, UTBot needs the information about how the project is compiled and linked in order to be able to recompile it into LLVM IR. UTBot is able to get this information from project build system via Bear tool [more info]. UTBot supports CMake and Makefile build systems.

Then, UTBot analyses given sources using clang-tooling to obtain C/C++ AST of files under test. Note that clang-tooling requires a compilation database to work. Essentially, a compilation database is a structural representation of compile commands used to obtain C/C++ object files from project sources, and it is also retrieved from the build system by UTBot. UTBot treats functions as structural units of the project and generates tests for them. In order to achieve this, UTBot creates symbolic wrappers called KLEE files for every project source file [more info], that use symbolic variables concept for symbolic execution. Symbolic variables can be treated as logical constraints for variables, which are transformed into concrete values as the result of symbolic execution. Symbolic variables are the core of symbolic execution approach to software verification, and they are used in many successful tools for software verification and test generation.

Then, UTBot uses information about project compilation and linkage to rebuild the project in LLVM bitcode [more info]. This is done to pass the linkage result (.bc file) to KLEE. If a function calls other functions from the project, UTBot can provide KLEE with their bitcode as well, or, depending on user preferences, use generated symbolic stubs for them stubs.

KLEE produces files that represent test cases for a given function, and UTBot parses and transforms them to C++ test files (for supported C/C++ features, see C syntax, C++ syntax), which present the test cases with GoogleTest framework. UTBot also provides its users with a build system for generated tests, automatically compiling and linking them with project files. Users can run tests [more info] with a single click and see the results and coverage information in VSCode Panel. UTBot uses Makefiles for test runs [more info], so UTBot tests can be used in continuous integration.

UTBotCpp pipeline

The most important things are described below in sequential order.

Tests Generation

  1. Client sends request (one of types) to server. The corresponding object aggregates requests' information.
  2. Fetcher fetches types, methods, global variable usages, array usages, includes for all files in request. It uses LibTooling for parsing source code, traversing types, etc.
  3. SourceToHeaderRewriter generates headers (.h files) based on source (.c files).
  4. FeaturesFilter filters out all methods except the ones listed in supported syntax.
  5. Synchronizer synchronizes stubs and wrappers for all files in compile_commands.json.
  6. KleeGenerator generates klee files as well as source files, also builds them in bitcode files.
  7. Linker links bitcode files together into module.
  8. Linker writes Makefiles.
  9. KleeRunner runs KLEE on module consecutively for each method from request.
  10. KleeGenerator parses output of KLEE and writes files of GoogleTest.
  11. Server sends final response to client and also transfers generated files for remote scenario.

Tests Running

When user runs the tests, server performs various operations that might depend on the compiler used by the project.

  1. Client sends request to server.
  2. TestRunner chooses coverage tool based on the compiler used by the project. It is GcovCoverageTool in case of gcc and LlvmCoverageTool in case of clang.
  3. TestRunner collects list of tuples (file path, suite name, test name) based on the information from the client.
  4. TestRunner incrementally builds tests with instrumented options (coverage and sanitizer). To achieve this Makefiles are used.
  5. TestRunner runs tests by calling make on makefiles that correspond to the given sources.
  6. CoverageAndResultsGenerator collects coverage information using previously chosen tool.
  7. Both GcovCoverageTool and LlvmCoverageTool read generated .json file and determine set of covered lines.
  8. Finally, Server sends response to the client.
Clone this wiki locally