Skip to content

A lighweight, intrusive memory profiler that allows to categorize or "tag" memory allocation inside C/C++ projects

License

Notifications You must be signed in to change notification settings

f18m/malloc-tag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

malloc-tag

A lighweight, intrusive memory profiler that allows to categorize memory allocation inside C/C++ projects, for Linux systems.

In short this project provides a small dynamic library (.so) containing a malloc interposer plus some basic tagging facility to instrument your code and add a tag / category to each memory allocation inside C/C++ projects. Malloc interposition (more on that later on) allows to intercept any memory allocation done using standard malloc()/new operators even if that happens from inside 3rd party libraries, C++ STL library, etc.

The malloc interposer provided in this project is garantueed to perform only O(1) counter updates and then simply use the original glibc malloc() implementation. In practice the per-malloc() overhead imposed by this library boils down to a few conditional jumps, a pointer deferencing and an integer sum.

In summary, this library enables minimal-overhead, per-thread memory profiling. It has been designed to be integrated as "always on" profiler vs being enabled only for debugging purposes. The target is to enable developers to answer questions like:

  • Why your application is using so much memory?
  • In which part of the code should you focus your attention to decrease the memory footprint of your application?
  • Is there a thread/part of the code allocating abnormal amounts of memory?

Additionally this project also provides Python-based tools that can be used to postprocess and manipulate the JSON files produced by malloc-tag and that contain "memory usage snapshots".

High-Level Design Criteria

  • close-to-zero overhead compared to original glibc malloc/free implementations to allow "always on" behavior
  • zero memory allocations inside the malloc/free hooks themselves
  • multi-threading friendly: no global mutexes (across all application's threads)
  • fast memory accounting (happening for each malloc/free operation), slow reporting (this is expected to be a human-driven process for profiling purposes)
  • C++ aware

Technical Implementation

Overview

A graphical overview of how malloc-tag works:

overview_svg

You may be wondering: how is it possible to "intercept" any malloc() call done from "Your application" and route them to "MallocTag library"? The answer is "ELF interposition": if malloctag.so shared library is loaded by the dynamic linker BEFORE the glibc shared library is loaded, then the process image of "Your application" will use the malloc() defined by malloctag instead of the same identical function signature available inside the GNU libc. Check the optimal page https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic for a very in-deep overview of ELF interposition. Focus on the following two statements while reading that page:

  • If a dynamic symbol is defined by multiple components, they don't conflict.
  • For a symbol lookup [...] the definition from the first component wins.

Internals

To achieve the (high level design criterias)[#high-level-design-criteria] the following implementation choices have been made:

  • each per-thread memory stat tree has a limited number of levels, pre-defined at build time
  • each tree level has a "tag" or "category" which is a limited-size string (limit pre-defined at build time)
  • max tree depth (i.e. number of nested nodes) is pre-defined at build time
  • per-thread enable/disable flag
  • per-thread tree of malloc categories
  • per-thread mutex to synchronize the "collector thread" (i.e. the thread using the main "malloc_collect_stats" API) and all other application threads

Example output

malloc-tag profiler can produce output in a machine-friendly JSON format, see e.g. minimal example JSON output or in SVG format thanks to Graphviz DOT utility, see e.g. minimal example SVG output:

minimal_example_svg

From the picture above it becomes obvious that to improve/reduce memory usage, all the allocations accounted against the "minimal" scope and all the code executing in the malloc scope "FuncC" should be improved, since they have the highest self-memory allocation, as emphasized by the darkest red shade.

Profiling a more complex example, involving a simple application spawning 5 secondary pthreads, will produce such kind of graph:

multithread_example_svg

From this picture it should be evident that all the memory allocations happen, regardless of the thread, in the malloc scope named "FuncB" (look at the self memory usage of that node and also at the number of malloc operations!).

How to use

Part 1: instrumenting the code

  1. build&install this project:
git clone https://github.com/f18m/malloc-tag.git
cd malloc-tag
make && make install
  1. add -lmalloc_tag to your C/C++ project linker flags in order to link against malloc-tag library (see caveat about tcmalloc below)

  2. add malloctag initialization as close as possible to the entrypoint of your application, e.g. as first instruction in your main(), using:

#include <malloc_tag.h>

int main() {
  MallocTagEngine::init();
  ...
}
  1. whenever you want a snapshot of the memory profiling results to be written, invoke the API to write results on disk:
MallocTagEngine::write_stats();

This function might be placed at the very end of your main() or any other exit point. In alternative it can be hooked to a signal e.g. SIGUSR1 so that the you will be able to write the statistics whenever you want at runtime.

  1. optional: start by adding a few instances of MallocTagScope to "tag" the parts of your application which you believe are the most memory-hungry portions:
MallocTagScope nestedMallocScope("someInterestingPart");

Part 2: run your application

After rebuilding your application, instrumented with malloc-tag, you can run your application as it runs normally. It may be useful to add to the LD_LIBRARY_PATH env variable the directory where you dropped the "libmalloc_tag.so.1", if that's not a standard path considered by the dynamic linker.

Part 3: analyze the results

Analyzing the results is an activity that can range from "very simple" to "very complex". For small applications the suggested way is to directly graph the results using the powerful Graphviz package and the malloc-tag mtag-json2dot CLI utility to render the .json snapshot files produced by the MallocTagEngine::write_stats_on_disk() API:

pip3 install malloctag-tools
mtag-json2dot --output nice-picture.svg  <malloc-tag-snapshot-file.json>

This produces a nice-picture.svg file that you can open with any suitable viewer. mtag-json2dot utility is the same tool used to produce example pictures like those you can see in the Example Output section.

As you can see from the snippet above, the post-processing tools for malloc-tag snapshots are published on Pypi repository and they're written in Python in order to allow to extend them with flexibility to cover more usecases and more investigations compared to what is described here. Indeed in more complex scenarios/application you will often need to post-process the JSON file containing the memory usage snapshot; please refer to the Python tools README for more information about e.g. the mtag-postprocess tool.

TcMalloc integration

If your C/C++ project is using tcmalloc that's fine. malloc-tag has been tested together with tcmalloc with the caveat that the -lmalloc_tag library must be provided to the linker BEFORE the -ltcmalloc library is provided.

As explained in the Overview section this will work thanks to ELF interposition: the malloc() imeplementation of malloc-tag will be used and will use the tcmalloc malloc() to carry out the actual memory allocation.

Environment variables

Environment var Description
MTAG_STATS_OUTPUT_JSON The relative/full path to the output JSON file written by MallocTagEngine::write_stats(). If empty, no JSON output file will be produced.
MTAG_STATS_OUTPUT_GRAPHVIZ_DOT The relative/full path to the output Graphviz DOT file written by MallocTagEngine::write_stats(). If empty, no DOT output file will be produced.
MTAG_SNAPSHOT_INTERVAL_SEC The time interval between two snapshots written by MallocTagEngine::write_snapshot_if_needed(). The special value zero means "disable snapshotting".
MTAG_SNAPSHOT_OUTPUT_PREFIX_FILE_PATH The filename prefix for snapshots written by MallocTagEngine::write_snapshot_if_needed(). If empty, no snapshot will be written.

Links

Useful references during malloc-tag development:

About GNU libc malloc, tcmalloc, jemalloc:

About memory profiling in general under Linux:

License

Apache 2.0 License

TODO

  • make max_tree_nodes, max_tree_levels configurable via env vars
  • remove the C++ implementation of .dot output generation and keep only the mtag_json2dot tool?

FUTURE

  • explore bpftrace and generally-speaking eBPF-based approaches to memory observability

About

A lighweight, intrusive memory profiler that allows to categorize or "tag" memory allocation inside C/C++ projects

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published