Skip to content

Commit

Permalink
Added AllocMonitor facility
Browse files Browse the repository at this point in the history
A general system to watch allocations/deallocations
  • Loading branch information
Dr15Jones committed Aug 28, 2023
1 parent f8827c3 commit d1421ac
Show file tree
Hide file tree
Showing 19 changed files with 1,563 additions and 0 deletions.
4 changes: 4 additions & 0 deletions PerfTools/AllocMonitor/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<use name="FWCore/Utilities"/>
<export>
<lib name="1"/>
</export>
76 changes: 76 additions & 0 deletions PerfTools/AllocMonitor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# PerfTools/AllocMonitor Description

## Introduction

This package works with the PerfTools/AllocMonitorPreload package to provide a general facility to watch allocations and deallocations.
This is accomplished by using LD_PRELOAD with libPerfToolsAllocMonitorPreload.so and registering a class inheriting from `AllocMonotorBase`
with `AllocMonitorRegistry`. The preloaded library puts in proxies for the C and C++ allocation methods (and forwards the calls to the
original job methods). These proxies communicate with `AllocMonitorRegistry` which, in turn, call methods of the registered monitors.

## Extending

To add a new monitor, one inherits from `cms::perftools::AllocMonitorBase` and overrides the `allocCalled` and
`deallocCalled` methods.

- `AllocMonitorBase::allocCalled(size_t iRequestedSize, size_t iActualSize)` : `iRequestedSize` is the number of bytes being requested by the allocation call. `iActualSize` is the actual number of bytes returned by the allocator. These can be different because of alignment constraints (e.g. asking for 1 byte but all allocations must be aligned on a particular memory boundary) or internal details of the allocator.

- `AllocMonitorBase::deallocCalled(size_t iActualSize)` : `iActualSize` is the actual size returned when the associated allocation was made. NOTE: the glibc extended interface does not provide a way to find the requested size base on the address returned from an allocation, it only provides the actual size.

When implementing `allocCalled` and `deallocCalled` it is perfectly fine to do allocations/deallocations. The facility
guarantees that those internal allocations will not cause any callbacks to be send to any active monitors.


To add a monitor to the facility, one must access the registry by calling the static method
`cms::perftools::AllocMonitorRegistry::instance()` and then call the member function
`T* createAndRegisterMonitor(ARGS&&... iArgs)`. The function will internally create a monitor of type `T` (being careful
to not cause callbacks during the allocation) and pass the arguments `iArgs` to the constructor.

The monitor is owned by the registry and should not be deleted by any other code. If one needs to control the lifetime
of the monitor, one can call `cms::perftools::AllocMonitorRegistry::deregisterMonitor` to have the monitor removed from
the callback list and be deleted (again, without the deallocation causing any callbacks).

## General usage

To use the facility, one needs to use LD_PRELOAD to load in the memory proxies before the application runs, e.g.
```
LD_PRELOAD=libPerfToolsAllocMonitorPreload.so cmsRun some_config_cfg.py
```

Internally, the program needs to register a monitor with the facility. When using `cmsRun` this can most easily be done
by loading a Service which setups a monitor. If one fails to do the LD_PRELOAD, then when the monitor is registered, the
facility will throw an exception.

It is also possible to use LD_PRELOAD to load another library which auto registers a monitor even before the program
begins. See PerfTools/MaxMemoryPreload for an example.

## Services

### SimpleAllocMonitor
This service registers a monitor when the service is created (after python parsing is finished but before any modules
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the
last plugins to be destroyed by cmsRun). The monitor reports
- Total amount of bytes requested by all allocation calls
- The maximum amount of _used_ (i.e actual size) allocated memory that was in use by the job at one time.
- Number of calls made to allocation functions while the monitor was running.
- Number of calls made to deallocation functions while the monitor was running.
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.


### EventProcessingAllocMonitor
This service registers a monitor at the end of beginJob (after all modules have been loaded and setup) and reports its accumulated information at the beginning of endJob (after the event loop has finished but before any cleanup is done). This can be useful in understanding how memory is being used during the event loop. The monitor reports
- Total amount of bytes requested by all allocation calls during the event loop
- The maximum amount of _used_ (i.e. actual size) allocated memory that was in use in the event loop at one time.
- The amount of _used_ memory allocated during the loop that has yet to be reclaimed by calling deallocation.
- Number of calls made to allocation functions during the event loop.
- Number of calls made to deallocation functions during the event loop.
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.

### HistogrammingAllocMonitor
This service registers a monitor when the service is created (after python parsing is finished but before any modules
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the
last plugins to be destroyed by cmsRun). The monitor histograms the values into bins of number of bytes where each
bin is a power of 2 larger than the previous. The histograms made are
- Amount of bytes requested by all allocation calls
- Amount of bytes actually used by all allocation calls
- Amount of bytes actually returned by all deallocation calls
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job.
50 changes: 50 additions & 0 deletions PerfTools/AllocMonitor/interface/AllocMonitorBase.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#ifndef AllocMonitor_interface_AllocMonitorBase_h
#define AllocMonitor_interface_AllocMonitorBase_h
// -*- C++ -*-
//
// Package: AllocMonitor/interface
// Class : AllocMonitorBase
//
/**\class AllocMonitorBase AllocMonitorBase.h "AllocMonitorBase.h"
Description: Base class for extensions that monitor allocations
Usage:
The class is required to be thread safe as all member functions
will be called concurrently when used in a multi-threaded program.
If allocations are done within the methods, no callbacks will be
generated as the underlying system will temporarily suspend such
calls on the thread running the method.
*/
//
// Original Author: Christopher Jones
// Created: Mon, 21 Aug 2023 14:03:34 GMT
//

// system include files
#include <stddef.h> //size_t

// user include files

// forward declarations

namespace cms::perftools {

class AllocMonitorBase {
public:
AllocMonitorBase();
virtual ~AllocMonitorBase();

AllocMonitorBase(const AllocMonitorBase&) = delete; // stop default
AllocMonitorBase(AllocMonitorBase&&) = delete; // stop default
AllocMonitorBase& operator=(const AllocMonitorBase&) = delete; // stop default
AllocMonitorBase& operator=(AllocMonitorBase&&) = delete; // stop default

// ---------- member functions ---------------------------
virtual void allocCalled(size_t iRequestedSize, size_t iActualSize) = 0;
virtual void deallocCalled(size_t iActualSize) = 0;
};
} // namespace cms::perftools
#endif
140 changes: 140 additions & 0 deletions PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
#ifndef PerfTools_AllocMonitor_AllocMonitorRegistry_h
#define PerfTools_AllocMonitor_AllocMonitorRegistry_h
// -*- C++ -*-
//
// Package: PerfTools/AllocMonitor
// Class : AllocMonitorRegistry
//
/**\class AllocMonitorRegistry AllocMonitorRegistry.h "AllocMonitorRegistry.h"
Description: [one line class summary]
Usage:
<usage>
*/
//
// Original Author: Christopher Jones
// Created: Mon, 21 Aug 2023 14:12:54 GMT
//

// system include files
#include <memory>
#include <vector>
#include <malloc.h>
#include <stdlib.h>

// user include files
#include "AllocMonitorBase.h"

// forward declarations

namespace cms::perftools {
class AllocTester;

class AllocMonitorRegistry {
public:
~AllocMonitorRegistry();

AllocMonitorRegistry(AllocMonitorRegistry&&) = delete; // stop default
AllocMonitorRegistry(const AllocMonitorRegistry&) = delete; // stop default
AllocMonitorRegistry& operator=(const AllocMonitorRegistry&) = delete; // stop default
AllocMonitorRegistry& operator=(AllocMonitorRegistry&&) = delete; // stop default

// ---------- static member functions --------------------
static AllocMonitorRegistry& instance();

// ---------- member functions ---------------------------
template <typename T, typename... ARGS>
T* createAndRegisterMonitor(ARGS&&... iArgs);
void deregisterMonitor(AllocMonitorBase*);

private:
friend void* ::malloc(size_t) noexcept;
friend void* ::calloc(size_t, size_t) noexcept;
friend void* ::realloc(void*, size_t) noexcept;
friend void* ::aligned_alloc(size_t, size_t) noexcept;
friend void ::free(void*) noexcept;

friend void* ::operator new(std::size_t size);
friend void* ::operator new[](std::size_t size);
friend void* ::operator new(std::size_t count, std::align_val_t al);
friend void* ::operator new[](std::size_t count, std::align_val_t al);
friend void* ::operator new(std::size_t count, const std::nothrow_t& tag) noexcept;
friend void* ::operator new[](std::size_t count, const std::nothrow_t& tag) noexcept;
friend void* ::operator new(std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept;
friend void* ::operator new[](std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept;

friend void ::operator delete(void* ptr) noexcept;
friend void ::operator delete[](void* ptr) noexcept;
friend void ::operator delete(void* ptr, std::align_val_t al) noexcept;
friend void ::operator delete[](void* ptr, std::align_val_t al) noexcept;
friend void ::operator delete(void* ptr, std::size_t sz) noexcept;
friend void ::operator delete[](void* ptr, std::size_t sz) noexcept;
friend void ::operator delete(void* ptr, std::size_t sz, std::align_val_t al) noexcept;
friend void ::operator delete[](void* ptr, std::size_t sz, std::align_val_t al) noexcept;
friend void ::operator delete(void* ptr, const std::nothrow_t& tag) noexcept;
friend void ::operator delete[](void* ptr, const std::nothrow_t& tag) noexcept;
friend void ::operator delete(void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept;
friend void ::operator delete[](void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept;

friend class AllocTester;

// ---------- member data --------------------------------
void start();
bool& isRunning();

struct Guard {
explicit Guard(bool& iOriginal) noexcept : address_(&iOriginal), original_(iOriginal) { *address_ = false; }
~Guard() { *address_ = original_; }

bool running() const noexcept { return original_; }

Guard(Guard const&) = delete;
Guard(Guard&&) = delete;
Guard& operator=(Guard const&) = delete;
Guard& operator=(Guard&&) = delete;

bool* address_;
bool original_;
};

Guard makeGuard() { return Guard(isRunning()); }

void allocCalled_(size_t, size_t);
void deallocCalled_(size_t);

template <typename ALLOC, typename ACT>
auto allocCalled(size_t iRequested, ALLOC iAlloc, ACT iGetActual) {
[[maybe_unused]] Guard g = makeGuard();
auto a = iAlloc();
if (g.running()) {
allocCalled_(iRequested, iGetActual(a));
}
return a;
}
template <typename DEALLOC, typename ACT>
void deallocCalled(DEALLOC iDealloc, ACT iGetActual) {
[[maybe_unused]] Guard g = makeGuard();
if (g.running()) {
deallocCalled_(iGetActual());
}
iDealloc();
}

AllocMonitorRegistry();
std::vector<std::unique_ptr<AllocMonitorBase>> monitors_;
};

template <typename T, typename... ARGS>
T* AllocMonitorRegistry::createAndRegisterMonitor(ARGS&&... iArgs) {
[[maybe_unused]] Guard guard = makeGuard();
start();

auto m = std::make_unique<T>(std::forward<ARGS>(iArgs)...);
auto p = m.get();
monitors_.push_back(std::move(m));
return p;
}
} // namespace cms::perftools
#endif
3 changes: 3 additions & 0 deletions PerfTools/AllocMonitor/plugins/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<use name="FWCore/MessageLogger"/>
<use name="FWCore/ServiceRegistry"/>
<use name="PerfTools/AllocMonitor"/>
98 changes: 98 additions & 0 deletions PerfTools/AllocMonitor/plugins/EventProcessingAllocMonitor.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// -*- C++ -*-
//
// Package: PerfTools/AllocMonitor
// Class : EventProcessingAllocMonitor
//
// Implementation:
// [Notes on implementation]
//
// Original Author: Christopher Jones
// Created: Mon, 21 Aug 2023 20:31:57 GMT
//

// system include files
#include <atomic>

// user include files
#include "PerfTools/AllocMonitor/interface/AllocMonitorBase.h"
#include "PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h"
#include "FWCore/ServiceRegistry/interface/ServiceRegistry.h"
#include "FWCore/MessageLogger/interface/MessageLogger.h"
#include "FWCore/ServiceRegistry/interface/ServiceMaker.h"

namespace {
class MonitorAdaptor : public cms::perftools::AllocMonitorBase {
public:
void performanceReport() {
started_.store(false, std::memory_order_release);

auto finalRequested = requested_.load(std::memory_order_acquire);
auto maxActual = maxActual_.load(std::memory_order_acquire);
auto present = presentActual_.load(std::memory_order_acquire);
auto allocs = nAllocations_.load(std::memory_order_acquire);
auto deallocs = nDeallocations_.load(std::memory_order_acquire);

edm::LogSystem("EventProcessingAllocMonitor")
<< "Event Processing Memory Report"
<< "\n total memory requested: " << finalRequested << "\n max memory used: " << maxActual
<< "\n total memory not deallocated: " << present << "\n # allocations calls: " << allocs
<< "\n # deallocations calls: " << deallocs;
}

void start() { started_.store(true, std::memory_order_release); }

private:
void allocCalled(size_t iRequested, size_t iActual) final {
if (not started_.load(std::memory_order_acquire)) {
return;
}
nAllocations_.fetch_add(1, std::memory_order_acq_rel);
requested_.fetch_add(iRequested, std::memory_order_acq_rel);

//returns previous value
auto a = presentActual_.fetch_add(iActual, std::memory_order_acq_rel);
a += iActual;

auto max = maxActual_.load(std::memory_order_relaxed);
while (a > max) {
if (maxActual_.compare_exchange_strong(max, a, std::memory_order_acq_rel)) {
break;
}
}
}
void deallocCalled(size_t iActual) final {
if (not started_.load(std::memory_order_acquire)) {
return;
}
nDeallocations_.fetch_add(1, std::memory_order_acq_rel);
auto present = presentActual_.load(std::memory_order_acquire);
if (present >= iActual) {
presentActual_.fetch_sub(iActual, std::memory_order_acq_rel);
}
}

std::atomic<size_t> requested_ = 0;
std::atomic<size_t> presentActual_ = 0;
std::atomic<size_t> maxActual_ = 0;
std::atomic<size_t> nAllocations_ = 0;
std::atomic<size_t> nDeallocations_ = 0;

std::atomic<bool> started_ = false;
};

} // namespace

class EventProcessingAllocMonitor {
public:
EventProcessingAllocMonitor(edm::ParameterSet const& iPS, edm::ActivityRegistry& iAR) {
auto adaptor = cms::perftools::AllocMonitorRegistry::instance().createAndRegisterMonitor<MonitorAdaptor>();
;
iAR.postBeginJobSignal_.connect([adaptor]() { adaptor->start(); });
iAR.preEndJobSignal_.connect([adaptor]() {
adaptor->performanceReport();
cms::perftools::AllocMonitorRegistry::instance().deregisterMonitor(adaptor);
});
}
};

DEFINE_FWK_SERVICE(EventProcessingAllocMonitor);
Loading

0 comments on commit d1421ac

Please sign in to comment.