-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
A general system to watch allocations/deallocations
- Loading branch information
Showing
19 changed files
with
1,563 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
<use name="FWCore/Utilities"/> | ||
<export> | ||
<lib name="1"/> | ||
</export> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# PerfTools/AllocMonitor Description | ||
|
||
## Introduction | ||
|
||
This package works with the PerfTools/AllocMonitorPreload package to provide a general facility to watch allocations and deallocations. | ||
This is accomplished by using LD_PRELOAD with libPerfToolsAllocMonitorPreload.so and registering a class inheriting from `AllocMonotorBase` | ||
with `AllocMonitorRegistry`. The preloaded library puts in proxies for the C and C++ allocation methods (and forwards the calls to the | ||
original job methods). These proxies communicate with `AllocMonitorRegistry` which, in turn, call methods of the registered monitors. | ||
|
||
## Extending | ||
|
||
To add a new monitor, one inherits from `cms::perftools::AllocMonitorBase` and overrides the `allocCalled` and | ||
`deallocCalled` methods. | ||
|
||
- `AllocMonitorBase::allocCalled(size_t iRequestedSize, size_t iActualSize)` : `iRequestedSize` is the number of bytes being requested by the allocation call. `iActualSize` is the actual number of bytes returned by the allocator. These can be different because of alignment constraints (e.g. asking for 1 byte but all allocations must be aligned on a particular memory boundary) or internal details of the allocator. | ||
|
||
- `AllocMonitorBase::deallocCalled(size_t iActualSize)` : `iActualSize` is the actual size returned when the associated allocation was made. NOTE: the glibc extended interface does not provide a way to find the requested size base on the address returned from an allocation, it only provides the actual size. | ||
|
||
When implementing `allocCalled` and `deallocCalled` it is perfectly fine to do allocations/deallocations. The facility | ||
guarantees that those internal allocations will not cause any callbacks to be send to any active monitors. | ||
|
||
|
||
To add a monitor to the facility, one must access the registry by calling the static method | ||
`cms::perftools::AllocMonitorRegistry::instance()` and then call the member function | ||
`T* createAndRegisterMonitor(ARGS&&... iArgs)`. The function will internally create a monitor of type `T` (being careful | ||
to not cause callbacks during the allocation) and pass the arguments `iArgs` to the constructor. | ||
|
||
The monitor is owned by the registry and should not be deleted by any other code. If one needs to control the lifetime | ||
of the monitor, one can call `cms::perftools::AllocMonitorRegistry::deregisterMonitor` to have the monitor removed from | ||
the callback list and be deleted (again, without the deallocation causing any callbacks). | ||
|
||
## General usage | ||
|
||
To use the facility, one needs to use LD_PRELOAD to load in the memory proxies before the application runs, e.g. | ||
``` | ||
LD_PRELOAD=libPerfToolsAllocMonitorPreload.so cmsRun some_config_cfg.py | ||
``` | ||
|
||
Internally, the program needs to register a monitor with the facility. When using `cmsRun` this can most easily be done | ||
by loading a Service which setups a monitor. If one fails to do the LD_PRELOAD, then when the monitor is registered, the | ||
facility will throw an exception. | ||
|
||
It is also possible to use LD_PRELOAD to load another library which auto registers a monitor even before the program | ||
begins. See PerfTools/MaxMemoryPreload for an example. | ||
|
||
## Services | ||
|
||
### SimpleAllocMonitor | ||
This service registers a monitor when the service is created (after python parsing is finished but before any modules | ||
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the | ||
last plugins to be destroyed by cmsRun). The monitor reports | ||
- Total amount of bytes requested by all allocation calls | ||
- The maximum amount of _used_ (i.e actual size) allocated memory that was in use by the job at one time. | ||
- Number of calls made to allocation functions while the monitor was running. | ||
- Number of calls made to deallocation functions while the monitor was running. | ||
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job. | ||
|
||
|
||
### EventProcessingAllocMonitor | ||
This service registers a monitor at the end of beginJob (after all modules have been loaded and setup) and reports its accumulated information at the beginning of endJob (after the event loop has finished but before any cleanup is done). This can be useful in understanding how memory is being used during the event loop. The monitor reports | ||
- Total amount of bytes requested by all allocation calls during the event loop | ||
- The maximum amount of _used_ (i.e. actual size) allocated memory that was in use in the event loop at one time. | ||
- The amount of _used_ memory allocated during the loop that has yet to be reclaimed by calling deallocation. | ||
- Number of calls made to allocation functions during the event loop. | ||
- Number of calls made to deallocation functions during the event loop. | ||
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job. | ||
|
||
### HistogrammingAllocMonitor | ||
This service registers a monitor when the service is created (after python parsing is finished but before any modules | ||
have been loaded into cmsRun) and reports its accumulated information when the service is destroyed (services are the | ||
last plugins to be destroyed by cmsRun). The monitor histograms the values into bins of number of bytes where each | ||
bin is a power of 2 larger than the previous. The histograms made are | ||
- Amount of bytes requested by all allocation calls | ||
- Amount of bytes actually used by all allocation calls | ||
- Amount of bytes actually returned by all deallocation calls | ||
This service is multi-thread safe. Note that when run multi-threaded the maximum reported value will vary from job to job. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#ifndef AllocMonitor_interface_AllocMonitorBase_h | ||
#define AllocMonitor_interface_AllocMonitorBase_h | ||
// -*- C++ -*- | ||
// | ||
// Package: AllocMonitor/interface | ||
// Class : AllocMonitorBase | ||
// | ||
/**\class AllocMonitorBase AllocMonitorBase.h "AllocMonitorBase.h" | ||
Description: Base class for extensions that monitor allocations | ||
Usage: | ||
The class is required to be thread safe as all member functions | ||
will be called concurrently when used in a multi-threaded program. | ||
If allocations are done within the methods, no callbacks will be | ||
generated as the underlying system will temporarily suspend such | ||
calls on the thread running the method. | ||
*/ | ||
// | ||
// Original Author: Christopher Jones | ||
// Created: Mon, 21 Aug 2023 14:03:34 GMT | ||
// | ||
|
||
// system include files | ||
#include <stddef.h> //size_t | ||
|
||
// user include files | ||
|
||
// forward declarations | ||
|
||
namespace cms::perftools { | ||
|
||
class AllocMonitorBase { | ||
public: | ||
AllocMonitorBase(); | ||
virtual ~AllocMonitorBase(); | ||
|
||
AllocMonitorBase(const AllocMonitorBase&) = delete; // stop default | ||
AllocMonitorBase(AllocMonitorBase&&) = delete; // stop default | ||
AllocMonitorBase& operator=(const AllocMonitorBase&) = delete; // stop default | ||
AllocMonitorBase& operator=(AllocMonitorBase&&) = delete; // stop default | ||
|
||
// ---------- member functions --------------------------- | ||
virtual void allocCalled(size_t iRequestedSize, size_t iActualSize) = 0; | ||
virtual void deallocCalled(size_t iActualSize) = 0; | ||
}; | ||
} // namespace cms::perftools | ||
#endif |
140 changes: 140 additions & 0 deletions
140
PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
#ifndef PerfTools_AllocMonitor_AllocMonitorRegistry_h | ||
#define PerfTools_AllocMonitor_AllocMonitorRegistry_h | ||
// -*- C++ -*- | ||
// | ||
// Package: PerfTools/AllocMonitor | ||
// Class : AllocMonitorRegistry | ||
// | ||
/**\class AllocMonitorRegistry AllocMonitorRegistry.h "AllocMonitorRegistry.h" | ||
Description: [one line class summary] | ||
Usage: | ||
<usage> | ||
*/ | ||
// | ||
// Original Author: Christopher Jones | ||
// Created: Mon, 21 Aug 2023 14:12:54 GMT | ||
// | ||
|
||
// system include files | ||
#include <memory> | ||
#include <vector> | ||
#include <malloc.h> | ||
#include <stdlib.h> | ||
|
||
// user include files | ||
#include "AllocMonitorBase.h" | ||
|
||
// forward declarations | ||
|
||
namespace cms::perftools { | ||
class AllocTester; | ||
|
||
class AllocMonitorRegistry { | ||
public: | ||
~AllocMonitorRegistry(); | ||
|
||
AllocMonitorRegistry(AllocMonitorRegistry&&) = delete; // stop default | ||
AllocMonitorRegistry(const AllocMonitorRegistry&) = delete; // stop default | ||
AllocMonitorRegistry& operator=(const AllocMonitorRegistry&) = delete; // stop default | ||
AllocMonitorRegistry& operator=(AllocMonitorRegistry&&) = delete; // stop default | ||
|
||
// ---------- static member functions -------------------- | ||
static AllocMonitorRegistry& instance(); | ||
|
||
// ---------- member functions --------------------------- | ||
template <typename T, typename... ARGS> | ||
T* createAndRegisterMonitor(ARGS&&... iArgs); | ||
void deregisterMonitor(AllocMonitorBase*); | ||
|
||
private: | ||
friend void* ::malloc(size_t) noexcept; | ||
friend void* ::calloc(size_t, size_t) noexcept; | ||
friend void* ::realloc(void*, size_t) noexcept; | ||
friend void* ::aligned_alloc(size_t, size_t) noexcept; | ||
friend void ::free(void*) noexcept; | ||
|
||
friend void* ::operator new(std::size_t size); | ||
friend void* ::operator new[](std::size_t size); | ||
friend void* ::operator new(std::size_t count, std::align_val_t al); | ||
friend void* ::operator new[](std::size_t count, std::align_val_t al); | ||
friend void* ::operator new(std::size_t count, const std::nothrow_t& tag) noexcept; | ||
friend void* ::operator new[](std::size_t count, const std::nothrow_t& tag) noexcept; | ||
friend void* ::operator new(std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept; | ||
friend void* ::operator new[](std::size_t count, std::align_val_t al, const std::nothrow_t&) noexcept; | ||
|
||
friend void ::operator delete(void* ptr) noexcept; | ||
friend void ::operator delete[](void* ptr) noexcept; | ||
friend void ::operator delete(void* ptr, std::align_val_t al) noexcept; | ||
friend void ::operator delete[](void* ptr, std::align_val_t al) noexcept; | ||
friend void ::operator delete(void* ptr, std::size_t sz) noexcept; | ||
friend void ::operator delete[](void* ptr, std::size_t sz) noexcept; | ||
friend void ::operator delete(void* ptr, std::size_t sz, std::align_val_t al) noexcept; | ||
friend void ::operator delete[](void* ptr, std::size_t sz, std::align_val_t al) noexcept; | ||
friend void ::operator delete(void* ptr, const std::nothrow_t& tag) noexcept; | ||
friend void ::operator delete[](void* ptr, const std::nothrow_t& tag) noexcept; | ||
friend void ::operator delete(void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept; | ||
friend void ::operator delete[](void* ptr, std::align_val_t al, const std::nothrow_t& tag) noexcept; | ||
|
||
friend class AllocTester; | ||
|
||
// ---------- member data -------------------------------- | ||
void start(); | ||
bool& isRunning(); | ||
|
||
struct Guard { | ||
explicit Guard(bool& iOriginal) noexcept : address_(&iOriginal), original_(iOriginal) { *address_ = false; } | ||
~Guard() { *address_ = original_; } | ||
|
||
bool running() const noexcept { return original_; } | ||
|
||
Guard(Guard const&) = delete; | ||
Guard(Guard&&) = delete; | ||
Guard& operator=(Guard const&) = delete; | ||
Guard& operator=(Guard&&) = delete; | ||
|
||
bool* address_; | ||
bool original_; | ||
}; | ||
|
||
Guard makeGuard() { return Guard(isRunning()); } | ||
|
||
void allocCalled_(size_t, size_t); | ||
void deallocCalled_(size_t); | ||
|
||
template <typename ALLOC, typename ACT> | ||
auto allocCalled(size_t iRequested, ALLOC iAlloc, ACT iGetActual) { | ||
[[maybe_unused]] Guard g = makeGuard(); | ||
auto a = iAlloc(); | ||
if (g.running()) { | ||
allocCalled_(iRequested, iGetActual(a)); | ||
} | ||
return a; | ||
} | ||
template <typename DEALLOC, typename ACT> | ||
void deallocCalled(DEALLOC iDealloc, ACT iGetActual) { | ||
[[maybe_unused]] Guard g = makeGuard(); | ||
if (g.running()) { | ||
deallocCalled_(iGetActual()); | ||
} | ||
iDealloc(); | ||
} | ||
|
||
AllocMonitorRegistry(); | ||
std::vector<std::unique_ptr<AllocMonitorBase>> monitors_; | ||
}; | ||
|
||
template <typename T, typename... ARGS> | ||
T* AllocMonitorRegistry::createAndRegisterMonitor(ARGS&&... iArgs) { | ||
[[maybe_unused]] Guard guard = makeGuard(); | ||
start(); | ||
|
||
auto m = std::make_unique<T>(std::forward<ARGS>(iArgs)...); | ||
auto p = m.get(); | ||
monitors_.push_back(std::move(m)); | ||
return p; | ||
} | ||
} // namespace cms::perftools | ||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
<use name="FWCore/MessageLogger"/> | ||
<use name="FWCore/ServiceRegistry"/> | ||
<use name="PerfTools/AllocMonitor"/> |
98 changes: 98 additions & 0 deletions
98
PerfTools/AllocMonitor/plugins/EventProcessingAllocMonitor.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
// -*- C++ -*- | ||
// | ||
// Package: PerfTools/AllocMonitor | ||
// Class : EventProcessingAllocMonitor | ||
// | ||
// Implementation: | ||
// [Notes on implementation] | ||
// | ||
// Original Author: Christopher Jones | ||
// Created: Mon, 21 Aug 2023 20:31:57 GMT | ||
// | ||
|
||
// system include files | ||
#include <atomic> | ||
|
||
// user include files | ||
#include "PerfTools/AllocMonitor/interface/AllocMonitorBase.h" | ||
#include "PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h" | ||
#include "FWCore/ServiceRegistry/interface/ServiceRegistry.h" | ||
#include "FWCore/MessageLogger/interface/MessageLogger.h" | ||
#include "FWCore/ServiceRegistry/interface/ServiceMaker.h" | ||
|
||
namespace { | ||
class MonitorAdaptor : public cms::perftools::AllocMonitorBase { | ||
public: | ||
void performanceReport() { | ||
started_.store(false, std::memory_order_release); | ||
|
||
auto finalRequested = requested_.load(std::memory_order_acquire); | ||
auto maxActual = maxActual_.load(std::memory_order_acquire); | ||
auto present = presentActual_.load(std::memory_order_acquire); | ||
auto allocs = nAllocations_.load(std::memory_order_acquire); | ||
auto deallocs = nDeallocations_.load(std::memory_order_acquire); | ||
|
||
edm::LogSystem("EventProcessingAllocMonitor") | ||
<< "Event Processing Memory Report" | ||
<< "\n total memory requested: " << finalRequested << "\n max memory used: " << maxActual | ||
<< "\n total memory not deallocated: " << present << "\n # allocations calls: " << allocs | ||
<< "\n # deallocations calls: " << deallocs; | ||
} | ||
|
||
void start() { started_.store(true, std::memory_order_release); } | ||
|
||
private: | ||
void allocCalled(size_t iRequested, size_t iActual) final { | ||
if (not started_.load(std::memory_order_acquire)) { | ||
return; | ||
} | ||
nAllocations_.fetch_add(1, std::memory_order_acq_rel); | ||
requested_.fetch_add(iRequested, std::memory_order_acq_rel); | ||
|
||
//returns previous value | ||
auto a = presentActual_.fetch_add(iActual, std::memory_order_acq_rel); | ||
a += iActual; | ||
|
||
auto max = maxActual_.load(std::memory_order_relaxed); | ||
while (a > max) { | ||
if (maxActual_.compare_exchange_strong(max, a, std::memory_order_acq_rel)) { | ||
break; | ||
} | ||
} | ||
} | ||
void deallocCalled(size_t iActual) final { | ||
if (not started_.load(std::memory_order_acquire)) { | ||
return; | ||
} | ||
nDeallocations_.fetch_add(1, std::memory_order_acq_rel); | ||
auto present = presentActual_.load(std::memory_order_acquire); | ||
if (present >= iActual) { | ||
presentActual_.fetch_sub(iActual, std::memory_order_acq_rel); | ||
} | ||
} | ||
|
||
std::atomic<size_t> requested_ = 0; | ||
std::atomic<size_t> presentActual_ = 0; | ||
std::atomic<size_t> maxActual_ = 0; | ||
std::atomic<size_t> nAllocations_ = 0; | ||
std::atomic<size_t> nDeallocations_ = 0; | ||
|
||
std::atomic<bool> started_ = false; | ||
}; | ||
|
||
} // namespace | ||
|
||
class EventProcessingAllocMonitor { | ||
public: | ||
EventProcessingAllocMonitor(edm::ParameterSet const& iPS, edm::ActivityRegistry& iAR) { | ||
auto adaptor = cms::perftools::AllocMonitorRegistry::instance().createAndRegisterMonitor<MonitorAdaptor>(); | ||
; | ||
iAR.postBeginJobSignal_.connect([adaptor]() { adaptor->start(); }); | ||
iAR.preEndJobSignal_.connect([adaptor]() { | ||
adaptor->performanceReport(); | ||
cms::perftools::AllocMonitorRegistry::instance().deregisterMonitor(adaptor); | ||
}); | ||
} | ||
}; | ||
|
||
DEFINE_FWK_SERVICE(EventProcessingAllocMonitor); |
Oops, something went wrong.