Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable Data Formats for Pixel Track Reconstruction #40465

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CUDADataFormats/SiPixelCluster/BuildFile.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
<use name="rootcore"/>
<use name="CUDADataFormats/Common"/>
<use name="DataFormats/Common"/>
<use name="DataFormats/SoATemplate" source_only="1"/>
<use name="HeterogeneousCore/CUDAUtilities"/>

<export>
Expand Down
60 changes: 22 additions & 38 deletions CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,34 @@
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"

#include "DataFormats/SoATemplate/interface/SoALayout.h"
nothingface0 marked this conversation as resolved.
Show resolved Hide resolved
#include "CUDADataFormats/Common/interface/PortableDeviceCollection.h"

#include <cuda_runtime.h>

class SiPixelClustersCUDA {
GENERATE_SOA_LAYOUT(SiPixelClustersCUDALayout,
SOA_COLUMN(uint32_t, moduleStart),
SOA_COLUMN(uint32_t, clusInModule),
SOA_COLUMN(uint32_t, moduleId),
SOA_COLUMN(uint32_t, clusModuleStart))

using SiPixelClustersCUDASoA = SiPixelClustersCUDALayout<>;
using SiPixelClustersCUDASOAView = SiPixelClustersCUDALayout<>::View;
using SiPixelClustersCUDASOAConstView = SiPixelClustersCUDALayout<>::ConstView;

// TODO: The class is created via inheritance of the PortableDeviceCollection.
// This is generally discouraged, and should be done via composition, i.e.,
// by adding a public class attribute like:
// cms::cuda::Portabledevicecollection<SiPixelClustersCUDALayout<>> collection;
// See: https://github.com/cms-sw/cmssw/pull/40465#discussion_r1067364306
class SiPixelClustersCUDA : public cms::cuda::PortableDeviceCollection<SiPixelClustersCUDALayout<>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use the PortableCollection via composition rather than inheritance (e.g. their relationship looks more like "has a" than "is a").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea has been passed down to from you to us via @AdrianoDee. Should it be addressed in this PR? As you can see, it will not be a trivial change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code would be intended for longer term, I'd insist on addressing it in this PR. But given the temporal nature of these classes (being a stepping stone towards Alpaka-based implementation), and that all these data formats are transient, maybe the inheritance approach could be tolerated. I'd anyway suggest to add a comment for each of these classes that they should be using composition instead, to be done for the Alpaka versions in the near future, in case anyone would use these as an example in the mean time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of inheriting from a PortableCollection is that it provides direct access to the "scalars" and "columns" of the underlying SoA.
e.g. in RecoLocalTracker/SiPixelRecHits/plugins/PixelRecHitGPUKernel.cu:

    TrackingRecHitSoADevice<TrackerTraits> hits_d(
        nHits, clusters_d.offsetBPIX2(), cpeParams, clusters_d->clusModuleStart(), stream);

clusters_d->clusModuleStart() is valid because SiPixelClustersCUDA inherits the ConstView const* operator->() const operator from PortableDeviceCollection.

This was done for convenience, especially in the migration of code from the old approach, that was using e.g. clusters_d.clusModuleStart().

If we decide against inheritance, then SiPixelClustersCUDA would need to replicate most of the PortableCollection data members: view/const_view, buffer, etc. or provide access to the PortableCollection itself, resulting in code like clusters_d.collection()->clusModuleStart() or clusters_d.collection().const_view().clusModuleStart() -- which IMHO seems a bit too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the inheritance makes some things easy. Nevertheless, a general guideline for class design is to prefer composition over inheritance when the latter is not necessary. There are lot's of material about that around the internet, e.g. here is the take of the C++ Core Guidelines https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-domain.

I personally see Portable*Collection conceptually similar to std::vector, and inheriting from std::vector is generally a bad idea.

In principle the SiPixelClustersCUDA could define its own operator->() function that would delegate the call to the View.

Although one could also ask if the Portable*Collection should give access to the columns, or only to the buffer and View.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the discussion about inheritance vs composition revolves around the use of virtual functions, pure virtual functions, and virtual destructors.
But those caveats do not apply here: we do not have virtual functions, and the derived types are the only ones that are going to be used.

Although one could also ask if the Portable*Collection should give access to the columns, or only to the buffer and View.

Technically, the Portable*Collection does give access only to the buffer and view.
Its * and -> operators return a View& or View*, and access to the columns is done by that.

Copy link
Contributor

@fwyzard fwyzard Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle the SiPixelClustersCUDA could define its own operator->() function that would delegate the call to the View.

Yes, I agree that this should be simple enough.

Though I think it would have to define also the view/const_view and buffer/const_buffer methods - i.e. the whole interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, SiPixelClustersCUDA is a decorator of the Portable*Collection: it leaves the whole interface available to the user, while just adding extra features. I believe inheritance is appropriate. We would have to write trivial (and transparent) wrappers otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the discussion about inheritance vs composition revolves around the use of virtual functions, pure virtual functions, and virtual destructors.
But those caveats do not apply here: we do not have virtual functions, and the derived types are the only ones that are going to be used.

They kind of do, but in a different way. A class hierarchy whose members behave differently by overloading virtual functions is clearly a use case that necessitates inheritance. For the case where there are no virtual functions (or other reasons that would necessitate inheritance), the C++ core guidelines recommend

Do not use inheritance when simply having a data member will do.

For example, public inheritance allows code like below, even if it behaves incorrectly (and thus it adds cost for code review to spot such uses)

std::unique_ptr<cms::cuda::PortableDeviceCollection<SiPixelClustersCUDALayout<>> foo = std::make_unique<SiPixelClustersCUDA>();

Another example of potentially unexpected behavior could be code like

// overload for generic PortableDeviceCollection
template <typename T>
auto copy(cms::cuda::PortableDevicecollection<T> const& collDev, cudaStream_t stream) {
  cms::cuda::PortableHostCollection<T> collHost{collDev->metadata().size(), stream};
  cudaCheck(cudaMemcpyAsync(collHost.buffer().get(), collDev.buffer().get(), collDev.bufferSize(), cudaMemcpyDefault, stream);
  return collHost;
}
// oops, forgot to overload SiPixelClustersCUDA

SiPixelClustersCUDA objDevice;
auto objHost = copy(objDevice, stream);

Shortcuts in a limited set of code that is fully in one's control is kind-of cheap to tolerate (if the structure turns out to be a bad choice, changing the code should be fairly straightforward). But this case looks like we would be setting a pattern that could eventually spread to tens to hundreds of classes. I think we should be careful in such a situation, and keep in mind that users tend to find all kinds of loopholes in library design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different question, though: there is no version that inherits from PortableHostCollection.
Is that because (in the CUDA code) we never copied this collection as-is back to the host ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we go back directly to legacy formats through the hits.

public:
SiPixelClustersCUDA() = default;
explicit SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream);
~SiPixelClustersCUDA() = default;

SiPixelClustersCUDA(const SiPixelClustersCUDA &) = delete;
SiPixelClustersCUDA &operator=(const SiPixelClustersCUDA &) = delete;
explicit SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream)
: PortableDeviceCollection<SiPixelClustersCUDALayout<>>(maxModules + 1, stream) {}

SiPixelClustersCUDA(SiPixelClustersCUDA &&) = default;
SiPixelClustersCUDA &operator=(SiPixelClustersCUDA &&) = default;

Expand All @@ -26,41 +44,7 @@ class SiPixelClustersCUDA {
uint32_t nClusters() const { return nClusters_h; }
int32_t offsetBPIX2() const { return offsetBPIX2_h; }

uint32_t *moduleStart() { return moduleStart_d.get(); }
uint32_t *clusInModule() { return clusInModule_d.get(); }
uint32_t *moduleId() { return moduleId_d.get(); }
uint32_t *clusModuleStart() { return clusModuleStart_d.get(); }

uint32_t const *moduleStart() const { return moduleStart_d.get(); }
uint32_t const *clusInModule() const { return clusInModule_d.get(); }
uint32_t const *moduleId() const { return moduleId_d.get(); }
uint32_t const *clusModuleStart() const { return clusModuleStart_d.get(); }

class SiPixelClustersCUDASOAView {
public:
__device__ __forceinline__ uint32_t moduleStart(int i) const { return __ldg(moduleStart_ + i); }
__device__ __forceinline__ uint32_t clusInModule(int i) const { return __ldg(clusInModule_ + i); }
__device__ __forceinline__ uint32_t moduleId(int i) const { return __ldg(moduleId_ + i); }
__device__ __forceinline__ uint32_t clusModuleStart(int i) const { return __ldg(clusModuleStart_ + i); }

uint32_t const *moduleStart_;
uint32_t const *clusInModule_;
uint32_t const *moduleId_;
uint32_t const *clusModuleStart_;
};

SiPixelClustersCUDASOAView const *view() const { return view_d.get(); }

private:
cms::cuda::device::unique_ptr<uint32_t[]> moduleStart_d; // index of the first pixel of each module
cms::cuda::device::unique_ptr<uint32_t[]> clusInModule_d; // number of clusters found in each module
cms::cuda::device::unique_ptr<uint32_t[]> moduleId_d; // module id of each module

// originally from rechits
cms::cuda::device::unique_ptr<uint32_t[]> clusModuleStart_d; // index of the first cluster of each module

cms::cuda::device::unique_ptr<SiPixelClustersCUDASOAView> view_d; // "me" pointer

uint32_t nClusters_h = 0;
int32_t offsetBPIX2_h = 0;
};
Expand Down
19 changes: 0 additions & 19 deletions CUDADataFormats/SiPixelCluster/src/SiPixelClustersCUDA.cc

This file was deleted.

1 change: 1 addition & 0 deletions CUDADataFormats/SiPixelDigi/BuildFile.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
<use name="CUDADataFormats/Common"/>
<use name="DataFormats/Common"/>
<use name="DataFormats/SiPixelRawData"/>
<use name="DataFormats/SoATemplate" source_only="1"/>
<use name="HeterogeneousCore/CUDAUtilities"/>

<export>
Expand Down
39 changes: 22 additions & 17 deletions CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDA.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,32 @@
#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"
#include "CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDASOAView.h"

class SiPixelDigisCUDA {
#include "CUDADataFormats/Common/interface/PortableDeviceCollection.h"
#include "DataFormats/SoATemplate/interface/SoALayout.h"

GENERATE_SOA_LAYOUT(SiPixelDigisSoALayout,
SOA_COLUMN(int32_t, clus),
SOA_COLUMN(uint32_t, pdigi),
SOA_COLUMN(uint32_t, rawIdArr),
SOA_COLUMN(uint16_t, adc),
SOA_COLUMN(uint16_t, xx),
SOA_COLUMN(uint16_t, yy),
SOA_COLUMN(uint16_t, moduleId))

using SiPixelDigisCUDASOA = SiPixelDigisSoALayout<>;
using SiPixelDigisCUDASOAView = SiPixelDigisCUDASOA::View;
using SiPixelDigisCUDASOAConstView = SiPixelDigisCUDASOA::ConstView;

// TODO: The class is created via inheritance of the PortableDeviceCollection.
// This is generally discouraged, and should be done via composition.
// See: https://github.com/cms-sw/cmssw/pull/40465#discussion_r1067364306
class SiPixelDigisCUDA : public cms::cuda::PortableDeviceCollection<SiPixelDigisSoALayout<>> {
public:
using StoreType = uint16_t;
SiPixelDigisCUDA() = default;
explicit SiPixelDigisCUDA(size_t maxFedWords, cudaStream_t stream);
explicit SiPixelDigisCUDA(size_t maxFedWords, cudaStream_t stream)
: PortableDeviceCollection<SiPixelDigisSoALayout<>>(maxFedWords + 1, stream) {}
~SiPixelDigisCUDA() = default;

SiPixelDigisCUDA(const SiPixelDigisCUDA &) = delete;
SiPixelDigisCUDA &operator=(const SiPixelDigisCUDA &) = delete;
SiPixelDigisCUDA(SiPixelDigisCUDA &&) = default;
SiPixelDigisCUDA &operator=(SiPixelDigisCUDA &&) = default;

Expand All @@ -28,17 +43,7 @@ class SiPixelDigisCUDA {
uint32_t nModules() const { return nModules_h; }
uint32_t nDigis() const { return nDigis_h; }

cms::cuda::host::unique_ptr<StoreType[]> copyAllToHostAsync(cudaStream_t stream) const;

SiPixelDigisCUDASOAView view() { return m_view; }
SiPixelDigisCUDASOAView const view() const { return m_view; }

private:
// These are consumed by downstream device code
cms::cuda::device::unique_ptr<StoreType[]> m_store;

SiPixelDigisCUDASOAView m_view;

uint32_t nModules_h = 0;
uint32_t nDigis_h = 0;
};
Expand Down
112 changes: 0 additions & 112 deletions CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDASOAView.h

This file was deleted.

29 changes: 0 additions & 29 deletions CUDADataFormats/SiPixelDigi/src/SiPixelDigisCUDA.cc

This file was deleted.

1 change: 1 addition & 0 deletions CUDADataFormats/Track/BuildFile.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
<use name="rootcore"/>
<use name="CUDADataFormats/Common"/>
<use name="DataFormats/Common"/>
<use name="DataFormats/SoATemplate" source_only="1"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
<use name="eigen"/>
<export>
Expand Down
50 changes: 50 additions & 0 deletions CUDADataFormats/Track/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Track CUDA Data Formats

`CUDADataFormat`s meant to be used on Host (CPU) or Device (CUDA GPU) for
storing information about `Track`s created during the Pixel-local Reconstruction
chain. It stores data in an SoA manner. It combines the data contained in the
deprecated `TrackSoAHeterogeneousT` and `TrajectoryStateSoAT` classes.

The host format is inheriting from `CUDADataFormats/Common/interface/PortableHostCollection.h`,
while the device format is inheriting from `CUDADataFormats/Common/interface/PortableDeviceCollection.h`

Both formats use the same SoA Layout (`TrackSoAHeterogeneousLayout`) which is generated
via the `GENERATE_SOA_LAYOUT` macro in the `PixelTrackUtilities.h` file.

## Notes

-`hitIndices` and `detIndices`, instances of `HitContainer`, have been added into the
layout as `SOA_SCALAR`s, meaning that they manage their own data independently from the SoA
`Layout`. This could be improved in the future, if `HitContainer` (aka a `OneToManyAssoc` of fixed size)
is replaced, but there don't seem to be any conflicts in including it in the `Layout` like this.
- Host and Device classes should **not** be created via inheritance, as they're done here,
but via composition. See [this discussion](https://github.com/cms-sw/cmssw/pull/40465#discussion_r1066039309).

## TrackSoAHeterogeneousHost

The version of the data format to be used for storing `Track` data on the CPU.
Instances of this class are to be used for:

- Having a place to copy data to host from device, via `cudaMemcpy`, or
- Running host-side algorithms using data stored in an SoA manner.

## TrackSoAHeterogeneousDevice

The version of the data format to be used for storing `Track` data on the GPU.

Instances of `TrackSoAHeterogeneousDevice` are to be created on host and be
used on device only. To do so, the instance's `view()` method is to be called
to pass a `View` to any kernel launched. Accessing data from the `view()` is not
possible on the host side.

## Utilities

`PixelTrackUtilities.h` contains a collection of methods which were originally
defined as class methods inside either `TrackSoAHeterogeneousT` and `TrajectoryStateSoAT`
which have been adapted to operate on `View` instances, so that they are callable
from within `__global__` kernels, on both CPU and CPU.

## Use case

See `test/TrackSoAHeterogeneous_test.cpp` for a simple example of instantiation,
processing and copying from device to host.
11 changes: 0 additions & 11 deletions CUDADataFormats/Track/interface/PixelTrackHeterogeneous.h

This file was deleted.

Loading