-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add user data collections w/ basic types #213
Conversation
- works for int/float w/ ROOT
- can now use range based loops on collections
include/podio/UserDataCollection.h
Outdated
typedef int intData ; | ||
typedef long longData ; | ||
typedef float floatData ; | ||
typedef double doubleData ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef int intData ; | |
typedef long longData ; | |
typedef float floatData ; | |
typedef double doubleData ; |
These seem to be unnecessary and cause the root problems in the CI (don't ask me why though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Thomas. This seems to do the trick (at least on my centos7 test system) However I do not understand why and more importantly how this can even work, as the branch for the user collection is defined (by the ROOTReader) as vector<doubleData>
and doubleData
is a non existing type w/o these typedefs !???
include/podio/UserDataCollection.h
Outdated
&m_refCollections, // only need to store the ObjectIDs of the referenced objects | ||
&m_vecmem_info } ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&m_refCollections, // only need to store the ObjectIDs of the referenced objects | |
&m_vecmem_info } ; | |
nullptr, | |
nullptr } ; |
Would allow it to not have to keep some unnecessary members for these parts in the class. This should be easily possible now, but I am not entirely sure whether everything that expects this already has the necessary nullptr check in place already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I had in initially - but it crashes in the SIOWriter as the SIOBlocks have:
//---- read ref collections -----
auto* refCols = collBuffers.references;
for( auto* refC : *refCols ){
unsigned size{0};
device.data( size ) ;
refC->resize(size) ;
podio::handlePODDataSIO( device , &((*refC)[0]), size ) ;
}
Should add a check for null pointer here ...
include/podio/UserDataCollection.h
Outdated
class UserDataTypes{ | ||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to put all of this into a function instead of a singleton? In principle everything that is needed is a "map" of typeids to easier to read type names and that could also just be constructed statically (or potentially even static constexpr
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. As this is still WIP I wanted to see how far we can take this approach and wanted to have a potentially extendible map (w/ adding a registerTypeName()
method or similar...
include/podio/UserDataCollection.h
Outdated
const std::map< const std::type_index, const std::string >_typeMap = | ||
{ | ||
{ std::type_index( typeid(int) ), "podio::int" }, | ||
{ std::type_index( typeid(long) ), "podio::long" }, | ||
{ std::type_index( typeid(float) ), "podio::float" }, | ||
{ std::type_index( typeid(double) ),"podio::double"} | ||
} ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How restrictive do we want to be with this map? In principle I think it is a good idea to not be overly permissive, but I also think that a few more fundamental types might be useful (even if the listed ones probably cover > 90% of the use cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open for discussion - this is just an initial set that probably would have to be extended by at least some short
and unsigned
, maybe fixed_size
types. This depends if we can make the mechanism extendible, i.e. such that users can add additional types if needed or not. For SIO this should work easily - with ROOT I am not so sure due to the discionary, etc...
include/podio/UserDataCollection.h
Outdated
|
||
|
||
/// Templated base class for storing std::vectors of basic types as user data in PODIO | ||
template <typename BasicType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This template should be constrained to only accept those types that we actually support via the mapping above. Currently it is possible to do something like
auto& coll = store.create<podio::UserDataCollection<unsigned>>("unsignedColl");
writer.registerForWrite("unsignedColl");
which compiles fine but crashes at some point at run-time. It should be possible to catch this at compile time already either via some static_assert
or enable_if
trickery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agreed, if we decide to only provide a fixed set of types - if not, then of course it cannot be done and we leave failure to runtime...
include/podio/UserDataSIOBlock.h
Outdated
|
||
virtual void read(sio::read_device& device, sio::version_type /*version*/) override { | ||
auto collBuffers = _col->getBuffers(); | ||
if (not _col->isSubsetCollection()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check could be removed for the UserDataCollection
, since it will always return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
|
||
auto& usrInts = store.get<podio::UserDataCollection<int> >("userInts"); | ||
|
||
#if 0 //access via vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these left over from testing, or are they actually required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of: testing - and as example how the underlying vector could be accessed, if needed. Can also remove eventually.
All this is still very much WIP ...
- now ROOT branches are of type vector<basic_type> and can be read w/ TBrowser correctly
src/ROOTWriter.cc
Outdated
std::string valTypeName = coll->getValueTypeName() ; | ||
|
||
auto collClassName = "vector<" + valTypeName + "Data>"; | ||
|
||
auto it = valTypeName.find("podio::User_") ; | ||
if(it == 0 ){ | ||
valTypeName = valTypeName.substr( 12 , 1024 ) ; // rm 'podio_user_' prefix | ||
collClassName = "vector<" + valTypeName +">"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider adding a getDataTypeName
function (or similar) to the podio::CollectionBase
? It would make some of the string juggling easier to do for general use, and give us a customization point where these user classes could be handled more nicely. As far as I can tell the major use cases for getValueTypeName
currently are:
- Keys for the map inside the
SIOBlockFactory
- Creating the
vector<XYZData>
for the root writer and reader
So maybe it could even be a getDataVectorTypeName
alongside the getValueTypeName
(?)
Hi Frank, I think the access to the underlying vector here is very nice! In my opinion the available data types are fine; they can always be expanded if necessary. Thomas' point for a compile time check is still a good one though. |
- simplify rader's and writer's handling of type names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already discussed during the meeting yesterday, I think this is a very nice addition and I like the implementation. I have left one comment that is not only about implementation details regarding the return value of getDataTypeName
. I am not sure what others think there.
From a technical point of view I would then simply try to push as many failures to compile time rather than runtime, since we have a closed set of types that we want to support with this (the BUILTIN_TYPES
sans std::string
and the ALLOWED_FIXED_WIDTH_TYPES
).
/// fully qualified type name of stored POD elements - with namespace | ||
std::string getDataTypeName() const override { return std::string("{{ (class | string ).strip(':')+"Data" }}"); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will always be used like this
const auto bufferClassName = "std::vector<" + coll->getDataTypeName() + ">";
Should we simply make this return the full thing (and potentially give it a name that indicates that)? Or do we foresee any potential use cases where the Data
name alone would be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a sense the fact that a vector<> is used here is an implementation detail of the ROOTReader/Writer and other future implementations (HDF5 ?) might make a different use of the underlying POD data type.
This is why I thought it would be more generic this way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement a compile-time constraint on the usable types it would be possible to do something along the lines of the following: Define a std::tuple
containing all supported types.
using SupportedUserDataTypes = std::tuple<
int,
float,
long,
double
>;
Around that it is then rather straight forward to implement the necessary enable_if
machinery to make compilation fail in case of unsupported types
namespace detail {
/**
* Helper function to check whether a type T is in a std::tuple<Ts...>
*/
template<typename T, typename ...Ts>
constexpr bool inTuple(std::tuple<Ts...>) {
return (std::is_same_v<T, Ts> || ...);
}
/**
* Compile time helper function to check whether the given type is in the list
* of supported types
*/
template<typename T>
constexpr bool isSupported() {
return inTuple<T>(SupportedUserDataTypes{});
}
}
/**
* Alias template to be used to enable template specializations only for the types listed in the
* SupportedUserDataTypes list
*/
template<typename T>
using EnableIfSupportedUserType = std::enable_if_t<detail::isSupported<T>()>;
Usage:
template<typename T
typename = EnableIfSupportedUserType<T>>
// class or function that should only be available if T is in SupportedUserDataTypes
include/podio/UserDataCollection.h
Outdated
template <typename BasicType> | ||
std::string userDataTypeName() { throw std::runtime_error( std::string(" unsupported type for UserDataCollection: ") | ||
+ typeid(BasicType).name() ) ; } | ||
|
||
template <> std::string userDataTypeName<int>() {return "int" ; } | ||
template <> std::string userDataTypeName<float>() {return "float" ; } | ||
template <> std::string userDataTypeName<long>() {return "long" ; } | ||
template <> std::string userDataTypeName<double>(){return "double" ; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be made into a
template<typename BasicType,
typename = EnableIfSupportedUserType<BasicType>>
constexpr const char* userDataTypeName();
template<> constexpr const char* userDataTypeName<int>() { return "int"; }
// ... etc.
without any changes necessary to the rest of the code. To ease the adding of additional types one could introduce a small macro, but still with that we would have to spell out the list of supported times a second time (after the initial std::tuple
).
#define ADD_USER_TYPE(type) template<> constexpr const char* userDataTypeName<type>( return #type; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I just have a few comments that are all more or less about the same thing: Trying to cut down the list of supported types to the minimum while keeping all the currently present functionality.
However, they can also be addressed later IMHO.
include/podio/UserDataCollection.h
Outdated
*/ | ||
using SupportedUserDataTypes = std::tuple< | ||
int, long, float, double, | ||
unsigned int, unsigned, unsigned long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned int, unsigned, unsigned long, | |
unsigned, unsigned long, |
Since unsigned int == unsigned
, I think it can be removed here (which would then also make it slightly more consistent with short
, long
, etc...). However, we are listing more duplicate types here with the fixed width types in any case. Maybe we should find the smallest list necessary and put some static_assert
s in place here to make sure that the expectations are met.
include/podio/UserDataCollection.h
Outdated
PODIO_ADD_USER_TYPE(int) | ||
PODIO_ADD_USER_TYPE(long) | ||
PODIO_ADD_USER_TYPE(float) | ||
PODIO_ADD_USER_TYPE(double) | ||
PODIO_ADD_USER_TYPE(unsigned) | ||
// PODIO_ADD_USER_TYPE(unsigned int) | ||
PODIO_ADD_USER_TYPE(unsigned long) | ||
PODIO_ADD_USER_TYPE(char) | ||
PODIO_ADD_USER_TYPE(short) | ||
PODIO_ADD_USER_TYPE(long long) | ||
PODIO_ADD_USER_TYPE(unsigned long long) | ||
// PODIO_ADD_USER_TYPE(int16_t) | ||
// PODIO_ADD_USER_TYPE(int32_t) | ||
// PODIO_ADD_USER_TYPE(int64_t) | ||
PODIO_ADD_USER_TYPE(uint16_t) | ||
// PODIO_ADD_USER_TYPE(uint32_t) | ||
// PODIO_ADD_USER_TYPE(uint64_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider putting the fixed-width integers into the definitions here, because that would (almost) always give us a name without spaces (this seems to be a requirement for sio? at least there is the sio_name
function that just replaces spaces with _
).
The only exception is unsigned long long
for which
static_assert(std::is_same_v<uint64_t, unsigned long long>);
fails, even though it seems to be 64 bits wide on most architectures. I am not entirely sure how to best handle that in case sio really does not handle spaces in its block names.
#include <vector> | ||
#include <typeindex> | ||
|
||
#define PODIO_ADD_USER_TYPE(type) template<> constexpr const char* userDataTypeName<type>(){ return #type ; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should #undef
this after we have used it to not give the impression that this is an extension point for users. (Even if a user tries to use it they will only run into a compilation error in any case)
src/SIOBlockUserData.cc
Outdated
static SIOBlockUserData<int> _defaultintCollcetionSIOBlock ; | ||
static SIOBlockUserData<long> _defaultlongCollcetionSIOBlock ; | ||
static SIOBlockUserData<float> _defaultfloatCollcetionSIOBlock ; | ||
static SIOBlockUserData<double> _defaultdoubleCollcetionSIOBlock ; | ||
static SIOBlockUserData<unsigned> _defaultunsignedCollcetionSIOBlock ; | ||
static SIOBlockUserData<unsigned int> _defaultunsignedintCollcetionSIOBlock ; | ||
static SIOBlockUserData<unsigned long> _defaultunsignedlongCollcetionSIOBlock ; | ||
static SIOBlockUserData<char> _defaultcharCollcetionSIOBlock ; | ||
static SIOBlockUserData<short> _defaultshortCollcetionSIOBlock ; | ||
static SIOBlockUserData<long long> _defaultlonglongCollcetionSIOBlock ; | ||
static SIOBlockUserData<unsigned long long> _defaultunsignedlonglongCollcetionSIOBlock ; | ||
static SIOBlockUserData<int16_t> _defaultint16_tCollcetionSIOBlock ; | ||
static SIOBlockUserData<int32_t> _defaultint32_tCollcetionSIOBlock ; | ||
static SIOBlockUserData<int64_t> _defaultint64_tCollcetionSIOBlock ; | ||
static SIOBlockUserData<uint16_t> _defaultuint16_tCollcetionSIOBlock ; | ||
static SIOBlockUserData<uint32_t> _defaultuint32_tCollcetionSIOBlock ; | ||
static SIOBlockUserData<uint64_t> _defaultuint64_tCollcetionSIOBlock ; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the macro defining the userDataTypeName
, I think we should be able to restrict ourselves to a smaller list (potentially focusing on the fixed width integer types). Since under the hood they will in any case call the the same userDataTypeName
and register as the same type in the factory, e.g.
static_assert(std::is_same_v<podio::SIOBlockUserData<unsigned>, podio::SIOBlockUserData<uint32_t>>);
works fine. So even if users specify unsigned
it should be enough if we have an uint32_t
specialization.
- leave comments on potentiall eddressing this in more generic way in PODIO
Thanks @tmadlener - added your suggestions. As discussed privately, we might need a more generic treatment of ensuring consistency and size of fundamental types in PODIO eventually ... |
include/podio/UserDataCollection.h
Outdated
using SupportedUserDataTypes = std::tuple< | ||
int, long, float, double, | ||
unsigned int, unsigned, unsigned long, | ||
char, short, long long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you quickly clean up the non-fixed width types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually intentionally left them in, as this is what we currently support in PODIO. I think we should address the issue of 'set of supported types' in PODIO consistently in a dedicate PR (with sufficient documentation). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a new feature we should already start restricted. We can open up easily later in case. The other way round is more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there is no restriction really due to the equivalence of the ususal named integer types and the fixed size ones. But I agree that this makes this feature much clearer and it already does what we probably want to do in all of PODIO ...
- this should be done for PODIO user data as well as discusse in EDM4hep meeting 21.09.21
Sure. Please merge after CI passed. |
BEGINRELEASENOTES
std::vector<basic_type>
ENDRELEASENOTES
This should address most needs for storing user defined data in PODIO in an efficient way without any overhead. In ROOT every user data collection is stored directly in a branch of the given type and can be accessed directly.
No relations to or from the user defined collections are foreseen. Users need to keep track if indices, e.g. by running collections parallel to existing PODIO (EDM4hep) collections.
This PR should work now as an example for possible treatment of additional user data. Open questions: