Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Smart STK Field Types #1209

Merged
merged 41 commits into from
Sep 27, 2023
Merged

Conversation

psakievich
Copy link
Contributor

@psakievich psakievich commented Sep 14, 2023

Overview

This feature is to create a SmartFieldRef<MEMSPACE, SCOPE> type that will automatically call the appropriate field modifies and syncs based on the memspace (i.e HOST or DEVICE) and the scopes (READ, WRITE and READ_WRITE).

Technical Details

The crux of this method comes from my discussions with the Kokkos team about how data is generated on device in a Kokkos::parallel_for. It turns out that the copy constructor and thus it's associated destructor are called on host for the lambda capture.

The entire lambda/functor is then essentially memcopied to device, and so the copy ctor and dtor don't actually get called on device. So my thought is we can embed the host functions for field sync and modify inside the these functions, and exploit them as part of every lambda capture.

This moves sync_to_* and modified_on_* to the closest possible usage for every field that is wrapped in one of these SmartFieldRef objects, and ensures that the sync-then-modify programming model is satisfied.

State of Tests on Device

The current tests work on CUDA, and I am working on adding more. Still need to write the host tests.

TODO (Utterly too verbose)

There are a few comments in there on things I would like to handle that this doesn't currently handle. Here is the list to look for in the code:

  • READ types are const correct to the point where a user is unable to modify field values through operator() and get(). I also want a clear error message when this is violated. Currently I am thinking partial template specialization and making non-const return types throw a message instead of trying to access the fields.
  • Determine a way to make this compatible with older style bucket loop syntax that is still used in the code
  • Add this to the FieldManager class as the main way of retrieving fields going forward.
  • Get the template parameters in practice down to 2 (as shown in the overview) and find a way to deduce stk::mesh::NgpFieldBase::value_type rather than having to specify this at each usage. Staying at <SPACE, SCOPE, T> is not a deal breaker, but it bugs me. I would love to only have to specify the SPACE and SCOPE.

Along those lines, an alternative formulation could look like:

enum class Scope{
  READ, WRITE, READ_WRITE;
};

template <typename T>
DeviceFieldRef(Scope scope, stk::mesh::NgpField<T>& field);

template <typename T>
HostFieldRef(Scope scope, stk::mesh::HostField<T>& field);

// Sample Usage
auto fieldRef = DeviceFieldRef(Scope::READ, field);
// vs the current
auto fieldRef = SmartFieldRef<DEVICE, READ, double>(field);

This would allow for automatic template deduction of the stk::mesh::NgpFieldBase::value_type, but then the only way I can think to ensure the READ scopes are appropriately const do it would be with an if statement in get and operator() which feels like a terrible idea.

Follow on PR would be to liberally apply this in the code.

@psakievich
Copy link
Contributor Author

Also I'm not really a fan of Scope as the term to indicate the Read/write contract for the object. Thinking of switching this to something else. Maybe Mode to follow python? Seems like that would be more appropriate.

Copy link
Contributor

@alanw0 alanw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like it, I think it would reduce the programmer burden for most use-cases.
There could be some corner cases where the local lifetime includes multiple GPU kernels, and perhaps you want or don't want syncs/modifies for each kernel. But those cases could be handled the old way.
I do like 'MODE' better than SCOPE, as you note.

@alanw0
Copy link
Contributor

alanw0 commented Sep 14, 2023

I'm a little worried about the potential cost of wrapping/forwarding functions like get() etc. Perhaps no runtime cost, but perhaps that increases the cost? I guess we would need to measure performance of that.

@psakievich
Copy link
Contributor Author

@alanw0 yes that is a good point. We should probably in-line them no matter what just to be safe.

@psakievich
Copy link
Contributor Author

I think this looks pretty good.
I think I would advocate making it construct from FieldBase instead of Field<T>. Of course then you would need to use a reinterpret_cast to get the stk::mesh::field_data calls to work.

@alanw0 I've looked into that. I am a little torn. Since this is c++ the T has to go somewhere. Personally I like the idea of not needing it in the creation, but I would rather do it once than in all the access points.

Is there another reason to use FieldBase? or has your opinion changes since I went from Field<T> to using the whole thing as a template param?

@alanw0
Copy link
Contributor

alanw0 commented Sep 19, 2023

Is there another reason to use FieldBase? or has your opinion changes since I went from Field<T> to using the whole thing as a template param?

I think your current changes fix it. I was thinking that FieldBase would take care of the difference between ScalarField, ScalarIntField, VectorField, etc. But your change does that also.

@psakievich psakievich marked this pull request as ready for review September 21, 2023 17:42
@psakievich
Copy link
Contributor Author

psakievich commented Sep 26, 2023

Eh I'm seeing this in the latest device build:

/scratch/psakiev/spack-manager/environments/smart/nalu-wind/unit_tests/UnitTestNgpMesh1.C(157): warning: calling a __host__ function("sierra::nalu::SmartField< ::stk::mesh::DeviceField<double,  ::stk::mesh::DefaultNgpFieldSyncDebugger> ,  ::tags::DEVICE,  ::tags::READ_WRITE, void> ::SmartField") from a __host__ __device__ function("test_ngp_mesh_field_values( ::sierra::nalu::FieldManager &, const  ::stk::mesh::BulkData &,  ::stk::mesh::Field<double,  ::stk::mesh::Cartesian3d, void, void, void, void, void, void>  *,  ::stk::mesh::Field<double,  ::stk::mesh::SimpleArrayTag, void, void, void, void, void, void>  *)::[lambda(const  ::Kokkos::Impl::CudaTeamMember &) (instance 1)]::[lambda(const  ::Kokkos::Impl::CudaTeamMember &) (instance 1)]") is not allowed

/scratch/psakiev/spack-manager/environments/smart/nalu-wind/unit_tests/UnitTestNgpMesh1.C(157): warning: calling a __host__ function("sierra::nalu::SmartField< ::stk::mesh::DeviceField<double,  ::stk::mesh::DefaultNgpFieldSyncDebugger> ,  ::tags::DEVICE,  ::tags::READ_WRITE, void> ::~SmartField") from a __host__ __device__ function("test_ngp_mesh_field_values( ::sierra::nalu::FieldManager &, const  ::stk::mesh::BulkData &,  ::stk::mesh::Field<double,  ::stk::mesh::Cartesian3d, void, void, void, void, void, void>  *,  ::stk::mesh::Field<double,  ::stk::mesh::SimpleArrayTag, void, void, void, void, void, void>  *)::[lambda(const  ::Kokkos::Impl::CudaTeamMember &) (instance 1)]::~[lambda(const  ::Kokkos::Impl::CudaTeamMember &) (instance 1)]") is not allowed

Considering refactoring and doing three separate classes for the memspace rather than SFINAE'ing the ctor and dtor to suppress this warning. I'm not really sure what is special about this case vs the base unit tests I created. Perhaps I just missed the warnings for those in the sea of outputs.

@psakievich psakievich changed the title Feature: Smart STK Field References Feature: Smart STK Field Types Sep 27, 2023
@psakievich
Copy link
Contributor Author

I've run the tests on crusher and the snl cuda dev machines. I think we are all good to go with this. @alanw0 @overfelt @rcknaus any more comments?

Copy link
Contributor

@alanw0 alanw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it looks good. It is at least ready for further testing in nalu-wind.
I'm still slightly worried about wrapping the potentially tight-loop methods like get and others, but I think this is worth putting in and see how it does.

@psakievich psakievich merged commit 73c36ff into Exawind:master Sep 27, 2023
3 checks passed
@psakievich psakievich deleted the f/smartptr branch September 27, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants