Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DSS to better leverage DataLayouts #1958

Merged
merged 1 commit into from
Sep 5, 2024
Merged

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Aug 27, 2024

This PR refactors DSS to avoid using DataLayout internals, and reduces code duplication. We can apply a similar pattern to the cuda code (wip).

This should make fixing #1910 easier.

What this PR does:

  • We pass the datalayouts themselves into the dss function, instead of the underlying array.
  • Swap the field and level order of the loops, and indexing into the datalayouts using the universal spatial index (I, J, F, V, H), which recursively indexes into the field variables via get_struct/set_struct.
  • I'm reusing the already existing dss_transform to transform the data.
  • The only unexpected thing I ran into was the need to define drop_vert_dim in Geometry.UVWVector due to a comment in dss.jl: For DSS of Covariant123Vector, the third component is treated like a scalar and is not transformed. I'm not 100% sure if what I did was correct here

The last bullet point still confuses me because it seems like N variables come in and N - M are defined. Can someone confirm that this is correct?

Also, I should note that I purposefully did not apply changes to other similar kernels, like dss_local_kernel!, because they do not loop over field variables. In all the kernels refactored in this PR, a single thread loops over field variables, so memory access wasn't significantly changed. It's not clear that the same refactoring pattern would perform as well with dss_local_kernel! because, currently, all of the field variables are processed in parallel.

@charleskawczynski charleskawczynski force-pushed the ck/refactor_dss branch 2 times, most recently from 515b92b to f859925 Compare August 29, 2024 19:34
@charleskawczynski
Copy link
Member Author

One benefit of this change, aside from deleting hundreds of lines of code, is that it also strictly and explicitly follows conversion rules (which already exist). Our previous implementation bypasses these checks since it accesses the parent arrays.

@akshaysridhar
Copy link
Member

akshaysridhar commented Aug 30, 2024

@charleskawczynski Even with the modifications to use the existing transform functions, it would be good to have the ability to convert C123 -> UVW -> apply dss op -> untransform to retain 3 component vectors (I'll look over the drop_vert_dim functions), i.e. avoid treating the last component as a scalar quantity. I'm unsure about the consequences of this PR on overall performance. Generally I'm in favour of simplifying these code blocks (especially since it's clear we have substantial duplication, e.g. of the conversion operations, between the cpu and cuda ext files in our code right now). It would be good if @sriharshakandala also chimes in here!

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Sep 3, 2024

@charleskawczynski Even with the modifications to use the existing transform functions, it would be good to have the ability to convert C123 -> UVW -> apply dss op -> untransform to retain 3 component vectors (I'll look over the drop_vert_dim functions), i.e. avoid treating the last component as a scalar quantity.

So, this turned out to be needed because perimeter_data doesn't have storage for the 3-component vectors (this is why drop_vert_dim was needed in the first place). In fact, this PR may have identified a bug in the existing implementation. Actually, perimeter_data does have the expected storage, what I didn't realize was that dss_transform transforms a Covariant12Vector into a UVWVector, so the transform does not preserve the dimensionality of the vector. This very well may be correct, I just didn't expect this result. Since there are no unit tests for dss_transform and dss_untransform, I've opened #1971 to help better document their behavior.

I've reached out to others on slack for additional review.

I'm unsure about the consequences of this PR on overall performance. Generally I'm in favour of simplifying these code blocks (especially since it's clear we have substantial duplication, e.g. of the conversion operations, between the cpu and cuda ext files in our code right now). It would be good if @sriharshakandala also chimes in here!

I think we should be able to get good performance from this.

@charleskawczynski
Copy link
Member Author

Hi @sriharshakandala, I ran the GPU target pipeline, and these are the preliminary results:

main (build)

solve!:: 88.144602 seconds (20.75 M allocations: 474.717 MiB, 0.72% compilation time)
[ Info: sypd: 2.687084644368064
[ Info: wall_time_per_timestep: 91 milliseconds, 763 microseconds

This PR (build)

solve!:: 70.939605 seconds (20.45 M allocations: 472.976 MiB, 1.01% compilation time)
[ Info: sypd: 3.3397802737331332
[ Info: wall_time_per_timestep: 73 milliseconds, 829 microseconds

I don't think that this implementation is necessarily responsible for the speedup, but I also don't think that it's slowed anything down. Also, I think that we can actually make it faster by avoiding the div calls in the kernels, and just rely on tuned cartesian indexing (like the copyto! and stencil kernels).

At this point, I think I'm ready to merge if people are happy with the changes.

@charleskawczynski
Copy link
Member Author

Hi @akshaysridhar, I've looked into this a bit more, and it seems that DSS does have storage for the vertical component, so the 3-component case should work just fine.

The behavior that I didn't expect was coming from:

@inline function dss_transform(
arg::Geometry.AxisVector,
local_geometry::Geometry.LocalGeometry,
weight,
)
ax = axes(local_geometry.∂x∂ξ, 1)
axfrom = axes(arg, 1)
# TODO: make this consistent for 2D / 3D
# 2D domain axis (1,2) horizontal curl
if ax isa Geometry.UVAxis && (
axfrom isa Geometry.Covariant3Axis ||
axfrom isa Geometry.Contravariant3Axis
)
return arg * weight
end
# 2D domain axis (1,3) curl
if ax isa Geometry.UWAxis && (
axfrom isa Geometry.Covariant2Axis ||
axfrom isa Geometry.Contravariant2Axis
)
return arg * weight
end
# workaround for using a Covariant12Vector in a UW space
if ax isa Geometry.UWAxis && axfrom isa Geometry.Covariant12Axis
# return Geometry.transform(Geometry.UVWAxis(), arg, local_geometry)
u₁, v = Geometry.components(arg)
uw_vector = Geometry.project(
Geometry.UWAxis(),
Geometry.Covariant13Vector(u₁, zero(u₁)),
local_geometry,
)
u, w = Geometry.components(uw_vector)
return Geometry.UVWVector(u, v, w) * weight
end
Geometry.project(ax, arg, local_geometry) * weight
end

which transforms a Covariant12Vector into a UVWVector. So, either that method needs to be updated to preserve the dimensionality, or just keep drop_vert_dim. For now, I think it's probably easiest to just keep drop_vert_dim.

@sriharshakandala
Copy link
Member

It looks good to me overall. I am not sure if drop_dims is required. The recent PR treats vector transformations in an normal way (no special treatment for vertical velocity component). I am assuming there are no performance penalties from using the AxisTensors conversion framework, which we had when we originally put this code!

@sriharshakandala
Copy link
Member

We should also remove the comments indicating that we are treating the vertical component in a special way, since this is no longer true!

@charleskawczynski
Copy link
Member Author

Alright, moving forward with this.

Remove unused indices and methods

Fix some conversions
@charleskawczynski charleskawczynski merged commit add3b1b into main Sep 5, 2024
21 of 22 checks passed
@charleskawczynski charleskawczynski deleted the ck/refactor_dss branch September 5, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants