Refactor DSS to better leverage DataLayouts #1958

charleskawczynski · 2024-08-27T15:51:38Z

This PR refactors DSS to avoid using DataLayout internals, and reduces code duplication. We can apply a similar pattern to the cuda code (wip).

This should make fixing #1910 easier.

What this PR does:

We pass the datalayouts themselves into the dss function, instead of the underlying array.
Swap the field and level order of the loops, and indexing into the datalayouts using the universal spatial index (I, J, F, V, H), which recursively indexes into the field variables via get_struct/set_struct.
I'm reusing the already existing dss_transform to transform the data.
The only unexpected thing I ran into was the need to define drop_vert_dim in Geometry.UVWVector due to a comment in dss.jl: For DSS of Covariant123Vector, the third component is treated like a scalar and is not transformed. I'm not 100% sure if what I did was correct here

The last bullet point still confuses me because it seems like N variables come in and N - M are defined. Can someone confirm that this is correct?

Also, I should note that I purposefully did not apply changes to other similar kernels, like dss_local_kernel!, because they do not loop over field variables. In all the kernels refactored in this PR, a single thread loops over field variables, so memory access wasn't significantly changed. It's not clear that the same refactoring pattern would perform as well with dss_local_kernel! because, currently, all of the field variables are processed in parallel.

src/Topologies/dss.jl

charleskawczynski · 2024-08-29T19:38:22Z

One benefit of this change, aside from deleting hundreds of lines of code, is that it also strictly and explicitly follows conversion rules (which already exist). Our previous implementation bypasses these checks since it accesses the parent arrays.

akshaysridhar · 2024-08-30T18:20:36Z

@charleskawczynski Even with the modifications to use the existing transform functions, it would be good to have the ability to convert C123 -> UVW -> apply dss op -> untransform to retain 3 component vectors (I'll look over the drop_vert_dim functions), i.e. avoid treating the last component as a scalar quantity. I'm unsure about the consequences of this PR on overall performance. Generally I'm in favour of simplifying these code blocks (especially since it's clear we have substantial duplication, e.g. of the conversion operations, between the cpu and cuda ext files in our code right now). It would be good if @sriharshakandala also chimes in here!

charleskawczynski · 2024-09-03T16:27:20Z

@charleskawczynski Even with the modifications to use the existing transform functions, it would be good to have the ability to convert C123 -> UVW -> apply dss op -> untransform to retain 3 component vectors (I'll look over the drop_vert_dim functions), i.e. avoid treating the last component as a scalar quantity.

So, this turned out to be needed because perimeter_data doesn't have storage for the 3-component vectors (this is why drop_vert_dim was needed in the first place). In fact, this PR may have identified a bug in the existing implementation. Actually, perimeter_data does have the expected storage, what I didn't realize was that dss_transform transforms a Covariant12Vector into a UVWVector, so the transform does not preserve the dimensionality of the vector. This very well may be correct, I just didn't expect this result. Since there are no unit tests for dss_transform and dss_untransform, I've opened #1971 to help better document their behavior.

I've reached out to others on slack for additional review.

I'm unsure about the consequences of this PR on overall performance. Generally I'm in favour of simplifying these code blocks (especially since it's clear we have substantial duplication, e.g. of the conversion operations, between the cpu and cuda ext files in our code right now). It would be good if @sriharshakandala also chimes in here!

I think we should be able to get good performance from this.

charleskawczynski · 2024-09-04T14:45:04Z

Hi @sriharshakandala, I ran the GPU target pipeline, and these are the preliminary results:

main (build)

solve!:: 88.144602 seconds (20.75 M allocations: 474.717 MiB, 0.72% compilation time)
[ Info: sypd: 2.687084644368064
[ Info: wall_time_per_timestep: 91 milliseconds, 763 microseconds

This PR (build)

solve!:: 70.939605 seconds (20.45 M allocations: 472.976 MiB, 1.01% compilation time)
[ Info: sypd: 3.3397802737331332
[ Info: wall_time_per_timestep: 73 milliseconds, 829 microseconds

I don't think that this implementation is necessarily responsible for the speedup, but I also don't think that it's slowed anything down. Also, I think that we can actually make it faster by avoiding the div calls in the kernels, and just rely on tuned cartesian indexing (like the copyto! and stencil kernels).

At this point, I think I'm ready to merge if people are happy with the changes.

charleskawczynski · 2024-09-04T15:00:32Z

Hi @akshaysridhar, I've looked into this a bit more, and it seems that DSS does have storage for the vertical component, so the 3-component case should work just fine.

The behavior that I didn't expect was coming from:

ClimaCore.jl/src/Topologies/dss_transform.jl

Lines 90 to 125 in f36251c

    
           @inline function dss_transform( 
        
               arg::Geometry.AxisVector, 
        
               local_geometry::Geometry.LocalGeometry, 
        
               weight, 
        
           ) 
        
               ax = axes(local_geometry.∂x∂ξ, 1) 
        
               axfrom = axes(arg, 1) 
        
               # TODO: make this consistent for 2D / 3D 
        
               # 2D domain axis (1,2) horizontal curl 
        
               if ax isa Geometry.UVAxis && ( 
        
                   axfrom isa Geometry.Covariant3Axis || 
        
                   axfrom isa Geometry.Contravariant3Axis 
        
               ) 
        
                   return arg * weight 
        
               end 
        
               # 2D domain axis (1,3) curl 
        
               if ax isa Geometry.UWAxis && ( 
        
                   axfrom isa Geometry.Covariant2Axis || 
        
                   axfrom isa Geometry.Contravariant2Axis 
        
               ) 
        
                   return arg * weight 
        
               end 
        
               # workaround for using a Covariant12Vector in a UW space 
        
               if ax isa Geometry.UWAxis && axfrom isa Geometry.Covariant12Axis 
        
                   # return Geometry.transform(Geometry.UVWAxis(), arg, local_geometry) 
        
                   u₁, v = Geometry.components(arg) 
        
                   uw_vector = Geometry.project( 
        
                       Geometry.UWAxis(), 
        
                       Geometry.Covariant13Vector(u₁, zero(u₁)), 
        
                       local_geometry, 
        
                   ) 
        
                   u, w = Geometry.components(uw_vector) 
        
                   return Geometry.UVWVector(u, v, w) * weight 
        
               end 
        
               Geometry.project(ax, arg, local_geometry) * weight 
        
           end

which transforms a Covariant12Vector into a UVWVector. So, either that method needs to be updated to preserve the dimensionality, or just keep drop_vert_dim. For now, I think it's probably easiest to just keep drop_vert_dim.

sriharshakandala · 2024-09-05T15:03:11Z

It looks good to me overall. I am not sure if drop_dims is required. The recent PR treats vector transformations in an normal way (no special treatment for vertical velocity component). I am assuming there are no performance penalties from using the AxisTensors conversion framework, which we had when we originally put this code!

sriharshakandala · 2024-09-05T15:04:02Z

We should also remove the comments indicating that we are treating the vertical component in a special way, since this is no longer true!

charleskawczynski · 2024-09-05T15:17:15Z

Alright, moving forward with this.

Remove unused indices and methods Fix some conversions

charleskawczynski added the refactor label Aug 27, 2024

charleskawczynski force-pushed the ck/refactor_dss branch 9 times, most recently from 5e3f3ed to 82c9e28 Compare August 29, 2024 15:18

charleskawczynski commented Aug 29, 2024

View reviewed changes

src/Topologies/dss.jl Show resolved Hide resolved

charleskawczynski force-pushed the ck/refactor_dss branch 2 times, most recently from 515b92b to f859925 Compare August 29, 2024 19:34

charleskawczynski marked this pull request as ready for review August 29, 2024 19:50

charleskawczynski requested review from dennisYatunin, Sbozzolo, akshaysridhar and sriharshakandala August 29, 2024 19:51

charleskawczynski force-pushed the ck/refactor_dss branch from f859925 to a46a660 Compare September 4, 2024 13:29

charleskawczynski force-pushed the ck/refactor_dss branch from a46a660 to a2bc514 Compare September 4, 2024 19:29

charleskawczynski mentioned this pull request Sep 5, 2024

Add dss_transform unit test #1971

Merged

sriharshakandala approved these changes Sep 5, 2024

View reviewed changes

charleskawczynski force-pushed the ck/refactor_dss branch from a2bc514 to 907bb8f Compare September 5, 2024 15:17

charleskawczynski enabled auto-merge September 5, 2024 15:17

charleskawczynski disabled auto-merge September 5, 2024 15:17

Refactor DSS to better leverage DataLayouts

580394f

Remove unused indices and methods Fix some conversions

charleskawczynski force-pushed the ck/refactor_dss branch from 907bb8f to 580394f Compare September 5, 2024 15:19

charleskawczynski enabled auto-merge September 5, 2024 15:19

charleskawczynski merged commit add3b1b into main Sep 5, 2024
21 of 22 checks passed

charleskawczynski deleted the ck/refactor_dss branch September 5, 2024 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor DSS to better leverage DataLayouts #1958

Refactor DSS to better leverage DataLayouts #1958

charleskawczynski commented Aug 27, 2024 •

edited

Loading

charleskawczynski commented Aug 29, 2024

akshaysridhar commented Aug 30, 2024 •

edited

Loading

charleskawczynski commented Sep 3, 2024 •

edited

Loading

charleskawczynski commented Sep 4, 2024

charleskawczynski commented Sep 4, 2024

sriharshakandala commented Sep 5, 2024

sriharshakandala commented Sep 5, 2024

charleskawczynski commented Sep 5, 2024

Refactor DSS to better leverage DataLayouts #1958

Refactor DSS to better leverage DataLayouts #1958

Conversation

charleskawczynski commented Aug 27, 2024 • edited Loading

charleskawczynski commented Aug 29, 2024

akshaysridhar commented Aug 30, 2024 • edited Loading

charleskawczynski commented Sep 3, 2024 • edited Loading

charleskawczynski commented Sep 4, 2024

main (build)

This PR (build)

charleskawczynski commented Sep 4, 2024

sriharshakandala commented Sep 5, 2024

sriharshakandala commented Sep 5, 2024

charleskawczynski commented Sep 5, 2024

charleskawczynski commented Aug 27, 2024 •

edited

Loading

akshaysridhar commented Aug 30, 2024 •

edited

Loading

charleskawczynski commented Sep 3, 2024 •

edited

Loading