Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an internal API which deploys off of JACCPreferences.backend #86

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

kmp5VT
Copy link

@kmp5VT kmp5VT commented May 3, 2024

This is rewritten to utilize Julia traits written in a way that replicates SimpleTraits.jl. The traits in this way will only allow one to utilize one backend at a time which is still managed by Preferences. In this way JACC does not overload the Base.Array name but is still able to construct an array generically using an array function.
One issue with this design is when one lauches JACC JACC.JAT = JACCArrayType{Array} regardless of the preference. Then when one loads the proper backend to match preferences JAT will be reset to reflect the desired preference.

@kmp5VT kmp5VT marked this pull request as draft May 3, 2024 18:20
@kmp5VT
Copy link
Author

kmp5VT commented May 8, 2024

With this implementation, JACC will not work if the preference doesn't match loaded libraries. For example

julia> using JACC
julia> JACC.JACCPreferences.backend
  "amdgpu"
julia> JACC.zeros(3)
ERROR: The backend amdgpu is either not recognized or the associated package is not loaded.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] arraytype(::Val{:amdgpu})
   @ JACC ~/.julia/dev/JACC.jl/src/helper.jl:15
 [3] arraytype()
   @ JACC ~/.julia/dev/JACC.jl/src/helper.jl:13
 [4] zeros(::Int64)
   @ JACC ~/.julia/dev/JACC.jl/src/array.jl:3
 [5] top-level scope
   @ REPL[969]:1

Also the code can now be precompiled without issue

@kmp5VT
Copy link
Author

kmp5VT commented May 9, 2024

All AMD, CUDA and CPU tests now pass. I cannot check oneapi because I don't yet have access to an intel GPU

@kmp5VT kmp5VT marked this pull request as ready for review May 9, 2024 00:06
@williamfgc
Copy link
Collaborator

@kmp5VT please see my previous comment


function JACC.parallel_for(N::I, f::F, x...) where {I <: Integer, F <: Function}
using JACC: JACCArrayType
function JACC.parallel_for(::JACCArrayType{<:ROCArray}, N::Integer, f::Function, x...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to keep the API as simple as possible for domain scientists. Please see #62 for a similar discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am confused. The outward facing API is exactly the same as the current implementation its just these two functions

function parallel_for(N, f::Function, x...)

function parallel_reduce(N, f::Function, x...)

Internally JACC will create a struct based on the JACC.JACCPreference.backend variable and will only allow users to utilize a single Array backend type just like the current implementation. The only real outward facing change for the current users is JACC.Array -> JACC.array as to not export and overwrite Base.Array which could cause issues in other libraries.

Copy link
Collaborator

@williamfgc williamfgc May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is breaking the current API. What I am trying to understand is the value added for this approach, we are adding internal boilerplate, but the end result in similar to the current implementation unless I am missing something. As for pre-compilation, we intentionally want to prevent compiling on non-supported platforms as we had a number of reported issues in the past on our systems. Multiple dispatch is a JIT "runtime concept" we are trying to avoid at all with weakdependencies.

Copy link
Collaborator

@williamfgc williamfgc May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See for example my comment here v0.0.3 wouldn't even compile when JACC.jl is consumed from an application and would just hang. Adding more applications help reveal a lot of these type of issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to not export and overwrite Base.Array which could cause issues in other libraries.

We are not overwriting Base.Array , JACC.Array is just an alias to existing Array types. Also, it's not clear what issues this could cause unless JACC.Array is misused outside its current scope. We can always implement a JACC.ToHost(JACC.Array)::Base.Array function to guarantee a host array.

Copy link
Author

@kmp5VT kmp5VT May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide an example where this implementation breaks the current API outside of migrating Array to an array function?

I have found some issues with the current main branch

julia> using JACC
julia> JACC.JACCPreferences.backend
"cuda"
julia> JACC.Array
Core.Array

If you don't have the correct backend loaded then JACC will still run with "threads" because JACC.Array = Base.Array. This would be unexpected behavior. The system I have written here forces the component matching JACC.JACCPreferences.backend to be loaded for the user facing API to work.

julia> using JACC
julia> begin
@show JACC.JACCPreferences.backend
    function seq_axpy(N, alpha, x, y)
        Threads.@threads for i in 1:N
            @inbounds x[i] += alpha * y[i]
        end
    end

    function axpy(i, alpha, x, y)
        if i <= length(x)
            @inbounds x[i] += alpha * y[i]
        end
    end

    N = 10
    # Generate random vectors x and y of length N for the interval [0, 100]
    x = round.(rand(Float32, N) * 100)
    y = round.(rand(Float32, N) * 100)
    alpha = 2.5

    x_host_JACC = Array(x)
    y_host_JACC = Array(y)
    JACC.parallel_for(N, axpy, alpha, x_host_JACC, y_host_JACC)
end
JACC.JACCPreferences.backend = "cuda"

ERROR: The backend cuda is either not recognized or the associated package is not loaded.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] arraytype(::Val{:cuda})
   @ JACC ~/.julia/dev/JACC.jl/src/helper.jl:15
 [3] JACC_BACKEND_TYPE()
   @ JACC ~/.julia/dev/JACC.jl/src/JACC.jl:14
 [4] parallel_for(::Int64, ::Function, ::Float64, ::Vector{Float32}, ::Vararg{Vector{Float32}})
   @ JACC ~/.julia/dev/JACC.jl/src/JACC.jl:17
 [5] top-level scope
   @ ~/.julia/dev/JACC.jl/test/tests_threads.jl:44

Currently the alias to array makes a global variable called Array and exports it which effectively overwrites the Base.Array . This can easily cause an issue outside of JACC

julia> using JACC
julia> begin 
       a = Array{Float32}(undef, 2,3)
       fill!(a, 10)
       a .+= 1
       end
WARNING: both JACC and Base export "Array"; uses of it in module Main must be qualified
ERROR: UndefVarError: `Array` not defined
Stacktrace:
 [1] top-level scope
   @ REPL[2]:2

The system here proposed does not need to alias Array because the alias is effectively stored as a compiled parameter inside the module.

Another issue is that if a different backend introduced after loading JACC, the definition of JACC.Array changes

julia> using JACC
using[ Info: Precompiling JACC [0979c8fe-16a4-4796-9b82-89a9f10403ea]
[ Info: Skipping precompilation since __precompile__(false). Importing JACC [0979c8fe-16a4-4796-9b82-89a9f10403ea].
julia> using CUDA
JACC.Precompiling JACCCUDA
  ? JACC → JACCCUDA
[ Info: Precompiling JACCCUDA [2fb45ac4-0993-536e-a71a-0b5526d52098]
┌ Warning: Module JACC with build ID ffffffff-ffff-ffff-000a-3724e6101b14 is missing from the cache.
│ This may mean JACC [0979c8fe-16a4-4796-9b82-89a9f10403ea] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:1948
[ Info: Skipping precompilation since __precompile__(false). Importing JACCCUDA [2fb45ac4-0993-536e-a71a-0b5526d52098].
julia> JACC.JACCPreferences.backend
"threads"
julia> JACC.Array
CuArray
julia> using AMDGPU
┌ Warning: Device libraries are unavailable, device intrinsics will be disabled.
└ @ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/AMDGPU.jl:186
julia> JACC.Array
ROCArray

This is not possible with the implementation proposed because the forward facing API effectively chooses the backend based solely on the JACCPreferences.backend variable.

I understand that you don't want to mix backends and that is specifically why I designed the code to internally construct the JACCArrayType based on the Symbol(JACCPreferences.backend) variable. To enforce this we should only document the two functions in JACC.jl and suggest using the JACC.JACCPreferences.set_backend variable to change the backend arraytype.

In regards to the JIT and multiple dispatch, I am still learning Julia so I could be wrong but I am not sure that is a concern based on the design here. The function JACC_BACKEND_TYPE() is compiler infer- able and defined at compile time

julia> @code_warntype JACC.JACC_BACKEND_TYPE()
MethodInstance for JACC.JACC_BACKEND_TYPE()
  from JACC_BACKEND_TYPE() @ JACC ~/.julia/dev/JACC.jl/src/JACC.jl:13
Arguments
  #self#::Core.Const(JACC.JACC_BACKEND_TYPE)
Body::JACC.JACCArrayType{Array}
1 ─ %1 = JACC.JACCArrayType::Core.Const(JACC.JACCArrayType)
│   %2 = JACC.JACCPreferences.backend::Core.Const("threads")
│   %3 = JACC.Symbol(%2)::Core.Const(:threads)
│   %4 = JACC.Val(%3)::Core.Const(Val{:threads}())
│   %5 = JACC.arraytype(%4)::Core.Const(Array)
│   %6 = Core.apply_type(%1, %5)::Core.Const(JACC.JACCArrayType{Array})
│   %7 = (%6)()::Core.Const(JACC.JACCArrayType{Array}())
└──      return %7
julia> @code_typed JACC.JACC_BACKEND_TYPE()
CodeInfo(
1 ─     return $(QuoteNode(JACC.JACCArrayType{Array}()))
) => JACC.JACCArrayType{Array}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide an example where this implementation breaks the current API outside of migrating Array to an array function?

I think this is self-explanatory.

Currently the alias to array makes a global variable called Array and exports it which effectively overwrites the Base.Array . This can easily cause an issue outside of JACC

using X is certainly not recommended, but import X due to name clashing, I think this is different from overriding Base.Array.

This is not possible with the implementation proposed because the forward facing API effectively chooses the backend based solely on the JACCPreferences.backend variable.

In regards to the JIT and multiple dispatch, I am still learning Julia so I could be wrong but I am not sure that is a concern based on the design here. The function JACC_BACKEND_TYPE() is compiler infer- able and defined at compile time

Packages still need to be downloaded and precompiled. That's where we saw the reported issues on our systems. Hence why we are being very conservative as explained in our last call. Even more with relatively new Julia version, features and rapidly evolving stack for AMD and Intel GPUs.

I just think the motivation and problems tackled for our systems by the current API and implementation are very different from those in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamfgc I do not understand your systems as the code patterns here work on my clusters without issue and all of the unittests in the module pass. I would love to understand how this PR breaks on your systems if you could run the code in this PR and give me examples that would be appreciated. I specifically tailored the design this PR from our conversations on Zoom. Further, I connected with @pedrovalerolara to collaborate with you on code design with the potential of integrating JACC into the ITensors. I do not have a specific use cases for your code and have not used any of your software previously.

Copy link
Collaborator

@williamfgc williamfgc May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a good starting point for collaboration. Feel free to integrate JACC with ITensors and understand the trade-offs and the motivation of the current design before changing all internals. As discussed in the call, we target DOE HPC systems, which are not simple to deploy and use, especially with new stuff like the Julia ecosystem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative idea: can we do struct Array{T,N} end and then have Array{T,N}(...) = JACC.arraytype(){T,N}(...)? This would preserve the idea that JACC.Array can become another array type, without messing with modifying module globals at runtime.

@kmp5VT kmp5VT changed the title Refactors the internals of JACC Create an internal API which deploys off of JACCPreferences.backend May 9, 2024
@jpsamaroo
Copy link

As a non-maintainer of JACC, but a maintainer of packages which solve similar problems (Dagger and DaggerGPU), I'm very much in favor of this PR! I think moving away from making JACC.Array automagically change its type is a necessary direction for JACC to go in.

@williamfgc
Copy link
Collaborator

FYI, @kmp5VT @jpsamaroo we are meeting to discuss these aspects w/ people in the community. You should have gotten a calendar invite. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants