Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SDDMM example #674

Merged
merged 4 commits into from
May 14, 2024
Merged

Add SDDMM example #674

merged 4 commits into from
May 14, 2024

Conversation

mtsokol
Copy link
Collaborator

@mtsokol mtsokol commented May 8, 2024

Hi @hameerabbasi,

This PR adds SDDMM example and upgrades Finch to the latest version.

[UPDATED 14.05.2024]
For my machine, running:

python examples/sddmm_example.py

gives:

Finch
Took 8.787564675013224 s.

Numba
Took 22.904020706812542 s.

SciPy
Took 22.59452811876933 s.

@mtsokol mtsokol self-assigned this May 8, 2024
@mtsokol mtsokol requested a review from hameerabbasi May 8, 2024 11:41
Copy link

github-actions bot commented May 8, 2024

Test Results

5 923 tests  ±0   5 892 ✅ ±0   9m 24s ⏱️ + 2m 33s
    1 suites ±0      31 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 0f52367. ± Comparison against base commit 79b9d71.

This pull request skips 1 and un-skips 1 tests.
sparse.numba_backend.tests.test_compressed ‑ test_reductions_float16[i8-None-sum-kwargs0]
sparse.numba_backend.tests.test_compressed ‑ test_reductions_float16[f8-None-sum-kwargs0]

♻️ This comment has been updated with latest results.

@mtsokol
Copy link
Collaborator Author

mtsokol commented May 8, 2024

I think density could be increased to 0.0001 so we have 100 non-zeros (more realistic?) - I get same performance.

@mtsokol mtsokol force-pushed the sddmm branch 2 times, most recently from 741704b to b63f7c5 Compare May 8, 2024 11:53
Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two final changes then this is ready.

examples/sddmm_example.py Outdated Show resolved Hide resolved
examples/sddmm_example.py Outdated Show resolved Hide resolved
@hameerabbasi
Copy link
Collaborator

hameerabbasi commented May 8, 2024

I'd actually like to test the examples as well, to make sure they always work. Can we add something like the following to CI:

# test_examples.sh
for example in $(find ./examples/ -iname *.py); do
  python $example
done

# in CI
source test_examples.sh

Alternatively (and preferably) let's move this to the benchmarks.

hameerabbasi
hameerabbasi previously approved these changes May 8, 2024
@mtsokol
Copy link
Collaborator Author

mtsokol commented May 8, 2024

I added a CI stage for running it.

I can add SDDMM also to the benchmarks, but I prefer to also have examples separately that can be quickly shared with others and executed in repl, instead of unwrapping asv-specific benchmark code.

@mtsokol
Copy link
Collaborator Author

mtsokol commented May 9, 2024

Blocked by finch-tensor/Finch.jl#534

@mtsokol
Copy link
Collaborator Author

mtsokol commented May 9, 2024

Here's a debug output for Finch lazy mode plan:

Executing:
:(function var"##compute#410"(prgm)
      begin
          V = (((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{PlusOneVector{Int32}, PlusOneVector{Int32}}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}
          V_2 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          V_3 = ((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A0 = V::Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{PlusOneVector{Int32}, PlusOneVector{Int32}}, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}
          A0_2 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          @finch mode = :fast begin
                  A0_2 .= 0.0
                  for i1 = _
                      for i0 = _
                          A0_2[i1, i0] = A0[i0, i1]
                      end
                  end
                  return A0_2
              end
          A2 = V_2::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A4 = V_3::Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, PyArray{Float64, 1, true, true, Float64}}}}}
          A8 = Tensor(Dense(SparseDict(Element{0.0, Float64}())))::Tensor{DenseLevel{Int64, SparseLevel{Int64, Finch.DictTable{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}
          @finch mode = :fast begin
                  A8 .= 0.0
                  for i52 = _
                      for i51 = _
                          for i50 = _
                              A8[i50, i51] << + >>= (*)(A0_2[i50, i51], (*)(A2[1, i52], A4[1, i52]))
                          end
                      end
                  end
                  return A8
              end
          return (A8,)
      end
  end)

@willow-ahrens
Copy link
Collaborator

Let's keep working on this until we see a speedup from fusion. I believe a fusion-based speedup should be achievable here, so it's a good goal to work towards.

hameerabbasi
hameerabbasi previously approved these changes May 14, 2024
@mtsokol
Copy link
Collaborator Author

mtsokol commented May 14, 2024

Right now in the latest Finch version we have precompilation of a few kernels. This causes a timeout of the first benchmark. Let me fix it.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this, @mtsokol!

@hameerabbasi hameerabbasi merged commit c12b29e into main May 14, 2024
12 checks passed
@hameerabbasi hameerabbasi deleted the sddmm branch May 14, 2024 12:10
@willow-ahrens
Copy link
Collaborator

Thanks @mtsokol!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants