Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xeon optimizations #2914

Closed
wants to merge 35 commits into from
Closed

Conversation

sanchit-misra
Copy link
Contributor

Description

This PR contains the single socket optimizations of SpMM for Xeon as mentioned in our DistGNN paper: https://arxiv.org/abs/2104.06700

Checklist

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • To the my best knowledge, examples get faster or equal performance and accuracy is not affected.

Changes

Provided Xeon optimized implementations of SpMMSumCsr() and SpMMCmpCsr(). We have observed up to 4.4x speedup on the SpMM kernel without change in accuracy.

Suggestion for best performance

The optimizations can achieve better performance for dense full graphs (like Reddit) if the neighbors for each node in CSR matrix (indices) are sorted.

@dgl-bot
Copy link
Collaborator

dgl-bot commented May 15, 2021

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@jermainewang
Copy link
Member

Look like the PR has conflicts with the previous efforts of adding libxsmm into DGL. I will close this one. Please rebase and create a new PR. THanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants