Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark test case for q4_0 matrix multiplication #653

Merged
merged 10 commits into from
Apr 13, 2023

Conversation

SebastianApel
Copy link
Contributor

This is a small program I used benchmark the performance of the q4_0 matrix multiplication core.

I've used it to compare several implementations of ggml_vec_dot_q4_0.

I will reference this benchmark in another pull request for an improved AVX2 ggml_vec_dot_q4_0 implementation.

@SebastianApel SebastianApel changed the title Benchmark test case for q4_0 matrix multiplication benchmark Benchmark test case for q4_0 matrix multiplication Mar 31, 2023
@slaren
Copy link
Collaborator

slaren commented Apr 1, 2023

Looks good, but I am wondering if this should be split in two parts. On one hand we have a correctness test, which should be added to the list of tests to run in the CI with CTest, and on the other hand we have a benchmarking tool that is meant to be run manually, and should be somewhere else, maybe as one of the examples. What do you think?

@SebastianApel
Copy link
Contributor Author

SebastianApel commented Apr 1, 2023

Looks good, but I am wondering if this should be split in two parts. On one hand we have a correctness test, which should be added to the list of tests to run in the CI with CTest, and on the other hand we have a benchmarking tool that is meant to be run manually, and should be somewhere else, maybe as one of the examples. What do you think?

@slaren I've moved the benchmarking to the folder "examples/benchmark" and adjusted the Makefile. Please note that I've not changed the CMakelists.

I also agree conceptually with your suggestion around the correctness test, but I am not familiar with setting up CTests so it would be great if someone else could do that (maybe in a different PR).

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to have a mat mul benchmark tool, but I think we need more work to make this better:

  • Add options to benchmark all data types, not just Q4_0
  • Add lightweight correctness tests to CTest / CI
  • I think no need for separate "example" and "test". A single "test" in the "./tests" folder that can be executed with different arguments should be best

I think this benchmark will eventually become part of the ggml repo.
For example, in whisper.cpp we already have similar benchmarks:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L4686-L4774

So, if we add it here too, we will have duplication.

If we address the points above, we can merge the benchmark tool in this repo first and later move it to ggml.

Makefile Outdated Show resolved Hide resolved
@SebastianApel
Copy link
Contributor Author

SebastianApel commented Apr 2, 2023

@slaren & @ggerganov I need you help with the questions around "examples" vs "tests".

@slaren suggested

maybe as one of the examples.

@ggerganov suggested

I think no need for separate "example" and "test". A single "test" in the "./tests" folder that can be executed with different arguments should be best

Both options are fine with me - but I cannot fulfill both wishes. Who will decide and let me know which option should be implemented in order to get this PR merged?

@SebastianApel
Copy link
Contributor Author

@ggerganov

It would be useful to have a mat mul benchmark tool, but I think we need more work to make this better:

  • Add options to benchmark all data types, not just Q4_0
  • Add lightweight correctness tests to CTest / CI

I can understand the desire to extend the benchmark tool even more and enable it to work on all data types, and also to implement lightweight correctness tests to CTest / CI.

My challenge is that I will only have very limited capacity over the next weeks, so I myself will not be able to work on your suggestions.

I see the following options:

  • O1: We leave this PR open until someone finds the time to work on the suggestions
  • O2: We merge the PR as is and open 1 or 2 "improvement" issue tickets for the suggestions
  • O3: We close the PR and when someone finds the time to work on the suggestions, we will open a new one

Do you see any additional options? If not, which option would you prefer?

@slaren
Copy link
Collaborator

slaren commented Apr 2, 2023

@SebastianApel I am sorry for the confusion, I was mostly looking for input since I wasn't sure myself what was the best way to proceed. Absolutely follow @ggerganov suggestions, this is his project after all.

@SebastianApel
Copy link
Contributor Author

SebastianApel commented Apr 2, 2023

@SebastianApel I am sorry for the confusion, I was mostly looking for input since I wasn't sure myself what was the best way to proceed. Absolutely follow @ggerganov suggestions, this is his project after all.

@slaren Thank you for the quick feedback. I can move it back, no problem. I'll wait for feedback from @ggerganov regarding #653 (comment) before doing so.

@ggerganov
Copy link
Owner

Adding this now, but I think we can do way better and later move this as a ggml test

@ggerganov ggerganov merged commit 95ea26f into ggerganov:master Apr 13, 2023
@jon-chuang
Copy link
Contributor

Quite curious, perhaps add the graph_print after to show the individual node perf timings? (Requires GGML_PERF defined)

@SebastianApel
Copy link
Contributor Author

@jon-chuang Since this PR is merged - would you be open to creating a new PR with your idea? That would make it easier to discuss your suggestion and/or code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants