Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632

Closed
wants to merge 24 commits into from

Commits on Jun 18, 2023

  1. initial

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    213f133 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1b041d7 View commit details
    Browse the repository at this point in the history
  3. bulk refactored task profile to support complete fallback; enable tun…

    …e by default for ease of dev
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    48016f6 View commit details
    Browse the repository at this point in the history
  4. threading test: At github, Windows can take more than 20 seconds to s…

    …tart 15 threads.Let's silently ignore when we saw two adjacent slowness.
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    9106232 View commit details
    Browse the repository at this point in the history
  5. Workrounnd to set node->backend

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    bb590f1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7c05049 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    21e9379 View commit details
    Browse the repository at this point in the history
  8. tunning: support k_quants; disabled rope shapes (workaround); make ca…

    …che thread safe; fixed shape comprison
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    5342dc0 View commit details
    Browse the repository at this point in the history
  9. try make CL run w/o tunning, but -ngl stucks no output. had to add ta…

    …sk runer and profile id, many changes, see the f codes
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    6b83a3e View commit details
    Browse the repository at this point in the history
  10. bulk refactoring task profile and related to run CL GPU offloading.

    * removed ggml_task_backend, infavour of ggml_task_profile.runner and newly added id and name.
    * extracted mul_mat blas codes into ggml_compute_forward_mul_mat_blas,
      thus align with CUDA/CL a bit more and make it easier to fix profile and run tune.
    * rewrote task profile and update/add some cuda/cl codes, finnaly made CL GPU offloading work.
    * misc minor fix/update to tune, the data format was changed.
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    06b0082 View commit details
    Browse the repository at this point in the history
  11. typos

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    67bb367 View commit details
    Browse the repository at this point in the history
  12. fix cuda build error

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    2193ab6 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    0ec4dab View commit details
    Browse the repository at this point in the history
  14. fix warning

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    5abb8ae View commit details
    Browse the repository at this point in the history
  15. threading: add suspend/resume APIs, so it's possible to run a thread …

    …pool at session level
    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    5feefb3 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    286c5b3 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    9872863 View commit details
    Browse the repository at this point in the history
  18. fixed OP_OUT_PROD and OP_NONE

    mqy committed Jun 18, 2023
    Configuration menu
    Copy the full SHA
    6609c22 View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2023

  1. tune: update readme

    mqy committed Jun 19, 2023
    Configuration menu
    Copy the full SHA
    65fd65e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    44b831d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4d32b40 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cc8a375 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    aac7f7c View commit details
    Browse the repository at this point in the history
  6. threading: removed feature wait_on_done to figure out causes of deadl…

    …ock in windows AVX
    mqy committed Jun 19, 2023
    Configuration menu
    Copy the full SHA
    08972d2 View commit details
    Browse the repository at this point in the history