-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JIT Offline Cache for Taichi #4401
Comments
Requesting for extending this to all back ends, considering the huge range of users on Mac or non-cuda laptops |
Yes, impl the feature for all backends is my goal. |
I argue strongly against this solution. We have profiles showing the LLVM codegen takes about 30% of the entire JIT codegen time, it would be much wiser to spend time figuring out AST->CHI-IR caching first. |
A two staged caching gives a major JIT time boost to all backends, I'd argue this is a lot cleaner to implement as well compared to having one stage caching for each individual backend, which will cause maintenance problems down the line |
Step by step. Maybe temporary solution. We don't have serialization of CHI IR now. After CHI IR's serialization is implemented, maybe 2-level cache is better, especially for multi backends... ps. I think CHI IR's serialization is very important for standardizing CHI IR, which needs a feasible efficient standard (more .adj to show the importance I think) solution, like llvm-ir, IL, Java bytecode, intel-asm, which is not easy... |
Is there a middle ground we can find out? E.g. how easy is it for us to migrate the implementation from caching LLVM to caching CHI IR? If most users don't care about the internal implementation of the cache, I expect the following scenario to happen:
In addition, IMHO the complexity still comes from the cache key part (considering all the involved global states). The cached contents can be adjusted fairly easily, provided that CHI IR serialization is implemented. |
The (new) implementation of offline-cache is transparent. All logic is in C++ side. Frontend only see the |
Can't agree more. Because taichi's kernels depends on global vars/states, generating a key which can uniquely identifies a kernel is difficult and the key of implementing caching a kernel. And, at present, before we have a standardized de/serializable CHI IR, dumping and loading & running backend-language is more simple than CHI IR because they have mature/standard solution. ps. Overhead of generating key is what we should consider. Python -> Taichi AST -> CHI IR -> Backend lang. From left to right:
|
Issue: #4401 * Fix a potential bug in metal AOT * Prepare for implementing offline cache on metal Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Supported or not
P.S.
|
Issue: #4401 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: #4401 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#4401, taichi-dev#6614 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: #7002, #4401 ### Brief Summary This PR: 1. Introduced `KernelCompilationManager` to unify implementation of the Offline Cache; 2. Used `KernelCompilationManager` re-impl JIT, Offline Cache on gfx backends (vulkan, metal, dx11, opengl); 3. Removed the `gfx::CacheManager`. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
) Issue: taichi-dev#4401 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#4401, taichi-dev#6614 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#7002, taichi-dev#4401 ### Brief Summary This PR: 1. Introduced `KernelCompilationManager` to unify implementation of the Offline Cache; 2. Used `KernelCompilationManager` re-impl JIT, Offline Cache on gfx backends (vulkan, metal, dx11, opengl); 3. Removed the `gfx::CacheManager`. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Solution
Workflow (on llvm backends)
... → Python Code → Taichi AST → Trans AST to string as key → Hash it(hashed key) → Try find offline cache file by hashed-key:
llvm::Module
+offloaded_task_list
→ Cache them → Run kernel → ... → ( Before exiting ) Dump cache data to diskTodo & Memo
ASTKeyGenerator
to generator key of Taichi AST instead ofIRPrinter
, which will holds more information compared withIRPrinter
.IRPrinter
andExpression::serialize
hold more information).IRPrinter
to generate offline-cache-key more correctlysee Support JIT Offline Cache for Taichi #4401 (comment)). see Refactor kernel compilation #7002- [ ] Support on dx11- [ ] Support on dx12- [ ] Handle hash collisions- [ ] Allow to set/unsetoffline_cache
per kernel ( OptionalUsage
Just setThe feature is enabled by default.offline_cache=True
Supported backends
See #4401 (comment)
For more, see Offline Cache
Potential Bugs
offline_cache=True
(https://github.com/taichi-dev/taichi/actions/runs/6176139423/job/16764496768)The text was updated successfully, but these errors were encountered: