CANN: improve ACL graph matching #16166

noemotiovon · 2025-09-22T03:23:57Z

What does this PR do?

Record ne and nb information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride.

noemotiovon · 2025-09-22T03:24:44Z

Model Parallel Inference Test

Qwen2.5-0.5B:

......
main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 574.36 t/s
Total gen tokens:     13278, speed: 446.64 t/s
Total speed (AVG):           speed: 1020.99 t/s
Cache misses:             0

llama_perf_context_print:        load time =    1730.22 ms
llama_perf_context_print: prompt eval time =   11966.81 ms / 30601 tokens (    0.39 ms per token,  2557.16 tokens per second)
llama_perf_context_print:        eval time =     122.12 ms /    25 runs   (    4.88 ms per token,   204.71 tokens per second)
llama_perf_context_print:       total time =   29732.53 ms / 30626 tokens
llama_perf_context_print:    graphs reused =       1440

Record `ne` and `nb` information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride.

hipudding

Thanks. This commit fix the hidden bugs by aclgraph reuse.

* master: (113 commits) webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489) cpu : optimize the ggml NORM operation (ggml-org#15953) server : host-memory prompt caching (ggml-org#16391) No markdown in cot (ggml-org#16483) model-conversion : add support for SentenceTransformers (ggml-org#16387) ci: add ARM64 Kleidiai build and test support (ggml-org#16462) CANN: Improve ACL graph matching (ggml-org#16166) kleidiai: kernel interface refactoring (ggml-org#16460) [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472) model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367) refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394) Disable CUDA host buffers on integrated GPUs (ggml-org#16308) server : fix cancel pending task (ggml-org#16467) metal : mark FA blocks (ggml-org#16372) server : improve context checkpoint logic (ggml-org#16440) ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452) llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464) server : add `/v1/health` endpoint (ggml-org#16461) webui : added download action (ggml-org#13552) (ggml-org#16282) presets : fix pooling param for embedding models (ggml-org#16455) ...

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 22, 2025

noemotiovon mentioned this pull request Sep 22, 2025

优化 acl graph 图匹配机制，src tensor也判断维度和步长信息 cosdt/llama.cpp#36

Open

1 task

noemotiovon force-pushed the fix_graph_match branch 5 times, most recently from 8a5274a to 489c944 Compare September 29, 2025 01:52

noemotiovon added 2 commits October 9, 2025 01:29

CANN: improve ACL graph matching

e190aad

Record `ne` and `nb` information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride.

CANN: add op_params match

f29fc80

noemotiovon force-pushed the fix_graph_match branch from 489c944 to f29fc80 Compare October 9, 2025 01:29

hipudding approved these changes Oct 9, 2025

View reviewed changes

hipudding merged commit aa4711d into ggml-org:master Oct 9, 2025
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: improve ACL graph matching #16166

CANN: improve ACL graph matching #16166

Uh oh!

noemotiovon commented Sep 22, 2025

Uh oh!

noemotiovon commented Sep 22, 2025 •

edited

Loading

Uh oh!

hipudding left a comment

Uh oh!

Uh oh!

Uh oh!

CANN: improve ACL graph matching #16166

CANN: improve ACL graph matching #16166

Uh oh!

Conversation

noemotiovon commented Sep 22, 2025

What does this PR do?

Uh oh!

noemotiovon commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Parallel Inference Test

Uh oh!

hipudding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

noemotiovon commented Sep 22, 2025 •

edited

Loading