Skip to content

Conversation

noemotiovon
Copy link
Collaborator

What does this PR do?

Record ne and nb information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 22, 2025
@noemotiovon
Copy link
Collaborator Author

noemotiovon commented Sep 22, 2025

Model Parallel Inference Test

Qwen2.5-0.5B:

......
main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 574.36 t/s
Total gen tokens:     13278, speed: 446.64 t/s
Total speed (AVG):           speed: 1020.99 t/s
Cache misses:             0

llama_perf_context_print:        load time =    1730.22 ms
llama_perf_context_print: prompt eval time =   11966.81 ms / 30601 tokens (    0.39 ms per token,  2557.16 tokens per second)
llama_perf_context_print:        eval time =     122.12 ms /    25 runs   (    4.88 ms per token,   204.71 tokens per second)
llama_perf_context_print:       total time =   29732.53 ms / 30626 tokens
llama_perf_context_print:    graphs reused =       1440

Record `ne` and `nb` information for src tensors and include them in the
graph matching check. This enhances the robustness of ACL graph matching
by preventing incorrect matches when src tensors share the same data
address but differ in shape or stride.
Copy link
Collaborator

@hipudding hipudding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This commit fix the hidden bugs by aclgraph reuse.

@hipudding hipudding merged commit aa4711d into ggml-org:master Oct 9, 2025
69 checks passed
anyshu pushed a commit to anyshu/llama.cpp that referenced this pull request Oct 10, 2025
* master: (113 commits)
  webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489)
  cpu : optimize the ggml NORM operation (ggml-org#15953)
  server : host-memory prompt caching (ggml-org#16391)
  No markdown in cot (ggml-org#16483)
  model-conversion : add support for SentenceTransformers (ggml-org#16387)
  ci: add ARM64 Kleidiai build and test support (ggml-org#16462)
  CANN: Improve ACL graph matching (ggml-org#16166)
  kleidiai: kernel interface refactoring (ggml-org#16460)
  [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)
  model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367)
  refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394)
  Disable CUDA host buffers on integrated GPUs (ggml-org#16308)
  server : fix cancel pending task (ggml-org#16467)
  metal : mark FA blocks (ggml-org#16372)
  server : improve context checkpoint logic (ggml-org#16440)
  ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452)
  llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464)
  server : add `/v1/health` endpoint (ggml-org#16461)
  webui : added download action (ggml-org#13552) (ggml-org#16282)
  presets : fix pooling param for embedding models (ggml-org#16455)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants