[CANN] Optimize RMS_NORM using cache #15419

noemotiovon · 2025-08-19T09:36:33Z

Description：

This PR introduces a cache-based optimization for the RMS_NORM operator in the CANN backend. By reusing pre-allocated zero and one float32 tensors, it reduces redundant memory allocations and improves runtime performance for RMS normalization operations.

What does this PR do?

Initializes a reusable cache for zero-filled and one-filled float32 tensors.
Expands the cache dynamically if the requested tensor size exceeds the current cache capacity.
Modifies RMS_NORM computations to utilize the cache, reducing memory operations.
Improves performance for large-scale tensor normalization in CANN backend.

Why is this needed?

Avoids frequent memory allocation and deallocation during RMS normalization.
Reduces overhead and latency in tensor computations, especially for large models.

Performance Impact:

Significant reduction in memory allocation overhead for RMS_NORM.
Faster execution for large tensors due to cache reuse.

noemotiovon · 2025-08-19T09:39:34Z

Opt Test:

Backend 1/2: CANN0
  Device description: Ascend910B4
  Device memory: 30196 MB (29851 MB free)

new_pool_for_device: device 0 use vmm pool
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.100000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.100000): OK
  10844/10844 tests passed
  Backend CANN0: OK
Backend 2/2: CPU
  Skipping
2/2 backends passed
OK

Model Test:

Script

./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd -p "Building a website can be done in 10 steps:" -ngl 32 -fa

Building a website can be done in 10 steps:
assistant
Certainly! Here’s a summary of the 10 steps to build a website:

### 1. Define Your Goals and Target Audience
- **Goals**: What do you want to achieve with your website? (e.g., sales growth, brand awareness, customer engagement)
- **Audience**: Who is your target audience? Age, gender, interests, and more

### 2. Conduct a Market Research
- **Needs Analysis**: Understand the needs and pain points of your target audience
- **Competitor Analysis**: Study what your competitors are doing to attract and retain customers

### 3. Choose a Website Build Method
- **Static Pages**: Create static HTML/CSS pages
- **Static Hosting**: Use a static website builder like WordPress, Wix, or Squarespace
- **Dynamic Pages**: Build content on the server using frameworks like Django, Ruby on Rails, or Laravel

### 4. Set Up Your Development Environment
- **Choose a Framework**: Choose a framework that suits your needs and ease of use
- **Choose a Platform**: Choose a platform to host your website (e.g., Heroku, WordPress, or Squarespace)

### 5. Design Your Website
- **Wireframing**: Create wireframes to visualize your layout and content
- **HTML and CSS**: Write and style your website using HTML and CSS
- **Responsive Design**: Ensure your website looks good on all devices

### 6. Choose a Content Management System (CMS)
- **CMS**: Select a CMS that suits your needs and ease of use
- **CMS Frameworks**: Consider frameworks like WordPress, Joomla, or Drupal

### 7. Develop Your Website
- **CMS Framework**: Write and publish your website using your chosen CMS framework
- **WordPress**: Use a plugin like WordPress.com for a free WordPress hosting account
- **Joomla**: Use a plugin like Joomla! for Joomla support
- **Drupal**: Use a plugin like Drupal to create and manage content

### 8. Implement User Interface and Navigation
- **UI/UX Design**: Create a user-friendly interface with consistent branding
- **Navigation**: Ensure users can easily find what they need on your site

### 9. Test Your Website
- **Testing**: Use tools like Google Analytics, Browser Performance Tools, and automated testing tools
- **Feedback**: Get feedback from users and make necessary changes

### 10. Launch Your Website
- **Launch**: After testing, launch your website on your chosen platform
- **Promote**: Share your website on social media and through your chosen marketing channels

### Additional Tips
- **SEO**: Optimize your website for search engines to improve its visibility
- **SEO Plugins**: Use SEO plugins like Yoast SEO or Rank Math to enhance your site’s performance
- **Security**: Secure your website with HTTPS, add CAPTCHA, and ensure your hosting provider is secure

By following these 10 steps, you can build a successful website that meets your business goals.

> 
llama_perf_sampler_print:    sampling time =     157.75 ms /   636 runs   (    0.25 ms per token,  4031.70 tokens per second)
llama_perf_context_print:        load time =    6836.93 ms
llama_perf_context_print: prompt eval time =      33.86 ms /    20 tokens (    1.69 ms per token,   590.75 tokens per second)
llama_perf_context_print:        eval time =    3965.67 ms /   615 runs   (    6.45 ms per token,   155.08 tokens per second)
llama_perf_context_print:       total time =   30899.38 ms /   635 tokens
llama_perf_context_print:    graphs reused =        612

ggml/src/ggml-cann/aclnn_ops.cpp

ggml/src/ggml-cann/common.h

Signed-off-by: noemotiovon <757486878@qq.com>

* [CANN] Optimize RMS_NORM using cache Signed-off-by: noemotiovon <757486878@qq.com> * fix typo Signed-off-by: noemotiovon <757486878@qq.com> * fix review comment Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Aug 19, 2025

hipudding reviewed Aug 20, 2025

View reviewed changes

noemotiovon added 3 commits August 20, 2025 03:08

[CANN] Optimize RMS_NORM using cache

0344d58

Signed-off-by: noemotiovon <757486878@qq.com>

fix typo

9b0ec0e

Signed-off-by: noemotiovon <757486878@qq.com>

fix review comment

c24b995

Signed-off-by: noemotiovon <757486878@qq.com>

noemotiovon force-pushed the rms_norm_opti branch from 485fd39 to c24b995 Compare August 20, 2025 03:21

codestyle adjustment

3c87db4

Signed-off-by: noemotiovon <757486878@qq.com>

hipudding approved these changes Aug 22, 2025

View reviewed changes

hipudding merged commit a0f98dd into ggml-org:master Aug 22, 2025
48 checks passed

noemotiovon mentioned this pull request Aug 25, 2025

RMS NORM 算子常量池优化 cosdt/llama.cpp#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CANN] Optimize RMS_NORM using cache #15419

[CANN] Optimize RMS_NORM using cache #15419

Uh oh!

noemotiovon commented Aug 19, 2025

Uh oh!

noemotiovon commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CANN] Optimize RMS_NORM using cache #15419

[CANN] Optimize RMS_NORM using cache #15419

Uh oh!

Conversation

noemotiovon commented Aug 19, 2025

Description：

What does this PR do?

Why is this needed?

Performance Impact:

Uh oh!

noemotiovon commented Aug 19, 2025

Opt Test:

Model Test:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!