Skip to content

Conversation

noemotiovon
Copy link
Collaborator

Description:

This PR introduces a cache-based optimization for the RMS_NORM operator in the CANN backend. By reusing pre-allocated zero and one float32 tensors, it reduces redundant memory allocations and improves runtime performance for RMS normalization operations.

What does this PR do?

  • Initializes a reusable cache for zero-filled and one-filled float32 tensors.

  • Expands the cache dynamically if the requested tensor size exceeds the current cache capacity.

  • Modifies RMS_NORM computations to utilize the cache, reducing memory operations.

  • Improves performance for large-scale tensor normalization in CANN backend.

Why is this needed?

  • Avoids frequent memory allocation and deallocation during RMS normalization.

  • Reduces overhead and latency in tensor computations, especially for large models.

Performance Impact:

  • Significant reduction in memory allocation overhead for RMS_NORM.

  • Faster execution for large tensors due to cache reuse.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Aug 19, 2025
@noemotiovon
Copy link
Collaborator Author

Opt Test:

Backend 1/2: CANN0
  Device description: Ascend910B4
  Device memory: 30196 MB (29851 MB free)

new_pool_for_device: device 0 use vmm pool
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.100000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.100000): OK
  10844/10844 tests passed
  Backend CANN0: OK
Backend 2/2: CPU
  Skipping
2/2 backends passed
OK

Model Test:

Script

./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd -p "Building a website can be done in 10 steps:" -ngl 32 -fa
Building a website can be done in 10 steps:
assistant
Certainly! Here’s a summary of the 10 steps to build a website:

### 1. Define Your Goals and Target Audience
- **Goals**: What do you want to achieve with your website? (e.g., sales growth, brand awareness, customer engagement)
- **Audience**: Who is your target audience? Age, gender, interests, and more

### 2. Conduct a Market Research
- **Needs Analysis**: Understand the needs and pain points of your target audience
- **Competitor Analysis**: Study what your competitors are doing to attract and retain customers

### 3. Choose a Website Build Method
- **Static Pages**: Create static HTML/CSS pages
- **Static Hosting**: Use a static website builder like WordPress, Wix, or Squarespace
- **Dynamic Pages**: Build content on the server using frameworks like Django, Ruby on Rails, or Laravel

### 4. Set Up Your Development Environment
- **Choose a Framework**: Choose a framework that suits your needs and ease of use
- **Choose a Platform**: Choose a platform to host your website (e.g., Heroku, WordPress, or Squarespace)

### 5. Design Your Website
- **Wireframing**: Create wireframes to visualize your layout and content
- **HTML and CSS**: Write and style your website using HTML and CSS
- **Responsive Design**: Ensure your website looks good on all devices

### 6. Choose a Content Management System (CMS)
- **CMS**: Select a CMS that suits your needs and ease of use
- **CMS Frameworks**: Consider frameworks like WordPress, Joomla, or Drupal

### 7. Develop Your Website
- **CMS Framework**: Write and publish your website using your chosen CMS framework
- **WordPress**: Use a plugin like WordPress.com for a free WordPress hosting account
- **Joomla**: Use a plugin like Joomla! for Joomla support
- **Drupal**: Use a plugin like Drupal to create and manage content

### 8. Implement User Interface and Navigation
- **UI/UX Design**: Create a user-friendly interface with consistent branding
- **Navigation**: Ensure users can easily find what they need on your site

### 9. Test Your Website
- **Testing**: Use tools like Google Analytics, Browser Performance Tools, and automated testing tools
- **Feedback**: Get feedback from users and make necessary changes

### 10. Launch Your Website
- **Launch**: After testing, launch your website on your chosen platform
- **Promote**: Share your website on social media and through your chosen marketing channels

### Additional Tips
- **SEO**: Optimize your website for search engines to improve its visibility
- **SEO Plugins**: Use SEO plugins like Yoast SEO or Rank Math to enhance your site’s performance
- **Security**: Secure your website with HTTPS, add CAPTCHA, and ensure your hosting provider is secure

By following these 10 steps, you can build a successful website that meets your business goals.

> 
llama_perf_sampler_print:    sampling time =     157.75 ms /   636 runs   (    0.25 ms per token,  4031.70 tokens per second)
llama_perf_context_print:        load time =    6836.93 ms
llama_perf_context_print: prompt eval time =      33.86 ms /    20 tokens (    1.69 ms per token,   590.75 tokens per second)
llama_perf_context_print:        eval time =    3965.67 ms /   615 runs   (    6.45 ms per token,   155.08 tokens per second)
llama_perf_context_print:       total time =   30899.38 ms /   635 tokens
llama_perf_context_print:    graphs reused =        612

Signed-off-by: noemotiovon <757486878@qq.com>
Signed-off-by: noemotiovon <757486878@qq.com>
Signed-off-by: noemotiovon <757486878@qq.com>
Signed-off-by: noemotiovon <757486878@qq.com>
@hipudding hipudding merged commit a0f98dd into ggml-org:master Aug 22, 2025
48 checks passed
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 25, 2025
* [CANN] Optimize RMS_NORM using cache

Signed-off-by: noemotiovon <757486878@qq.com>

* fix typo

Signed-off-by: noemotiovon <757486878@qq.com>

* fix review comment

Signed-off-by: noemotiovon <757486878@qq.com>

* codestyle adjustment

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants