Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On demand #1

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

On demand #1

wants to merge 4 commits into from

Commits on Aug 26, 2024

  1. Enabled MoE On Demand

    This commit enables MoE on demand. On our GPUs, this cuts the
    memory requirements by about half. Inference is still much slower.
    
    You can instantiate MoE on demand on this branch using Mixtral
    with command:
    
    '''
    LLM(model="/home/user/Workspace_4tb/grant/Mixtral-8x7B-v0.1",  max_model_len=2048, gpu_memory_utilization=0.98, enforce_eager=True, num_gpu_blocks_override=1000)
    '''
    
    Fixes MoE On Demand
    gnpinkert committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    ce5f39f View commit details
    Browse the repository at this point in the history
  2. Enabled MoE On Demand

    This commit enables MoE on demand for DeepSeek V2
    
    You can instantiate MoE on demand on this branch using DeepSeek Lite
    with command:
    
    '''
    LLM(model="deepseek-ai/DeepSeek-V2-Lite",  max_model_len=2048, gpu_memory_utilization=0.98, enforce_eager=True, num_gpu_blocks_override=1000)
    '''
    
    Fixes MoE On Demand
    gnpinkert committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    7b33075 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2024

  1. This commit uses the DRL predictor to predict future expert selection.

    Prefill stage is not yet complete
    gnpinkert committed Sep 28, 2024
    Configuration menu
    Copy the full SHA
    07787f3 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. wip

    gnpinkert committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    779767b View commit details
    Browse the repository at this point in the history