Skip to content

Run code-llama with 50k tokens using flash attention and better transformer

License

Notifications You must be signed in to change notification settings

TrelisResearch/code-llama-32k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

code-llama-32k

Run code-llama with 32k tokens using flash attention and better transformer

Basic Jupyter Notebook (only works on Nvidia GPUs, not Mac).

Option 1 - Google Colab:

  • Download the ipynb notebook
  • Select a GPU
    • A100 with 40 GB will allow for 25k context length

Option 2 - Run on a server (e.g. AWS or RunPod (affiliate link))

  • Spin up an A100 80 GB server
  • Run the notebook and select 50,000 context length

PRO Notebooks

  • Allows for saving and re-loading of conversations
  • Allows for uploading and analysis of documents
  • Works on Google Colab or on a Server (e.g. AWS, Azure, RunPod)
  • Purchase here

About

Run code-llama with 50k tokens using flash attention and better transformer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published