Add CodeLlama usage to the README and make sure it works (pytorch#330)

yanbing-j · Jul 17, 2024 · 81d09b7 · 81d09b7
1 parent 9209436
commit 81d09b7
Show file tree

Hide file tree

Showing 3 changed files with 53 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@ git clone https://github.com/pytorch/torchchat.git
 cd torchchat
 pip install -r requirements.txt
 
-# ensure everything installed correctly. If this command works you'll see a welcome message and some details
+# ensure everything installed correctly
 python torchchat.py --help
 
 ```
@@ -57,13 +57,13 @@ python torchchat.py download llama3
 
 ### Chat
 Designed for interactive and conversational use.
-In chat mode, the LLM engages in a back-and-forth dialogue with the user. It responds to queries, participates in discussions, provides explanations, and can adapt to the flow of conversation. This mode is typically what you see in applications aimed at simulating conversational partners or providing customer support.
+In chat mode, the LLM engages in a back-and-forth dialogue with the user. It responds to queries, participates in discussions, provides explanations, and can adapt to the flow of conversation.
 
 For more information run `python torchchat.py chat --help`
 
 **Examples**
 ```
-# Chat with some parameters
+python torchchat.py chat llama3 --tiktoken
 ```
 
 ### Generate
@@ -74,18 +74,24 @@ For more information run `python torchchat.py generate --help`
 
 **Examples**
 ```
-python torchchat.py generate llama3 --device=cpu --dtype=fp16 --tiktoken
+python torchchat.py generate llama3 --dtype=fp16 --tiktoken
 ```
 
 ### Export
-Compiles a model for different use cases
+Compiles a model and saves it to run later.
 
 For more information run `python torchchat.py export --help`
 
 **Examples**
 
+AOT Inductor:
 ```
-python torchchat.py export stories15M --output-pte-path=stories15m.pte
+python torchchat.py export stories15M --output-dso-path stories15M.so
+```
+
+ExecuTorch:
+```
+python torchchat.py export stories15M --output-pte-path stories15M.pte
 ```
 
 ### Browser
@@ -94,20 +100,20 @@ Run a chatbot in your browser that’s supported by the model you specify in the
 **Examples**
 
 ```
-
-python torchchat.py browser stories15M --device cpu --temperature 0 --num-samples 10
+python torchchat.py browser stories15M --temperature 0 --num-samples 10
 ```
 
 *Running on http://127.0.0.1:5000* should be printed out on the terminal. Click the link or go to [http://127.0.0.1:5000](http://127.0.0.1:5000) on your browser to start interacting with it.
 
 Enter some text in the input box, then hit the enter key or click the “SEND” button. After 1 second or 2, the text you entered together with the generated text will be displayed. Repeat to have a conversation.
 
 ### Eval
-Uses lm_eval library to evaluate model accuracy on a variety of tasks. Defaults to wikitext and can be manually controlled using the tasks and limit args.l
+Uses lm_eval library to evaluate model accuracy on a variety of tasks. Defaults to wikitext and can be manually controlled using the tasks and limit args.
 
 For more information run `python torchchat.py eval --help`
 
 **Examples**
+
 Eager mode:
 ```
 python torchchat.py eval stories15M -d fp32 --limit 5
@@ -118,6 +124,7 @@ To test the perplexity for lowered or quantized model, pass it in the same way y
 ```
 python torchchat.py eval stories15M --pte-path stories15M.pte --limit 5
 ```
+
 ## Models
 These are the supported models
 | Model | Mobile Friendly | Notes |
@@ -139,59 +146,72 @@ These are the supported models
 See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
 
 **Examples**
+
 ```
-#Llama3
+# Llama 3 8B Instruct
+python torchchat.py chat llama3 --tiktoken
 ```
 
 ```
-#Stories
+# Stories 15M
+python torchchat.py chat stories15M
 ```
 
 ```
-#CodeLama
+# CodeLama 7B for Python
+python torchchat.py chat codellama
 ```
 
 ## Desktop Execution
 
-### AOTI (AOT Inductor ) - PC Specific
-AOT compiles models into machine code before execution, enhancing performance and predictability. It's particularly beneficial for frequently used models or those requiring quick start times. AOTI also increases security by not exposing the model at runtime. However, it may lead to larger binary sizes and lacks the runtime optimization flexibility
+### AOTI (AOT Inductor)
+AOT compiles models into machine code before execution, enhancing performance and predictability. It's particularly beneficial for frequently used models or those requiring quick start times. However, it may lead to larger binary sizes and lacks the runtime flexibility of eager mode.
 
 **Examples**
 The following example uses the Stories15M model.
 ```
 # Compile
-python torchchat.py export stories15M --device cpu --output-dso-path stories15M.so
+python torchchat.py export stories15M --output-dso-path stories15M.so
 
 # Execute
-python torchchat.py generate --device cpu --dso-path stories15M.so --prompt "Hello my name is"
+python torchchat.py generate --dso-path stories15M.so --prompt "Hello my name is"
 ```
 
 NOTE: The exported model will be large. We suggest you quantize the model, explained further down, before deploying the model on device.
 
 ### ExecuTorch
-ExecuTorch enables you to optimize your model for execution on a mobile or embedded device
+ExecuTorch enables you to optimize your model for execution on a mobile or embedded device, but can also be used on desktop for testing.
+
+**Examples**
+The following example uses the Stories15M model.
+```
+# Compile
+python torchchat.py export stories15M --output-pte-path stories15M.pte
 
-If you want to deploy and execute a model within your iOS app <do this>
-If you want to deploy and execute a model within your Android app <do this>
-If you want to deploy and execute a model within your edge device <do this>
-If you want to experiment with our sample apps. Check out our iOS and Android sample apps.
+# Execute
+python torchchat.py generate --device cpu --pte-path stories15M.pte --prompt "Hello my name is"
+```
+
+See below under Mobile Execution if you want to deploy and execute a model in your iOS or Android app.
 
 ## Quantization
-Quantization focuses on reducing the precision of model parameters and computations from floating-point to lower-bit integers, such as 8-bit integers. This approach aims to minimize memory requirements, accelerate inference speeds, and decrease power consumption, making models more feasible for deployment on edge devices with limited computational resources. While quantization can potentially degrade the model's performance, the methods supported by torchchat are designed to mitigate this effect, maintaining a balance between efficiency and accuracy.
+Quantization focuses on reducing the precision of model parameters and computations from floating-point to lower-bit integers, such as 8-bit and 4-bit integers. This approach aims to minimize memory requirements, accelerate inference speeds, and decrease power consumption, making models more feasible for deployment on edge devices with limited computational resources. While quantization can potentially degrade the model's performance, the methods supported by torchchat are designed to mitigate this effect, maintaining a balance between efficiency and accuracy.
 
 TODO:
 - Brief rundown on supported quant modes and torchchat.py flags (emphasis on brief).
 - Recommendations for quantization modes for 7b local chat, 7b on mobile, etc.
 - One line that shows the performance difference between the base model and the 4bit
 - Link to Quantization.md.
 
-Read the [Quantization documention](docs/quantization.md) for more details.
+Read the [quantization documention](docs/quantization.md) for more details.
 
 ## Mobile Execution
 **Prerequisites**
 
-Install [ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup.html)
+ExecuTorch lets you run your model on a mobile or embedded device. The exported ExecuTorch .pte model file plus runtime is all you need.
+
+Install [ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup.html) to get started.
 
-[iOS Details](docs/iOS.md)
+Read the [iOS documentation](docs/iOS.md) for more details on iOS.
 
-[Android Details](docs/Android.md)
+Read the [Android documentation](docs/Android.md) for more details on Android.
diff --git a/config/data/models.json b/config/data/models.json
@@ -9,6 +9,11 @@
         "distribution_channel": "HuggingFaceSnapshot",
         "distribution_path": "meta-llama/Llama-2-7b-chat-hf"
     },
+    "meta-llama/CodeLlama-7b-Python-hf": {
+        "aliases": ["codellama", "codellama-7b"],
+        "distribution_channel": "HuggingFaceSnapshot",
+        "distribution_path": "meta-llama/CodeLlama-7b-Python-hf"
+    },
     "mistralai/Mistral-7B-Instruct-v0.2": {
         "aliases": ["mistral-7b", "mistral-7b-instruct"],
         "distribution_channel": "HuggingFaceSnapshot",

diff --git a/download.py b/download.py
@@ -82,8 +82,9 @@ def download_and_convert(
 def is_model_downloaded(model: str, models_dir: Path) -> bool:
     model_config = resolve_model_config(model)
 
+    # Check if the model directory exists and is not empty.
     model_dir = models_dir / model_config.name
-    return os.path.isdir(model_dir)
+    return os.path.isdir(model_dir) and os.listdir(model_dir)
 
 
 def main(args):