update cloud gpu notebook

enzokro · Sep 29, 2023 · 81fcf86 · 81fcf86
1 parent 3606e75
commit 81fcf86
Show file tree

Hide file tree

Showing 2 changed files with 157 additions and 39 deletions.
diff --git a/nbs/03_lambda_labs.ipynb b/nbs/03_lambda_labs.ipynb
@@ -4,33 +4,32 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Running Cloud GPUs\n",
+    "# Running LLMs in the Cloud\n",
     "\n",
-    "> Creating our LLM environment in a cloud GPU for Fine-Tuning."
+    "> Creating an LLM environment for Fine-Tuning in a cloud GPU."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "There are many companies that provide access to GPUs in the Cloud. In this Notebook we use a service called Paperspace for its mix of ease and flexibility.  \n",
+    "There are many companies that offer GPUs in the Cloud. In this Notebook we choose Paperspace for its mix of ease and options.  \n",
     "\n",
     "\n",
-    "First, let's go over some cloud providers to see different options and use-cases: \n",
-    "- Google Colab.   \n",
+    "First, let's go over some cloud providers that cover different use-cases and options:   \n",
     "- Lambda Labs.   \n",
     "- Paperspace.  \n",
+    "- Google Colab. \n",
     "\n",
-    "There are many more options with different pricing and features, but these three cover a solid range of usability and price. \n",
     "\n",
-    "[Lambda Labs](https://lambdalabs.com/) is one of the more popular GPU cloud providers. It has great pricing options. Unfortunately, this combo of popularity and low costs means that their GPUs are often unavailable. Making an account and launching GPUs, when one *is* open, is incredibly fast and straightfoward.  \n",
+    "[Lambda Labs](https://lambdalabs.com/) is a very popular GPU cloud provider. It has great pricing. Unfortunately, its combo of popularity and low costs means that GPUs are often claimed and we're not guaranteed to get one. Making an account and launching GPUs, when they *are* open, is incredibly fast and straightforward.  \n",
     "\n",
-    "[Paperspace](https://www.paperspace.com/) offers cloud GPUs in two different, complementary ways. Their platform called Gradient is built around Notebooks and is tailored for ML and scientific experiments. Their CORE service has more low-level options, allowing you to build and deploy a custom VM with a GPU. Paperspace generally has better availability than Lambda Labs.    \n",
+    "[Paperspace](https://www.paperspace.com/) offers cloud GPUs in two different, complementary ways. Their platform called Gradient is built around Notebooks and is tailored for quick ML and scientific jobs. Their CORE service, on the other hand, has more low-level options. We use it to fully customize and deploy a VM with a GPU. Paperspace tends to have better availability than Lambda Labs.    \n",
     "\n",
-    "[Colab](https://colab.google/) is an option from Google. It is built around their own flavor of Notebooks that are very similar to Jupyter's. One of Colab's most useful features is the ability to directly load any Notebook straight from your Git repos. It's a convenient and flexible option without the overhead effort of Lambda or Paperspace.   \n",
+    "[Colab](https://colab.google/) is an option provided by Google. It builds around their own flavor of Notebooks that is very similar to Jupyter's. One of Colab's most useful features is the ability to directly load a Notebook straight from any of your Git repos. It's a convenient and flexible option without the overhead of Lambda or Paperspace.   \n",
     "\n",
     "\n",
-    "> Note: the cloud GPU scene changes fast. The points above about cost or availability are rules of thumb. Actual uptime and costs change all the time. \n"
+    "> Note: the cloud GPU scene changes fast. The points above are rules of thumb, specifics change all of the time."
    ]
   },
   {
@@ -44,7 +43,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We picked Paperspace because it lets us deploy custom VMs. It's not as easy to use as Colab. But, we'll have the option to build the LLM environment for fine-tuning LLMs."
+    "We use Paperspace because it lets us deploy custom VMs in different ways."
    ]
   },
   {
@@ -65,14 +64,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `Create a Machine` button takes us to the page to setup and deploy the VM."
+    "The `Create a Machine` button on the top-left takes us to the page for setting up and deploying VMs."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "One of the first options is picking which OS the VM will run. We'll pick Ubuntu 22.04 to leverage its latest updates and improvements."
+    "One of the first options is picking the OS on the VM. We use Ubuntu 22.04 to leverage its latest updates and improvements."
    ]
   },
   {
@@ -86,7 +85,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next we can pick which type of GPU the VM will get. Here I picked the `Quadro M4000` which is the cheapest option as of writing, costing $0.45 an hour. "
+    "Next we pick the OS that the VM will run. The screenshot below shows the `Quadro M4000` GPU which is the cheapest option at $0.45 an hour. "
    ]
   },
   {
@@ -100,12 +99,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Below some back-of-the-napkin math around what it costs to run the machine for different times, since this is  always good to keep this in mind."
+    "It's good to keep the cost of running these cloud VMs in mind, so we don't get any billing surprises. Below we do some quick math to see the costs of running this GPU:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [
     {
@@ -123,7 +122,7 @@
     }
    ],
    "source": [
-    "# price for an hour of Quadro M4000 use\n",
+    "# price of an hour for the Quadro M4000 GPU\n",
     "price_per_hour = 0.45\n",
     "\n",
     "# leaving the machine on for a day\n",
@@ -148,37 +147,36 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Leaving it on for a full day isn't too bad. But the cost rise quickly as we leave the machine on for longer. This gets even worse with more expensive machines. Thankfully, the fine-tuning we'll be doing should fit well within a day. "
+    "Leaving it on for a full day isn't too bad. But the cost rises quickly the longer we leave on the machine. This is even worse with the more expensive GPU cards. Thankfully, the fine-tuning we'll be doing should fit well within a day. "
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Connecting to the VM"
+    "### VM's connection to the outside world"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next we get to pick between two way of connecting to the VM.  \n",
-    "\n",
+    "Next we get to pick how to connect to this VM from a local computer: \n",
     "![](imgs/paperspace_vm_connect_opts.png)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this case we will use the [`ssh`](https://arjunaravind.in/blog/learning-and-using-ssh/) option."
+    "In this case we pick the [`ssh`](https://arjunaravind.in/blog/learning-and-using-ssh/) option."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Below are two good options for SSH setup tutorials. The first is this text-based [Lambda Labs SSH tutorial](https://lambdalabs.com/blog/getting-started-with-lambda-cloud-gpu-instances). The second is a video from Paperspace embedded below."
+    "Some of you likely have keys already in `~/.ssh`. If not, here are two good tutorial options for SSH setups. The first is this [Lambda Labs SSH tutorial](https://lambdalabs.com/blog/getting-started-with-lambda-cloud-gpu-instances). The second is a video from Paperspace embedded below."
    ]
   },
   {
@@ -202,7 +200,7 @@
        "        "
       ],
       "text/plain": [
-       "<IPython.lib.display.YouTubeVideo at 0x1105c93d0>"
+       "<IPython.lib.display.YouTubeVideo at 0x1065200d0>"
       ]
      },
      "execution_count": 4,
@@ -221,35 +219,28 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We're almost done. At the bottom of the page, Paperspace breakdowns the VM configuration and its cost. If the summary looks good, go ahead and click the `Create` button to deploy the cloud VM."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "> Note: There is also an \"Advanced options\" section with more low-level options for the VM."
+    "Once you have your key, add it under the `SSH Keys` section of your Paperspace account before going forward:"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![](imgs/paperspace_cost_summary.png)"
+    "![](imgs/paperspace_ssh_key.png)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next, we have to start the VM before we connect to it. After following the SSH tutorials you should have your own  key. Add it under the `SSH Keys` section of your Paperspace account as shown below:"
+    "We're almost done setting up the VM. Back at the bottom of the creation page, Paperspace summarizes the VM configuration and its cost. If the summary looks good, go ahead and click the `Create` button to deploy the cloud VM."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![](imgs/paperspace_ssh_key.png)"
+    "![](imgs/paperspace_cost_summary.png)"
    ]
   },
   {
@@ -281,9 +272,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Paperspace offers a ready-to-go VM option. It's called `ML-In-A-Box`, and already includes the nvidia drivers. It's a great option if you want to skip the driver installation step.  \n",
+    "Paperspace also offers a VM ready-to-go for Machine Learning applications. It's called `ML-In-A-Box` and includes the nvidia drivers. It's a great option if you want to get up and running quickly, or just want to skip the driver installation step (understandable).  \n",
     "\n",
-    "It doesn't install the latest version of the drivers, but it's a great option to get started quickly. We can also always install drivers later."
+    "This pre-configured VM might not have the latest version of the drivers, but we can always fix that later."
    ]
   },
   {
@@ -293,6 +284,133 @@
     "Here is a screenshot with both types of Machines: custom and `ML-In-A-Box`:\n",
     "![](imgs/paperspace_both_types.png)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating the `llm_base` environment in the VM"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For this example we'll use the `ML-In-A-Box` VM. But, the steps are roughly the same for the custom VM, after installing the nvidia drivers.   \n",
+    "\n",
+    "The steps below are mostly repeated from the first lesson on creating the Environment. The two main changes:  \n",
+    "- The requirements are pip-installed in a different order than on Mac.   \n",
+    "- We can now install the libraries in `reqs_optim.txt` to speed up the LLMs.    \n",
+    "\n",
+    "```bash\n",
+    "## Setting up the Environment on a VM\n",
+    "\n",
+    "# connect to the Paperspace VM\n",
+    "ssh paperspace@some-ip-here\n",
+    "\n",
+    "# install mamba\n",
+    "curl -L -O \"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\"\n",
+    "bash Miniforge3-$(uname)-$(uname -m).sh\n",
+    "# NOTE: after installing mamba, refresh your terminal\n",
+    "\n",
+    "# clone the course repo\n",
+    "git clone https://github.com/enzokro/Fractal-LLM-Course.git\n",
+    "\n",
+    "# move in to the environment folder\n",
+    "cd Fractal-LLM-Course/Fractal_LLM_Course/lesson_0/envs\n",
+    "\n",
+    "# create the base environment\n",
+    "mamba env create -f environment.yml\n",
+    "\n",
+    "# activate the environment\n",
+    "mamba activate llm_base\n",
+    "\n",
+    "# install the pytorch library\n",
+    "python -m pip install -r reqs_torch.txt\n",
+    "\n",
+    "# install the python packages, after activating the env\n",
+    "python -m pip install -r requirements.txt  \n",
+    "\n",
+    "# now, we can also install the extra packages to speed up LLMs\n",
+    "python -m pip install -r reqs_optim.txt  \n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Checking if we can use the GPU"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Drum roll... moment of truth. Can we actually use the GPU with pytorch? Run the following python code in the terminal to find out:\n",
+    "\n",
+    "```python\n",
+    "## first, make sure the `llm_base` environment is active\n",
+    "## then, run the following python code in a shell\n",
+    "\n",
+    "# import the torch library\n",
+    "import torch\n",
+    "\n",
+    "# check if we can see the GPU\n",
+    "print(torch.cuda.is_available()) # should print \"True\"\n",
+    "```  \n",
+    "\n",
+    "If the above command shows `True`, we're good to go! We now have the `llm_base` environment on a cloud VM with a working GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tying this approach to Fine-Tuning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Eventually, we'll reserve a stronger GPU (or more of the weaker ones) with more memory and resources to handle the LLM fine-tuning. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook covered how to launch and connect to a cloud GPU running on Paperspace. We then created the `llm_base` environment on the VM, mimicking our local environment. That means any notebook or command we've ran on our local computer can now run in the VM."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Aside(?): Going over CUDA driver install"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> Recording the CUDA driver install process..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Aside(??): `llm_base` on Colab"
+   ]
   }
  ],
  "metadata": {

diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -13,7 +13,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Install"
+    "## Installation"
    ]
   },
   {
@@ -29,7 +29,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## How to use"
+    "## Usage"
    ]
   },
   {