Time | Talk | Speaker |
---|---|---|
8:30AM | Welcome and Introduction | Filippo Simini, ANL |
8:40AM | Transition time: splitting into groups (people new to deep learning vs. more experienced) | |
8:40AM | Parallel Session, Part 1 (talk/hands on): | |
- Main room: Introduction to deep learning | Bethany Lusch, ANL | |
- Breakout room: Profiling deep learning | Khalid Hossain, ANL | |
9:40AM | Introduction to Large Language Models (LLMs) | Huihuo Zheng, ANL |
10:40AM | Break | |
11:10AM | Distributed Deep Learning (talk/hands on) | Nathan Nichols, ANL Kaushik Velusamy, ANL |
12:30PM | Lunch | |
1:30PM | Research talk | Sandeep Madireddy, ANL |
2:00PM | AI Testbed (talk/hands on) | Sid Raskar, PNNL |
3:00PM | LLM inference (talk/hands on) | Sid Raskar, PNNL |
3:50PM | Break | |
4:20PM | Training LLMs at Scale (talk/hands on) | Shilpika, ANL |
5:20PM | Workflow management tools to couple simulation and AI (talk/hands on) | Christine Simpson, ANL |
6:30PM | Dinner |
At the beginning of the day, we will temporarily split into two groups. Attendees can choose between Introduction to deep learning and Profiling deep learning.
The "Introduction to deep learning" session will rely on Jupyter Notebooks which are targeted for running on Google's Colaboratory Platform or ALCF JupyterHub. The Colab platform gives the user a virtual machine in which to run Python codes including machine learning codes. The VM comes with a preinstalled environment that includes most of what is needed for these tutorials.
The other sessions involve Python scripts executed on the Aurora and AI Testbed platforms at ALCF.
- Queue: ATPESC (
-q ATPESC
) - Project/Allocation: ATPESC2025 (
-A ATPESC2025
) - Shared directories:
- Aurora:
/flare/ATPESC2025
- Polaris:
/eagle/projects/ATPESC2025
- Aurora:
Google Colab involves running Jupyter notebooks, which you will also be using next week.
Do the following before you come to the tutorial:
- You need a Google Account to use Colaboratory
- Go to Google's Colaboratory Platform
- You should see this page
- Now you can open the
File
menu at the top left and selectOpen Notebook
which will open a dialogue box. - Select the
GitHub
tab in the dialogue box. - From here you can enter the url for the github repo:
https://github.com/argonne-lcf/ATPESC_MachineLearning
and hit<enter>
. - This will show you a list of the Notebooks available in the repo. When you select a notebook from this list it will create a copy for you in your Colaboratory account (all
*.ipynb
files in the Colaboratory account will be stored in your Google Drive). - To use a GPU in the notbook select
Runtime
->Change Runtime Type
and select an accelerator.
For the AI Testbed hands on you will need a Cerebras Inference API key. Follow these instructions on your computer to setup Cerebras Inference API key.
- Visit https://cloud.cerebras.ai to sign up for an account
- Create an API key by navigating to "API Keys" on the left nav bar.
- Set your API key as an environment variable. You can do this by running the following command in your terminal:
export CEREBRAS_API_KEY="your-api-key-here"
For the Training LLMs at Scale session, you will need a Wandb api_key. Visit https://docs.wandb.ai/quickstart/ to sign-up and get the key.