This document lists the high level points that will be covered in the DSI bootcamp. Each instructor will build out the details of their section.
- High Level questions to frame the program - not to answer today
- When should I terminal/script/program/text editor/IDE?
- When should I use R v Python?
- Does this fit in memory/ Do I need more than my laptop?
- It's ok to freeze and not know what to do, that means you are about to learn. Feel the burn
- Which ml tool is right for the job?
- How do I translate a research question into a data question? How to I translate a data answer to a research answer?
- operating systems
- the shell
- Command Line Interface (CLI):
- We use the Bourne-again Shell (BASH)
- use sagemaker for unified terminal environment
- mkdir, cd, ls, history
- installation on student laptop is homework
- GUI
- version control (git/github)
- text editor v word processor v IDE (eg: vi vs. VS Code vs. Rstudio)
- grab bag
- "#" is a comment in BASH,R,Python
- Reading list:
- bryan wright's stuff
- geek-hours/shell.html
- History
- RStudio and installation
- tour / hotkeys
- projects and working diretory
- commandline from inside RStudio
- Base R essential tools
- c(...), functions how they work
- "anatomy of coding" aka syntax or grammar
- "?", "<-"
- indexing (start from 1)
- operators
- Data Frames
- details
- getting data in
- manipulating
- Simple Plots
- Tidying up
- ReadingList:
- History
- anaconda/spyder/jupyter
- tour and hotkeys
- working directory
- commandline from inside spyder
- python 3 essentials
- int,float,string,lists
- indexing (start at 0)
- functions, "anatomy"
- assignment operator and a couple others
- Data frames - import pandas as pd
- simple plot
- Load data into data frame
- Reading List:
This is language agnostic. The prompt works for R and Python. We give examples in both languages. The goal of this part is to open the world to how a data scientist operates and thinks.
- reading in data
- munging data
- plotting data
- presenting data