Tips, Tricks and Thoughts
- Clone the repo:
git clone https://github.com/bayoishola20/Data-Science-All.git
→ install.packages("xlsx") [To install this, run sudo apt-get install r-cran-rjava
]
→ install.packages("openxlsx") [# The above does have memory issues even for a file of few MBs. Use this instead]
→ install.packages('RColorBrewer') [For viz color]
→ install.packages('ggthemes', dependencies = TRUE) [Theme for ggplot2]
→ install.packages('gridExtra') [For creating multiple histograms on a plot]
→ install.packages('leaflet') [For leaflet maps]
→ install.packages('tidyverse') [All tidyverse packages]
→ install.packages('stringr') [Checks for a pattern]
→ install.packages("sbtools") [USGS package (https://owi.usgs.gov/R/training-curriculum/usgs-packages/) for USGS web platform for data storage]
→ install.packages("dataRetrieval") [USGS package (https://owi.usgs.gov/R/training-curriculum/usgs-packages/) for retrieving gages (hydrologic time series data) with discharge from watershed]
→ to get above two working, run sudo apt-get install libudunits2-dev libxml2-dev
→ install.packages("sf") [For "simple features" like shapefiles]
→ to get above working on terminal run sudo apt install libgdal-dev
, then install.packages(c("proj4", "rgdal", "sf")). Check gdal using gdalinfo --version
→ Get all files in a particular folder list.files("/home/bayo/Documents/Geostatistics & Geomarketing/")
- matplotlib
- D3.js
- Spatial Analysis (leaflet, GDAL, OGR)
- Inferential statistics & Probability distributions
- Parameter estimation
- Hypothesis testing
- Statistical significance
- Correlation and regression
- A/B Testings
- Maximum likelihood
- Generalized linear model
- Scikit-learn
- Supervised and unsupervised learning
- Naive Bayes
- SVM
- Decision trees
- Regression
- clustering
- Dimensionality reduction
- Validation & evaluation of ML methods
Inspired by several online resources and personal encounters. :smiles:
→ https://www.statmethods.net/input/datatypes.html
→ https://rstudio.github.io/leaflet/morefeatures.html
→ https://www.datacamp.com/community/data-science-cheatsheets
PS. Ubuntu is the OS used.