Skip to content

Latest commit

 

History

History
142 lines (87 loc) · 9.21 KB

README.md

File metadata and controls

142 lines (87 loc) · 9.21 KB
description
My learning notes. Just in time (JIT) is better than Just in Case

README

Introduction

https://noklam.ml

All things data

I am a data scientist. Recently, I find myself studying database, data structure, data pipeline way more than machine learning. To build a good model, I found the importance of writing good code to produce data with quality often triumphs a SOTA model.

Delivering the model is the job of a data scientist. Inevitably, every data scientist should somewhat be a "full-stack" data scientist.

This is a central repository for my blogs and notes

  • Blog: https://noklam.ml (Github Page) - Usually blog or notes with code with shorter articles
  • Blog: Medium (https://medium.com/@nokknocknok)
  • GitBook (Study notes mainly, I use Joplin to keep notes in markdown, am considering sync to Gitbook from time to time. I haven't figured out what's the best way to do so.)

Resource

I am generally interested in tools that increase productivity, please let me know if you have any recommendations. Here is a list of software/topics that I found useful.

Uncertainty Estimation

Uncertainty Quantification in Deep Learning

Visualization

Visualization (University of Washington)

Custom Matplotlib style for Presentation (Larger font size)

https://raw.githubusercontent.com/noklam/mediumnok/master/_demo/python-viz/presentation.mplstyle

my_style = 'https://raw.githubusercontent.com/noklam/mediumnok/master/_demo/python-viz/presentation.mplstyle'

with plt.style.context(['ggplot', my_style]):
    make_scatter_plot()
    make_line_plot()

Useful Python Tools

  • pyinstructment: for profiling python process, which is useful for optimization

  • torchsnooper -> pytorch profiling, another profiling tool which is for PyTorch, no more print x.shape anymore.

  • knockknock notification: A single line of code that get you notifications when your 10 hours model training finally done. No more starring at the progress bar.

  • colorama: Colored printing in terminal (cross platform)

  • Hypoehsis - Property-based testing, autogenerated input for unit-test.

    Reviewing (any suggestions for code metric report/analysis library are welcome!)

  • coala - coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

  • radon - Radon is a Python tool that computes various metrics from the source code

  • great_expectations - A data validation library for python integrated with Pandas/Spark/SQL

Syntax Highlight

  • lunr.js

A catalog of various machine learning topics.

Graph Neural Network Basics

Understand What is the weird D-1/2LD-1/2

  1. spectral graph theory - Why Laplacian Matrix need normalization and how come the sqrt of Degree Matrix? - Mathematics Stack Exchange
  2. spectral graph theory - Why Laplacian Matrix need normalization and how come the sqrt of Degree Matrix? - Mathematics Stack Exchange
  3. What's the intuition behind a Laplacian matrix? I'm not so much interested in mathematical details or technical applications. I'm trying to grasp what a laplacian matrix actually represents, and what aspects of a graph it makes accessible. - Quora

Supplement Chinese Reading

  1. Heat Diffusion
  2. GCN use edge to agg node information
  3. How to do batch training with GCN

Time Series Forecast

Motivation

While neural network has gain a lot of success in NLP and computer vision, there are relatively less changes for traditional time series forecasting. This repository aims to study the lastest practical technique for time series prediction, with either statistical method, machine learning, or deep neural network.

Forecasting Methods

Statistical Method

Machine Learning

Deep Neural Network

Gramian Angular Field : Transform time series into an image and use transfer learning with CNN

Prediction Interval

While forecasting accuracy is important, the prediction interval is also important and it is an area that the machine learning world has less focus on.

  • Traditional statistical forecast (ARIMA, ETS etc)
  • Bayesian Neural Network
  • Random Forest jackknife approximation
  • MCDropout (Use Dropout at inference time as variation inference)
  • Quantile Regression
  • VOGN (Optimizer weight perturbation)
  • Random Forest jackknife approximation

Python Time Series Forecasting Library

Prophet (Facebook): Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. It has build-in modeling for the Holiday effect.

pyts : state-of-the-art algorithms for time-series transformation and classification

Contribution

Feel free to send a PR or discuss by starting an issue.😁

powered by fastpages

fastpages allow me to blog directly in Notebook, so I don't have to worry how to convert into markdown anymore. I simple code and write.