Paper Reference: https://arxiv.org/abs/2106.01342
We got AUROC of 92.9% on bank dataset with initial experiments. More experiments coming soon.
Major modules implemented in the code
- Saint Transformer
- Saint Intersample Transformer
- Embeddings for tabular data
- Mixup
- CutMix
- Contrastive Loss
- Denoising Loss
- Add cls column to dataset. 'cls' column has to be the first column as mentioned in paper
- Apply z-transform to numerical columns
- Label encode categorical columns
- Concatenate cat and num columns, with cat columns coming first, then numerical ones
- Calculate the number of categorical columns (including 'cls' column), and numerical columns. Add to config file as 'no_cat' and 'no_num'
- Calculate the number of categories in each categorical columns, as a list. Add to config file as 'cats'. 'cls' column has 1 category
- Sample function
preprocess_bank
can be used to preprocess bank dataset. It can be found insrc > dataset.py
- Save files in train, val and test csv in
data
folder
git clone https://github.com/[username]/saint.git
pip3 install -r requirements.txt
go to src > config.py
e.g To train saint_i model in self-supervised mode, run;
python main.py --model saint_i --experiment ssl
- Evaluate on more datasets
- Optimize the embedding layer for fast retrieval of embeddings
- Improve documentation
- Ahmed A. Elhag
- Aisha Alaagib
- Amina Rufai
- Amna Ahmed Elmustapha
- Jamal Hussein
- Mohammedelfatih Salah
- Ruba Mutasim
- Sewade Olaolu Ogun
(names in alphabetical order)