Sparse Flow (Node) paper and v2.2.0 release

lucaslie · Nov 16, 2022 · 14b392c · 14b392c
1 parent 2e59069
commit 14b392c
Show file tree

Hide file tree

Showing 211 changed files with 16,859 additions and 3,126 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,9 @@
-# Neural Network Pruning
-[Lucas Liebenwein](https://people.csail.mit.edu/lucasl/), 
-[Cenk Baykal](http://www.mit.edu/~baykal/),
-[Alaa Maalouf](https://www.linkedin.com/in/alaa-maalouf/),
-[Igor Gilitschenski](https://www.gilitschenski.org/igor/), 
-[Dan Feldman](http://people.csail.mit.edu/dannyf/),
-[Daniela Rus](http://danielarus.csail.mit.edu/)
+# torchprune
+Main contributors of this code base:
+[Lucas Liebenwein](http://www.mit.edu/~lucasl/),
+[Cenk Baykal](http://www.mit.edu/~baykal/).
+
+Please check individual paper folders for authors of each paper.
 
 <p align="center">
   <img src="./misc/imgs/pruning_pipeline.png" width="100%">
@@ -15,10 +14,11 @@ This repository contains code to reproduce the results from the following
 papers: 
 | Paper | Venue | Title & Link | 
 | :---: | :---: | :---         |
+| **Node** | NeurIPS 2021 | [Sparse Flows: Pruning Continuous-depth Models](https://proceedings.neurips.cc/paper/2021/hash/bf1b2f4b901c21a1d8645018ea9aeb05-Abstract.html) |
 | **ALDS** | NeurIPS 2021 | [Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition](https://arxiv.org/abs/2107.11442) |
 | **Lost** | MLSys 2021 | [Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy](https://proceedings.mlsys.org/paper/2021/hash/2a79ea27c279e471f4d180b08d62b00a-Abstract.html) |
 | **PFP** | ICLR 2020 | [Provable Filter Pruning for Efficient Neural Networks](https://openreview.net/forum?id=BJxkOlSYDH) |
-| **SiPP** | arXiv | [SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks](https://arxiv.org/abs/1910.05422) |
+| **SiPP** | SIAM 2022 | [SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks](https://doi.org/10.1137/20M1383239) |
 
 ### Packages
 In addition, the repo also contains two stand-alone python packages that 
@@ -35,6 +35,7 @@ about the paper and scripts and parameter configuration to reproduce the exact
 results from the paper.
 | Paper | Location |
 | :---: | :---:    |
+| **Node** | [paper/node](./paper/node) |
 | **ALDS** | [paper/alds](./paper/alds) |
 | **Lost** | [paper/lost](./paper/lost) |
 | **PFP**  | [paper/pfp](./paper/pfp)   |
@@ -98,14 +99,27 @@ using the codebase.
 | --- | --- |
 | [src/torchprune/README.md](./src/torchprune) | more details to prune neural networks, how to use and setup the data sets, how to implement custom pruning methods, and how to add your data sets and networks. |   
 | [src/experiment/README.md](./src/experiment) | more details on how to configure and run your own experiments, and more information on how to re-produce the results. |
+| [paper/node/README.md](./paper/node) | check out for more information on the [Node](https://proceedings.neurips.cc/paper/2021/hash/bf1b2f4b901c21a1d8645018ea9aeb05-Abstract.html) paper. |
 | [paper/alds/README.md](./paper/alds) | check out for more information on the [ALDS](https://arxiv.org/abs/2107.11442) paper. |
 | [paper/lost/README.md](./paper/lost) | check out for more information on the [Lost](https://proceedings.mlsys.org/paper/2021/hash/2a79ea27c279e471f4d180b08d62b00a-Abstract.html) paper. |
 | [paper/pfp/README.md](./paper/pfp) | check out for more information on the [PFP](https://openreview.net/forum?id=BJxkOlSYDH) paper. |
-| [paper/sipp/README.md](./paper/sipp) | check out for more information on the  [SiPP](https://arxiv.org/abs/1910.05422) paper. |
+| [paper/sipp/README.md](./paper/sipp) | check out for more information on the  [SiPP](https://doi.org/10.1137/20M1383239) paper. |
 
 ## Citations
 Please cite the respective papers when using our work.
 
+### [Sparse flows: Pruning continuous-depth models](https://proceedings.neurips.cc/paper/2021/hash/bf1b2f4b901c21a1d8645018ea9aeb05-Abstract.html)
+```
+@article{liebenwein2021sparse,
+  title={Sparse flows: Pruning continuous-depth models},
+  author={Liebenwein, Lucas and Hasani, Ramin and Amini, Alexander and Rus, Daniela},
+  journal={Advances in Neural Information Processing Systems},
+  volume={34},
+  pages={22628--22642},
+  year={2021}
+}
+```
+
 ### [Towards Determining the Optimal Layer-wise Decomposition](https://arxiv.org/abs/2107.11442)
 ```
 @inproceedings{liebenwein2021alds,
@@ -140,12 +154,16 @@ url={https://openreview.net/forum?id=BJxkOlSYDH}
 }
 ```
 
-### [SiPPing Neural Networks](https://arxiv.org/abs/1910.05422)
+### [SiPPing Neural Networks](https://doi.org/10.1137/20M1383239) (Weight Pruning)
 ```
-@article{baykal2019sipping,
-title={SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks},
-author={Baykal, Cenk and Liebenwein, Lucas and Gilitschenski, Igor and Feldman, Dan and Rus, Daniela},
-journal={arXiv preprint arXiv:1910.05422},
-year={2019}
+@article{baykal2022sensitivity,
+  title={Sensitivity-informed provable pruning of neural networks},
+  author={Baykal, Cenk and Liebenwein, Lucas and Gilitschenski, Igor and Feldman, Dan and Rus, Daniela},
+  journal={SIAM Journal on Mathematics of Data Science},
+  volume={4},
+  number={1},
+  pages={26--45},
+  year={2022},
+  publisher={SIAM}
 }
 ```
diff --git a/misc/imgs/node_overview.png b/misc/imgs/node_overview.png
diff --git a/misc/requirements.txt b/misc/requirements.txt
@@ -3,10 +3,10 @@
 -e ./src/experiment
 
 # We need those with special tags unfortunately...
--f https://download.pytorch.org/whl/torch_stable.html
-torch==1.7.1+cu110
-torchvision==0.8.2+cu110
-torchaudio===0.7.2
+-f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
+torch==1.8.2+cu111 
+torchvision==0.9.2+cu111 
+torchaudio==0.8.2 
 
 # Some extra requirements for the code base
 jupyter

diff --git a/paper/alds/param/imagenet/prune/mbv2.yaml → ...ds/param/imagenet/prune/mobilenet_v2.yaml b/paper/alds/param/imagenet/prune/mbv2.yaml → ...ds/param/imagenet/prune/mobilenet_v2.yaml
diff --git a/paper/alds/param/imagenet/retrain/mbv2.yaml → .../param/imagenet/retrain/mobilenet_v2.yaml b/paper/alds/param/imagenet/retrain/mbv2.yaml → .../param/imagenet/retrain/mobilenet_v2.yaml
diff --git a/paper/alds/script/results_viewer.py b/paper/alds/script/results_viewer.py
@@ -38,7 +38,7 @@
 TABLE_BOLD_THRESHOLD = 0.005
 
 # auto-discover files from folder without "common.yaml"
-FILES = glob.glob(os.path.join(FOLDER, "[!common]*.yaml"))
+FILES = glob.glob(os.path.join(FOLDER, "*[!common]*.yaml"))
 
 
 def key_files(item):
@@ -52,6 +52,7 @@ def key_files(item):
         "resnet18",
         "resnet101",
         "wide_resnet50_2",
+        "mobilenet_v2",
         "deeplabv3_resnet50",
     ]
 
@@ -127,6 +128,8 @@ def get_results(file, logger, legend_on):
         elif "imagenet/prune" in file:
             graphers[0]._figure.gca().set_xlim([0, 87])
             graphers[0]._figure.gca().set_ylim([-87, 5])
+        elif "imagenet/retrain/mobilenet_v2" in file:
+            graphers[0]._figure.gca().set_ylim([-5, 0.5])
         elif "imagenet/retrain/" in file:
             graphers[0]._figure.gca().set_ylim([-3.5, 1.5])
         elif "imagenet/retraincascade" in file:
@@ -317,6 +320,7 @@ def generate_table_entries(
             "resnet18": "ResNet18",
             "resnet101": "ResNet101",
             "wide_resnet50_2": "WRN50-2",
+            "mobilenet_v2": "MobileNetV2",
             "deeplabv3_resnet50": "DeeplabV3-ResNet50",
         }
 

diff --git a/paper/node/README.md b/paper/node/README.md
@@ -0,0 +1,68 @@
+# Sparse flows: Pruning continuous-depth models
+[Lucas Liebenwein*](https://people.csail.mit.edu/lucasl/), 
+[Ramin Hasani*](http://www.raminhasani.com),
+[Alexander Amini](https://www.mit.edu/~amini/),
+[Daniela Rus](http://danielarus.csail.mit.edu/)
+
+***Equal contribution**
+
+<p align="center">
+  <img align="center" src="../../misc/imgs/node_overview.png" width="80%">
+</p>
+<!-- <br clear="left"/> -->
+
+Continuous deep learning architectures enable learning of flexible 
+probabilistic models for predictive modeling as neural ordinary differential 
+equations (ODEs), and for generative modeling as continuous normalizing flows.
+In this work, we design a framework to decipher the internal dynamics of these
+continuous depth models by pruning their network architectures. Our empirical
+results suggest that pruning improves generalization for neural ODEs in
+generative modeling. We empirically show that the improvement is because
+pruning helps avoid mode- collapse and flatten the loss surface. Moreover,
+pruning finds efficient neural ODE representations with up to 98% less
+parameters compared to the original network, without loss of accuracy. We hope
+our results will invigorate further research into the performance-size
+trade-offs of modern continuous-depth models.
+
+## Setup
+Check out the main [README.md](../../README.md) and the respective packages for
+more information on the code base. 
+
+## Overview
+
+### Run compression experiments
+The experiment configurations are located [here](./param). To reproduce the
+experiments for a specific configuration, run: 
+```bash
+python -m experiment.main param/toy/ffjord/spirals/vanilla_l4_h64.yaml
+```
+
+The pruning experiments will be run fully automatically and store all the 
+results.
+
+### Experimental evaluations
+
+The [script](./script) contains the evaluation and plotting scripts to 
+evaluate and analyze the various experiments. Please take a look at each of 
+them to understand how to load the pruning experiments and how to analyze
+the pruning experiments. 
+
+Each plot and experiment presented in the paper can be reproduced this way.
+
+## Citation
+Please cite the following paper when using our work.
+
+### Paper link
+[Sparse flows: Pruning continuous-depth models](https://proceedings.neurips.cc/paper/2021/hash/bf1b2f4b901c21a1d8645018ea9aeb05-Abstract.html)
+
+### Bibtex
+```
+@article{liebenwein2021sparse,
+  title={Sparse flows: Pruning continuous-depth models},
+  author={Liebenwein, Lucas and Hasani, Ramin and Amini, Alexander and Rus, Daniela},
+  journal={Advances in Neural Information Processing Systems},
+  volume={34},
+  pages={22628--22642},
+  year={2021}
+}
+```
diff --git a/paper/node/param/cnf/cifar_multiscale.yaml b/paper/node/param/cnf/cifar_multiscale.yaml
@@ -0,0 +1,68 @@
+network:
+  name: "ffjord_multiscale_cifar"
+  dataset: "CIFAR10"
+  outputSize: 10
+
+training:
+  transformsTrain:
+    - type: RandomHorizontalFlip
+      kwargs: {}
+  transformsTest: []
+  transformsFinal:
+    - type: Resize
+      kwargs: { size: 32 }
+    - type: ToTensor
+      kwargs: {}
+    - type: RandomNoise
+      kwargs: { "normalization": 255.0 }
+
+  loss: "NLLBitsLoss"
+  lossKwargs: {}
+
+  metricsTest:
+    - type: NLLBits
+      kwargs: {}
+    - type: Dummy
+      kwargs: {}
+
+  batchSize: 200 # don't change that since it's hard-coded
+
+  optimizer: "Adam"
+  optimizerKwargs:
+    lr: 1.0e-3
+    weight_decay: 0.0
+
+  numEpochs: 50
+  earlyStopEpoch: 0
+  enableAMP: False
+
+  lrSchedulers:
+    - type: MultiStepLR
+      stepKwargs: { milestones: [45] }
+      kwargs: { gamma: 0.1 }
+
+file: "paper/node/param/directories.yaml"
+
+retraining:
+  startEpoch: 0
+
+experiments:
+  methods:
+    - "ThresNet"
+    - "FilterThresNet"
+  mode: "cascade"
+
+  numRepetitions: 1
+  numNets: 1
+
+  plotting:
+    minVal: 0.02
+    maxVal: 0.85
+
+  spacing:
+    - type: "geometric"
+      numIntervals: 12
+      maxVal: 0.80
+      minVal: 0.05
+
+  retrainIterations: -1
diff --git a/paper/node/param/cnf/mnist_multiscale.yaml b/paper/node/param/cnf/mnist_multiscale.yaml
@@ -0,0 +1,66 @@
+network:
+  name: "ffjord_multiscale_mnist"
+  dataset: "MNIST"
+  outputSize: 10
+
+training:
+  transformsTrain: []
+  transformsTest: []
+  transformsFinal:
+    - type: Resize
+      kwargs: { size: 28 }
+    - type: ToTensor
+      kwargs: {}
+    - type: RandomNoise
+      kwargs: { "normalization": 255.0 }
+
+  loss: "NLLBitsLoss"
+  lossKwargs: {}
+
+  metricsTest:
+    - type: NLLBits
+      kwargs: {}
+    - type: Dummy
+      kwargs: {}
+
+  batchSize: 200 # don't change that since it's hard-coded
+
+  optimizer: "Adam"
+  optimizerKwargs:
+    lr: 1.0e-3
+    weight_decay: 0.0
+
+  numEpochs: 50
+  earlyStopEpoch: 0
+  enableAMP: False
+
+  lrSchedulers:
+    - type: MultiStepLR
+      stepKwargs: { milestones: [45] }
+      kwargs: { gamma: 0.1 }
+
+file: "paper/node/param/directories.yaml"
+
+retraining:
+  startEpoch: 0
+
+experiments:
+  methods:
+    - "ThresNet"
+    - "FilterThresNet"
+  mode: "cascade"
+
+  numRepetitions: 1
+  numNets: 1
+
+  plotting:
+    minVal: 0.02
+    maxVal: 0.85
+
+  spacing:
+    - type: "geometric"
+      numIntervals: 12
+      maxVal: 0.80
+      minVal: 0.05
+
+  retrainIterations: -1
diff --git a/paper/node/param/directories.yaml b/paper/node/param/directories.yaml
@@ -0,0 +1,6 @@
+# relative directories from where main.py was called
+directories:
+  results: "./data/node/results"
+  trained_networks: null
+  training_data: "./data/training"
+  local_data: "./local"