Skip to content

Latest commit

 

History

History
112 lines (95 loc) · 5 KB

experiments.md

File metadata and controls

112 lines (95 loc) · 5 KB

Reproducing figures of the paper "Learning sparse features can lead to overfitting in neural networks"

L. Petrini, F. Cagnetta, E. Vanden-Eijnden, M. Wyart. @NeurIPS2022 arXiv: 2206.12314

Experiments are run using grid.

Figure 4

Neural network learning a constant function on the sphere with small L1 regularization:

  • d = 2 (here convergence is very slow, conic gradient descent [Chizat and Bach, 2018] is employed to speed it up)
python -m grid /home/results/regressionsphere --n 32 "
grun python main.py --init_w1 unitary --w1_norm1 1 --reg l1 --pofx sphere --pte 1000 --savefreq 10000
" --l 1e-6 --d 2 --dataseed 0 1 2 3 4 --ptr 1024 512 256 128 64 32 16 8  --maxstep 1e7 --h:int 10000 --conic_gd 1 --init_w2:str '1e-10'
  • d = 3
python -m grid /home/results/regressionsphere --n 32 "
grun python main.py --init_w2 zero --init_w1 unitary --pofx sphere --pte 1000 --savefreq 10000 --w1_norm1 1 --reg l1
" --l 1e-5 --d 3  --dataseed 0 1 2 3 4 --ptr 1024 512 256 128 64 32 16 8 --h:int 10000 --maxstep 1e6
  • d = 5
python -m grid /home/results/regressionsphere --n 32 "
grun python main.py --init_w2 zero --init_w1 unitary --pofx sphere --pte 1000 --savefreq 10000 --w1_norm1 1 --reg l1
" --l 1e-4 --d 5  --dataseed 0 1 2 3 4 --ptr 1024 512 256 128 64 32 16 8 --h:int 10000 --maxstep 1e5

Kernel regression of the constant on the sphere

python -m grid /home/results/krrsphere --n 4 "
grun python main_krr.py --target norm --pofx sphere --pte 10000
" --ptr 1024 512 256 128 32 16 8 --l 0 --d 2 3 5 --dataseed 0 1 2 3 4 5 6 7 8 9

Figure G.3

Neural network learning a constant function on the sphere with the alpha-trick:

  • d = 2
python -m grid /home/results/regressionsphere_alphatrick --n 12 "
grun python main.py
" --ptr 1024 512 256 128 64 32 16 8 --l 0 --d 2 --h:int 10000 --alpha 1e-6 --maxstep 1e7 --dataseed 0 1 2 3 4
  • d = 3
python -m grid /home/results/regressionsphere_alphatrick --n 12 "
grun python main.py
" --ptr 1024 512 256 128 64 32 16 8 --l 0 --d 3 --h:int 10000 --alpha 1e-25 --maxstep 1e7 --dataseed 0 1 2 3 4
  • d = 5
python -m grid /home/results/regressionsphere_alphatrick --n 12 "
grun python main.py
" --ptr 1024 512 256 128 64 32 16 8 --l 0 --d 5 --h:int 10000 --alpha 1e-50 --maxstep 1e6 --dataseed 0 1 2 3 4

The functions for counting the number of atoms can be found in .arch.counting_atoms.py.

Figure G.1

Learning a GRF on the sphere. The teacher is an ~infinite-width (H = 1e7) FCN with |.|^a activation function, a controls the GRF smoothness \nu_t = a + 1/2.

Neural network:

  • a = 1
python -m grid /home/results/regressionsphere_absteacher --n 16 "
grun python main.py --init_w1 unitary --w1_norm1 1 --reg l1 --target teacher --pofx sphere --teacher_act abs
 --pte 10000 --savefreq 10000 --h 10000 --init_w2 zero
" --ptr 1024 512 256 128 64 32 16 8 --act_power 1 --l 1e-5 --lr 0.1 --d 5 --dataseed 0 1 2 3 4 5 --maxstep 1e7
  • a = 4
python -m grid /home/results/regressionsphere_absteacher --n 16 "
grun python main.py --init_w1 unitary --w1_norm1 1 --reg l1 --target teacher --pofx sphere --teacher_act abs
 --pte 10000 --savefreq 10000 --h 10000 --init_w2 zero
" --ptr 1024 512 256 128 64 32 16 8 --act_power 4 --l 1e-5 --lr 0.3 --d 5 --dataseed 0 1 2 3 4 5 --maxstep 1e7

Kernel regression:

python -m grid /home/results/krrsphere_absteacher --n 4 "
grun python main_krr.py --pofx sphere --target teacher --teacher_act abs --pte 10000
" --ptr 8192 4096 2048 1024 512 256 128 32 16 8 --act_power 1 6 --l 0 --d 5 --dataseed 0 1 2 3 4 5 6 7 8 9

Figure 1

Experiments regarding FCNs training on images are run using the gradient flow approximation introduced in Geiger et al. 2020, the corresponding code can be found here github.com/leonardopetrini/feature_lazy.

python -m grid /home/results/fc_on_images --n 16 "grun python main.py
 --init_kernel 0 --final_kernel 0 --delta_kernel 0 --pte 5000
  --arch fc --act softplus --L 1 --act_beta 5 --max_wall 20000 --max_dout 0.1
    --loss_beta 20 --max_dgrad 1e-4 --stop_frac .99 --h 1000 --alpha 1e-6
   " --seed_trainset 0 1 2 3 4 5 6 7 8 9 --dataset:str 'mnist' 'fashion' 'cifar10' --seed_init 0 1 2 3 4 5 6 7 8 9 --ptr 10000 5000 2500 1250 512 256 128 64 32

SVC trainings are performed using sklearn.svm.SVC.

Figure 6

Image deformations are computed following Petrini et al. 2021, the code can be found at github.com/pcsl-epfl/diffeomorphism.

stability.py contains the functions to compute predictors rotation stability,

RS = rotation_stability(predictor, x, angle=10)

and relative deformation stability:

D, G = deformation_and_noise_stability(predictor, imgs)
R = D / G