- LCA: Loss Change Allocation for Neural Network Training
- Asymptotics of Wide Networks from Feynman Diagrams
- Neural networks and physical systems with emergent collective computational abilities
- Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes
- Adversarial Robustness Through Local Lipschitzness
- Lagrangian Neural Networks
- Inherent Weight Normalization in Stochastic Neural Networks
- Neural Arithmetic Units
- Information Theory, Inference and Learning Algorithms
- Intriguing properties of neural networks
- An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks
- Rigging the Lottery: Making All Tickets Winners
- Deep Information Propagation
- Exponential expressivity in deep neural networks through transient chaos
- Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
- Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function
- Mean Field Residual Networks: On the Edge of Chaos
- Mean Field Theory of Activation Functions in Deep Neural Networks
- Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
- On the Impact of the Activation Function on Deep Neural Networks Training
- Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks
- Disentangling trainability and generalization in deep learning
- A Mean Field View of the Landscape of Two-Layers Neural Networks
- A Mean Field Theory of Batch Normalization
- Statistical field theory for neural networks
- A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
- Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods
- Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
- Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
- Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- Lojasiewicz Condition
- On the distance between two neural networks and the stability of learning
- The large learning rate phase of deep learning: the catapult mechanism
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
- A Fine-Grained Spectral Perspective on Neural Networks
- The Geometry of Sign Gradient Descent
- The Break-Even Point on Optimization Trajectories of Deep Neural Networks
- Quasi-hyperbolic momentum and Adam for deep learning
- A new regret analysis for Adam-type algorithms
- Disentangling Adaptive Gradient Methods from Learning Rates
- Stochastic Flows and Geometric Optimization on the Orthogonal Group
- Adaptive Multi-level Hyper-gradient Descent
- Regularizing activations in neural networks via distribution matching with the Wasserstein metric
- Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
- Effect of Activation Functions on the Training of Overparametrized Neural Nets
- Implicit Neural Representations with Periodic Activation Functions
- Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem
- Small nonlinearities in activation functions create bad local minima in neural networks
- Tempered Sigmoid Activations for Deep Learning with Differential Privacy
- Neural Networks Fail to Learn Periodic Functions and How to Fix It
- Making Convolutional Networks Shift-Invariant Again
- GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing
- Butterfly Transform: An Efficient FFT Based Neural Architecture Design
- ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
- Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
- Learning One Convolutional Layer with Overlapping Patches
- Batch-Shaping for Learning Conditional Channel Gated Networks
- Convolutional Networks with Adaptive Inference Graphs
- The Singular Values of Convolutional Layers
- Rendering Natural Camera Bokeh Effect with Deep Learning
- Towards Learning Convolutions from Scratch
- Feature Products Yield Efficient Networks
- Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning
- ALGEBRA
- Contemporary Abstract Algebra
- Statistical Mechanics of Deep Learning
- Linear Algebra
- Linear Algebra Done Right
- A Simple Framework for Contrastive Learning of Visual Representations
- Self-supervised Label Augmentation via Input Transformations
- On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
- Structured Convolutions for Efficient Neural Network Design
- Tensor Programs III: Neural Matrix Laws
- An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
- The Hardware Lottery
- Tensor Programs II: Neural Tangent Kernel for Any Architecture
- PareCO: Pareto-aware Channel Optimization for Slimmable Neural Networks
- Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators
- Hypersolvers: Toward Fast Continuous-Depth Models
- Residual Feature Distillation Network for Lightweight Image Super-Resolution
- SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
- HyperNetworks
- Understanding the Role of Individual Units in a Deep Neural Network