1

Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation

We show that first-order debiasing of black-box ML estimators is optimal for estimating average treatment effect.

Jikai Jin, Vasilis Syrgkanis

Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity

We study the best-achievable identification guarantees and provable identification algorithms for causal representation learning when hard interventions are not available.

Jikai Jin, Vasilis Syrgkanis

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

We investigate the “grokking” phenomenon in deep learning on some simple setups, and show that it is caused by a dichotomy of the implicit biases between the early phase and late phase during training.

Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, Wei Hu

Understanding Incremental Learning of Gradient Descent -- A Fine-grained Analysis of Matrix Sensing

We prove that GD applied to the matrix sensing problem has intriguing properties – with small initialization and early stopping, it follows an incremental/greedy low-rank learning procedure. This form of simplicity bias allows GD to recover the ground-truth, despite over-parameterization and non-convexity.

Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D. Lee

Understanding Incremental Learning of Gradient Descent -- A Fine-grained Analysis of Matrix Sensing

Minimax Optimal Kernel Operator Learning via Multilevel Training

We consider the problem of learning a linear operator between Sobolev RKHSs from noisy data. Different from its finite-dimensional counterpart where regularized least squares is optimal, we prove that estimators with a certain multilevel structure is necessary (and sufficient) to achieve optimality.

Jikai Jin, Yiping Lu, Jose Blanchet, Lexing Ying

Minimax Optimal Kernel Operator Learning via Multilevel Training

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

We provide theoretical evidence that the hardness of robust generalization may stem from the expressive power of deep neural networks. Even when standard generalization is easy, robust generalization provably requires the size of DNNs to be exponentially large.

Binghui Li, Jikai Jin, Han Zhong, John E. Hopcroft, Liwei Wang

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

Understanding Riemannian Acceleration via a Proximal Extragradient Framework

We provide an improved analysis of the convergence rates of clipping algorithms, theoretically justifying their superior performance in deep learning.

Jikai Jin, Suvrit Sra

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

We proposed the first non-asymptotic analysis of algorithms for DRO with non-convex losses. Our algorithm incorporates momentum and adaptive step size, and has superior empirical performance.

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Improved analysis of clipping algorithms for non-convex optimization

We provide an improved analysis of the convergence rates of clipping algorithms, theoretically justifying their superior performance in deep learning.

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang