(2023). Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking. In ArXiv preprint 2311.18817.


(2023). Understanding Incremental Learning of Gradient Descent -- A Fine-grained Analysis of Matrix Sensing. In ICML 2023.

(2022). Minimax Optimal Kernel Operator Learning via Multilevel Training. In ICLR 2023 (spotlight).

(2022). Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power. In NeurIPS 2022.

(2021). Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis. In NeurIPS 2021.

(2020). Improved analysis of clipping algorithms for non-convex optimization. In NeurIPS 2020.

