(2023). Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking. In ArXiv preprint 2311.18817.


(2023). Understanding Incremental Learning of Gradient Descent -- A Fine-grained Analysis of Matrix Sensing. In ICML 2023.

PDF Cite ArXiv Poster

(2022). Minimax Optimal Kernel Operator Learning via Multilevel Training. In ICLR 2023 (spotlight).

PDF Cite ArXiv Slides Poster

(2022). Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power. In NeurIPS 2022.

PDF Cite ArXiv

(2021). Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis. In NeurIPS 2021.

PDF Cite ArXiv

(2020). Improved analysis of clipping algorithms for non-convex optimization. In NeurIPS 2020.

PDF Cite ArXiv