Publications

(2025). It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation' . In arXiv preprint arXiv:2507.02275.

ArXiv

(2025). Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning. In arXiv preprint arXiv:2506.10378.

PDF Cite ArXiv

(2025). Solving Inequality Proofs with Large Language Models. In arXiv preprint arXiv:2506.07927.

PDF Cite ArXiv Website Twitter

(2023). Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking. In ICLR 2024.

Cite ArXiv

(2023). Understanding Incremental Learning of Gradient Descent -- A Fine-grained Analysis of Matrix Sensing. In ICML 2023.

PDF Cite ArXiv Poster Slides

(2022). Minimax Optimal Kernel Operator Learning via Multilevel Training. In ICLR 2023 (spotlight).

PDF Cite ArXiv Slides Poster

(2022). Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power. In NeurIPS 2022.

PDF Cite ArXiv

(2021). Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis. In NeurIPS 2021.

PDF Cite ArXiv

(2020). Improved analysis of clipping algorithms for non-convex optimization. In NeurIPS 2020.

PDF Cite ArXiv