Aonan Zhang

Aonan Zhang CV Email Google Scholar LinkedIn X


Research Scientist - Apple Foundation Model Team
I build foundation models at Apple.
My research focuses on model efficiency and data generation.

Previously at ByteDance, I worked on trillion-scale data subsampling.
At Google, I designed a sequence model for speaker diarization.
I got my Ph.D from Columbia University, working with Prof. John Paisley.
I finished my BS and MS at Tsinghua University, working with Prof. Jun Zhu.


Highlights

[2025.9] Synthetic Bootstrapped Pretraining [Arxiv]
Zitong Yang*, Aonan Zhang*, Hong Liu, Tatsu Hashimoto, Emmanuel Candès, Chong Wang, Ruoming Pang
We generate synthetic data for pre-training by building a doc-to-doc mapping from scratch, without relying on a teacher model. This approach improves data efficiency on the scale of training a 3B model on 1T tokens.

[2025.7] Apple Intelligence Foundation Language Models: Tech Report 2025 [Arxiv][Official Blog Post]
Apple Foundation Model Team
I worked on math reasoning for Apple's latest on-device and server-side foundation language models.

[2024.12] Recurrent Drafter for Fast Speculative Decoding in Large Language Models [Arxiv][Pytorch Code][TensorRT LLM]
Yunfei Cheng, Aonan Zhang, Xuanyu Zhang, Chong Wang, Yi Wang
A simple and efficient speculative decoding algorithm that delivers significantly faster inference for large language models. The algorithm is readily deployable across multiple hardware platforms.

[2024.7] Apple Intelligence Foundation Language Models [Arxiv][Official Blog Post]
Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, et al. I'm a core contributor to Apple's on-device foundation language models and scalable server models operating within Apple's Private Cloud Compute platform.

[2024.4] MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [Arxiv]
Brandon McKinzie, Zhe Gan, et al. A family of multimodal models up to 30B parameters, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. I'm a contributor of the text backbone model.

[2021.10] Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data [Arxiv]
Haiying Wang, Aonan Zhang, Chong Wang. We designed a theoretically efficient data subsampling algorithm for negative-dominant datasets, and applied it to a click-through rate dataset containing over 0.3 trillion instances.

[2019.2] Fully Supervised Speaker Diarization
[Arxiv][Github][Video][Official Google AI Blog]
Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang We introduce Unbounded Interleaved-State Recurrent Neural Networks (UIS-RNN), in which each speaker is modeled by a parameter-sharing RNN. This framework detects speaker changes on the fly, through anomalies in the RNN state transitions.


Last update: Oct 3rd, 2025