Zeyuan Allen-Zhu's Home Page

Education & Work

(I've recently moved my homepage to this location, so please bear with me as I work out any bugs or issues that may arise.)

Zeyuan on the photo day of MSR AI
@ Microsoft Research, 2018

Meta / FAIR Labs (2022 – present)
- AI research scientist, in Seattle/Bellevue office
Mirosoft Research Redmond (2017 – 2022)
- senior -> principal researcher
PRINCETON and IAS (2015 - 2017)
- postdoc (hosted by Elad Hazan and Avi Wigderson)
MIT, Csail (2010 – 2015)
- Sc.D. in computer science (advised by Jon Kelner and Silvio Micali)
- M.S. in computer science (advised by Silvio Micali)
Tsinghua, Department of Physics (2006 – 2010)
- B.S. in mathematics and physics (summa cum laude)
- Chi-Sun Yeh prize for physics major
NFLS (2000 – 2006)
- high school diplomat with English major

Personal Information

Our 12 scaling laws (for LLM knowledge capacity) are out: https://t.co/qNTarfEb3l. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions pic.twitter.com/gF2O3LivEW
— Zeyuan Allen-Zhu (@ZeyuanAllenZhu) April 9, 2024

Research Interests

My current research focuses on investigating the physics of language models and AI in a broader sense. This involves designing experiments to elucidate the underlying fundamental principles governing how transformers/GPTs learn to accomplish diverse AI tasks. By probing into the neurons of the pre-trained transformers, my goal is to uncover and comprehend the intricate (and sometimes surprising!) physical mechanisms behind large language models. Our first paper in this series can be found on arxiv.

Before that, I work on the mathematics of deep learning. That involves developing rigorous theoretical proofs towards the learnability of neural networks, in ideal and theory-friendly settings, to explain certain mysterious phenomena observed in deep learning. In this area, our paper on ensemble / knowledge distillation received some award from ICLR'23; although I am most proud of our COLT'23 result that provably shows why deep learning is actually deep –– better than shallow learners such as layer-wise training, kernel methods, etc.

In my past life, I have also worked in machine learning, optimization theory, and theoretical computer science.

Conferences

ICML
ICLR
NeurIPS
STOC
FOCS
SODA
ICALP
SoCG
WSDM
ICDM
CIKM
EC
ITCS
WINE
POPL

Journals

Email

Service

grant review: NSF ('18 '17).
conf review: FOCS'19 (PC), ICML'19, SODA'19, NIPS'18, SODA'18, COLT'18, STOC'18, ITCS'17 (PC), NIPS'17, ICML'17, SODA'17, NIPS'16, ICML'16, STOC'16, STOC'15, WWW'15, ICALP'14, SODA'14, SODA'13, and more.
journal review: JMLR'18, MOR'18, JMLR'16, SIIMS'15, JMLR'14, JFA'14, TIST'13, JMLR'13, TC'13, and more.

Some Awards

In algorithm competitions, I was fortunate to win a few awards in my past life, including two IOI gold medals, a USACO world champion, an ACM/ICPC world-final gold medal, a Google Codejam world runner-up, and a USA MCM Top Prize.

In research, I used to be supported by a Microsoft Young Fellow Award, a Simons Student Award and a Microsoft Azure Research Award.

For a full list, click here.

Zeyuan ALLEN-ZHU

Pages