Bowen Pan
panbowen0607 [at] gmail [dot] com

Bowen Pan

I am a research scientist at Apple foundation model (AFM) team. I work on multi-modal model training and RL.

I completed my Ph.D. and M.Sc. at MIT CSAIL, advised by Aude Oliva. My Ph.D. thesis focuses on efficient algorithms for the training and inference of multimodal agents. Prior to that, I obtained my B.E. from Shanghai Jiao Tong University.

Publications

[Full list]

*: equal contribution

MANZANO
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Apple foundation model team
Bowen Pan, First Author
Scaling unified multimodal understanding and generation in a single model.
Apple Intelligence Foundation Language Models
Apple Intelligence Foundation Language Models
Apple foundation model team
Bowen Pan, Core Contributor
Brain Netflix
Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
Camilo Fosco*, Benjamin Lahner*, Bowen Pan, Alex Andonian, Emilie Josephs, Alex Lascelles, Aude Oliva
Dense Training, Sparse Inference
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda
Training parameter-efficient MoE for higher decoding throughput.
LangNav
LangNav: Language as a Perceptual Representation for Navigation
Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
An LLM agent for vision-and-language navigation with language perception.
HoloAssist
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World
Xin Wang*, Taein Kwon*, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys
Egocentric Vision 2022/2023 Distinguished Paper Award
Multi-Moments in Time
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Argoverse 2.0
Argoverse 2.0: Next Generation Datasets for Self-Driving Perception and Forecasting
Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays
IA-RED2
IA-RED2: Interpretability-Aware Redundancy Reduction for Vision Transformer
Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva
An interpretable run-time token pruning strategy for vision transformer.
VA-RED2
VA-RED2: Video Adaptive Redundancy Reduction
Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
Cross-view Semantic Segmentation
Cross-view Semantic Segmentation for Sensing Surroundings
Bowen Pan*, Jiankai Sun*, Ho Yin Tiga Leung, Alex Andonian, Bolei Zhou
A top-down semantic scene representation for surrounding-environment perception.
Oral Presentation
Recurrent Residual Module
Recurrent Residual Module for Fast Inference in Videos
Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu

Misc

In my spare time, I play soccer and go to the gym.