PhD Student at MIT

pschro@mit.edu

Philip Schroeder

I am a PhD student at MIT in Electrical Engineering & Computer Science, advised by Dr. Jim Glass.

My work has focused on improving the reasoning capabilities of large language models (LLMs) and vision-language models (VLMs) in challenging embodied settings.

My early projects introduced recursive architectures for transformer decoding that improve the performance of LLMs and VLMs when interacting with external environments through text or video. This work led to first-author papers at NeurIPS 2025 (introducing ROVER) and NAACL 2025 (introducing THREAD).

In my most recent work during my internship at the Boston Dynamics AI Institute (now RAI Institute), we introduce SOLE-R1: a new foundation model with video-language reasoning designed for guiding on-robot reinforcement learning. In our paper, we show that SOLE-R1 significantly outperforms state-of-the-art reasoning models and enables learning over 20 unseen tasks through zero-shot online RL: robots learn without access to ground-truth rewards, success indicators, demonstrations, or task-specific tuning.

News

2026/01

New preprint - SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL.
2025/12

NeurIPS 2025 paper - ROVER: Recursive Reasoning Over Videos with VLMs for Embodied Tasks.
2025/05

Started my summer internship at the Boston Dynamics AI Institute (now RAI Institute) with Ondrej Biza.
2025/04

Talk: MIT Embodied Intelligence Seminar - "Recursive Reasoning with LLMs and VLMs"
2024/08

NAACL 2025 paper - THREAD: Thinking Deeper with Recursive Spawning.
2023/09

Started my PhD at MIT, Cambridge, MA.

Philip Schroeder

News

Selected First-Author Papers

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks

THREAD: Thinking Deeper with Recursive Spawning

Talks

Recursive Reasoning with LLMs and VLMs

THREAD: Thinking Deeper with Recursive Spawning

Philip Schroeder

News

Selected First-Author Papers

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks

THREAD: Thinking Deeper with Recursive Spawning

Talks

Recursive Reasoning with LLMs and VLMs

THREAD: Thinking Deeper with Recursive Spawning

BibTeX Citation