Yutong Ban   班雨桐
Postdoc Research Fellow

Ditributed Robotics Lab, CSAIL MIT &
SAIIL, Mass General Hospital, Harvard Medical School

Address: 32 Vassar St, Cambridge, MA 02139
Email: yban[at]csail.mit.edu

Google Scholar | Twitter | Github

About me

Yutong Ban is currently a postdoctoral research fellow at both Distributed Robotics Laboratory (DRL), CSAIL, MIT and Surgical Artificial Intelligence and Innovation Laboratory (SAIIL), MGH, Harvard Medical School, working with Prof. Daniela Rus and Prof. Ozanan Meireles. Prior to that, he obtained his Ph.D. at Perception team of INRIA Grenoble-Rhône-Alpes, France, under supervision of Dr. Radu Horaud. His research interests include Surgical AI, Muitl-modal perception, Probabilitic Baysian Models. His overall goal is to build the machine intelligence which assist human in different scenarios like social interactions or surgeries.

I will join Shanghai Jiao Tong University (SJTU) as an assistant professor in Fall 2023. We have a few openings for Postdocs, PhDs, Master students and Research assistants. Please reach out with your resume to apply if you are interested.


  • 07/2023: ConceptNet was accepted at IEEE TMI!
  • 05/2023: One paper was accepted at ICML 2023!
  • 03/2023: One paper was accepted at Surgical Endoscopy!
  • 01/2023: One paper was accepted at ICRA 2023!
  • 11/2022: TransCenter is accepted by IEEE TPAMI!
  • 10/2022: Serve as co-Chair of Open Souce Competition in ACM 2022 in Lisbon!
  • 09/2022: An open-source toolkit for automated surgical phase recognition is released!
  • 06/2022: Gerhard Buess Amazing Technologies Award at The European Association of Endoscopic Surgery Annual Congress (EAES).
  • 02/2022: Two papers were accepted at ICRA 2022.
  • 12/2021: Release TransCenter multi-object tracking toolkit. SOTA on MOT17 and MOT20.
  • 09/2021: Please find my previous projects in my old page.


Journal Papers

Concept Graph Neural Networks for Video Understanding and Risk Mitigation
Y. Ban, J.Eckhoff, T. Ward, D. Hashimoto, O. Meireles, D. Rus, G. Rosman;
IEEE Transactions on Medical Imaging, IEEE TMI , 2023
paper | bibtex
      title={Concept Graph Neural Networks for Surgical Video Understanding},
      author={Ban, Yutong and Eckhoff, Jennifer A and Ward, Thomas M and Hashimoto, Daniel A and Meireles, Ozanan R and Rus, Daniela and Rosman, Guy},
      journal={arXiv preprint arXiv:2202.13402},
TEsoNet: Knowledge Transfer from laparoscopic Sleeve Gastrectomy to the Ivor-Lewis Esophagectomy
J.A. Eckhoff*, Y. Ban*, G. Rosman, D.T. Müller, DA. Hashimoto, D. Rus, C. Bruns, HF Fuchs, O. Meireles;
(* indicates the equally contributed authors).
Abstract version receives EAES Gerhard Buess Amazing Technologies Award
Surgical Endoscopy, 2023
paper | bibtex
TransCenter: Transformers with Dense Queries for Multiple-Object Tracking
Y. Xu* ,Y. Ban*, G. Delorme, C. Gan, D. Rus, X. Alameda-Pineda;
IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE TPAMI 2022
(* indicates the equally contributed authors).
paper | bibtex
      title={TransCenter: Transformers with Dense Representations for Multiple-Object Tracking},
      author={Xu, Yihong and Ban, Yutong and Delorme, Guillaume and Gan, Chuang and Rus, Daniela and Alameda-Pineda, Xavier},
      journal={arXiv preprint arXiv:2103.15145},

SUPR-GAN: SUrgical PRediction GAN for Event Anticipationin Laparoscopic and Robotic Surgery
Y. Ban, G. Rosman, J.Eckhoff, T. Ward, D. Hashimoto, T. Kondo, H. Iwaki, O. Meireles, D.Rus;
IEEE Robotics and Automation Letters, RA-L & ICRA, 2022.
paper | bibtex

Artificial Intelligence Prediction of Cholecystectomy Operative Course from Automated Identification of Gallbladder Inflammation
TM Ward, DA Hashimoto, Y. Ban, G. Rosman, O. Meireles;
Surgical Endoscopy, 2022 (Best paper session in SAGES 2021)
paper | bibtex

Automated Operative Phase Identification in Peroral Endoscopic Myotomy
TM Ward, DA Hashimoto, Y. Ban, DW Rattner, H Inoue, KD Lillemoe, D. Rus, G. Rosman, O. Meireles;
Surgical Endoscopy, 2021
paper | bibtex

Computer Vision in Surgery
T.M.Ward, P. Mascagni, Y. Ban, G. Rosman, N. Padoy, O. Meireles , D.A. Hashimoto;
Elsevier Surgery, 2021.
paper | bibtex

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers
Y. Ban, X. Alameda-Pineda, L. Girin, and R. Horaud;
IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE TPAMI (2019)
paper | project | bibtex

Tracking Multiple Audio Sources with the Von Mises Distribution and Variational EM
Y. Ban, X. Alameda-Pineda, C. Evers and R. Horaud;
IEEE Signal Processing Letters, IEEE SPL (May 2019)
paper | project | bibtex

Online Localization and Tracking of Multiple Speakers in Reverberant Environments
X. Li*, Y. Ban*, L. Girin, X. Alameda-Pineda, and R. Horaud;
IEEE Journal on Selected Topics in Signal Processing, IEEE JSTSP, 2019
(* indicates the equally contributed authors)
paper | project | Kinovis Multiple-Speaker Tracking Dataset | bibtex

Conferences and Workshops

A Deep Concept Graph Network for Interaction-Aware Trajectory Prediction
Y. Ban*, X. Li*, G. Rosman, I. Gilitschenski, O. Meireles, S. Karaman, D. Rus;
IEEE International Conference on Robotics and Automation (ICRA), 2022
(* indicates the equally contributed authors)
paper | bibtex

Aggregating Long-Term Context for Learning Laparoscopic and Robot-Assisted Surgical Workflows
Y. Ban, G. Rosman, T. Ward, D. Hashimoto, T. Kondo, H. Iwaki, O. Meireles, D.Rus;
IEEE International Conference on Robotics and Automation (ICRA), 2021
paper | bibtex

How To Train Your Deep Multi-Object Tracker
Y. Xu, A. Osep, Y. Ban, R. Horaud, L. Leal-Taixe, X. Alameda-Pineda;
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
paper | project | bibtex

Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking
Y. Ban, X. Li, X. Alameda-Pineda, L. Girin , R. Horaud;
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018, Calgary, Canada
paper | project | bibtex

Tracking a Varying Number of People with a Visually-Controlled Robotic Head
Y. Ban, X. Alameda-Pineda, F. Badeig, S. Ba, R. Horaud;
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver, Canada
IEEE/RSJ IROS'17 Novel Technology Paper Award Finalist
paper | project | award | bibtex

Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking
Y. Ban, L Girin, X. Alameda-Pineda, R Horaud;
IEEE ICCV Workshops 2017, Venice, Italy
paper | project | bibtex

Tracking Multiple Persons Based on a Variational Bayesian Model
Y. Ban, S. Ba, X. Alameda-Pineda, R. Horaud;
IEEE ECCV Workshops 2016, Amsterdam, Netherlands
paper | project | bibtex


Audio-Visual Multiple-Speaker Tracking for Robot Perception
Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots
pdf | bibtex