Jehanzeb Mirza

Jehanzeb Mirza

Jehanzeb Mirza

MIT, USA.

CV | LinkedIn | Google Scholar | Github | Email

Hi, I am Jehanzeb Mirza. I am a Postdoctoral Researcher at MIT CSAIL, in the Spoken Language Systems Group, led by Dr. James Glass. I received my Ph.D. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee.

I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding.

I am actively looking for student collaborators in the area of multi-modal learning. Please do not hesitate to write me an email, even if you just want an opinion on your work! :)

Contact

  • jmirza [at] mit.edu
  • Office: 32-G442.
  • MIT, Cambridge, USA.

Education

  • Ph.D. in Computer Vision (2021 - 2024)
    TU Graz, Austria.
  • MS in ETIT (2017 - 2020)
    KIT, Germany.
  • BS in EE (2013 - 2017)
    NUST, Pakistan

Recent News

08/25: 1 paper accepted at TMLR, 2025.
07/25: 1 paper accepted at COLM, 2025.
06/25: 2 paper accepted at ICCV, 2025.
04/25: Our workshops "Long Multi-Scene Video Foundations" and "MMFM" got accepted at ICCV 2025.
03/25: Talk at EI Seminar, MIT-CSAIL.
02/25: 2 paper accepted at CVPR, 2025 (workshops).
01/25: 3 papers accepted at ICLR, 2025.
12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2025.
11/24: I joined MIT CSAIL as a Postdoctoral Researcher.
11/24: 1 paper accepted at 3DV, 2025.
09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 papers accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2024.
10/23: Invited talk at Cohere.
10/23: Invited talk at VIS Lab, University of Amsterdam.
9/23: 1 paper accepted at NeurIPS, 2023.
9/23: Invited talk at Center for Robotics, Paris Tech.
7/23: 1 paper accepted at ICCV, 2023.
4/23: I will be attending ICVSS 2023.
3/23: 2 papers accepted at CVPR, 2023.
2/23: Reviewing for CVPR, ICCV, and TPAMI.
3/22: 2 papers accepted at CVPR, 2022.

Experience

  • Postdoctoral Researcher - MIT (Boston, USA): Multi-modal Learning with Speech/Audio, Vision, and Language. (11.24 - Present).
  • Research Assistant - TU Graz (Graz, Austria): Self-supervised learning and vision-language understanding (01.21 - 10.24).
  • Research Scientist Internship - Sony AI (Tokyo, Japan): Multimodal vision-language understanding (05.24 - 8.24)
  • Internship - Intel (Karlsruhe, Germany): Evaluating robustness of object detectors in degrading weather (03.19 - 08.20).
  • Selected Publications

    Publication thumbnail
    Teaching VLMs to Localize Specific Objects from In-context Examples
    ICCV 2025
    Publication thumbnail
    GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
    TMLR 2025
    Publication thumbnail
    Are Vision Language Models Texture or Shape Biased and Can We Steer Them?
    ICLR 2025
    [ Paper]
    Publication thumbnail
    Mining your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
    ICLR 2025
    [ Paper]
    Publication thumbnail
    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
    NeurIPS 2024
    Publication thumbnail
    Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
    ECCV 2024
    Publication thumbnail
    Towards Multimodal In-Context Learning for Vision & Language Models
    ECCVW 2024
    [ Paper]
    Publication thumbnail
    LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
    NeurIPS 2023
    Publication thumbnail
    MATE: Masked Autoencoders are Online 3D Test-Time Learners
    *M. Jehanzeb Mirza, *Inkyu Shin, *Wei Lin, Andreas Schriebl, Kunyang Sun, Jaesung Choe, Mateusz Kozinski, Horst Possegger, In So Kweon, Kun-Jin Yoon, Horst Bischof (*Equal Contribution)
    ICCV 2023
    Publication thumbnail
    ActMAD: Activation Matching to Align Distributions for Test-Time-Training
    CVPR 2023
    Publication thumbnail
    Video Test-Time Adaptation for Action Recognition
    *Wei Lin, *M. Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof (*Equal Contribution)
    CVPR 2023
    [ Paper | Code]
    Publication thumbnail
    The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
    CVPR 2022
    [ Paper | Code]