Jehanzeb Mirza

Jehanzeb Mirza

Jehanzeb Mirza

MIT, USA.

CV | LinkedIn | Google Scholar | Github | Email

Hi, I am Jehanzeb Mirza. I am a Postdoctoral Researcher at MIT CSAIL, in the Spoken Language Systems Group, led by Dr. James Glass. I received my Ph.D. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee.

I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding.

I am actively looking for student collaborators in the area of multi-modal learning. Please do not hesitate to write me an email, even if you just want an opinion on your work! :)

Contact

  • jmirza [at] mit.edu
  • Office: 32-G442.
  • MIT, Cambridge, USA.

Education

  • Ph.D. in Computer Vision (2021 - 2024)
    TU Graz, Austria.
  • MS in ETIT (2017 - 2020)
    KIT, Germany.
  • BS in EE (2013 - 2017)
    NUST, Pakistan

Recent News

01/25: 3 papers accepted at ICLR, 2025.
12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2025.
11/24: I joined MIT CSAIL as a Postdoctoral Researcher.
11/24: 1 paper accepted at 3DV, 2025.
09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 papers accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.

Selected Publications

Publication thumbnail
Are Vision Language Models Texture or Shape Biased and Can We Steer Them?
ICLR 2025
[ Paper]
Publication thumbnail
Mining your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
ICLR 2025
[ Paper]
Publication thumbnail
Teaching VLMs to Localize Specific Objects from In-context Examples
Arxiv 2025
Publication thumbnail
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Arxiv 2025
Publication thumbnail
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
NeurIPS 2024
Publication thumbnail
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
ECCV 2024
Publication thumbnail
Towards Multimodal In-Context Learning for Vision & Language Models
ECCVW 2024
[ Paper]
Publication thumbnail
LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
NeurIPS 2023
Publication thumbnail
MATE: Masked Autoencoders are Online 3D Test-Time Learners
*M. Jehanzeb Mirza, *Inkyu Shin, *Wei Lin, Andreas Schriebl, Kunyang Sun, Jaesung Choe, Mateusz Kozinski, Horst Possegger, In So Kweon, Kun-Jin Yoon, Horst Bischof (*Equal Contribution)
ICCV 2023
Publication thumbnail
ActMAD: Activation Matching to Align Distributions for Test-Time-Training
CVPR 2023
Publication thumbnail
Video Test-Time Adaptation for Action Recognition
*Wei Lin, *M. Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof (*Equal Contribution)
CVPR 2023
[ Paper | Code]
Publication thumbnail
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
CVPR 2022
[ Paper | Code]