Bolei Zhou

Assistant Professor at CUHK
Office: Room 717, Ho Sin-Hang Engineering Building, CUHK
Email:

CV • Google Scholar • Github • Linkedin • Zhihu

About Me

I am an Assistant Professor at the Information Engineering Department of The Chinese University of Hong Kong. New homepage is at http://bzhou.ie.cuhk.edu.hk/.
My research is on computer vision and machine learning, particularly visual scene understanding and interpretable AI systems.
My representative work includes the large-scale scene benchmarks Places Database and Places-CNN, ADE20K Dataset, as well as neural network interpretation methods Class Activation Mapping (CAM) and Network Dissection. Recently I investigate video scene understanding, with work Temporal Relational Reasoning and Moments in Time.

Updates

Please visit new webpage at http://bzhou.ie.cuhk.edu.hk/.
[2018/09/14] Temporal Relation Network is covered by MIT News as Today's Spotlight.
[2018/07/03] The videos for CVPR'18 Tutorial on Interpretable Machine Learning is available.
[2018/05/04] I defended my Ph.D. thesis. Defense talk titled Interpretable Representation Learning for Visual Intelligence is available in Youtube or Downlad.
[2018/04/09] PyTorch implementation of scene parsing networks trained on ADE20K is released.
[2017/12/09] I will organize the Tutorial on Interpretable Machine Learning at CVPR'18.
[2017/12/03] Moments in Time Dataset with 1 million videos from 339 actions is online!
[2017/12/03] Latest work on temporal reasoning in videos. Relation is all you need.
[2017/12/02] I am invited as panelist for the NIPS'17 Interpretable Machine Learning Symposium.

Selected Projects and Publications

	Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba Revisiting the Importance of Individual Units in CNNs via Ablation. arXiv:1806.02891, 2018. [arXiv]
	Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba Temporal Relational Reasoning in Videos. European Conference on Computer Vision (ECCV), 2018 (arXiv:1711.08496). [PDF][arXiv][Webpage][Demo Video][Code][MIT News]
	Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba. Interpretable Basis Decomposition for Visual Explanation.* European Conference on Computer Vision (ECCV), 2018. [PDF][Code(soon)]
	Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun Unified Perceptual Parsing for Scene Understanding.* European Conference on Computer Vision (ECCV), 2018. [PDF][Code & Data]
	Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba. Single Image Intrinsic Decomposition without a Single Intrinsic Image. European Conference on Computer Vision (ECCV), 2018. [PDF(soon)]
	Yikang Li, Wanli Ouyang, Bolei Zhou, Yawen Cui, Jianping Shi, Xiaogang Wang. Factorizable Net: An Efficient Subgraph based Framework for Scene Graph Generation. European Conference on Computer Vision (ECCV), 2018. [PDF]
	Jimmy Wu, Bolei Zhou, Rebecca Russell, Vincent Kee, Syler Wagner, Mitchell Hebert, Antonio Torralba, and David M.S. Johnson Real-Time Object Pose Estimation with Pose Interpreter Networks. International Conference on Intelligent Robots (IROS), 2018. [PDF][Code][Video]
	Bolei Zhou Interpretable Representation Learning for Visual Intelligence. PhD thesis submitted to MIT EECS, May 17, 2018. Committee: Antonio Torralba, Aude Oliva, Bill Freeman. [PDF][Defense Talk]
	Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Vasilis Syrgkanis, Lester Mackey, and Genevieve Patterson DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation. arXiv:1805.12323, 2018. [arXiv]
	Jimmy Wu, Diondra Peck, Scott Hsieh, Vandana Dialani, Constance D. Lehman, Bolei Zhou, Vasilis Syrgkanis, Lester Mackey, and Genevieve Patterson Expert identification of visual primitives used by CNNs during mammogram classification. SPIE Medical Imaging, 2018. [PDF]
	Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, and Xiaogang Wang Visual Question Generation as Dual Task of Visual Question Answering. Computer Vision and Pattern Recognition (CVPR), 2018, spotlight (arXiv:1709.07192). [arXiv][Webpage][Code]
	Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu Recurrent Residual Module for Fast Inference in Videos. Computer Vision and Pattern Recognition (CVPR), 2018. [arXiv]
	Mathew Monfort, Bolei Zhou, Sarah Adel Bargal, Alex Andonian, Tom Yan, Kandan Ramakrishnan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl Vondrick, Aude Oliva. Moments in Time Dataset: one million videos for event understanding. under revision of TPAMI, arXiv:1801.03150, 2018. [Tech Report][Website][Code+Model]
	Bolei Zhou, David Bau, Aude Oliva, and Antonio Torralba. Interpreting Deep Visual Representations via Network Dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 2018 (arXiv:1711.05611, 2017). -indicates equal contributions* [arXiv][Webpage][Code]
	Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, July 2017. [PDF][Places2 Dataset][Challenge Page][Places365 CNN models][Demo]
	Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, and Xiaogang Wang Scene Graph Generation from Objects, Phrases and Region Captions. International Conference on Computer Vision (ICCV), 2017. [PDF][Code]
	Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, and Antonio Torralba Open Vocabulary Scene Parsing. International Conference on Computer Vision (ICCV), 2017. (arXiv:1703.08769). [PDF][arXiv][Webpage]
	Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso and Antonio Torralba. Scene Parsing through ADE20K Dataset. Computer Vision and Pattern Recognition (CVPR), 2017. [PDF][Dataset][Benchmark Page][Challenge Page][Toolkit&Code][Demo]
	David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network Dissection: Quantifying Interpretability of Deep Visual Representations. Computer Vision and Pattern Recognition (CVPR), 2017. as oral. -indicates equal contribution.* [PDF][arXiv][webpage][code][Talk Video]
	Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. Person Search with Natural Language Description. Computer Vision and Pattern Recognition (CVPR), 2017. [PDF][Dataset]
	J. Wong, V. Kee, T. Le, S.Wagner, G. Mariottini, A. Schneider, L. Hamilton, R. Chiaplkatty, M. Herbert, D. Johnson J. Wu, B. Zhou, and A. Torralba. SegICP: Integrated Deep Semantic Segmentation and Pose Estimation. IEEE International Conference on Intelligent Robots and Systems (IROS'17) as Oral* (arXiv:1703.01661)* [PDF]
	Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso and Antonio Torralba. Semantic Understanding of Scenes through ADE20K Dataset. arXiv:1608.05442, 2016. [PDF][Dataset][Benchmark Page][Challenge Page][Toolkit&Code][Demo]
	Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba Learning Deep Features for Discriminative Localization. Computer Vision and Pattern Recognition (CVPR), 2016 (arXiv:1512.04150) [PDF] [arXiv][Project Page][Video of CNN shifting its attention]
	Donglai Wei, Bolei Zhou, Antonio Torralba, William Freeman Understanding Intra-Class Knowledge inside CNN. arXiv:1507.02379, 2015. [PDF][Page][Code]
	Bolei Zhou, Yuandong Tian, Sainbar Suhkbaatar, Arthur Szlam, Rob Fergus Simple Baseline for Visual Question Answering. arXiv:1512.02167, 2015. [PDF][Demo][Code]
	Zi Wang, Bolei Zhou, Stephanie Jegelka Optimization as Estimation with Gaussian Processes in Bandit Settings. Artificial Intelligence and Statistics (AISTATS'16) as oral, 2016. (arXiv:1510.06423) [PDF][Project][Code]
	Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba Object Detectors Emerge in Deep Scene CNNs. International Conference on Learning Representations (ICLR) as oral, 2015.(arXiv:1412.6856) [PDF][Project Page][More Visualization][Code]
	Bolei Zhou, Vignesh Jagadeesh, and Robinson Piramuthu ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections. Computer Vision and Pattern Recognition (CVPR), 2015.(arXiv:1411.5319) [PDF][Project Page & Demo]
	Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems 27 (NIPS) spotlight, 2014. [PDF][Project Page][Demo]
	Bolei Zhou, Liu Liu, Aude Oliva and Antonio Torralba Recognizing City Identity via Attribute Analysis of Geo-tagged Images. Proceedings of 13th European Conference on Computer Vision (ECCV) , 2014. [PDF][Project Page] Liu Liu, Bolei Zhou, Jinhua Zhao, Brent D. Ryan C-IMAGE: City Cognitive Mapping through Geo-tagged Photos GeoJournal, Springer, 2016. [PDF]
	Bolei Zhou, Xiaoou Tang, Hepeng Zhang and Xiaogang Wang Measuring Crowd Collectiveness. IEEE transaction on Pattern Analysis and Machine Intelligence (PAMI), 2014. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) oral, 2013. [PDF(CVPR)][PDF(TPAMI)][Project Page]
	Bolei Zhou, Xiaoou Tang and Xiaogang Wang. Learning Collective Crowd Behaviors with Dynamic Pedestrian-Agents. International Journal of Computer Vision (IJCV), 2014. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) oral, 2012. [PDF(CVPR)] [PDF(IJCV)][Project Page]
	Bolei Zhou, Xiaoou Tang and Xiaogang Wang. Coherent Filtering: Detecting Coherent Motions from Crowd Clutters. In Proceedings of 12th European Conference on Computer Vision (ECCV), 2012. [PDF] [Project Page]
	Bolei Zhou, Xiaogang Wang and Xiaoou Tang. Random Field Topic Model for Semantic Region Analysis in Crowded Scenes from Tracklets. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [PDF][Project Page]
	Go to Google Scholar for full publication list

Honors

Facebook Fellowship Award 2016-2018
BRC & LING Fellowship Award 2017
MIT Ho-Ching and Han-Ching Fund Award 2013
MIT Greater China Computer Science Fellowship 2013
CUHK Outstanding Thesis Award 2012
Microsoft Research Asia Fellowship 2011

Media coverage

VentureBeat: MIT CSAIL designs AI that can track objects over time.
MIT News: Helping computers fill in the gaps between video frames.
Quartz: Track AI decisions back to single neurons.
MIT News: Peering into neural networks.
TechCrunch: A fully automated way to peer inside neural networks.
MIT CSAIL News: Scene parsing and scene classification challenges.
TechCrunch and MIT News: Object detectors emerge in CNNs.

Datasets & Benchmarks

Moments in Time: 1-million video dataset for video scene understanding.
Places Challenge 2017: instance segmentation, scene parsing, and semantic boundary detection
Places Database: 10 million image database for scene recognition
Mini-Places: An educational tool for deep learning in computer vision
MIT Scene Parsing Benchmark: full scene semantic segmentation dataset
ADE20K dataset: Pixel-wise annotated dataset for semantic scene understanding

Open-source softwares

Semantic Segmentation in PyTorch: an efficient implementation of scene parsing networks trained on ADE20K in PyTorch.
Network Dissection: Network visualization and annotation toolkit.
CNN Visualizer: Neuron Visualization and Segmentation toolkit for deep CNNs.
Places365-CNNs: scene recognition networks on Places365 with docker container.
iBOWIMG: visual question answering baseline code in Torch.
CAM: algorithm package for generating class-specific saliency map for CNN.
GoSpark: implementation of Spark, an in-memory distributed computation framework, in Golang [Report].
gKLT tracker: algorithm package for extracting trajectories from videos with KLT features.
Collectiveness descriptor: a metric for crowd system order and the simulation of Self-Driven Particles.
Coherent filtering: algorithm package for detecting coherent motions in time-series data.
Random field topic model: C++ implementation of MRF on LDA with Gibbs sampling inference.

Professional activities

Organizer of the Tutorial on Interpretable Machine Learning for Computer Vision at CVPR'18.
Panelist for the NIPS'17 Interpretable Machine Learning Symposium.
Co-Organizer of the Joint COCO and Places Recognition Challenge Workshop at ICCV'17.
Organizer of the Places Challenge 2017 at ICCV'17.
Organizer of the Tutorial on Deep Learning for Objects and Scenes at CVPR'17.
Organizer of the 5th Scene Understanding Workshop at CVPR'17
Organizer of the Places365 Challenge 2016 and Scene Parsing Challenge 2016 at ECCV'16.
Co-organizer of ILSVRC'16 challenge workshop at ECCV'16
Organizer of the Places Challenge 2015 in ICCV'15.
Conference reviewer for ICCV'17, BMVC'17, CVPR'17, ACCV'16, ECCV'16, CVPR'16, ICCV'15, CVPR'15, ECCV'14, ACCV'14.
Journal reviewer for TPAMI, IJCV, The visual computer, Computer Vision and Image Understanding, IEEE Trans on NNLS, IEEE Trans on IP, IEEE Trans on SMC, IEEE Trans on CSVT, PLOS ONE, Pattern Recognition.
Teaching Assistant for MIT course Advances in Computer Vision. In the course a Mini-Places Scene Classification Challenge is hosted for educational purpose.
Chair of the MIT Vision Seminar.
Internships at Facebook AI Research, eBay Research Labs, Microsoft Research Asia, and Barclays Capital.

Talks

Interpreting Deep Visual Representations at Workshop on Visualization for Deep Learning, ICML'17, Sydney.
Network Dissection: Quantifying the Interpretability of Deep Visual Representations, CVPR'17, Hawaii.
Tutorial on the Deep Learning for Objects and Scenes, CVPR'17, Hawaii.
Understand and Leverage the Internal Representations of CNNs at Tufts, Cornell Tech, Harvard.
Challenges in Deep Sceen Understanding at ECCV'16 ILSVRC and COCO joint workshop, Oct. 2016, Amsterdam.
Object Detectors Emerge in Deep Scene CNNs at ICLR'15, May 2015, San Diego.
Learning Deep Features for Scene Recognition at NIPS'14, Dec. 2014, Montreal.
Measuring Crowd Collectiveness at CVPR'13, June 2013, Portland.
Understanding Crowd Behaviors at CVPR'12, June 2012, Rhode Island.

Personal interests

blogs:Urban Computation,Crowd Behavior & Psychology
books, rock climbing (5.11C,V6), juggling (recently), bass player (former lead guitarist)