We are a research group at the Swiss Federal Institute of Technology (EPFL)'s School of Computer and Communication Sciences (IC). Our research interests are broadly in AI and machine learning, and specifically in computer vision, multimodal learning, and embodied/active vision.
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks R. Ramachandran, A. Garjani, R. Bachmann, A. Atanov*, O. F. Kar*, A. Zamir* ICLR 2026.
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length R. Bachmann*, J. Allardice*, D. Mizrahi*, E. Fini, O.F. Kar, E. Amirloo, A. El-Nouby, A. Zamir, A. Dehghan ICML 2025.
Controlled Training Data Generation with Diffusion Models T. Yeo*, A. Atanov*, H. Benoit†, A. Alekseev†, R. Ray, P. E. Akhoondi, A. Zamir TMLR 2025.
Large (Vision) Language Models are Unsupervised In-Context Learners A. Gadetsky*, A. Atanov*, Y. Jiang*, Z. Gao, G. H. Mighan, E. Amirloo, A. Zamir, M. Brbic ICLR 2025.
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities R. Bachmann*, O.F. Kar*, D. Mizrahi*, A. Garjani, M. Gao, D. Griffiths, J. Hu, A. Dehghan, A. Zamir NeurIPS, 2024.
Solving Vision Tasks using Simple Photoreceptors instead of Cameras A. Atanov*, J. Fu*, R. Singh*, I. Yu, A. Spielberg, A. Zamir ECCV, 2024.
ViPer: Visual Personalization of Generative Models via Individual Preference Learning S. Salehi, M. Shafiei, T. Yeo, R. Bachmann, A. Zamir ECCV, 2024.
BRAVE: Broadening the visual encoding of vision-language models O.F. Kar, A. Tonioni, P. Poklukar, A. Kulshrestha, A. Zamir, F. Tombari ECCV, 2024. [ Oral ]
Unraveling the Key Components of OOD Generalization via Diversification H. Benoit*, L. Jiang*, A. Atanov*, O.F. Kar, M. Rigotti, A. Zamir ICLR, 2024.
4M: Massively Multimodal Masked Modeling D. Mizrahi*, R. Bachmann*, O.F. Kar, T. Yeo, M. Gao, A. Dehghan, A. Zamir NeurIPS, 2023. [ Spotlight ]
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback T. Yeo, O.F. Kar, Z. Sodagar, A. Zamir ICCV, 2023.
Modality-invariant Visual Odometry for Embodied Navigation M. Memmel, R. Bachmann, A. Zamir CVPR, 2023.
Task Discovery: Finding the Tasks that Neural Networks Generalize on A. Atanov, A. Filatov, T. Yeo, A. Sohmshetty, A. Zamir NeurIPS, 2022.
PALMER: Perception-Action Loop with Memory Reorganization for Planning O. Beker, M. Mohammadi, A. Zamir NeurIPS, 2022.
MultiMAE: Multi-modal Multi-task Masked Autoencoders R. Bachmann*, D. Mizrahi*, A. Atanov, A. Zamir ECCV, 2022.
3D Common Corruptions and Data Augmentation O.F. Kar, T. Yeo, A. Atanov, A. Zamir CVPR, 2022. [ Oral ]
CLIPasso: Semantically-Aware Object Sketching Y. Vinker, E. Pajouheshgar, J. Y. Bo, R. Bachmann, A. H. Bermano, D. Cohen-Or, A. Zamir, A. Shamir Transactions on Graphics (Proceedings of SIGGRAPH), 2022. [ Best Paper Award ]
Robustness via Cross-Domain Ensembles T. Yeo*, O.F. Kar*, A. Zamir ICCV, 2021. [ Oral ]
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans A. Eftekhar*, A. Sax*, R. Bachmann, J. Malik, A. Zamir ICCV, 2021.
Robust Learning Through Cross-Task Consistency A. Zamir*, A. Sax*, T. Yeo, O. Kar, N. Cheerla, R. Suri, J. Cao, J. Malik, L. Guibas CVPR, 2020. [ Best Paper Award Nominee ]
Which Tasks Should Be Learned Together in Multi-task Learning? T. Standley, A. Zamir, D. Chen, L. Guibas, J. Malik, S. Savarese ICML, 2020.
Side-tuning: Network Adaptation via Additive Side Networks J. Zhang, A. Sax, A. Zamir, L. Guibas, J. Malik ECCV, 2020. [ Spotlight ]
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation B. Chen, S. Sax, L. Pinto, F. Lewis, I. Armeni, S. Savarese, A. Zamir, J. Malik CoRL, 2020.