Amir Zamir

About Me:

I am a tenure-track Assistant Professor of Computer Science at the Swiss Federal Institute of Technology Lausanne (EPFL). Before EPFL, I was at UC Berkeley, Stanford, and UCF, where I had the opportunity to work with Silvio Savarese, Jitendra Malik, Mubarak Shah, Rahul Sukthankar, and Leonidas Guibas.

My interests are broadly in computer vision, multi-modal learning, and machine learning. My lab's research aims to understand and develop general multi-modal and multi-task perception and models that can operate as part of an active agent in the real world. Examples of my work on this topic: 4M, FlexTok, Taskonomy, Gibson, Visual Morphology, and MultiMAE. I support Slow Science.

I was the AI/ML chief scientist of Aurora Solar (a Forbes AI 50 company, valued at $4B in 2022) from 2015 to 2022. I currently serve as a scientific advisor to Metamorphic Labs and Duranta.

Prospective students: You do not need to contact me. Please see here first.

PhD Students:
• Roman Bachmann
• Andrei Atanov
• Rishubh Singh
• Jason Toskov
• Kunal Pratap Singh
• Zhitong Gao
• Mingqiao Ye
• Muhammad Uzair Khattak
Former PhD Students:
• Oguzhan Kar (Now at Apple)
• Alexander Sasha Sax (co-advised with Jitendra Malik) (Now at Anthropic)
• Teresa Yeo (Now at Google DeepMind)
Interns/Undergraduate/Master's Students: full list here.

Current Teaching:
• CS-503 Visual Intelligence, Fall 2021, Spring 2023, 2024, 2025.
• CS-500: AI Product Management, Fall 2023, 2024, 2025.
• COM-304: Intelligent Systems, Spring 2024, 2025.
• ENG-615: Topics in Autonomous Robotics, Spring 2021, 2023, 2025.

Honors:

• Young Researcher Award 2022, ECCV/European Computer Vision Association. [ref]
• PAMI Mark Everingham Prize 2022. [ref]
• SIGGRAPH 2022 Best Paper Award, for CLIPasso. [ref]
• CVPR 2020 Best Paper Award Nomination, for X-Task Consistency. [ref]
• CVPR 2018 Best Paper Award, for Taskonomy. [ref]
• CVPR 2016 Best Student Paper Award, for structural-RNN. [ref]
• ELLIS Faculty Scholar. [ref]
• NVIDIA Pioneering Research Award 2018, for Gibson Environment. [ref]
• Stanford Inst. for Computational and Mathematical Engineering Seed Award. [ref]
• Winner of CVPR19 Habitat Embodied Agents Challenge. [ref]

Selected Projects:

Variable-Length Video Tokenization

4M: Multimodal Foundation Models

Variable-Length Image Tokenization

How far can a 1-pixel camera go?

Omnidata: Steerable Vision Datasets from 3D Scans

Robust Learning with Cross-Task Consistency

Taskonomy: Transfer Learning Mid-Level Vision for Robotics

Gibson Environment Visual Sim-to-Real Feedback Networks

Workshop/Conference Organization:

• Area Chair of NeurIPS 2026, ICML 2026, CVPR 2026, NeurIPS 2025, CVPR 2025, NeurIPS 2024, CVPR 2024, NeurIPS 2023, CVPR 2023, ICCV 2023, ECCV 2022, CVPR 2021, ICCV 2021, CVPR 2020.
• Computational Design of Diverse Morphologies and Sensors for Vision and Robotics tutorial, in CVPR 2024, co-instructor.
• Capturing, Interpreting & Visualizing Indoor Spaces (CIVILS) workshop, in CVPR 2023, co-organizer.
• Embodied AI workshop, in CVPR 2020, co-organizer.
• Computer Vision for Global Challenges workshop, in CVPR 2019, co-organizer.
• Self-Supervised Learning workshop, in ICML 2019, co-organizer.
• Visual Learning and Embodied Agents in Simulation Environments workshop, in ECCV 2018, co-organizer.
• Beyond Supervised Learning workshop, in CVPR 2018, ICCV 2017, co-organizer.
• Negative Results in Computer Vision workshop, in CVPR 2017, co-organizer.
• Geo-Spatial Computer Vision workshop, in CVPR 2016, co-organizer.
• THUMOS Action Recognition Challenge workshop, in ICCV 2013, ECCV 2014, CVPR 2015, co-organizer.
• 3DV 2016, Workshops and Tutorials Chair.
• Pre-2020 Teaching: Stanford CS331B: Representation Learning in Computer Vision Autumn 2016, Autumn 2017 (co-instructed with Silvio Savarese).

Selected Publications:

(Up-to-date list at Google Scholar)

• VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization,
Andrei Atanov*, Jesse Allardice*, Roman Bachmann, Oğuzhan Fatih Kar, R Devon Hjelm, David Griffiths, Peter Fu, Afshin Dehghan, Amir Zamir
In ICML, 2026 - [Spotlight]
[Website | Demo | Code]

• MODUS: Decoder-only Any-to-Any Modeling of Diverse Modalities,
Mingqiao Ye, Zhaochong An, Zhitong Gao, Xian Liu, Oğuzhan Fatih Kar, Jesse Allardice, Roman Bachmann, David Mizrahi, François Fleuret, Chuan Li, Amir Zadeh, Serge Belongie, Afshin Dehghan, Amir Zamir
In ICML, 2026
[Website | Demo | Code]

• (1D) Ordered Tokens Enable Efficient Test-Time Search,
Zhitong Gao, Parham Rezaei, Ali Cy, Mingqiao Ye, Nataša Jovanović, Jesse Allardice, Afshin Dehghan, Amir Zamir, Roman Bachmann, Oğuzhan Fatih Kar
In ICML, 2026
[Website | Demo | Code]

• Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality,
Kunal Pratap Singh*, Ali Garjani*, Rishubh Singh, Muhammad Uzair Khattak, Efe Tarhan, Jason Toskov, Andrei Atanov, Oğuzhan Fatih Kar, Amir Zamir
In ICLR, 2026
[Website | Code]

• How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks,
Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov*, Oğuzhan Fatih Kar*, Amir Zamir*
In ICLR, 2026
[Website | Code]

• FlexTok: Resampling Images into 1D Token Sequences of Flexible Length,
Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan
In ICML, 2025
[Website | Demo | Code]

• Large (Vision) Language Models are Unsupervised In-Context Learners,
Artyom Gadetsky*, Andrei Atanov*, Yulun Jiang*, Zhitong Gao, Ghazal Hosseini Mighan, Amir Zamir, Maria Brbić
In ICLR, 2025
[Website | Code]

• 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities,
Roman Bachmann*, Oğuzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir
In NeurIPS, 2024
[Website | Demo | Code]

• How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Morphology,
Andrei Atanov*, Jiawei Fu*, Rishubh Singh*, Isabella Yu, Andrew Spielberg, Amir Zamir,
In ECCV, 2024
[Website]

• ViPer: Visual Personalization of Generative Models via Individual Preference Learning ,
Sogand Salehi, Mahdi Shafiei, Roman Bachmann, Teresa Yeo, Amir Zamir,
In ECCV, 2024
[Website | Demo]

• BRAVE: Broadening the visual encoding of vision-language models,
Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari,
In ECCV, 2024
[Website]

• Unraveling the Key Components of OOD Generalization via Diversification,
Harold Benoit*, Liangze Jiang*, Andrei Atanov*, Oğuzhan Fatih Kar, Mattia Rigotti, Amir Zamir
In ICLR, 2024
[Paper]

• 4M: Massively Multimodal Masked Modeling,
David Mizrahi*, Roman Bachmann*, Oguzhan Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir
In NeurIPS, 2023 - [Spotlight]
[Website | Demo | Code]

• Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback,
Teresa Yeo, Oğuzhan Fatih Kar, Zahra Sodagar, Amir Zamir
In ICCV, 2023
[ Paper | Website]

• Modality-invariant Visual Odometry for Embodied Navigation,
Marius Memmel, Roman Bachmann, Amir Zamir
In CVPR, 2023
[ Paper | Code | Website]

• MultiMAE: Multi-modal Multi-task Masked Autoencoders,
Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir
In ECCV, 2022
[Interactive Visualizations | Live Demo | Paper | Code | Website]

• Task Discovery: Finding the Tasks that Neural Networks Generalize on,
Andrei Atanov, Andrey Filatov, Teresa Yeo, Ajay Sohmshetty, Amir Zamir
In NeurIPS, 2022
[Interactive Visualizations | Paper | Code | Website]

• PALMER: Perception-Action Loop with Memory Reorganization for Planning,
Onur Beker, Mohammad Mohammadi, Amir Zamir
In NeurIPS, 2022
[Website | Paper]

• CLIPasso: Semantically-Aware Object Sketching,
Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir
In Transactions on Graphics (Proceedings of SIGGRAPH), 2022
[Best Paper Award]
[Website | Collab | Code]

• 3D Common Corruptions and Data Augmentation,
Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, Amir Zamir
In CVPR, 2022 - [Oral]
[Website | Live Demo | Paper | Code]

• Robustness via Cross-Domain Ensembles,
Teresa Yeo*, Oğuzhan Fatih Kar*, Alexander Sax, Amir Zamir
In ICCV, 2021 - [Oral]
In Arxiv, 2021
[Website | Paper | Code]

• Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans,
Ainaz Eftekhar*, Alexander Sax*, Roman Bachmann, Jitendra Malik, Amir Zamir
In Arxiv 2021, ICCV 2021
[Live Demo | Dataset | Code | Website | Paper]

• Robust Learning Through Cross-Task Consistency,
Amir Zamir*, Alexander Sax*, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas
In CVPR, 2020 - [Best Paper Award Nominee],[Oral]
In Arxiv, 2020
[Live Demo | Visulizations | Website | Paper]

• Which Tasks Should Be Learned Together in Multi-task Learning?,
Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese
In ICML, 2020
[Website | Paper]

• Side-Tuning: Network Adaptation via Additive Side Networks,
Jeffrey Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik
In ECCV, 2020 - [Spotlight]
[Website | Paper]

• Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation,
Bryan Chen, Sasha Sax, Lerrel Pinto, Francis Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik
In Conference on Robot Learning (CoRL), 2020
[Website | Paper]

• Learning to Navigate Using Mid-level Visual Priors,
Alexander Sax, Jeffery Zhang, Bradley Emi, Amir Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik
In Conference on Robot Learning (CoRL), 2019
[Policy Visulizations | Website | Paper]

• 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera,
Iro Armeni, Zhiyang He, JunYoung Gwak, Amir Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese,
In ICCV, 2019
[Interactive Database Visualization | Website | Paper]

• Taskonomy: Disentangling Task Transfer Learning,
Amir Zamir, Alexander Sax*, William Shen*, Leonidas Guibas, Jitendra Malik, Silvio Savarese,
In CVPR, 2018 [Best Paper Award]
In IJCAI, 2019 [Invited Paper, Sister Conference Best Papers Track]
[Transfer Learning API | Live Demo | Website | Paper]

• Gibson Env: Real-World Perception for Embodied Agents,
Amir Zamir*, Fei Xia*, Jerry He*, Alexander Sax, Jitendra Malik, Silvio Savarese,
In CVPR, 2018 - [Spotlight Oral],[NVIDIA Pioneering Research Award]
[Gibson Environments | Github | Website | Paper]

• Patent: Systems and Methods for Performing Three-Dimensional Semantic Parsing of Indoor Spaces,
Iro Armeni, Ozan Sener, Amir Zamir, Martin Fischer, Silvio Savarese,
US Patent App. 5/619,422, 2017.
[Link]

• Feedback Networks,
Amir Zamir*, Te-Lin Wu*, Lin Sun, William B. Shen, Bertram Shi, Jitendra Malik, Silvio Savarese,
In CVPR, 2017.
[PDF | Project Page]

• Generic 3D Representation via Pose Estimation and Matching,
Amir Zamir, Pulkit Agrawal, Tilman Wekel, Jitendra Malik, Silvio Savarese,
In ECCV, 2016.
[PDF | 3DRepresentation website | Dataset]

• Structural-RNN: Deep Leaning on Spatio-Temporal Graphs, Ashesh Jain, Amir Zamir, Silvio Savarese, Ashutosh Saxena,
In CVPR, 2016 [Best Student Paper Award]
[PDF | Project Page ]

• 3D Semantic Parsing of Large-Scale Indoor Spaces , Iro Armeni, Ozan Sener, Amir Zamir, Martin Fischer, Silvio Savarese,
In CVPR, 2016 - [Oral] (acceptance rate ~3%)
[PDF | 3D PC Parser website (Demo, Code, Data)]

• Book: Large-Scale Visual Geo-Localization,
Amir Zamir, Asaad Hakeem, Luc Van Gool, Mubarak Shah, Richard Szeliski,
Springer, 2016 [Front Matter | Cover | Springer Page]

• The THUMOS Challenge on Action Recognition for Videos "in the Wild", Haroon Idrees, Amir Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah,
In Computer Vision and Image Understanding (CVIU), 2016 [PDF | Project Page ]

• Unsupervised Semantic Parsing of Video Collections, Ozan Sener, Amir Zamir, Silvio Savarese, Ashutosh Saxena,
In Proceedings of International Conference on Computer Vision (ICCV), 2015 [PDF | Project Page ]

• Action Recognition by Hierarchical Mid-level Action Elements, Tian Lan, Yuke Zhu, Amir Zamir, Silvio Savarese,
In Proceedings of International Conference on Computer Vision (ICCV), 2015 [PDF | Project Page | 1 min Summary]

• DaMN - Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition, Rui Hou, Amir Zamir, Rahul Sukthankar, Mubarak Shah,
In Proceedings of European Conference on Computer Vision (ECCV), 2014 [PDF | BibTeX | Project Page | 1 min Summary]

@inproceedings{DaMN_2014,
   Author = { Hou, R. and Roshan Zamir, A. and Sukthankar R. and Shah, M.},
   Booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
   Title = {{DaMN \96 Discriminative and Mutually Nearest}: Exploiting Pairwise Category Proximity for Video Action Recognition},
   Year = {2014}}

• GIS-Assisted Object Detection and Geospatial Localization, Shervin Ardeshir, Amir Zamir, Mubarak Shah,
In Proceedings of European Conference on Computer Vision (ECCV), 2014 [PDF | BibTeX | Project Page | 1 min Summary]

@inproceedings{GIS_Assisted_ECCV14,
   Author = { Ardeshir, S. and Roshan Zamir, A. and Shah, M.},
   Booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
   Title = {{GIS}-Assisted Object Detection and Geospatial Localization},
   Year = {2014}}

• GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor, Amir Zamir, Shervin Ardeshir, Mubarak Shah,
in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2014. [PDF | 1 min Summary | 20 min Presentation | BibTeX | Project Page]

@inproceedings{ZamirCVPR14,
   Author = {Roshan Zamir, A. and Ardeshir S. and Shah, M.},
   Booktitle = {27th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
   Title = {GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor},
   Year = {2014}}

• Video Classification using Semantic Concept Co-occurrences, Shayan Modiri, Amir Zamir, Mubarak Shah,
in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2014. [PDF | 1 min Summary | BibTeX | Project Page]

@inproceedings{GMCP_Classification,
   Author = {Modiri S., Roshan Zamir, A. and Shah, M.},
   Booktitle = {27th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
   Title = {Video Classification using Semantic Concept Co-occurrences},
   Year = {2014}}

• Invited Book Chapter: "Action Recognition in Realistic Sports Videos", Khurram Soomro, Amir Zamir,
in Computer Vision in Sports, Springer, 2014. [PDF | BibTeX ]

                @incollection{ActionRecognitionSports_2014Springer,
          title={Action Recognition in Realistic Sports Videos},
          author={Soomro, Khurram and  Zamir, Amir},
          booktitle={Computer Vision in Sports},
          year={2014},
          publisher={Springer}
          }

• Image Geo-localization Based on Multiple Nearest Neighbor Feature Matching using Generalized Graphs, Amir Zamir, Mubarak Shah,
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014 [Preprint PDF | BibTeX | Web Page]

@null{6710175,
author={Zamir, A.R. and Shah, M.},
journal={Pattern Analysis and Machine Intelligence, IEEE Transactions on},
title={Image Geo-localization Based on Multiple Nearest Neighbor Feature Matching using Generalized Graphs},
year={2014},
volume={PP},
number={99},
pages={1-1},
keywords={Generalized Minimum Clique Problem (GMCP);Generalized Minimum Spanning Tree (GMST);Geo-location;feature correspondence;feature matching;generalized graphs;image localization;multiple nearest neighbor feature matching},
doi={10.1109/TPAMI.2014.2299799},
ISSN={0162-8828},}

• Visual Business Recognition - A Multimodal Approach, Amir Zamir, Afshin Dehghan, Mubarak Shah,
In Proceeding of ACM International Conference on Multimedia (ACM MM), 2013 [PDF | Video | BibTeX | Project Page]

@inproceedings{ZamirACMMM13,
   Author = {Roshan Zamir, A. and Dehghan, A. and Shah M.},
   Booktitle = {Proceeding of ACM International Conference on Multimedia ({ACM MM})},
   Title = {{Visual Business Recognition} - A Multimodal Approach},
   Year = {2013}}

• GMCP-Tracker: Global Multi-object Tracking using Generalized Minimum Clique Graphs, Amir Zamir, Afshin Dehghan, Mubarak Shah,
In Proceedings of European Conference on Computer Vision (ECCV), 2012 [PDF | Project Page | 20 min Presentation | BibTeX ]

@inproceedings{ZamirECCV12,
   Author = {Roshan Zamir, A. and Dehghan, A. and Shah, M.},
   Booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
   Title = {{GMCP-Tracker}: Global Multi-object Tracking using Generalized Minimum Clique Graphs},
   Year = {2012}}

• City Scale Geo-spatial Trajectory Estimation of a Moving Camera, Gonzalo Vaca, Amir Zamir, Mubarak Shah,
in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2012 [PDF | BibTeX | Project Page]

@inproceedings{VacaZamir12,
   Author = {Vaca, G. and Roshan Zamir, A. and Shah, M.},
   Booktitle = {25th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
   Title = {City Scale Geo-spatial Trajectory Estimation of a Moving Camera},
   Year = {2012}}

• Accurate Image Localization Based on Google Maps Street View, Amir Zamir, Mubarak Shah,
In Proceedings of European Conference on Computer Vision (ECCV), 2010 [PDF | BibTeX | Project Page]

@inproceedings{Zamir10,
   Author = {Roshan Zamir, A., and Shah,  M.},
   Booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
   Title = {Accurate Image Localization Based on Google Maps Street View},
   Year = {2010}}

• Recognition of 101 human actions from videos in the wild, Khurram Soomro, Amir Zamir, Mubarak Shah,
In arXiv preprint arXiv:1212.0402, November, 2012. [PDF | BibTeX | Project Page | PDF2]

@inproceedings{UCF101,
   Author = {Soomro, k. and R. Zamir, A. and Shah, M.},
   Booktitle = {arXiv preprint arXiv:1212.0402},
   Title = {{UCF101}: A Dataset of 101 Human Actions Classes From Videos in The Wild},
   Year = {2012}}

• Automatic Detection and Tracking of Pedestrians in Videos with Various Crowd Densities, Afshin Dehghan, Haroon Idrees, Amir Zamir, Mubarak Shah,
In Proceedings of PED, June 2012 [PDF | BibTeX | Project Page]

@incollection{
year={2014},
isbn={978-3-319-02446-2},
booktitle={Pedestrian and Evacuation Dynamics 2012},
editor={Weidmann, Ulrich and Kirsch, Uwe and Schreckenberg, Michael},
doi={10.1007/978-3-319-02447-9_1},
title={Automatic Detection and Tracking of Pedestrians in Videos with Various Crowd Densities},
url={http://dx.doi.org/10.1007/978-3-319-02447-9_1},
publisher={Springer International Publishing},
keywords={Human detection; Tracking; Data association; Crowd density; Crowd analysis; Automatic surveillance},
author={Dehghan, Afshin and Idrees, Haroon and Zamir, AmirRoshan and Shah, Mubarak},
pages={3-19},
language={English}}

• Street View Challenge: Identification of Commercial Entities in Street View Imagery, Amir Zamir, Alexander Darino, Ryan Patrick, Mubarak Shah,
In Proceedings of ICMLA, 2011

Visual Intelligence & Learning Lab | School of Computer & Comm. Sciences | EPFL