Visual Intelligence and Learning Laboratory (VILAB)

We are a research group at the Swiss Federal Institute of Technology (EPFL)'s School of Computer and Communication Sciences. Our research focus is broadly on Computer Vision, Machine Learning, and Perception-for-Robotics.

Group Members



Highlighted Recent Projects

Robust Learning Through Cross-Task Consistency
A. Zamir*, A. Sax*, T. Yeo, O. Kar, N. Cheerla, R. Suri, J. Cao, J. Malik, L. Guibas
CVPR, 2020. [Best Paper Award Nominee]
[project website] [live demo] [code] [PDF] [slides]

Visual perception entails solving a wide set of tasks, e.g., object detection, depth estimation, etc. The predictions made for multiple tasks from the same image are not independent, and therefore, are expected to be ‘consistent’. We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency. The proposed formulation is based on inference-path invariance over a graph of arbitrary tasks. We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs. This framework also leads to an informative unsupervised quantity, called Consistency Energy, based on measuring the intrinsic consistency of the system. Consistency Energy correlates well with the supervised error (r=0.67), thus it can be employed as an unsupervised confidence metric as well as for detection of out-of-distribution inputs (ROC-AUC=0.95). The evaluations are performed on multiple datasets, including Taskonomy, Replica, CocoDoom, and ApolloScape, and they benchmark cross-task consistency versus various baselines including conventional multi-task learning, cycle consistency, and analytical consistency.

Which Tasks Should Be Learned Together in Multi-task Learning?
T. Standley, A. Zamir, D. Chen, L. Guibas, J. Malik, S. Savarese
ICML, 2020.
[project website] [slides] [code] [PDF]

Many computer vision applications require solving multiple tasks in real-time. A neural network can be trained to solve multiple tasks simultaneously using multi-task learning. This can save computation at inference time as only a single network needs to be evaluated. Unfortunately, this often leads to inferior overall performance as task objectives can compete, which consequently poses the question: which tasks should and should not be learned together in one network when employing multi-task learning? We study task cooperation and competition in several different learning settings and propose a framework for assigning tasks to a few neural networks such that cooperating tasks are computed by the same neural network, while competing tasks are computed by different networks. Our framework offers a time-accuracy trade-off and can produce better accuracy using less inference time than not only a single large multi-task neural network but also many single-task networks.

Side-tuning: Network Adaptation via Additive Side Networks
J. Zhang, A. Sax, A. Zamir, L. Guibas, J. Malik
ECCV, 2020. [Spotlight]
[project website] [code] [PDF]

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one -- due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others.

In this paper we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network using a simple additive process. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.


Prospective Members:

  • Prospective PhD Students: We are always looking for highly motivated and talented students. However, PhD admissions at EPFL are centralized and highly competitive. If you are seeking a PhD position, please apply to our PhD program.
  • Prospective Postdocs: Please contact us directly with your resume.