Rethinking Few-Shot Image Classification:
A Good Embedding Is All You Need?

Code [GitHub]
arXiv 2020 [Paper]

Teaser

Figure 1: We show a meta-testing case for 5-way 1-shot task: 5 support images and 1 query image are transformed into embeddings using the fixed neural network; a linear model (logistic regression (LR) in this case) is trained on 5 support embeddings; the query image is tested using the linear model.

Abstract

The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.

Highlights

(1) Our model learns representations by training a neural network on the entire meta-training set:
we merge all meta-training data into a single task and a neural network is asked to perform either ordinary classification or self-supervised learning, on this combined dataset; the classification task is equivalent to the pre-training phase of many meta-learning methods.

Teaser

(2) Meta-testing is done using a linear classifier:
after training, we keep the pre-trained network up to the penultimate layer and use it as a feature extractor. During meta-testing, for each task, we fit a linear classifier on the features extracted by the pre-trained network.

Teaser

(3) Self-distillation provides an additional boost:
instead of using the embedding model directly for meta-testing, we distill the knowledge from the embedding model into a new model with an identical architecture, training on the same merged meta-training set.

Teaser

Results

We show results of this surprisingly simple baseline for few-shot learning. This baseline suggests that many recent meta-learning algorithms are no better than simply learning a good representation through a proxy task, e.g., image classification.

Teaser