A Survey of Image Classification With Deep Learning in the Presence of Noisy Labels

Monica Dommaraju
The Startup
Published in
8 min readNov 1, 2020

--

Examples of Noisy Labels. Source: https://arxiv.org/pdf/1711.00583v1.pdf

The advancement of deep neural networks has placed major importance in Image Classification, Object detection, Semantic Segmentation, and others. But, they require huge amounts of labeled data to train the model. This is a very expensive process and now it has become a major problem and is practically very challenging. So, Label noise has become a common problem in many datasets. In this article, I am going to discuss numerous methods to train deep networks with label noise.

There are mainly two kinds of noise in a dataset:

  1. Feature Noise corresponds to corruption in observed features of data.
  2. Label Noise means the change of label from its true class.

Both Noises try to cause a significant decrease in performance but Label Noise is considered more harmful. It may also worsen the performance of classification. This is due to many factors.

  1. The label is unique for each data, while features are multiple.
  2. The importance of each feature varies while the label always has a significant impact.

Deep Networks are considered more robust to label noise and they tend to overfit data. Therefore, preventing DNNs to overfit noisy data is very important, especially for fail-safe applications, such as automated medical diagnosis systems.

The main problem statement for supervised learning in the presence of noisy labels is Classical supervised learning aims to find the best estimator parameters for given distribution D while iterating over D. However, in a noisy label setup, the task is still finding the best estimator parameters while working on distribution Dn. Therefore, classical risk minimization is insufficient in the presence of label noise, since it would result in optimal classifier parameters. As a result, variations of classical risk minimization methods are proposed in the literature and they will be further evaluated in the upcoming sections.

Data features, the true label of data, and the labeler characteristics are the main factors that affect the label noise. The dependence of these factors it is again classified into three subclasses.

  1. Random Noise is totally random and does not depend on either instance features or its true class.
  2. Y-dependent noise is independent of image features but depends on its class.
  3. XY-dependent noise depends on both image features and its class

Label noise is a natural outcome of the dataset collection process and can occur in various domains, such as medical imaging, crowd-sourcing, social network tagging, financial analysis, and many more. This work focuses on various solutions to such problems, but it may be helpful to investigate the causes of label noise in order to understand the phenomenon better.

  • Firstly, we can make use of the huge amount of data available on the web and social media. But, as the labels are coming from automated systems used by search engines or user tags which may result in noisy labels.
  • Secondly, multiple experts can label the data, but each expert has different experience levels which again leads to noisy labels.
  • Sometimes, data is too complicated and even experts also will not be able to label them correctly, for example, Medical Imaging.
  • It can also be injected for the purpose of regularizing or data poisoning.

Noise model-based methods

Noise model-based methods aim for extracting the information which is noise-free from the dataset by either neglecting or de-emphasizing the information coming from noisy samples. Their performance is good when we have prior information about the structure. The advantage of noise model-based methods is the label noise estimation and decoupling of classification, which helps them to work with the classification algorithm.

Noisy Channel

A common problem here is scalability. The size of the noise transition matrix increases exponentially with the increase in the number of classes making it intractable to calculate.

It is used in the training phase and it will be removed in the evaluation phase as the classifier requires noise-free predictions. The general setup is shown in the figure below.

There are many ways to formulate the noisy channel

  • Explicit Calculation
  • Iterative Calculation
  • Complex noisy channel

Label Noise Cleansing

Correcting the suspicious labels to their corresponding true class is the main solution for the noisy label.

The whole dataset cannot be cleaned as it is a costly process; some works propose to use a human annotator by sending him suspicious labels for the purpose of cost reduction.

Different approaches for label cleansing have been proposed based on the need for clean data or not.

  • Using data with clean labels
  • Using data with both clean and noisy labels
  • Using data with just noisy labels

Dataset Pruning

We can remove the noisy labels instead of correcting them to their true labels which will result in the loss of information and also prevents the negative impact of noise.

There is a risk of removing too many samples if we perform the above step. So, it is very important to avoid data loss by removing as few samples as possible.

Many approaches have been discussed in the paper to avoid the risk of removing too many samples.

Sample Choosing

To overcome label noise Sample Choosing uses an approach to manipulate the input stream to the classifier.

If correct guidance is provided to the network to choose the right instances to feed it will help in the presence of noisy labels, the classifier makes its way easy.

Sample choosing methods try to select samples that are to be trained on for the next training iteration by continuously monitoring the base classifier.

Hard informative samples are given importance in the later stages of training. Meanwhile, these methods aim to reduce the learning rate by prioritizing the low loss samples. Two major approaches under this group are:

  • Curriculum Learning
  • Multiple Classifiers

Sample Importance Weighting

Training can be made more effective by assigning weights to instances according to their estimated noisiness level. This has an effect of emphasizing cleaner instances for better updates on model weights.

The simplest approach would be, in case of availability of both clean and noisy data, weighting clean data more. However, this utilizes information poorly; moreover, clean data is not always available.

Labeler Quality Assessment

Depending on the expertise of each labeler, the way they label changes and very rarely contradict each other. This is the most common case in crowd-sourced data or for datasets that require a high level of expertise such as medical imaging.

Two unknown names will be heard while discussing this setup; noisy labeler characteristics and ground truth labels.

  • With the noise transition matrix, the labeler characteristics can be modeled when the noise is assumed to be y-dependent.
  • By considering image complexities we can use xy-dependent noise.

Noise model-free methods

Noise model-free methods aim to come up with inherent noise-robust methods without explicit modeling of the noise structure. Prior information about the structure of noise is not required. They mainly concentrate on regularizing the network training procedure to avoid overfitting as these kinds of approaches assume that the classifier is not too sensitive to the noise, and performance degradation is a result of overfitting.

Robust Losses

A noise-robust loss function is said to be learned with the noise-free and noisy data.

In this section algorithms mainly aim to design the loss function in such a way that performance is not decreased.

However, the performance of robust loss functions may badly affect the noise.

To prevent the use of previous data distribution information, these methods mostly treat the noisy and clean data in the same way.

Meta-Learning

The main purpose of Meta-Learning is to eliminate the hand-designed parameters like loss function, network architecture, optimizer algorithm, and so on.

It is known as learning how to learn. It is a potential learning paradigm that can absorb information from one task and generalize that information to unseen tasks proficiently.

The main drawback of these methods is the computational cost as they require nested loops of gradient computations for each loop. Compared to straight forward training, they are many times slower.

Model-Agnostic-Meta-Learning (MAML) seeks for weight initialization as a Meta-Learning model, which can be fine-tuned easily.

Regularizers

To prevent DNN’S from overfitting noisy labels Regularizer methods treat noisy data performance degradation as overfitting to noise. This assumption is not valid for more complex noises.

Techniques like Dropout, Weight decay, mixup, Label smoothing, adversarial training are widely used.

Ensemble Methods

It is well known that Bagging is best used to make noise more robust rather than boosting.

AdaBoost which is a Boosting algorithm puts more weight on noisy samples, which results in overfitting the noise.

However, by choosing the boosting algorithm, the degree of label noise robustness changes accordingly.

Conclusion

  • In order to achieve the best results from real-world datasets, Label noise is the main obstacle to deal with.
  • It is an important step to not use human supervision when the datasets are collected from the web.
  • Different approaches have been discussed in this paper when dealing with noisy labels. All of them have their own advantages and disadvantages
  • Noise model-based methods are heavily dependent on the accurate estimate of the noise structure. One can choose their best appropriate method based on their preferences.
  • For eg, Dataset pruning or label noise cleansing methods can be preferred when dealing with purification of the dataset as a preprocessing stage.
  • Noise model-free methods do not require any prior information about the noise structure. So, they are easier to implement if the noise is random and overfitting is the reason for performance degradation.
  • Robust losses or regularizers are preferred when there is no clean subset of data as they treat all samples the same.
  • Here, we use algorithms to try to find the noise structure and train the base classifier with estimated noise parameters.

The overall quality of the paper

  • This paper mainly concentrates on different types of noise model-based and noise model-free methods.
  • There are many other important fields other than image classification where treating noisy labels is important like generative networks, semantic segmentation, sound classification, and many more.

Critique of the paper

While extensive research on machine learning techniques is carried out, deep learning by noise labels is definitely an understudied problem.

Future directions and suggestions

Apart from concentrating on different types of noise labels, it would be good if future research topics include an accurate understanding of the effects of label noise on deep networks. Alternate ways of labeling at a low cost are very necessary in order to avoid the noisy label situation. Transfer Learning can be achieved effectively if we are able to identify the part of the network that is highly affected by label noise.

The learning from a noisy labeled dataset is given very little importance when a small amount of data is available. This can be a good research path given its potential in areas where data collection is expensive.

--

--