The Approximately Correct Machine Intelligence (ACMI) Lab at Carnegie Mellon University (CMU) has published a paper on Randomly Assign, Train, and Track (RATT). RATT is an algorithm that uses noisy training data to put an upper bound on a deep-learning model’s actual error risk. Model developers can use RATT to see how well a model generalizes to new input data.
The researchers demonstrate mathematical proofs of RATT’s guarantees and conduct experiments on various datasets for computer vision (CV) and natural language processing (NLP) models in their publication. When a trained model gets a high error rate on randomly labeled (or noisy) data but a low error rate on clean data, the model is assumed to have a low error rate on new data.
This research gives practitioners a way to ensure the generalization of deep networks even when unseen labeled data is unavailable, as well as theoretical insights into the link between random label noise and generalization.
Generalization is a learned model’s capacity to give correct output for new, previously unseen input data or data that was not used during training. Large deep-learning models’ generalization capacity is poorly known, particularly for models with more parameters than training data samples. These models, for example, can achieve low training errors even on random input data, showing that they effectively memorize the training data; but, when trained with real datasets, they can still actually generalize to unknown data.
To determine how much a model can generalize, average error, or risk, determined across the whole input population is used as a parameter. While calculating a model’s theoretical upper bound can be challenging, if not impossible, there are approaches for doing so. However, these approaches yield a blank upper bound in many circumstances, implying that the model will do no worse than getting every answer incorrectly. In reality, most model developers withhold a portion of the training data and test the trained model on this test set to determine its generalizability.
CMU Team’s approach
Deep learning models demonstrate an early learning phenomenon when trained on a combination of clean and noisy data. According to the CMU team, the model first fits the clean data and then memorizes the noisy data. The researchers then demonstrated that if a model is trained on a mixture of clean and noisy data, with the average training error on clean data being low but the average training error on noisy data being high (approx. 50%). The model’s risk will have a non-vacuous upper bound that is a function of the two training error averages; this bound will be slightly larger than the average error on the clean data, but will still be lower than the average error on the noisy data.
Further, to validate their proof, various deep-learning models were trained. For example, image datasets like MNIST and CIFAR-10 were used to train an MLP (multilayer perceptron) and Resnet CV model (Residual neural network computer vision). The IMDb sentiment analysis dataset was used to train an LSTM (Long short-term memory) and BERT model.
To create noisy data, the team set aside a small portion of each dataset and randomly assigned new labels to the samples before training. The models were then trained on both clean and noisy data, with the set’s error being tracked (thus the name RATT).
The researchers contrasted the accuracy bounds predicted by their proof to actual accuracy measured using traditional test-set assessment on a model trained only on clean data. The anticipated bound closely mirrored test performance; for example, the ResNet18 model’s projected accuracy on MNIST data was 96.8%, compared to an actual accuracy of 98.8%.
In both the corporate and academic worlds, generalization is currently a hot topic for research. So, there has been various work done and still going on this subject. Some of the links are provided below.
Other related research links: https://academic.microsoft.com/paper/3123856301/