The ImageNet database, which was first introduced at the Conference of Computer Vision and Pattern Recognition in 2009 and today contains over 14 million tagged images, has become one of the most prominent standards in the field of computer vision. ImageNet is also a static dataset, but real-world data is frequently streamed and on a considerably more extensive scale. While academics are constantly working to increase model accuracy on ImageNet, there has been minimal focus on improving resource efficiency in ImageNet supervised learning.
Researchers from DeepMind present the One Pass ImageNet (OPIN) problem, designed to study and understand deep learning in a streaming setting with constrained data storage, with the intent of developing systems that can train a model with each example passed once through the system.
In the OPIN problem, inputs are sent in mini-batches and do not repeat. The training method is completed after the entire dataset has been exposed. OPIN considers learning capability under confined space and computing conditions, unlike standard ImageNet assessments, which focus on model accuracy.
The researchers employ three primary metrics:
- Accuracy, which is defined as the top-1 accuracy in the test set.
- Space, which is defined as the total additional data storage required.
- Compute, which is defined as the total number of global backpropagation steps.
The OPIN problem has four properties, according to the team:
i) The cold-start problem: model begins with random initialization. As a result, representation learning in OPIN is complex, especially early on in training.
ii) The problem of forgetting: Each example is only given to the model once. Vanilla supervised learning is prone to forget early examples, even if the data is i.i.d. (independent and identically distributed).
iii) Data in a natural order: There is no artificial order imposed on the data. As a result, the data can be considered i.i.d., which differs from many other continuous learning benchmarks.
iv) Several objectives: Because the approaches are judged on three criteria (accuracy, space, and computation), the goal is to enhance all three in a single training process.
The team employed a standard “multi-epoch” ImageNet solution, ResNet-50, as their model in the tests and ran trials with replay steps of 1, 3, 5, 8, and replay buffer sizes of 1 percent, 5 percent, and 10% of the dataset.
Under the exact computational cost, prioritized replay with a 10% memory size yields similar performance to multi-epoch training. On the other hand, the multi-epoch technique makes use of the entire dataset, which necessitates a considerable amount of data storage. A vital starting point for the prioritized replay is 1% data storage. The naive One-Pass performance is improved by 28.7% when 1 percent data storage (equal to 100 mini-batches) is added.
When the buffer size is increased, the accuracy improves as the number of playback steps increases. The model accuracy increases by 0.6 percent and 0.1 percent for replay steps 1 and 3, respectively, when the size is increased from 5% to 10%, while the accuracy increases by 0.9 percent for replay step 5. If you increase the storage space or the playback steps, the model accuracy quickly saturates. Increasing both of these could result in a significantly more significant increase in accuracy.
The researchers expect their study to encourage researchers to focus on improving resource efficiency in supervised learning, which will help deep learning mature and industrialize further.