Studies have shown that it is possible to adjust the pre-trained model for newly collected sensory data after deployment for on-device training. The model can learn to improve its predictions over time, learn new skills and adapt to new users by training and adapting locally at the edge. Protecting users’ privacy when dealing with sensitive data is another benefit of bringing training closer to the sensors.
However, unlike cloud-based training, on-device training on tiny edge devices presents a unique set of challenges. SRAM sizes for small IoT devices are typically small, around 256 kilobytes. Even the inference of deep learning models requires additional computing for the backward and additional memory for intermediate activation, which is not feasible with such a short memory budget. However, state-of-the-art deep training frameworks are typically built for cloud servers and have a high memory footprint, even when training a small model in batches of 1. To facilitate minimal on-device training, there is a need to jointly design the algorithm and the system.
Training frameworks for deep learning, like PyTorch, TensorFlow, JAX, MXNet, etc., aren’t designed to work with the limited hardware and memory of edge devices. While inference frameworks for deep learning at the periphery, such as TVM, TF-Lite, NCNN, etc., have low execution times, they do not provide back-propagation. Nonetheless, the current training system cannot convert theoretical savings into measurable ones.
A new study by IBM and MIT researchers investigates the possible small on-device training methods through the collaborative design of algorithms and systems. As we delve deeper into micro-on-device training, we discover two distinct hurdles: The researchers address two major issues in a model that is quantized on the edge devices, and the little memory and processing power of microcontrollers prevent full back-propagation.
They propose Quantization-Aware Scaling (QAS) to automatically scale the gradient of tensors with varying bit-precisions, stabilizing the training and achieving accuracy comparable to its floating-point equivalent while overcoming the optimization challenge. QAS does not need any hyper-parameter adjustment and may be used directly out of the box.
They also propose Sparse Update, which bypasses the gradient computation of less relevant layers and sub-tensors to lower the memory footprint of the whole backward computation. They implemented an automated approach based on contribution analysis to determine the optimal updating strategy for varying memory constraints. They also present a lightweight training system called Tiny Training Engine (TTE) to help develop new algorithms. Since TTE is code-generation based, it reduces the runtime overhead by moving the auto-differentiation to the compile time.
Sparse updates are supported, memory is conserved, and performance is improved thanks to graph pruning and reordering techniques.
According to the team, this is the first work to permit the on-device training of convolutional neural networks with a memory requirement of less than 256KB.
Their approach allows for classifier and backbone parameter updates, which results in improved transfer learning accuracy. Their findings show that the on-device-tuned model for the VWW tinyML application achieves the same accuracy as cloud training+edge deployment. It outperforms the typical tinyML requirement by nine percentage points.
The memory requirement is greatly diminished because of their system-algorithm co-design approach. When compared to cloud training frameworks, the suggested methods significantly reduce memory use by over a factor of 1000 and a factor of 100 compared to the best edge training framework can discover (MNN).
This framework saves energy and encourages practical use by decreasing the per-iteration time by more than 20 compared to dense update and vanilla system design. Their findings show that small IoT devices may make inferences, learn from experience, and acquire new skills over time.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'On-Device Training Under 256KB Memory'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, demo video, project and reference article. Please Don't Forget To Join Our ML Subreddit
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.