To meet the increasing computing needs of neural networks, AI processing requires full-stack innovation across hardware and software platforms. Using lower precision number formats to increase computational efficiency, decrease memory utilization, and optimize for interconnect bandwidth is a crucial area to drive efficiency.
Researchers think that having a standard interchange format will promote rapid development and platform interoperability for software and hardware to progress computers. Thus, recently the industry switched from 32-bit precisions to 16-bit, and now even 8-bit precision formats to reap these advantages. An 8-bit floating point precision is especially advantageous for transformer networks, one of the most significant developments in AI. In this context, NVIDIA, Intel, and Arm have jointly published an article defining an 8-bit floating point (FP8) specification. It introduces a standard format that aims to push AI development thanks to the optimization of memory utilization and works in both AI training and inference steps. This FP8 specification was presented under two variants, E5M2 and E4M3.
The names of the two encodings, E4M3 and E5M2, specify the exponent (E) and mantissa (M) bit counts following the IEEE 754 standard. It is recommended to use E4M3 for weight and activation tensors and E5M2 for gradient tensors when using FP8 encodings. Although some networks just require the E4M3 or the E5M2 type to be trained, some networks require both types (or must maintain many fewer tensors in FP8). Inference and the forward pass of training are made using a variation of E4M3, while gradients are done in the backward pass utilizing a variation of E5M2. The FP8 format was created using the guiding idea of adhering to IEEE-754 norms and only deviating where doing so would significantly improve the accuracy of DL applications. The E5M2 structure is, therefore, IEEE half-precision with fewer mantissa bits and adheres to the IEEE 754 rules for the exponent and particular values. This makes it simple to convert between IEEE FP16 and E5M2 formats. The dynamic range of E4M3 is increased by recovering most of the bit patterns used for particular values instead of allowing several encodings for the specific values in this case.
To verify the proposal’s effectiveness formulated in this article, the authors conducted an experimental study concerning the training and inference phases to compare the obtained results with baselines trained in either FP16 or bfloat16. Over the vision and language translation models, the FP8 training results match those of 16-bit training sessions.
This paper introduced a new FP8 binary interchange format, E4M3, and E5M2. The authors ensure that software implementations can continue to rely on such IEEE FP characteristics as the ability to compare and sort values using integer operations by barely departing from IEEE-754 norms for binary encoding of floating point values. The experimental study shows that using the same model, optimizer, and training hyperparameters, a wide range of neural network models for image and language tasks can be trained in FP8 to equal model accuracy achieved with 16-bit training sessions. By using the same datatypes for training and inference, FP8 not only expedites and minimizes the resources needed for training but also makes 8-bit inference deployment simpler.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'FP8 FORMATS FOR DEEP LEARNING'. All Credit For This Research Goes To Researchers on This Project. Check out the paper. Please Don't Forget To Join Our ML Subreddit
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep