The introduction of incredible Large Language Models (LLMs) has been nothing short of groundbreaking in the field of Artificial Intelligence. The way humans engage with technology has changed as a result of these complex algorithms, which are powered by enormous amounts of data and computer power. AI is changing the way humans interact with machines, and with the power of LLMs, a number of domains are getting revolutionized.
Transformer models need feedforward layers, as they are crucial for the performance of the model. These layers are responsible for transforming input data and are central to the model’s performance. Transformer models have expanded in size in recent years, and their feedforward layers now include tens of thousands of hidden neurons. Finding strategies to accelerate feedforward layer calculations is crucial since the growth in model size has resulted in higher computational expenses during inference.
Only a small portion of the feedforward hidden neurons are required in very large networks in order to determine the output for a given input. In response to this insight, efforts have been made to create modular networks that make use of this phenomenon. Recent studies in this domain have concentrated on architectural layouts that encourage feedforward layer sparsity. These designs require training a gating layer to select which experts to use during inference and subdividing the feedforward layer into distinct blocks of neurons. This method increases training complexity and cuts down on inference time, but it also relies on noisy gating.
As an alternative to the existing approaches, a team of two researchers from ETH Zurich has introduced Fast Feedforward (FFF) architecture. FFF uses a differentiable binary tree, separating the input space into multiple regions while concurrently learning each sector’s borders and the associated neural blocks. Compared to conventional feedforward layers and modularization techniques, FFF has advantages. It reduces the inference time as it can access specific blocks of neurons in logarithmic time. This is in contrast to earlier methods’ linear scaling of the feedforward layer’s width.
FFF has been compared to the Mixture-of-Experts (MoE) approach, which also uses expert blocks but involves noisy gating. FFF avoids this noise and achieves faster inference with reduced computational complexity. The researchers have also highlighted the impressive speed gains achieved by FFF. It states that FFFs can be up to 220 times faster than traditional feedforward networks, which indicates a substantial improvement in computational efficiency. As an example, the use of FFFs in vision transformers has been highlighted, asserting that FFFs have the potential for use in vision-related activities because they can maintain 94.2% of prediction performance while using only 1% of the neurons.
In conclusion, FFF’s design is definitely a groundbreaking method for enhancing neural networks’ computational effectiveness. It outperforms mixture-of-experts networks and greatly shortens inference time when compared to typical feedforward networks. The training characteristics of FFFs, such as noiseless conditional execution, and their capacity to attain good prediction accuracy with low neuron usage are also the primary features. These developments have the potential to speed up and improve the performance of huge models, revolutionizing the deep-learning industry.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.