Researchers from MIT and Harvard have Produced a Hypothesis that may Explain How a Transformer Could be Built Using Biological Elements in the Brain

Artificial neural networks, prevalent models in machine learning capable of being trained for various tasks, derive their name from their structural resemblance to the information-processing methods of biological neurons within the human brain. The workings of the human brain inspire them.

The rise of Transformers, a distinctive category of artificial intelligence architecture, has sparked a profound influence across the landscape of machine learning and is steadily penetrating the domain of computational neuroscience. These revolutionary models exhibit an extraordinary ability to achieve unmatched levels of performance, as evidenced by their proficiency in generating text from prompts with remarkable human-like accuracy. Significantly, prominent AI frameworks like ChatGPT and Bard are constructed upon the bedrock of transformers.

Recently, a collaborative effort of researchers from MIT, the MIT-IBM Watson AI Lab, and Harvard Medical School has formulated a hypothesis outlining the potential construction of a transformer using biological components found within the brain. Their proposition revolves around the concept that a biological network, comprising neurons alongside other essential brain cells known as astrocytes, might be capable of executing the fundamental computations analogous to those performed by a transformer architecture.

The group of scientists conducted thorough computational investigations into the cognitive functions performed by astrocytes within the brain. Their efforts also led to developing a sophisticated mathematical framework that accurately illustrates the collaborative interactions between astrocytes and neurons. This framework serves as a blueprint for designing a transformer model that closely emulates the intricate biological processes of the brain.

The researchers laid the foundation by establishing a correspondence between the models, employing shared weights, and presenting the overarching scenario. They also developed an alternate non-astrocytic approach for implementing Transformers within a biological context to ensure comprehensive insight. 

Central to their investigation is the tripartite synapse, a ubiquitous three-way connection involving an astrocyte, a presynaptic neuron, and a postsynaptic neuron. The researchers underscored that these tripartite synapses have the potential to play a significant role in performing normalization tasks within the self-attention mechanism of a Transformer model.

They used the fundamental mathematical components inherent to a transformer and constructed uncomplicated biophysical models illustrating the interactions between astrocytes and neurons during brain communication. This process drew upon an extensive exploration of existing literature and benefited from insights provided by collaborating neuroscientists. By skillfully blending these models using clever combinations, they arrived at a neuron-astrocyte network equation that beautifully captures the self-attention mechanism of a transformer. 

The researchers are now embarking from theoretical concepts to practical applications. Their forthcoming task involves scrutinizing their model’s predictions against outcomes observed in biological experiments, a pivotal phase that could refine or challenge their formulated hypothesis.

An interesting idea from their research is the potential role of astrocytes in long-term memory. This thought arises because the network needs to effectively store information for possible future actions, hinting at how astrocytes might be involved in this memory process.

Although the intriguing possibilities of combining Transformers and the brain are captivating, it’s important to recognize the significant disparities in the learning processes of humans and Transformers. Transformers exhibit an insatiable hunger for data, demanding substantial energy for their training. On the flip side, the human brain functions on a relatively modest energy budget, similar to an everyday laptop. It doesn’t require enormous, internet-scale training datasets to develop language skills.


Check out the Paper and Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft