Artificial Neural Networks have provided promising solutions to problems we once thought impossible to address. Artificial Intelligence and Deep Learning models are widely used in almost every sector in today’s time.
With the growing complexity of real-time problems, the size of the neural networks is growing as well. Many present neural networks have trillions of neural weights, also known as parameters. This increasing scale presents unique problems for the hardware and software used to develop such neural networks. Today, clusters of graphics processors require acres of space, megawatts of electricity, and dedicated staff to run models that are only a tenth of that size.
Recently, Cerebras Systems released the world’s first multi-million core AI cluster architecture. Cerebras Systems is a leading innovator in developing computer solutions for complex AI and DL applications. Their new technology handles neural networks with up to an astonishing 120 trillion parameters, approximately the number of synapses in the human brain.
They address the most fundamental challenges using a holistic systems approach for extreme scale. The proposed architecture’s foundations are built upon the second-generation Cerebras Wafer-Scale Engine (WSE-2), the largest chip ever made, the fastest AI processor, and a vital part of the CS-2 system. Purpose-built for AI work, the 7nm-based WSE-2 delivers a massive leap forward for AI computing.
Neural networks use memory differently for different components of model computation. Therefore, they planned to design a purpose-built solution for each type of memory and each type of compute that the neural network needs. As a result, this would not only untangle them but also simplify the entire scaling problem.
The new execution mode named “weight streaming” allows independent scaling of the model size and the training speed and offers great flexibility. A single CS-2 system can support models up to 120 trillion parameters. Up to 192 systems are be clustered with near-linear performance scaling to speed up training.
MemoryX : Enabling Hundred-Trillion Parameter Models
In addition to this, they introduce MemoryX, a new memory extension technology that stores model weights and streams them onto the CS-2 systems as needed to calculate each layer of the network, one layer at a time. It includes both weight storage and the intelligence to accurately schedule and perform weight updates in order to avoid dependence bottlenecks. MemoryX’s architecture is flexible, allowing for configurations ranging from 4TB to 2.4PB with parameter sizes ranging from 200 billion to 120 trillion. The gradients are transmitted back to the MemoryX on the backward pass, where the weight update is executed in time to be used for the next round of training.
SwarmX : Providing Bigger, More Efficient Clusters
They introduce SwarmX; an interconnect fabric technology that allows them to expand the number of CS-2 devices near-linearly for extreme-scale models in this topology. This means that 10 CS-2 systems will achieve the same solution ten times faster than a single CS-2 system.
The SwarmX fabric was created with weight streaming in mind, allowing for practical concurrent training across CS-2 systems. It employs tree topology to facilitate modular and low-overhead scaling and broadcasts weights to all CS-2 systems while reducing gradients. As a result, the SwarmX fabric takes an active role in the training process.
A single MemoryX unit can target any number of CS-2 computers, and the SwarmX fabric scales independently of MemoryX resources. Because each CS-2 system contains 850,000 AI-optimized cores, the SwarmX fabric can grow from two to 192 systems, allowing for clusters of up to 163 million AI-optimized cores.
Cerebras architecture not only scales capacity and performance, but it also offers massive acceleration for sparse neural networks. The AI community is working hard to develop new sparse models that can achieve the same level of accuracy while using less computing power. Traditional designs cannot accelerate these sparse networks; hence such techniques are essential for reaching extreme-scale in practice. In contrast, The Cerebras hardware employs fine-grained dataflow scheduling to ensure that computations are only triggered when they are needed. This can help in achieving 10X weight sparsity speedup, all while saving electricity.
Cerebras makes huge cluster building as simple as pressing a button by combining weight streaming, MemoryX, and SwarmX technologies. Cerebras doesn’t try to mask distribution complexity with software. Instead, its fundamentally different design eliminates scaling complexity entirely. Because of the WSE-2’s scale, there’s no need to split a neural network’s layers across several CS-2 systems — even multi-trillion parameter models’ layers can be mapped to a single CS-2 system.
With this architecture, researchers can simply compile the neural network mapping for a single CS-2 system. The Cerebras software takes care of the rest of the execution as they scale and eliminates the traditional distributed AI complications of memory partitioning, coordination, and synchronization across multiple small devices.