Intel’s Habana Labs Introduces Second-Generation AI Deep Learning Processors

This Article Is Based On The Research Article 'Habanaยฎ Labs Launches Second-generation AI Deep Learning Processors'. All Credit For This Research Goes To Intel Researchers ๐Ÿ‘๐Ÿ‘๐Ÿ‘

Please Don't Forget To Join Our ML Subreddit

Today, Intel’s Habana Labs team announced two critical new products: Gaudi2, the 2nd iteration of the Gaudi DL training processor, and Greco, the successor to the Goya DL inference processor. Intel’s CPUs are significantly faster than their predecessors and competitors.

While Gaudi and Goya are the first new chips launched by Habana Labs following its acquisition by Intel. Gaudi2 and Greco switched from a 16nm to a 7nm technology (via TSMC, the manufacturer). The 10 Tensor processing cores present in the Gaudi training processor have been increased to 24. Also, the in-package memory capacity has tripled from 32GB (HBM2) to 96GB (HBM2E). The onboard SRAM has been increased from 24MB to 48MB. Gaudi2 is the first and only accelerator with such an amount of memory. The CPU has a TDP of 600W (vs. Gaudi – 350W), although it is claimed that it still employs passive cooling and does not require liquid cooling.

Gaudi2 achieved 3.2x Gaudi’s output, 1.9x an 80GB Nvidia A100, and 4.1x an Nvidia V100 on ResNet-50. On several other benchmarks, the difference between Gaudi and the 80GB A100 was considerably wider: Gaudi-2 outperformed the 80GB A100 by 2.8 for BERT Phase-2 training throughput. Comparing the V100 and A100 is critical because both are widely used in the cloud and on-premise.

Gaudi2 also includes support for FP8 to allow quicker training and improved memory use for huge models. It is now available to Habana customers in a mezzanine card form factor and part of the HLS-Gaudi2 server, designed to let clients evaluate Gaudi2. 8 Gaudi2 cards and a dual-socket Intel Xeon subsystem are included in the server. 1K Gaudi2 processors have already been deployed to be utilized for software optimization and the development of the Gaudi3 CPU. 


Greco is the successor of Goya as an inference processor. Greco shrinks the very efficient Goya to 7nm. The on-card memory is upgraded from DDR4 to LPDDR5, doubling the bandwidth and increasing the on-chip memory from 50 to 128MB. Greco switches from dual-slot to single-slot mode, lowering TDP from 200W to 75W. Because of the small form factor, users will be able to double the number of accelerators in the same host system. 

Gaudi2 and Greco are the newest to the increasingly crowded AI accelerator arms race, including Nvidia’s GPUs and additional specialized accelerators like Cerebras, Graphcore, and SambaNova. One won’t have the price-performance consistency that an ASIC like Gaudi2 can provide at scale for AI, but one will get a scalable, adaptable environment inside the Xe family.


๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...