This AI Paper from Intel Presents a SYCL Implementation of Fully Fused Multi-Layer Perceptrons (MLPs) on Intel Data Center GPU Max

In the field of Artificial Intelligence (AI), Multi-Layer Perceptrons (MLPs) are the foundation for many Machine Learning (ML) tasks, including partial differential equation solving, density function representation in Neural Radiance Fields (NeRFs), and ray tracing simulation using Neural Ray Tracing.

Fully connected layers, in which every neuron in a layer is connected to every other neuron in the layer above and below, are a defining characteristic of MLPs. In MLPs, every neuron’s output is independent of the output of its nearby neurons in the same layer, in contrast to certain other topologies. Because of this property, MLPs can be used for fully fusing processes, which is essential for some computational workloads.

In recent research, a team of researchers from Intel Corporation and Ecole Polytechnique has focussed on effectively building narrow MLPs on Intel GPUs. Narrow MLPs feature a tiny, fixed number of neurons per layer and a shallow depth, i.e., the number of layers. Narrow MLPs are universal approximators that have significance in a wide range of applications despite their narrow width. Their narrow breadth, however, limits their performance, leading to low memory bandwidth utilization and arithmetic intensity during training and inference.

Combining the layers into a single kernel is a popular solution to these problems, as it allows for the use of quicker memories such as caches, shared memory, and register files. This method, called ‘fully-fused MLPs,’ was previously utilized with CUDA to construct Nvidia GPUs.

The team has shared that the goal of this study is to create fully-fused MLPs with a fixed layer width of 2^i neurons and arbitrary depth using SYCL for Intel GPUs (where i varies from 4 to 7). These MLPs are effective universal approximators in spite of the fixed layer width. Utilizing the XMX technology in Intel’s Data Centre GPU Max 1550, the implementation is based on Intel’s joint matrix SYCL extensions.

Models requiring high data throughput with batch sizes of 2^i, where i is more than 15, are especially well suited for this technique. Compared to comparable CUDA implementations, the Intel hardware SYCL version performs better, particularly for 64-width MLPs. A study has also indicated that this method requires less access to global memory than prior ones, which improves inference acceleration and theoretical peak performance. 

Benchmarks and applications, including Image Compression, Neural Radiance Fields (NeRFs), and Physics-Informed Machine Learning, have been tested in order to demonstrate performance improvements and possible applications. The provided approach performs significantly better than off-the-shelf implementations such as the CUDA PyTorch version on Nvidia’s H100 GPU and Intel Extension for PyTorch (IPEX) on the same Intel GPU in all circumstances.

The team has summarized their primary contributions as follows.

  1. The first SYCL implementation for fully-fused Multi-Layer Perceptrons designed for Intel GPUs using XMX instructions has been introduced. 
  1. The performance of the implementation has been assessed using a roofline model, which shows a rise in arithmetic intensity of up to 2.15 times when compared to a fully-fused implementation.
  1. Four sample applications have been used to validate the higher performance: the regression benchmark, image compression, neural radiation fields, and physics-informed neural networks. 
  1. The implementation is noteworthy because it can perform training 1.75 times quicker and inference 2.84 times faster than another fully-fused implementation. Its effectiveness across a variety of activities and datasets has been further demonstrated by the up to 30 times performance improvement it delivers over commercially available PyTorch versions.

Check out the Paper and GithubAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...