Hidet: An Open-Source Python-based Deep Learning Compiler

The demand for optimized inference workloads has never been more critical in deep learning. Meet Hidet, an open-source deep-learning compiler developed by a dedicated team at CentML Inc. This Python-based compiler aims to streamline the compilation process, offering end-to-end support for DNN models from PyTorch and ONNX to efficient CUDA kernels, focusing on NVIDIA GPUs.

Hidet has emerged from research presented in the paper “Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs,” The compiler addresses the challenge of reducing the latency of deep learning model inferences, a vital aspect of ensuring efficient model serving across a variety of platforms, from cloud services to edge devices.

The development of Hidet is driven by the recognition that developing efficient tensor programs for deep learning operators is a complex task, given the intricacies of modern accelerators like NVIDIA GPUs and Google TPUs, coupled with the rapid expansion of operator types. While existing deep learning compilers, such as Apache TVM, leverage declarative scheduling primitives, Hidet takes a unique approach.

The compiler embeds the scheduling process into tensor programs, introducing dedicated mappings known as task mappings. These task mappings enable developers to define the computation assignment and ordering directly within the tensor programs, enriching the expressible optimizations by allowing fine-grained manipulations at a program-statement level. This innovative approach is referred to as the task-mapping programming paradigm.

Additionally, Hidet introduces a post-scheduling fusion optimization, automating the fusion process after scheduling. This not only allows developers to focus on scheduling individual operators but also significantly reduces the engineering efforts required for operator fusion. The paradigm also constructs an efficient hardware-centric schedule space agnostic to program input size, thereby substantially reducing tuning time.

Extensive experiments on modern convolution and transformer models showcase the power of Hidet, outperforming state-of-the-art DNN inference frameworks such as ONNX Runtime and the compiler TVM equipped with AutoTVM and Ansor schedulers. On average, Hidet achieves a 1.22x improvement, with a maximum performance gain of 1.48x.

In addition to its superior performance, Hidet demonstrates its efficiency by reducing tuning times significantly. Compared to AutoTVM and Ansor, Hidet slashes tuning times by 20x and 11x, respectively. 

As Hidet continues to evolve, it’s setting new standards for efficiency and performance in deep learning compilation. With its approach to task mapping and fusion optimization, Hidet has the potential to become a cornerstone in the toolkit of developers seeking to push the boundaries of deep learning model serving.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...