NVIDIA recently released the 12.0 version of the CUDA Toolkit. This release, which focused on new programming models and CUDA application acceleration through new hardware capabilities, was the first significant update in a long time. After this update, we can now target CUDA custom code, improved libraries, and developer tools that provide architecture-specific features and instructions in the NVIDIA Hopper and NVIDIA Ada Lovelace architectures.
NVIDIA’s parallel computing platform, CUDA (Compute Unified Device Architecture), was created for general computing and is the main basis for GPGPU. It is a layer of software that gives compute kernels direct access to the virtual instruction set of GPUs as well as parallel computational components.
The new software brings support for new NVIDIA Hopper and NVIDIA Ada Lovelace architecture and brings enhancements for all GPUs, including PTX instructions through higher-level C and C++ APIs. It also brings significant performance improvements through assistance with the updated CUDA dynamic parallelism APIs. The CUDA graphics API was also improved. It also provides support for the GCC 12 host compiler and C++ 20. It also introduced a new library, nvJitLink, for JIT LTO, and several other updates were also introduced.
Enhanced memory bandwidth, higher clock rates, and increased streaming multiprocessor (SM) count in new GPU types are all instantly advantageous to CUDA applications. New speed improvements based on improvements to the GPU hardware architecture are revealed by CUDA and the CUDA libraries.
For several features of the NVIDIA Hopper and NVIDIA Ada Lovelace architectures, CUDA 12.0 introduces programmable capabilities:
- NVIDIA Hopper GPUs’ membar domains are controlled by launch parameters.
- In C++ and PTX, support the Hopper asynchronous transaction barrier.
- Support for cooperative grid array (CGA) relaxed barriers using C intrinsics
It is now possible to dynamically link your program to any minor version of the CUDA Toolkit within the same major release thanks to a feature called “CUDA minor version compatibility” that was introduced in 11.x.
The C++20 standard is now supported by the CUDA Toolkit 12.0. The following host compilers and their base versions support C++20:
- GCC 10
- Clang 11
- NVC++ 22.x
- Arm C/C++ 22.x
- MSVC 2022
In C++20, modules are presented as a brand-new method for importing and exporting entities between translation units.
Users of the NVIDIA Hopper and NVIDIA Ada Lovelace architectures and dynamic parallelism APIs can now target architecture-specific functionality thanks to the update.
This will be really beneficial to the users NVIDIA Hopper, and Ada Lovelace as Users will be able to create, improve, and deploy their applications on GPU-accelerated embedded systems, desktop workstations, business data centers, cloud-based platforms, and HPC supercomputers with the help of the CUDA Toolkit.
Check out the Reference Article. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Rishabh Jain, is a consulting intern at MarktechPost. He is currently pursuing B.tech in computer sciences from IIIT, Hyderabad. He is a Machine Learning enthusiast and has keen interest in Statistical Methods in artificial intelligence and Data analytics. He is passionate about developing better algorithms for AI.