PyTorch Releases Prototype Features To Execute Machine Learning Models On-Device Hardware Engines

PyTorch has recently released four new PyTorch prototype features. The first three enable mobile machine-learning developers to execute models on the full set of hardware (HW) engines making up a system-on-chip (SOC) system. This allows developers to optimize their model execution for a unique performance, power, and system-level concurrency.

New features incorporate enabling execution on-device HW engines given below:

  • DSP and NPUs using the Android Neural Networks API (NNAPI) developed in collaboration with Google Android.
  • GPU execution on Android via Vulkan
  • GPU execution on iOS via Metal

There is increasing ARM usage in the PyTorch community with Raspberry Pis and Graviton(2) platforms. Hence, the new release also includes developer efficiency benefits with recently launched support for ARM64 builds for Linux.

NNAPI Support with Google Android

PyTorch’s collaboration with the Google Android team enables Android’s Neural Networks API (NNAPI) via PyTorch Mobile. On-device machine learning allows ML models to run locally on the device without transmitting data to a server. This offers lower latency and improved privacy and connectivity. The Android Neural Networks API (NNAPI) is designed for running computationally intensive processes for machine learning on Android gadgets. Thus, machine learning models can now access additional hardware blocks on the phone’s system-on-chip, allowing developers to unlock high-performance execution on Android phones. NNAPI enables Android apps to run computationally accelerated neural networks on the most potent and active parts of the chips powering androids, including DSPs (Digital Signal Processors) and NPUs (specialized Neural Processing Units). 

The API was first introduced in Android 8 and significantly expanded in Android 10 and 11 to support a richer AI model. This integration allows developers to access NNAPI directly from PyTorch Mobile. This initial release includes fully-functional support for a core set of features and operators. Google and Facebook will be working on expanding capabilities soon.

PyTorch Mobile GPU Support

GPU deduction can provide excellent performance on many model types, particularly the ones utilizing high-precision floating-point math. Leveraging the GPU for machine learning model execution as those found in SOCs from Qualcomm, Mediatek, and Apple supports CPU-offload. This frees up the Mobile CPU for non-machine learning use cases. This primary level of prototype assistance for device GPUs is through the Metal API specification for iOS and the Vulkan API specification for Android. This feature’s performance is not optimized, and model coverage is limited as it is in an immature stage. The team foresees this to improve significantly throughout 2021.


Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...