TensorFlow introduces the PluggableDevice architecture, which seamlessly integrates accelerators (GPUs, TPUs) with TensorFlow without making any changes in the TensorFlow code. PluggableDevice, as the name suggests, provides plug-in options for registering the devices with TensorFlow. It is constructed using the StreamExecutor C API and builds on the work done for Modular TensorFlow. In TF 2.5, the PluggableDevice feature is available.
Need for Seamless integration
Previously, any integration of a new device needed changes to TensorFlow’s core. This was not a flexible task because of the following issues:
- Complex build dependencies and compiler toolchains. The process of onboarding a new compiler is time-consuming and adds to the product’s technical complexity.
- Slow development time. Changes in the code require code reviews from the TensorFlow team, which is time-taking. And it delays the development and testing of new features.
- Multiple build configurations to test for. Other devices/components of TensorFlow may get affected because of some changes in one device. Each new device has the potential to increase the number of test configurations in a multiplicative manner.
- Easy to break. Due to the lack of a contract via a well-defined API, it’s easier to break a particular device.
The PluggableDevice mechanism requires no changes in the TensorFlow code regarding device change. It uses C APIs to interface with the TensorFlow binary in a stable manner. There are separate code repositories and distribution packages for the different plug-ins. TensorFlow’s build dependencies, toolchains, and testing procedures are also unaffected in this way. Also, since only modifications to the C APIs or PluggableDevice components can influence the code, the integration is less brittle.
The PluggableDevice mechanism has four major parts:
- PluggableDevice type: A new ‘device type’ in TensorFlow that enables device registration from plug-in packages. Users only need to install the plug-in in a particular directory, and the mechanism will detect and plug in the plug-in’s capabilities.
- Custom operations and kernels: Plug-ins use the Kernel and Op Registration C API to register their own operations and kernels to TensorFlow.
- Memory management and device execution: TensorFlow controls plug-in devices via the StreamExecutor C API.
- Custom graph optimization pass: Using the Graph Optimization C API, plug-ins can register one custom graph optimization pass that will be run after all regular Grappler passes.
How to use PluggableDevice
To use any particular device in TensorFlow, users only need to install the device plug-in package for that device. An example code snippet is being shown below that installs and uses APU (Awesome Processing Unit). Let this APU plug-in have only one custom kernel for ReLU for the sake of simplicity.
$ pip install tensorflow-apu-0.0.1-cp36-cp36m-linux_x86_64.whl … Successfully installed tensorflow-apu-0.0.1 $ python Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf # TensorFlow registers PluggableDevices here >>> tf.config.list_physical_devices() [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:APU:0', device_type='APU')] >>> a = tf.random.normal(shape=, dtype=tf.float32) # Runs on CPU >>> b = tf.nn.relu(a) # Runs on APU >>> with tf.device("/APU:0"): # Users can also use 'with tf.device' syntax ... c = tf.nn.relu(a) # Runs on APU >>> @tf.function # Defining a tf.function ... def run(): ... d = tf.random.uniform(shape=, dtype=tf.float32) # Runs on CPU ... e = tf.nn.relu(d) # Runs on APU >>> run() # PluggableDevices also work with tf.function and graph mode.
Upcoming plug-ins/PluggableDevicesIntel will be one of the first partners to release a PluggableDevice. Intel has submitted over 3 RFCs implementing the overall mechanism. They are about to release an Intel extension for TensorFlow (ITEX) plug-in package that will integrate Intel XPU with TensorFlow for AI workload acceleration. TensorFlow will also publish a detailed tutorial soon on how to develop a PluggableDevice plug-in. One can still ask any questions regarding this on the TensorFlow Forum with the tag pluggable_device.