TensorFlow has recently released its new update, TensorFlow 2.4.0-rc4. TensorFlow Profiler now supports profiling MultiWorkerMirroredStrategy that is now a stable API and tracing multiple workers using the sampling mode API. This strategy enables synchronous distributed training across multiple workers, each with potentially multiple GPUs. Some of the significant improvements involve handling peer failure and many bug fixes that can be found on Multi-worker training with Keras. Major refactoring of the internals of Keras Functional API has been completed. It improves the reliability, stability, and performance of constructing Functional models. The update also adds support for TensorFloat-32 on Ampere based GPUs. TensorFloat-32 (TF32) is a math model for NVIDIA Ampere based GPUs and is enabled by default.
Some float32 ops run in lower precision on Ampere based GPUs, including matmuls and convolutions, due to TensorFloat-32. For instance, inputs to such ops are rounded from 23 bits of precision to 10 bits of precision. In a few cases, TensorFloat-32 is also used for complex64 ops. So, TensorFloat-32 can now be disabled.
Many irrelevant API functions are removed, such as C-API functions for string access/modification in C. Modules that are not part of TensorFlow public API are hidden.
The steps_per_execution argument is now stable in compile(). It helps running multiple batches inside a single tf.function call that can improve performance on TPUs or small models with large Python overhead. There has been major refactoring of the internals of the Keras Functional API. This refactoring may affect code that relies on certain internal details.
tf.data.experimental.service.DispatchServer and tf.data.experimental.service.WorkerServer now take a config tuple instead of individual arguments. This can be done using tf.data.experimental.service.DispatchServer(dispatcher_config) and tf.data.experimental.service.WorkerServer(worker_config) respectively. This helps working with multiple arguments at the same time.
Various inbuilt APIs are renamed with new features in the latest update.
Bug fixes and other changes
- Calling ops with python constants or NumPy values is consistent with tf.convert_to_tensor behavior. This now avoids operations like tf.reshape truncating inputs such as from int64 to int32.
- Adds support for dispatcher fault tolerance.
- Adds support for sharing dataset graphs through a shared file system rather than over RPC. This reduces the load on the dispatcher, improving the performance of distributed datasets.
- Improvements from the Functional API refactoring:
- Functional model construction doesn’t need to maintain a global workspace graph, removing memory leaks, especially when building many models or extensive models.
- Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g., tf.image.ssim_multiscale.
- Error messages when Functional API construction goes wrong (or when ops cannot be converted to Keras layers automatically) should be more precise and easier to understand.
Overall, TensorFlow’s new features are much required, as it adds the necessary elements to enhance the performance and remove the irrelevant ones. The improvements introduced will help develop more reliable and improved ML models.