The PyTorch Profiler v1.9 has been released. It is the newest in a series of releases meant to provide you with new tools for debugging machine learning performance issues regardless of whether you use one or many machines. The objective is to target execution steps that consume significant time/memory, then visualize where there might be an issue on your system between GPUs or CPUs.
The five major features released in the new version of this app include: Distributed Training View, Memory View, GPU Utilization Visualization, Cloud Storage Support, and Jump to Source Code.
Distributed training can be an efficient way to train models, but issues often arise when splitting the load into worker nodes. The Distributed Training View enables you to diagnose and debug these problems as they happen in real-time. Memory View gives you a better understanding of your memory usage and can help avoid the infamous Out Of Memory error. The tool will show active allocations during various points in your program’s run so that if something takes up more space than expected, it won’t come as an unwelcome surprise!
GPU Utilization Visualization helps you ensure that your GPU is being pushed to its limits and fully utilized, so your computer isn’t just sitting around. In terms of Cloud Storage Support, the Tensorboard plugin can now read profiling data from Azure Blob Storage, Amazon S3 and Google Cloud Platform.
Jump to Source Code is a fantastic new feature that allows you to see what’s going on in your code and jump right into the source. This helps you quickly optimize your application based on profiling results, so make sure not to miss out!