Amazon introduces a new plugin, called ‘Amazon S3 plugin’, Facebook’s PyTorch machine learning framework that helps data scientists access datasets stored in Amazon Web Services (AWS) Simple Storage Service (S3) buckets. The open-source library built for use with the deep learning framework PyTorch offers users the ability to stream data from Amazon S3. The new plugin for Facebook’s PyTorch machine learning framework will allow data scientists to access datasets stored in AWS S3 buckets with greater ease.
The new Amazon S3 plugin for PyTorch is a fast dataset library that offers high-performance access to data on the cloud without having to provision local storage. It handles streaming of volume data quickly and efficiently, making it easy for users with any level of experience or expertise.
Using the Amazon S3 plugin, the transfer of data from Amazon S3 is done at maximum speed without worrying about thread safety or multiple connections. It also provides the option to stream data in parallel, while working on .zip or .tar archives and shuffling datasets across shards as needed.
The benefits of Amazon S3 plugin for PyTorch includes:
- PyTorch supports two different types of datasets, both map-style and iterable-style. The Amazon S3 plugin for PyTorch provides the flexibility to use either based on your needs.
- Training data in various formats can be used to train machine learning models with this plugin. It is file format agnostic and presents objects on Amazon S3 as a binary buffer (blob). Additional transformations are possible on the input received from Amazon S3.
- The plugin provides a way to shuffle data in-memory within shards using
ShuffleDatasetor across shards by providing the input parameter
The Amazon S3 plugin for PyTorch is available to use through pre-configured Pytorch docker images and the GitHub repository.