Facebook AI Open Sources AugLy: A New Python Library For Data Augmentation To Develop Robust Machine Learning Models

Facebook has recently open-sourced AugLy, a new Python library that aims to help AI researchers use data augmentations to evaluate and improve the durability of their machine learning models. AugLy provides sophisticated data augmentation tools to create samples to train and test different systems.

AugLy is a new open-source data augmentation library that combines audio, image, video, and text, becoming increasingly significant in several AI research fields. It offers over 100 data augmentations based on people’s real-life images and videos on platforms like Facebook and Instagram.

As data sets and models are gradually becoming increasingly multimodal, it’s helpful to transform all of a project’s data under one unified library and API. Moreover, combining different modalities using real-world augmentations can help machines
better understand complex content.

The set of data augmentations provided in AugLy is also directly derived by the types of data transformations observed on platforms at Facebook. Therefore, AugLy will be particularly beneficial for people working on models or data related to social media applications.

AugLy has four sub-libraries, each associated with a different modality. However, each library follows a similar interface. As a result, AugLy can generate valuable metadata to help users understand how their data was transformed. Facebook has aggregated several augmentations from different existing libraries, including some of which Facebook has created ingeniously.


Data augmentations are essential to enhance the robustness of AI models. For example, suppose the models can be taught to be robust to perturbations of unimportant data attributes. In that case, models will gradually learn to focus on the critical attributes of data for a particular case.

According to Facebook, an important application detects exact copies or duplicates of a specific piece of content. For example, an amount of misinformation can repeatedly appear in slightly different forms; after augmenting AI models with AugLy data, they can learn to identify when a user is uploading content that is known to be infringing, which in turn will help in proactively prevent users from uploading content that is known to be infringing.
Apart from training models using AugLy, the library can also be utilized to determine the durability of models concerning a set of augmentations. AugLy was also used to evaluate the robustness of Deepfake detection models in the Deepfake Detection Challenge.

A lot of augmentations in AugLy are based on ways people transform content to try to evade automatic systems. As a result, AugLy can aid researchers working on various things ranging from object detection models to identifying hate speech to voice recognition.


Github: https://github.com/facebookresearch/AugLy

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft