Facebook AI releases Dynabench, a new and ambitious research platform for dynamic data collection, and benchmarking. This platform is one of the first for benchmarking in artificial intelligence with dynamic benchmarking happening over multiple rounds. It works by testing machine learning systems and asking adversarial human annotators to break it.
While there has been significant progress in AI research benchmarks — from MNIST to ImageNet to GLUE, we are still far from having machines that can truly understand natural language. Dynabench creates new challenging datasets using both humans and models together to measure NLP models more accurately. This process shows where gaps in current models exist, which allows it to train the next generation of AI models in the loop. It also measures how easily humans fool AI models in a dynamic environment instead of a static benchmark.
Dynabench uses a novel procedure called dynamic adversarial data collection to improve current AI benchmarking practices. This new approach to evaluate the robustness (or brittleness) of ML systems goes beyond the traditional training set paradigm.
With all these benchmark innovations in Dynabench, we can hope the best for future AI systems to make fewer mistakes, have less harmful biases, and be more useful in real-world applications.
Related Paper: https://arxiv.org/pdf/1910.14599.pdf
Related Github: https://github.com/facebookresearch/anli