Meet MultiRay: Meta AI’s New Platform For Efficiently Running Large-Scale Artificial Intelligence (AI) Models

Today’s state-of-the-art AI systems for handling text, images, and other modalities achieve optimal performance by first training a massive model with a massive quantity of data and then training that model to specialize in a single job (for example, identifying harmful speech). The result is a high-quality, high-priced specialized tool. The cost of maintaining so many massive models quickly escalates out of control if there are many problems to solve. As a result, huge state-of-the-art models are rarely employed in production, and considerably smaller and simpler models are typically used instead.

A new Meta AI research has created MultiRay, a new platform for executing cutting-edge AI models at a massive scale to make AI systems more effective. With MultiRay, numerous models can share the same input. Only a fraction of the processing time and resources are used for each model, minimizing the overall cost of these AI-based operations. By centralizing the business’s computational resources in one model, AI accelerators can easily deploy and strategically trade between computing resources and data storage. The universal models in MultiRay have been fine-tuned to excel in a wide variety of applications. 

Machine learning (ML) models for various uses, such as subject tagging of posts and hate speech detection, can be developed and refined by teams across Meta with the help of MultiRay. This method is more time- and labor-efficient than having multiple teams construct huge end-to-end models independently.

MultiRay increases the accessibility to Meta’s big core models by offloading calculations to specialized hardware like graphics processing units (GPUs) and minimizing the time and energy expended on recomputation by keeping frequently used data in memory (cache). MultiRay currently drives over 125 use cases across Meta, supporting up to 20 million queries per second (QPS) and 800 billion daily queries.

MultiRay employs huge, foundational models to accurately reflect the input that provides a point in a high-dimensional vector space. An embedding represents the input that is more amenable to machine learning. To simplify the processing of task-specific models, MultiRay provides an embedding of the input data (such as text and images) that can be consumed in place of the raw input. MultiRay’s core models are trained to perform well on various tasks, including similarity and classification. Due to the need to convey additional information, our embeddings are large (several kilobytes in size).

Centralized, massive models provide the following advantages:

  1. Multiple-team amortisation
  2. Reduced complexity in production and operation
  3. Shorter times between discovery and commercialization: Localized velocity change

A single request can be made simultaneously using the MultiRay external API. To handle the high volume of requests from multiple customers at once, MultiRay employs a cross-request batching mechanism inside. The logic only needs to be written once and can be fine-tuned to produce batches of the optimal size for the model and hardware. Even when making significant improvements to performance, such as using a bigger batch size when migrating to the latest generation of GPU accelerator hardware, this batching is fully transparent to the clients issuing the requests.

To minimize the time and energy spent recomputation, MultiRay uses a cache. It’s a multi-level cache designed to save money and time, with higher hit rates at the expense of slower access times. Each MultiRay server has its own quick but limited RAM-based local cache. Those caches are topped off by a slower but more extensive flash memory-based globally distributed cache.

Check out theย reference article.ย All Credit For This Research Goes To Researchers on This Project. Also, donโ€™t forget to joinย our Reddit pageย andย discord channel, where we share the latest AI research news, cool AI projects, and more.

๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...