MixedBread AI Introduces Binary MRL: A Novel Embeddings Compression Method, Making Vector Search Scalable and Enable Embeddings-based Applications

Mixedbread.ai recently introduced Binary MRL, a 64-byte embedding to address the challenge of scaling embeddings in natural language processing (NLP) applications due to their memory-intensive nature. In natural language processing (NLP), embeddings play a vital role in various tasks, such as recommendation systems, retrieval, and similarity search. However, the memory requirements of embeddings pose a significant challenge, particularly when dealing with massive datasets. The method aims to find a way to decrease the memory use for embeddings while maintaining their utility and effectiveness in NLP applications.

Currently, state-of-the-art models produce embeddings with high dimensions (e.g., 1024 dimensions), encoded in float32 format, requiring large memory for storage and retrieval. To address these limitations, researchers at mixedbread.ai have found two main approaches: Matryoshka Representation Learning (MRL) and Vector Quantization. MRL focuses on reducing the number of output dimensions of an embedding model while preserving accuracy. This is done by putting more important data in the earlier dimensions of the embedding, which lets the less important dimensions be cut off. On the other hand, Vector Quantization aims to reduce the size of each dimension by representing them as binary values instead of floating-point numbers. 

✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

The proposed approach, Binary MRL, combines both methods to achieve simultaneous dimensionality reduction and compression of embeddings. By integrating MRL and Vector Quantization, Binary MRL aims to retain the semantic information encoded in embeddings while significantly reducing their memory footprint.

Binary MRL achieves compression by first reducing the number of output dimensions of the embedding model using MRL techniques. This involves training the model to preserve important information in fewer dimensions, thereby allowing for the truncation of less relevant dimensions. Then, Vector Quantization is used to show each dimension of the reduced-dimensional embedding as a binary value. This binary representation significantly reduces the memory usage of embeddings while retaining semantic information. The evaluation of Binary MRL on various datasets demonstrates that the method can achieve over 90% of the performance of the original model while using significantly smaller embeddings.

In conclusion, Binary MRL represents a novel approach to addressing the scalability challenges of embeddings in NLP applications. By combining techniques from MRL and Vector Quantization, Binary MRL achieves significant compression of embeddings while preserving their utility and effectiveness. Not only does this method reduce the costs of large-scale retrieval, but it also makes new tasks possible that were not possible before because of memory limits.

Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.

[Free AI Webinar] 'How to Build Personalized Marketing Chatbots (Gemini vs LoRA)'.