Jina AI Introduces ‘jina-embeddings-v2’: The World’s First 8k Open-Source Text Embedding Models

Jina AI unveils its latest advancement in its second-generation text embedding model: jina-embeddings-v2. This state-of-the-art model is the only open-source solution supporting an impressive 8K (8192 tokens) context length. This achievement positions it equivalently with OpenAI’s proprietary model, text-embedding-ada-002, in terms of capabilities and its performance on the Massive Text Embedding Benchmark (MTEB) leaderboard.

Jina-embeddings-v2 is a big step in open-source text embedding models, rivalling established proprietary counterparts in both capacity and benchmark performance. It performs better than OpenAI’s 8K model jina-embeddings-v2. Remarkably, Jina-embedding-v2 exhibits superior performance compared to its OpenAI counterpart across key metrics such as Classification Average, Reranking Average, Retrieval Average, and Summarization Average.

The researchers said that Jina-embeddings-v2 has revolutionized diverse applications with its advanced capabilities. In legal document analysis, it captures and analyzes every intricate detail in extensive legal texts. For medical research, it embeds scientific papers, facilitating holistic analytics and fostering groundbreaking discoveries. The model delves deep into long-form content in literary analysis, capturing thematic elements for a richer understanding. Financial forecasting empowers users to attain superior insights from detailed financial reports, enhancing decision-making processes. In conversational AI, Jina Embeddings V2 significantly improves chatbot responses to intricate user queries. With its versatile and powerful capabilities, Jina Embeddings V2 stands at the forefront of transforming how we approach and derive insights from complex datasets in various domains.

Tests show that this context-enabled jina-embeddings-v2 outperforms other leading base embedding models, emphasizing the practical advantages of longer context capabilities.

Dr. Han Xiao, the CEO of Jina AI, shared reflections on the journey and the profound significance of this launch. He said that the achievement with the release of Jina-embeddings-v2 is remarkable, aiming to create the world’s first open-source 8K context length model and compete with industry leaders like OpenAI. The mission at Jina AI remains crystal clear: to democratize AI by providing tools that were once confined to exclusive ecosystems, making significant strides toward this goal today.

The researchers said they have planned to publish an academic paper detailing the technical intricacies and benchmarks of Jina-embeddings-v2, providing the AI community with a chance to explore the model’s capabilities more deeply. The team is progressing in developing an embedding API platform akin to OpenAI, reaching an advanced stage that assures users seamless scalability of the embedding model tailored to their needs. Furthermore, Jina AI is broadening its linguistic capabilities by venturing into multilingual embeddings, intending to introduce German-English models. This expansion aims to enhance their portfolio and reinforce their position as leaders in AI innovation.

The model can be easily downloaded for free on Hugging Face. The Base Model, formulated for demanding tasks that require high accuracy, finds applications in fields like academic research or business analytics. In contrast, the Small Model, with a compact size of 0.07G, is designed for lighter tasks, making it ideal for applications on mobile apps or devices with limited computing resources. Recognizing the varied requirements within the AI community, Jina AI presents these two distinct model options, allowing users to choose the one that best suits their computational needs and aligns with their application preferences. 

Check out the Reference Article and Project Page. All Credit For This Research Goes To the Researchers on This Project.

