https://store-images.s-microsoft.com/image/apps.10812.78752fb9-5718-44ba-9737-cec2ce85f4e7.3a656a19-8df4-4d61-9477-6e45e183793c.0f353279-97ff-41c0-a16d-410355ed8828

Jina Embeddings v2 Base - en

Jina AI

Jina Embeddings v2 Base - en

Jina AI

Text embedding model (base) for input of size up to 8192 tokens.

  • Jina Embeddings v2 Base model is optimized for highly accurate embeddings - For speed of inference and memory efficiency use the Small model.
  • jina-embeddings-v2-base-en is an open-source English embedding model supporting 8192 sequence length.
  • This state-of-the-art AI embedding model enables many applications, such as document clustering, classification, content personalization, vector search, or retrieval augmented generation.

Highlights:
  • Use-cases: Vector search, retrieval augmented generation, long document clustering, sentiment analysis.

    Extended context length: This model uniquely support an 8K context length, enabling them to process and understand larger chunks of data in a single pass, resulting in richer embeddings and more accurate predictions.

    Model size: 137M parameters.

    High performance over tasks across the board: Our model ranks amongst the top performing ones on HuggingFace’s MTEB leaderboard for embedding models - especially considering its small size and extended context length.

  • The backbone of this model was pretrained on the C4 dataset. This model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.