https://store-images.s-microsoft.com/image/apps.10812.90eb0510-1415-4658-8a46-d0354b742b34.5eba6160-0d8e-497c-9dd7-2cf24df15462.24567de6-d817-4d43-a435-af4aebd4b63f

Jina Embeddings v2 Base - code

Jina AI

Jina Embeddings v2 Base - code

Jina AI

Text embedding model (base) for Coding Languages with input of size up to 8192 tokens.

  • jina-embeddings-v2-base-code is an open-source coding embedding model supporting 8192 sequence length.
  • This state-of-the-art AI embedding model enables many applications, such as code review, static analyses, documentation assistance, code search, or retrieval augmented generation (RAG).

Highlights:
  • State-of-the-art: This model is designed for high performance in applications across 30 programming languages and has been trained specifically to support mixed English-coding input without bias.
  • Extended Context: An 8192-token length enables jina-embeddings-v2-base-code to support longer codes and document fragments, far surpassing models that only support a few hundred tokens at a time.
  • Compact Size: jina-embeddings-v2-base-code is built for high performance on standard computer hardware. With only 137 million parameters, the entire model is less than 300MB. The embeddings themselves are 768 dimensions, a relatively small vector size compared to many models, saving space and run-time for applications.