Instructions to use jinaai/jina-embeddings-v5-text-nano-text-matching with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jinaai/jina-embeddings-v5-text-nano-text-matching with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jinaai/jina-embeddings-v5-text-nano-text-matching", trust_remote_code=True)

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

llama-cpp-python

How to use jinaai/jina-embeddings-v5-text-nano-text-matching with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jinaai/jina-embeddings-v5-text-nano-text-matching",
	filename="v5-nano-text-matching-F16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jinaai/jina-embeddings-v5-text-nano-text-matching with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

Use Docker

docker model run hf.co/jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

LM Studio
Jan
Ollama
How to use jinaai/jina-embeddings-v5-text-nano-text-matching with Ollama:
```
ollama run hf.co/jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
```

Unsloth Studio new

How to use jinaai/jina-embeddings-v5-text-nano-text-matching with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jinaai/jina-embeddings-v5-text-nano-text-matching to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jinaai/jina-embeddings-v5-text-nano-text-matching to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jinaai/jina-embeddings-v5-text-nano-text-matching to start chatting

Docker Model Runner
How to use jinaai/jina-embeddings-v5-text-nano-text-matching with Docker Model Runner:
```
docker model run hf.co/jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M
```

Lemonade

How to use jinaai/jina-embeddings-v5-text-nano-text-matching with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jinaai/jina-embeddings-v5-text-nano-text-matching:Q4_K_M

Run and chat with the model

lemonade run user.jina-embeddings-v5-text-nano-text-matching-Q4_K_M

List all available models

lemonade list

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Elastic Inference Service | ArXiv | Release Note | Blog

Model Overview

jina-embeddings-v5-text Architecture

`jina-embeddings-v5-text-nano-text-matching` is a compact, high-performance text embedding model designed for text-matching.

It is part of the jina-embeddings-v5-text model family, which also includes jina-embeddings-v5-text-small, for better performance at a bigger size.

Trained using a novel approach that combines distillation with task-specific contrastive losses, jina-embeddings-v5-text-nano-text-matching outperforms existing state-of-the-art models of similar size across diverse embedding benchmarks.

Feature	Value
Parameters	239M
Supported Tasks	`text-matching`
Max Sequence Length	8192
Embedding Dimension	768
Matryoshka Dimensions	32, 64, 128, 256, 512, 768
Pooling Strategy	Last-token pooling
Base Model	jinaai/jina-embeddings-v5-text-nano

Training and Evaluation

For training details and evaluation results, see our technical report.

Usage

Requirements

The following Python packages are required:

transformers>=5.1.0
torch>=2.8.0
peft>=0.15.2
vllm==0.15.1

Optional / Recommended

flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
sentence-transformers: If you want to use the model via the sentence-transformers interface, install this package as well.

via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT _inference/text_embedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-nano"
  }
}

See the Elastic Inference Service documentation for setup details.

via sentence-transformers

from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "jinaai/jina-embeddings-v5-text-nano-text-matching",
    trust_remote_code=True,
    model_kwargs={"dtype": torch.bfloat16},  # Recommended for GPUs
    config_kwargs={"_attn_implementation": "flash_attention_2"},  # Recommended but optional
)
# Optional: set truncate_dim in encode() to control embedding size

texts = [
    "A beautiful sunset over the beach",  # English
    "غروب جميل على الشاطئ",  # Arabic
    "海滩上美丽的日落",  # Chinese
    "Un beau coucher de soleil sur la plage",  # French
    "Ein wunderschöner Sonnenuntergang am Strand",  # German
    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία",  # Greek
    "समुद्र तट पर एक खूबसूरत सूर्यास्त",  # Hindi
    "Un bellissimo tramonto sulla spiaggia",  # Italian
    "浜辺に沈む美しい夕日",  # Japanese
    "해변 위로 아름다운 일몰",  # Korean
]

# Encode texts
embeddings = model.encode(texts)
print(embeddings.shape)
# (10, 768)

similarity = model.similarity(embeddings[0], embeddings[1:])
print(similarity)
# tensor([[0.8945, 0.9386, 0.9339, 0.9439, 0.7339, 0.9303, 0.9291, 0.9404, 0.9317]])

via vLLM

from vllm import LLM
from vllm.config.pooler import PoolerConfig

# Initialize model
name = "jinaai/jina-embeddings-v5-text-nano-text-matching"
model = LLM(
    model=name,
    dtype="float16",
    runner="pooling",
    trust_remote_code=True,
    pooler_config=PoolerConfig(seq_pooling_type="LAST", normalize=True)
)

# Create text prompts
query = "Overview of climate change impacts on coastal cities"
query_prompt = f"Query: {query}"

document = "The impacts of climate change on coastal cities are significant.."
document_prompt = f"Document: {document}"

# Encode all prompts
prompts = [query_prompt, document_prompt]
outputs = model.encode(prompts, pooling_task="embed")

embed_query = outputs[0].outputs.data
embed_document = outputs[1].outputs.data

via llama.cpp (GGUF)

Since our nano model is based on jinaai/jina-embeddings-v5-text-nano, which is not yet supported by llama.cpp, we provide our own branch of llama.cpp, which implements the necessary changes to support it for now.

To start the OpenAI API compatible HTTP server, run with the respective model version:

llama-server \
  -hf jinaai/jina-embeddings-v5-text-nano-text-matching:F16 \
  --embedding \
  --pooling last \
  --batch-size 8192 \
  --ubatch-size 8192 \
  --ctx-size 8192

Client:

curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Document: A beautiful sunset over the beach",
      "Document: Un beau coucher de soleil sur la plage",
      "Document: 海滩上美丽的日落",
      "Document: 浜辺に沈む美しい夕日",
      "Document: Golden sunlight melts into the horizon, painting waves in warm amber and rose, while the sky whispers goodnight to the quiet, endless sea."
    ]
  }'

Note: For the text-matching variant, always add Document: prefix in front of your input as shown above.

via Optimum (ONNX)

You can run the ONNX-optimized version of the model locally using Hugging Face's optimum library. Make sure you have the required dependencies installed (e.g., pip install optimum[onnxruntime] transformers torch):

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
import torch

model_id = "jinaai/jina-embeddings-v5-text-nano-text-matching"

# 1. Load tokenizer and ONNX model
# We specify the subfolder 'onnx' where the weights are located
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    subfolder="onnx",
    file_name="model.onnx",
    provider="CPUExecutionProvider",  # Or "CUDAExecutionProvider" for GPU
    trust_remote_code=True,
)

# 2. Prepare input
texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")


# 4. Inference
with torch.no_grad():
    outputs = model(**inputs)

# 5. Pooling (Crucial for Jina-v5)
# Jina-v5 uses LAST-TOKEN pooling.
# We take the hidden state of the last non-padding token.
last_hidden_state = outputs.last_hidden_state
# Find the indices of the last token (usually the end of the sequence)
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]

print('embeddings shape:', embeddings.shape)
print('embeddings:', embeddings)

License

The model is licensed under CC BY-NC 4.0. For commercial use, please contact us.

Citation

If you find jina-embeddings-v5-text-nano-text-matching useful in your research, please cite the following paper:

@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
      title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation}, 
      author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael Günther and Maximilian Werk and Han Xiao},
      year={2026},
      eprint={2602.15547},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.15547}, 
}