ViCLSR

ViCLSR (Vietnamese Contrastive Learning for Sentence Representations) is a supervised contrastive learning framework for Vietnamese Natural Language Understanding (NLU). The model leverages Natural Language Inference (NLI) datasets to learn high-quality sentence embeddings using entailment and contradiction relationships, targeting low-resource Vietnamese NLU settings.

📄 Paper: arXiv:2603.21084
🤗 Model: huynhtin/ViCLSR

Model Details

Field	Details
Model name	ViCLSR
Base model	XLM-RoBERTa-Large
Language	Vietnamese
Task	Sentence Embedding / NLU
Training objective	Supervised Contrastive Learning with NLI
License	CC BY-NC-SA 4.0
Paper	arXiv:2603.21084 (March 2026)

Abstract

High-quality text representations are crucial for NLU, but low-resource languages like Vietnamese face challenges due to limited annotated data. We propose ViCLSR, a novel supervised contrastive learning framework that optimizes sentence embeddings for Vietnamese by leveraging existing NLI datasets. ViCLSR significantly outperforms strong baselines on five Vietnamese NLU benchmarks, demonstrating that supervised contrastive learning can effectively address resource limitations in low-resource NLU tasks.

Performance

ViCLSR is evaluated on five Vietnamese NLU benchmarks spanning NLI, Fact Checking, Constructive Speech Detection, and Reading Comprehension (Table 4 in the paper).

ViCLSR Results

Dataset	Task	Metric	ViCLSR	vs. XLM-R Large	vs. PhoBERT Large
ViNLI	Natural Language Inference	F1	82.84	↑1.53	↑6.97
ViWikiFC	Fact Checking	F1	86.57	↑1.42	↑4.97
ViFactCheck	Fact Checking	F1	88.78	↑0.76	↑9.02
UIT-ViCTSD	Constructive Speech Detection	F1	82.22	↑2.78	↑5.36
ViMMRC2.0	Reading Comprehension	Acc	59.06	↑1.54	↑4.33

Full results and analysis are available in the paper.

Intended Uses

ViCLSR is designed for Vietnamese NLU research and can be applied to:

✅ Sentence embedding
✅ Semantic similarity
✅ Natural Language Inference (NLI)
✅ Information retrieval
✅ Fact checking
✅ Sentiment analysis
✅ Vietnamese NLU tasks in general

Out-of-Scope Uses

❌ Non-Vietnamese languages (model is optimized for Vietnamese)
❌ Commercial use (CC BY-NC-SA 4.0 license)

Architecture Note

ViCLSR extends XLM-RoBERTa-Large with a custom MLP projection head (mlp.dense, Linear 1024→1024) trained with supervised contrastive loss. This projection head is essential for obtaining high-quality embeddings — using the raw CLS token without it will yield suboptimal results. The examples below demonstrate the correct loading procedure.

Usage

Installation

pip install transformers torch huggingface_hub

Model Loading Helper

Both usage examples share the same model loading procedure. We recommend defining a helper function:

import os
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "1"
import transformers
from transformers import AutoTokenizer, XLMRobertaModel
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download


def load_viclsr(model_name="huynhtin/ViCLSR"):
    transformers.logging.set_verbosity_error()  # suppress load warnings

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model     = XLMRobertaModel.from_pretrained(
        model_name,
        use_safetensors=False  # use pytorch_model.bin directly
    )

    # Load custom MLP projection head trained with contrastive loss
    model.mlp        = nn.Linear(1024, 1024)
    ckpt_path        = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
    state_dict       = torch.load(ckpt_path, map_location="cpu")
    model.mlp.weight = nn.Parameter(state_dict["mlp.dense.weight"])
    model.mlp.bias   = nn.Parameter(state_dict["mlp.dense.bias"])
    model.eval()
    return tokenizer, model

Sentence Embedding

tokenizer, model = load_viclsr()

text   = "Trí tuệ nhân tạo đang phát triển rất nhanh."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs   = model(**inputs)
    cls_emb   = outputs.last_hidden_state[:, 0]
    embedding = model.mlp(cls_emb)             # pass through MLP head
    embedding = F.normalize(embedding, dim=-1) # L2 normalize

print(embedding.shape)  # torch.Size([1, 1024])

Semantic Similarity

tokenizer, model = load_viclsr()

def get_embedding(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=256
    )
    with torch.no_grad():
        outputs = model(**inputs)
    cls_emb = outputs.last_hidden_state[:, 0]
    return F.normalize(model.mlp(cls_emb), dim=-1)

sentence1 = "Hà Nội là thủ đô của Việt Nam."
sentence2 = "Thành phố Hà Nội là thủ đô nước Việt Nam."
sentence3 = "Bóng đá là môn thể thao phổ biến nhất thế giới."

emb1 = get_embedding(sentence1)
emb2 = get_embedding(sentence2)
emb3 = get_embedding(sentence3)

sim_12 = (emb1 * emb2).sum().item()
sim_13 = (emb1 * emb3).sum().item()

print(f"Similarity (sentence1 vs sentence2): {sim_12:.4f}")  # ~0.98
print(f"Similarity (sentence1 vs sentence3): {sim_13:.4f}")  # ~0.48

Training Details

Base model: XLM-RoBERTa-Large
Training framework: Supervised Contrastive Learning
Training data: Vietnamese NLI datasets (entailment/contradiction pairs)
Objective: Contrastive loss using positive (entailment) and negative (contradiction) pairs
Projection head: MLP Linear(1024 → 1024) trained jointly with contrastive loss
Language: Vietnamese

Limitations

Optimized specifically for Vietnamese — performance may degrade significantly on other languages
Performance depends on the quality and domain of input text
Best suited for research purposes under CC BY-NC-SA 4.0

Citation

If you use ViCLSR in your research, please cite:

@article{huynh2026viclsr,
  title={ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks},
  author={Huynh, Tin Van and Nguyen, Kiet Van and Nguyen, Ngan Luu-Thuy},
  journal={arXiv preprint arXiv:2603.21084},
  year={2026}
}

Downloads last month: 379

Paper for huynhtin/ViCLSR

ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks

Paper • 2603.21084 • Published Mar 22