ViCLSR

ViCLSR (Vietnamese Contrastive Learning for Sentence Representations) is a supervised contrastive learning framework for Vietnamese Natural Language Understanding (NLU). The model leverages Natural Language Inference (NLI) datasets to learn high-quality sentence embeddings using entailment and contradiction relationships, targeting low-resource Vietnamese NLU settings.


Model Details

Field Details
Model name ViCLSR
Base model XLM-RoBERTa-Large
Language Vietnamese
Task Sentence Embedding / NLU
Training objective Supervised Contrastive Learning with NLI
License CC BY-NC-SA 4.0
Paper arXiv:2603.21084 (March 2026)

Abstract

High-quality text representations are crucial for NLU, but low-resource languages like Vietnamese face challenges due to limited annotated data. We propose ViCLSR, a novel supervised contrastive learning framework that optimizes sentence embeddings for Vietnamese by leveraging existing NLI datasets. ViCLSR significantly outperforms strong baselines on five Vietnamese NLU benchmarks, demonstrating that supervised contrastive learning can effectively address resource limitations in low-resource NLU tasks.


Performance

ViCLSR is evaluated on five Vietnamese NLU benchmarks spanning NLI, Fact Checking, Constructive Speech Detection, and Reading Comprehension (Table 4 in the paper).

ViCLSR Results

Dataset Task Metric ViCLSR vs. XLM-R Large vs. PhoBERT Large
ViNLI Natural Language Inference F1 82.84 ↑1.53 ↑6.97
ViWikiFC Fact Checking F1 86.57 ↑1.42 ↑4.97
ViFactCheck Fact Checking F1 88.78 ↑0.76 ↑9.02
UIT-ViCTSD Constructive Speech Detection F1 82.22 ↑2.78 ↑5.36
ViMMRC2.0 Reading Comprehension Acc 59.06 ↑1.54 ↑4.33

Full results and analysis are available in the paper.


Intended Uses

ViCLSR is designed for Vietnamese NLU research and can be applied to:

  • ✅ Sentence embedding
  • ✅ Semantic similarity
  • ✅ Natural Language Inference (NLI)
  • ✅ Information retrieval
  • ✅ Fact checking
  • ✅ Sentiment analysis
  • ✅ Vietnamese NLU tasks in general

Out-of-Scope Uses

  • ❌ Non-Vietnamese languages (model is optimized for Vietnamese)
  • ❌ Commercial use (CC BY-NC-SA 4.0 license)

Architecture Note

ViCLSR extends XLM-RoBERTa-Large with a custom MLP projection head (mlp.dense, Linear 1024→1024) trained with supervised contrastive loss. This projection head is essential for obtaining high-quality embeddings — using the raw CLS token without it will yield suboptimal results. The examples below demonstrate the correct loading procedure.


Usage

Installation

pip install transformers torch huggingface_hub

Model Loading Helper

Both usage examples share the same model loading procedure. We recommend defining a helper function:

import os
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "1"
import transformers
from transformers import AutoTokenizer, XLMRobertaModel
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download


def load_viclsr(model_name="huynhtin/ViCLSR"):
    transformers.logging.set_verbosity_error()  # suppress load warnings

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model     = XLMRobertaModel.from_pretrained(
        model_name,
        use_safetensors=False  # use pytorch_model.bin directly
    )

    # Load custom MLP projection head trained with contrastive loss
    model.mlp        = nn.Linear(1024, 1024)
    ckpt_path        = hf_hub_download(repo_id=model_name, filename="pytorch_model.bin")
    state_dict       = torch.load(ckpt_path, map_location="cpu")
    model.mlp.weight = nn.Parameter(state_dict["mlp.dense.weight"])
    model.mlp.bias   = nn.Parameter(state_dict["mlp.dense.bias"])
    model.eval()
    return tokenizer, model

Sentence Embedding

tokenizer, model = load_viclsr()

text   = "Trí tuệ nhân tạo đang phát triển rất nhanh."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs   = model(**inputs)
    cls_emb   = outputs.last_hidden_state[:, 0]
    embedding = model.mlp(cls_emb)             # pass through MLP head
    embedding = F.normalize(embedding, dim=-1) # L2 normalize

print(embedding.shape)  # torch.Size([1, 1024])

Semantic Similarity

tokenizer, model = load_viclsr()

def get_embedding(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=256
    )
    with torch.no_grad():
        outputs = model(**inputs)
    cls_emb = outputs.last_hidden_state[:, 0]
    return F.normalize(model.mlp(cls_emb), dim=-1)

sentence1 = "Hà Nội là thủ đô của Việt Nam."
sentence2 = "Thành phố Hà Nội là thủ đô nước Việt Nam."
sentence3 = "Bóng đá là môn thể thao phổ biến nhất thế giới."

emb1 = get_embedding(sentence1)
emb2 = get_embedding(sentence2)
emb3 = get_embedding(sentence3)

sim_12 = (emb1 * emb2).sum().item()
sim_13 = (emb1 * emb3).sum().item()

print(f"Similarity (sentence1 vs sentence2): {sim_12:.4f}")  # ~0.98
print(f"Similarity (sentence1 vs sentence3): {sim_13:.4f}")  # ~0.48

Training Details

  • Base model: XLM-RoBERTa-Large
  • Training framework: Supervised Contrastive Learning
  • Training data: Vietnamese NLI datasets (entailment/contradiction pairs)
  • Objective: Contrastive loss using positive (entailment) and negative (contradiction) pairs
  • Projection head: MLP Linear(1024 → 1024) trained jointly with contrastive loss
  • Language: Vietnamese

Limitations

  • Optimized specifically for Vietnamese — performance may degrade significantly on other languages
  • Performance depends on the quality and domain of input text
  • Best suited for research purposes under CC BY-NC-SA 4.0

Citation

If you use ViCLSR in your research, please cite:

@article{huynh2026viclsr,
  title={ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks},
  author={Huynh, Tin Van and Nguyen, Kiet Van and Nguyen, Ngan Luu-Thuy},
  journal={arXiv preprint arXiv:2603.21084},
  year={2026}
}

Downloads last month
379
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for huynhtin/ViCLSR