QLoRA Adapter for Dutch Definition Expansion (mT5-xl)

This repository contains a QLoRA adapter fine-tuned on google/mt5-xl for the task of sense-preserving definitional expansion in Dutch.

This work was developed as part of the Master's thesis, "Transformer-based Expansion of Dutch Dictionary Definitions", submitted for the degree of Master of Science in Artificial Intelligence at KU Leuven.

About the Thesis

The research investigates the potential of transformer-based models to automate a significant bottleneck in contemporary lexicography: the manual expansion of concise, core-meaning definitions into comprehensive, formally structured dictionary entries. The study focuses on Dutch, a task requiring not only semantic accuracy but also strict adherence to lexicographical style and structure.

The thesis empirically compares two primary methodologies: in-context learning via few-shot prompting and adaptation via parameter-efficient fine-tuning (specifically, QLoRA). This comparison was conducted across a range of powerful multilingual and Dutch-specific models, including mT5-xl, GEITje Ultra, Aya-101, and Aya-23, to determine the most effective strategy for this high-precision domain.

This Model's Role and Performance

This fine-tuned mT5-xl model represents a key baseline in the study acting as the "unaligned blank slate" against which more modern instruction-tuned models were compared. Unlike models pre-aligned for conversational interaction mT5 was pre-trained exclusively on an unsupervised objective without instruction tuning making it a pure test of domain adaptation through fine-tuning.

While it successfully learned the task, its performance was ultimately surpassed by newer decoder-only architectures. The study concluded that while a blank slate model like mT5 avoids conversational bias, a stronger instruction-tuned base model like Aya-23 ultimately provided a better foundation for this specific fine-tuning task.

How to Use

To use this adapter you must first load the base model (`google/mt5-xl') in 4-bit and then apply this adapter on top of it.

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base_model_id = "google/mt5-xl"
adapter_id = "RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora" # The repo ID of this adapter

# Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Apply the LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

print("Model loaded successfully!")

Prompting Format

This adapter was fine-tuned on a specific instructional prompt. For best results, your input should match this structure.

# Define the lemma and short definition you want to expand
lemma = "ecoroman"
short_def = "roman over milieuproblematiek"

# Define the prompt components, matching the training script
system_prompt = "Je bent een expert-lexicograaf die definities schrijft voor een Nederlands woordenboek."
instruction = f"Breid de volgende korte definitie voor het woord '{lemma}' uit tot een volledige definitie: '{short_def}'"
prompt = f"{system_prompt}\n\n{instruction}"

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate the output tokens
print("
Generating definition...")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512, # Chosen based on the longest full definition's token length for this model
        num_beams=4,  # What was used for the thesis
        early_stopping=True
    )

# Decode the tokens into a string
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("\n--- Prompt ---")
print(prompt)
print("\n--- Model Output ---")
print(decoded_output)

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora

Base model

google/mt5-xl

Adapter

(3)

this model

RobbedoesHF
/

mt5-xl-dutch-definition-expansion-qlora