Instructions to use RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
QLoRA Adapter for Dutch Definition Expansion (mT5-xl)
This repository contains a QLoRA adapter fine-tuned on google/mt5-xl for the task of sense-preserving definitional expansion in Dutch.
This work was developed as part of the Master's thesis, "Transformer-based Expansion of Dutch Dictionary Definitions", submitted for the degree of Master of Science in Artificial Intelligence at KU Leuven.
About the Thesis
The research investigates the potential of transformer-based models to automate a significant bottleneck in contemporary lexicography: the manual expansion of concise, core-meaning definitions into comprehensive, formally structured dictionary entries. The study focuses on Dutch, a task requiring not only semantic accuracy but also strict adherence to lexicographical style and structure.
The thesis empirically compares two primary methodologies: in-context learning via few-shot prompting and adaptation via parameter-efficient fine-tuning (specifically, QLoRA). This comparison was conducted across a range of powerful multilingual and Dutch-specific models, including mT5-xl, GEITje Ultra, Aya-101, and Aya-23, to determine the most effective strategy for this high-precision domain.
This Model's Role and Performance
This fine-tuned mT5-xl model represents a key baseline in the study acting as the "unaligned blank slate" against which more modern instruction-tuned models were compared. Unlike models pre-aligned for conversational interaction mT5 was pre-trained exclusively on an unsupervised objective without instruction tuning making it a pure test of domain adaptation through fine-tuning.
While it successfully learned the task, its performance was ultimately surpassed by newer decoder-only architectures. The study concluded that while a blank slate model like mT5 avoids conversational bias, a stronger instruction-tuned base model like Aya-23 ultimately provided a better foundation for this specific fine-tuning task.
How to Use
To use this adapter you must first load the base model (`google/mt5-xl') in 4-bit and then apply this adapter on top of it.
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base_model_id = "google/mt5-xl"
adapter_id = "RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora" # The repo ID of this adapter
# Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForSeq2SeqLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Apply the LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
print("Model loaded successfully!")
Prompting Format
This adapter was fine-tuned on a specific instructional prompt. For best results, your input should match this structure.
# Define the lemma and short definition you want to expand
lemma = "ecoroman"
short_def = "roman over milieuproblematiek"
# Define the prompt components, matching the training script
system_prompt = "Je bent een expert-lexicograaf die definities schrijft voor een Nederlands woordenboek."
instruction = f"Breid de volgende korte definitie voor het woord '{lemma}' uit tot een volledige definitie: '{short_def}'"
prompt = f"{system_prompt}\n\n{instruction}"
# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate the output tokens
print("
Generating definition...")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512, # Chosen based on the longest full definition's token length for this model
num_beams=4, # What was used for the thesis
early_stopping=True
)
# Decode the tokens into a string
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n--- Prompt ---")
print(prompt)
print("\n--- Model Output ---")
print(decoded_output)
- Downloads last month
- -
Model tree for RobbedoesHF/mt5-xl-dutch-definition-expansion-qlora
Base model
google/mt5-xl