MrBERT-es ESG News Classifier (Spanish)

Model summary

Fine-tuned ESG news classifier for Spanish equity market headlines. Based on BSC-LT/MrBERT-es (ModernBERT, 150M parameters, bilingual ES/EN). Classifies Spanish financial news headlines into ESG pillars (Environmental, Social, Governance) and sentiment (Positive / Negative / Neutral / NA). Training regime: human-gold annotations (1,688 events) augmented with LLaMA-3.1-8B SFT silver labels (62,800 events).

Repository layout

This repository ships four sibling sub-models — one per task head — because the fine-tuned architecture is a shared BSC-LT/MrBERT-es encoder with separate classification heads (three binary pillar heads plus one 4-class sentiment head), not a single multi-class classifier.

mrbert-es-esg/
├── esg_E/         binary head: Environmental pillar (0 = not-E, 1 = E)
├── esg_S/         binary head: Social pillar       (0 = not-S, 1 = S)
├── esg_G/         binary head: Governance pillar   (0 = not-G, 1 = G)
└── sentiment/     4-class head: {Pos, Neg, Neu, NA}

Each sub-folder contains:

model.safetensors — TAPT-adapted ModernBERT-es encoder (~599 MB)
separate_head.pt — classification head weights for this task
config.json — ModernBertModel encoder config
separate_classifier_config.json — head metadata (pillar, hidden_size, num_sentiment_classes)
tokenizer.json, tokenizer_config.json — fast tokenizer (ModernBERT uses a single tokenizer.json; no vocab.txt / special_tokens_map.json)

ESG pillar predictions are independent binary classifications — the same headline can be flagged on any combination of E, S, G, or none (multi-label by design, per the project codebook).

Intended use

ESG signal extraction from Spanish business press
Event study research on ESG news and equity market response
Spanish financial NLP benchmarking

Out of scope: high-stakes automated decisions without human review; languages other than Spanish.

How to use

The classification head is a custom nn.Module stored in separate_head.pt, separate from the encoder weights — so the standard AutoModelForSequenceClassification / pipeline("text-classification", ...) path does not work out of the box. To run inference, load the encoder with AutoModel and apply the matching separate_head.pt for each task.

Minimum loading sketch (encoder only — head loading uses the project's SeparateClassifier class):

import torch
from transformers import AutoTokenizer, AutoModel

head = "esg_E"  # or "esg_S", "esg_G", "sentiment"
tok = AutoTokenizer.from_pretrained(f"DReggio/mrbert-es-esg/{head}")
enc = AutoModel.from_pretrained(f"DReggio/mrbert-es-esg/{head}")
state = torch.load(f"DReggio/mrbert-es-esg/{head}/separate_head.pt")
# instantiate SeparateClassifier(...) from project repo, load state, forward.

Training data

Gold set: 1,688 human-annotated Spanish ESG events (E / S / G binary + 4-class sentiment). Silver augmentation (R3): 62,800 events labelled by a LLaMA-3.1-8B SFT annotator (variant4 prompt, masked-loss QLoRA, mean κ = 0.761 vs human gold). R3 was selected over R2 (Qwen3-8B SFT silver) because LLaMA's higher Governance base-rate yields richer positive-class signal for the G pillar; the R3 fine-tune lifts macro-F1 by 0.7–3.1 pp and MCC_G by 2.2–6.6 pp over the gold-only R1b baseline.

Evaluation (gold_test, n = 267)

Metric	Value
Macro F1	0.8460
F1_E	0.8866
F1_S	0.8485
F1_G	0.8029
MCC_G	0.5953
κ_sentiment	0.7006

Training procedure

Base model: BSC-LT/MrBERT-es
Architecture: separate binary classification head per pillar (E / S / G)
- 4-class sentiment head, over a shared TAPT-adapted ModernBERT-es encoder
Training regimes: R1b (gold only) → R2 (gold + Qwen3 silver) → R3 (gold + LLaMA silver, this checkpoint)
Framework: HuggingFace Transformers + PyTorch
Hardware: Google Colab Pro A100 40GB

Limitations

Governance (G) pillar F1 0.80, MCC_G 0.60 — structurally ambiguous category; use with caution for G-specific downstream tasks.
Silver-label bias: R3 inherits the labelling distribution of the LLaMA-3.1-8B SFT annotator, which has a higher G base-rate (47.5 %) than the Qwen3 silver corpus (34.0 %); applications sensitive to G prevalence should consider the gold-only R1b checkpoint or the Qwen3-silver R2 checkpoint as sensitivity comparisons.
Training data: Spanish peninsular financial press (2014–2024); Latin American Spanish and social media not covered.

License

Apache 2.0 — derived from BSC-LT/MrBERT-es (© Barcelona Supercomputing Center, Apache 2.0).

Citation

@mastersthesis{reggio2026esg,
  author = {Damien Reggio},
  title  = {ESG News Classification and Market Response
            in Spanish Equity Markets},
  school = {FernUni Switzerland},
  year   = {2026}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DReggio/mrbert-es-esg

Base model

BSC-LT/MrBERT

Finetuned

BSC-LT/MrBERT-es