MrBERT-es ESG News Classifier (Spanish)

Model summary

Fine-tuned ESG news classifier for Spanish equity market headlines. Based on BSC-LT/MrBERT-es (ModernBERT, 150M parameters, bilingual ES/EN). Classifies Spanish financial news headlines into ESG pillars (Environmental, Social, Governance) and sentiment (Positive / Negative / Neutral / NA). Training regime: human-gold annotations (1,688 events) augmented with LLaMA-3.1-8B SFT silver labels (62,800 events).

Repository layout

This repository ships four sibling sub-models β€” one per task head β€” because the fine-tuned architecture is a shared BSC-LT/MrBERT-es encoder with separate classification heads (three binary pillar heads plus one 4-class sentiment head), not a single multi-class classifier.

mrbert-es-esg/
β”œβ”€β”€ esg_E/         binary head: Environmental pillar (0 = not-E, 1 = E)
β”œβ”€β”€ esg_S/         binary head: Social pillar       (0 = not-S, 1 = S)
β”œβ”€β”€ esg_G/         binary head: Governance pillar   (0 = not-G, 1 = G)
└── sentiment/     4-class head: {Pos, Neg, Neu, NA}

Each sub-folder contains:

  • model.safetensors β€” TAPT-adapted ModernBERT-es encoder (~599 MB)
  • separate_head.pt β€” classification head weights for this task
  • config.json β€” ModernBertModel encoder config
  • separate_classifier_config.json β€” head metadata (pillar, hidden_size, num_sentiment_classes)
  • tokenizer.json, tokenizer_config.json β€” fast tokenizer (ModernBERT uses a single tokenizer.json; no vocab.txt / special_tokens_map.json)

ESG pillar predictions are independent binary classifications β€” the same headline can be flagged on any combination of E, S, G, or none (multi-label by design, per the project codebook).

Intended use

  • ESG signal extraction from Spanish business press
  • Event study research on ESG news and equity market response
  • Spanish financial NLP benchmarking

Out of scope: high-stakes automated decisions without human review; languages other than Spanish.

How to use

The classification head is a custom nn.Module stored in separate_head.pt, separate from the encoder weights β€” so the standard AutoModelForSequenceClassification / pipeline("text-classification", ...) path does not work out of the box. To run inference, load the encoder with AutoModel and apply the matching separate_head.pt for each task.

Minimum loading sketch (encoder only β€” head loading uses the project's SeparateClassifier class):

import torch
from transformers import AutoTokenizer, AutoModel

head = "esg_E"  # or "esg_S", "esg_G", "sentiment"
tok = AutoTokenizer.from_pretrained(f"DReggio/mrbert-es-esg/{head}")
enc = AutoModel.from_pretrained(f"DReggio/mrbert-es-esg/{head}")
state = torch.load(f"DReggio/mrbert-es-esg/{head}/separate_head.pt")
# instantiate SeparateClassifier(...) from project repo, load state, forward.

Training data

Gold set: 1,688 human-annotated Spanish ESG events (E / S / G binary + 4-class sentiment). Silver augmentation (R3): 62,800 events labelled by a LLaMA-3.1-8B SFT annotator (variant4 prompt, masked-loss QLoRA, mean ΞΊ = 0.761 vs human gold). R3 was selected over R2 (Qwen3-8B SFT silver) because LLaMA's higher Governance base-rate yields richer positive-class signal for the G pillar; the R3 fine-tune lifts macro-F1 by 0.7–3.1 pp and MCC_G by 2.2–6.6 pp over the gold-only R1b baseline.

Evaluation (gold_test, n = 267)

Metric Value
Macro F1 0.8460
F1_E 0.8866
F1_S 0.8485
F1_G 0.8029
MCC_G 0.5953
ΞΊ_sentiment 0.7006

Training procedure

  • Base model: BSC-LT/MrBERT-es
  • Architecture: separate binary classification head per pillar (E / S / G)
    • 4-class sentiment head, over a shared TAPT-adapted ModernBERT-es encoder
  • Training regimes: R1b (gold only) β†’ R2 (gold + Qwen3 silver) β†’ R3 (gold + LLaMA silver, this checkpoint)
  • Framework: HuggingFace Transformers + PyTorch
  • Hardware: Google Colab Pro A100 40GB

Limitations

  • Governance (G) pillar F1 0.80, MCC_G 0.60 β€” structurally ambiguous category; use with caution for G-specific downstream tasks.
  • Silver-label bias: R3 inherits the labelling distribution of the LLaMA-3.1-8B SFT annotator, which has a higher G base-rate (47.5 %) than the Qwen3 silver corpus (34.0 %); applications sensitive to G prevalence should consider the gold-only R1b checkpoint or the Qwen3-silver R2 checkpoint as sensitivity comparisons.
  • Training data: Spanish peninsular financial press (2014–2024); Latin American Spanish and social media not covered.

License

Apache 2.0 β€” derived from BSC-LT/MrBERT-es (Β© Barcelona Supercomputing Center, Apache 2.0).

Citation

@mastersthesis{reggio2026esg,
  author = {Damien Reggio},
  title  = {ESG News Classification and Market Response
            in Spanish Equity Markets},
  school = {FernUni Switzerland},
  year   = {2026}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DReggio/mrbert-es-esg

Base model

BSC-LT/MrBERT
Finetuned
BSC-LT/MrBERT-es
Finetuned
(10)
this model

Dataset used to train DReggio/mrbert-es-esg