Instructions to use temsa/eurocivic-gliner2-onnx-cpu-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use temsa/eurocivic-gliner2-onnx-cpu-v2 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("temsa/eurocivic-gliner2-onnx-cpu-v2") - GLiNER2
How to use temsa/eurocivic-gliner2-onnx-cpu-v2 with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("temsa/eurocivic-gliner2-onnx-cpu-v2") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
EuroCivic GLiNER2 ONNX CPU
EuroCivic GLiNER2 ONNX CPU is a CPU-oriented GLiNER2 bundle for zero-shot entity extraction and label-set classification in civic and public-service text. It is based on fastino/gliner2-base-v1, then fine-tuned for civic-service intent classification, sentiment-style feedback classification, and PII extraction.
The primary language targets are English and Irish. Additional European-language coverage was kept in the training mix for German, Spanish, French, Polish, Ukrainian, Dutch, Portuguese, Italian, and Swedish. Quality is not expected to be equal across all languages; validate thresholds for your own domain.
Precision variants
The gliner2_config.json file exposes three precision keys:
| Precision key | Encoder | Classifier | Span representation | Count embedding | Notes |
|---|---|---|---|---|---|
fp32 |
fp32 | fp32 | fp32 | fp32 | Highest-fidelity export. |
qint8 |
fp32 | QInt8 | QInt8 | fp32 passthrough | Recommended CPU setting from validation. Partial q8, not full-model q8. |
dynamic_q8 |
fp32 | fp32 | QInt8 | fp32 passthrough | Alternative partial dynamic-q8 setting. |
The quantized variants are intentionally partial. Full encoder quantization was not retained because validation quality dropped sharply. Use qint8 first unless your runtime has a specific reason to prefer dynamic_q8.
What it is tuned for
- Zero-shot entity extraction with arbitrary labels.
- PII-like entity extraction for names, email addresses, phone numbers, postal addresses, passport-like identifiers, national identifiers, and IBAN-style values.
- Civic and public-service query classification, including task/service intent, publication/report intent, update/news intent, initiative/hub intent, specialist workflow intent, recurring publication series, annual programme cycles, generic topics, and people/office/organisation queries.
- Sentiment-style feedback classification with labels such as satisfied, frustrated, and neutral.
- Irish-centric examples, including Irish-language text, Irish names with and without diacritics, and PPS/PSP-style identifiers.
- European-friendly multilingual classification and PII extraction examples.
Training data transparency
Public Hugging Face datasets used in the adapter lineage:
nvidia/Nemotron-PII(cc-by-4.0)DataikuNLP/kiji-pii-training-data(apache-2.0)ai4privacy/pii-masking-300k(other; review the dataset card before redistribution-sensitive use)
The final tuning stage used 133,901 training rows and 5,579 validation rows. It mixed synthetic civic-intent, synthetic multilingual PII, synthetic sentiment, and local replay-derived examples. The non-public training rows are not redistributed in this repository.
Explicit language-tagged rows in the final stage included: English 72,080, Irish 8,781, German 2,989, Spanish 2,809, French 2,742, Polish 2,541, Ukrainian 2,438, and 845 each for Dutch, Portuguese, Italian, and Swedish.
See training_summary.json for the compact provenance and validation summary.
Validation snapshot
Small internal validation checks for the recommended qint8 key:
- Civic intent contrastive set:
81/135 - Civic intent smoke set:
7/9 - Sentiment smoke set:
5/6 - PII label-level smoke: precision
0.6875, recall0.9167
These are not broad public benchmarks. Treat them as smoke/regression checks for this export, not a guarantee of production accuracy.
Files
gliner2_config.json: precision map and export metadata.onnx/encoder.onnx: shared fp32 encoder used by all active precision keys.onnx/classifier.onnx,onnx/classifier.qint8.onnx: classification heads.onnx/span_rep.onnx,onnx/span_rep.qint8.onnx,onnx/span_rep.dynamic_q8.onnx: span representation heads.onnx/count_embed.onnx,onnx/count_embed.qint8.onnx,onnx/count_embed.dynamic_q8.onnx: count embedding component; q8-named variants are fp32 passthrough copies for precision-key completeness.- Tokenizer and base config files needed by GLiNER2-compatible ONNX runtimes.
Runtime notes
A runtime should select files from gliner2_config.json -> onnx_files -> <precision key>. For CPU use, start with:
MODEL_ID=temsa/eurocivic-gliner2-onnx-cpu
MODEL_PRECISION=qint8
For PII extraction, tune score thresholds against your target text. For classification, use short label descriptions rather than opaque IDs where possible.
Limitations
- This is a civic/public-service specialist, not a general-domain NER benchmark leader.
- Quantized precision keys are partial because the encoder remains fp32.
- PII extraction is not a compliance guarantee and should be paired with tests, thresholds, and review for high-risk uses.
- English and Irish were prioritized; other European languages are included but less heavily weighted.
- Downloads last month
- 419
Model tree for temsa/eurocivic-gliner2-onnx-cpu-v2
Base model
fastino/gliner2-base-v1