--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3-4B tags: - chat - filter - quality - classification - instruct - qwen - portuguese metrics: - precision - recall - accuracy model-index: - name: portuguese-qwen3-4b-instruct-quality-classifier results: [] datasets: - Polygl0t/portuguese-instruct-quality-qwen-annotations language: - pt pipeline_tag: text-classification --- # Qwen3-4B Instruct Quality Classifier Qwen3-4B Instruct Quality Classifier is a [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) based model that can be used for judging the quality of a user-assistant conversation, serving as a quality filter for instruction-following tasks. This model was trained on the [Portuguese Instruct Quality Qwen Annotations](https://huggingface.co/datasets/Polygl0t/portuguese-instruct-quality-qwen-annotations) dataset. ## Details For training, we added a classification head with a single regression output to [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). For training, we only froze the weights of the embedding layer. - **Dataset:** [Portuguese Instruct Quality Qwen Annotations](https://huggingface.co/datasets/Polygl0t/portuguese-instruct-quality-qwen-annotations) - **Language:** Portuguese - **Number of Training Epochs:** 2 - **Batch size:** 64 - **Optimizer:** `torch.optim.AdamW` (cosine learning rate scheduler with 100 warmup steps) - **Learning Rate:** 5e-5 - **Eval Metric:** `f1-score` This repository has the [source code](https://github.com/Polygl0t/llm-foundry) used to train this model. ### Evaluation Results #### Confusion Matrix | | **1** | **2** | **3** | **4** | **5** | |-------|-------|-------|-------|-------|-------| | **1** | 153 | 35 | 4 | 1 | 2 | | **2** | 17 | 204 | 91 | 11 | 4 | | **3** | 2 | 60 | 578 | 143 | 7 | | **4** | 0 | 1 | 99 | 2076 | 348 | | **5** | 0 | 0 | 5 | 299 | 5860 | - Precision: 0.8248 - Recall: 0.7821 - F1 Macro: 0.8082 - Accuracy: 0.894 ## Usage Here's an example of how to use Qwen3-4B Instruct Quality Classifier for scoring a conversation in Portuguese: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained("Polygl0t/portuguese-qwen3-4b-instruct-quality-classifier") model = AutoModelForSequenceClassification.from_pretrained("Polygl0t/portuguese-qwen3-4b-instruct-quality-classifier") model.to(device) good_messages = [ { "role": "user", "content": "Qual é a capital de Portugal?" }, { "role": "assistant", "content": "A capital de Portugal é Lisboa." } ] bad_messages = [ { "role": "user", "content": "Qual é a capital de Portugal?" }, { "role": "assistant", "content": "Minha cor favorita é azul." } ] for message in [good_messages, bad_messages]: # Format the conversation into a single string (which is how the model was trained) text = tokenizer.apply_chat_template( message, tokenize=False, ) # This model was fine-tuned with sequences up to 6032 tokens long inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True).to(device) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits.squeeze(-1).float().cpu().numpy() score = [x + 1 for x in logits.tolist()][0] # scores are produced in the range [0, 4]. To convert to the range [1, 5], we add 1 to the score. print({ "text": text, "score": score, "int_score": [int(round(max(0, min(score, 4)))) + 1 for score in logits][0], # scores are produced in the range [0, 4]. To convert to the range [1, 5], we add 1 to the rounded score. }) ``` ## Cite as 🤗 ```latex @misc{correa2026tucano2cool, title={{Tucano 2 Cool: Better Open Source LLMs for Portuguese}}, author={Nicholas Kluge Corr{\^e}a and Aniket Sen and Shiza Fatimah and Sophia Falk and Lennard Landgraf and Julia Kastner and Lucie Flek}, year={2026}, eprint={2603.03543}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2603.03543}, } ``` ## Aknowlegments Polyglot is a project funded by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MWK) as part of TRA Sustainable Futures (University of Bonn) and the Excellence Strategy of the federal and state governments. We also gratefully acknowledge the granted access to the [Marvin cluster](https://www.hpc.uni-bonn.de/en/systems/marvin) hosted by [University of Bonn](https://www.uni-bonn.de/en) along with the support provided by its High Performance Computing & Analytics Lab. ## License Qwen3-4B Instruct Quality Classifier is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](LICENSE) file.