Instructions to use bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx") model = AutoModelForSeq2SeqLM.from_pretrained("bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx") - MLX
How to use bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir g2p-multilingual-byT5-tiny-16-layers-mlx bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Charsiu G2P (byT5-tiny-16-layers) — safetensors port
Self-contained, safetensors-packaged conversion of Charsiu's
g2p_multilingual_byT5_tiny_16_layers
multilingual grapheme-to-phoneme model.
The upstream repo ships pytorch_model.bin and no tokenizer files. This
repo adds:
model.safetensors— same weights, HF-standard T5 key naming, F32.config.json— upstream's config, with the stale local path stripped.tokenizer_config.json+special_tokens_map.json— byte-level byT5 tokenizer config (fromgoogle/byt5-small, which byT5 G2P uses).
No architecture or training changes. Predictions match the upstream PyTorch model within floating-point tolerance.
Usage
Python (transformers / PyTorch)
from transformers import AutoTokenizer, T5ForConditionalGeneration
tok = AutoTokenizer.from_pretrained("bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx")
model = T5ForConditionalGeneration.from_pretrained("bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx")
# Prepend the language tag, e.g. "<eng-us>: " for American English.
inputs = tok(["<eng-us>: hello"], return_tensors="pt")
out = model.generate(**inputs, num_beams=1, max_length=50)
print(tok.batch_decode(out, skip_special_tokens=True)) # => ['ˈhɛɫoʊ']
Rust (MLX, Apple Silicon)
The reference Rust implementation is in
bearcove/bee — see the
bee-g2p-charsiu-mlx crate. It reads model.safetensors directly via
mlx-rs.
use bee_g2p_charsiu_mlx::engine::G2pEngine;
let mut engine = G2pEngine::load("path/to/model-dir")?;
let ipa = engine.g2p("hello", "eng-us")?; // => "ˈhɛɫoʊ"
Language codes
Charsiu uses ISO 639-derived codes with dialect suffixes where
applicable — e.g. eng-us for American English, eng-uk for British
English. See the
Charsiu language code table
for all 100 supported languages.
License
MIT, matching the upstream Charsiu project.
Citation
If you use this model, please cite the original Charsiu paper:
@inproceedings{zhu2022byt5,
title={{ByT5} model for massively multilingual grapheme-to-phoneme conversion},
author={Zhu, Jian and Zhang, Cong and Jurgens, David},
booktitle={Proc. Interspeech 2022},
year={2022},
eprint={2204.03067},
archivePrefix={arXiv}
}
Upstream project: https://github.com/lingjzhu/CharsiuG2P
- Downloads last month
- 17
Quantized
Model tree for bearcove/g2p-multilingual-byT5-tiny-16-layers-mlx
Base model
charsiu/g2p_multilingual_byT5_tiny_16_layers