Jahaz
/

Qwen3-tts-0.6b-gguf-for-koboldcpp

Model card Files Files and versions

Hi, welcome to use,

The audio tokenizer gguf file in iq4xs, even i say so, only little smaller and faster than q8_0;
This repo's qwen3-tts-0.6b-q8_0.gguf is smaller. iq4xs quant hurts the voice clone ablity, but maybe we don't really care that little lose.
Updated new quanted iq4_xs tokenizer, which downcasted some fp16 tensors to q8_0.(211M)
Updated iq3s tokenizer, yes, it's working well and faster
Updated new quanted q5k, which can handle longer transcript than q4 as q8.
Updated MXFP4 which is faster Now you only need 703 MB RAM in total to run this model!

Note:

Only use 4bit base for single short paragraph or realtime llm output transcript, it will not work even for a short article !!! (maybe i should delete them.)
All WAvtokenizer functioning well.
Recommand use q5K Base + MXFP4 Tokenizer for robust, speed, and save VRAM, which is 869 M in total.

Downloads last month: 908

GGUF

Model size

0.9B params

Architecture

qwen3-tts

Hardware compatibility

Log In to add your hardware

4-bit

8-bit

View +4 variants

Model tree for Jahaz/Qwen3-tts-0.6b-gguf-for-koboldcpp

Base model

Qwen/Qwen3-TTS-12Hz-0.6B-Base

Quantized

(14)

this model