Hi, welcome to use,
- The audio tokenizer gguf file in iq4xs, even i say so, only little smaller and faster than q8_0;
- This repo's qwen3-tts-0.6b-q8_0.gguf is smaller. iq4xs quant hurts the voice clone ablity, but maybe we don't really care that little lose.
- Updated new quanted iq4_xs tokenizer, which downcasted some fp16 tensors to q8_0.(211M)
- Updated iq3s tokenizer, yes, it's working well and faster
- Updated new quanted q5k, which can handle longer transcript than q4 as q8.
- Updated MXFP4 which is faster Now you only need 703 MB RAM in total to run this model!
Note:
- Only use 4bit base for single short paragraph or realtime llm output transcript, it will not work even for a short article !!! (maybe i should delete them.)
- All WAvtokenizer functioning well.
- Recommand use q5K Base + MXFP4 Tokenizer for robust, speed, and save VRAM, which is 869 M in total.
- Downloads last month
- 908
Hardware compatibility
Log In to add your hardware
Model tree for Jahaz/Qwen3-tts-0.6b-gguf-for-koboldcpp
Base model
Qwen/Qwen3-TTS-12Hz-0.6B-Base