Instructions to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF", dtype="auto") - llama-cpp-python
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF", filename="VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Use Docker
docker model run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Ollama:
ollama run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
- Unsloth Studio
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting
- Pi
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Docker Model Runner:
docker model run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
- Lemonade
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
Run and chat with the model
lemonade run user.VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF-Q4_K_M
List all available models
lemonade list
File size: 6,147 Bytes
1497817 73da0dc 1497817 73da0dc 1497817 8fa9b8a 1497817 3503df6 8fa9b8a 3503df6 8fa9b8a 3503df6 8fa9b8a 3503df6 8fa9b8a d555715 e0f810b 8fa9b8a 67513d9 8fa9b8a 11dac2c 8fa9b8a ef2ddee 67513d9 ef2ddee 67513d9 3503df6 c239f87 3503df6 c239f87 3503df6 c239f87 3503df6 e84e247 3503df6 e84e247 3503df6 e84e247 3503df6 142761b 990cc4a 3503df6 142761b 990cc4a 3503df6 142761b 990cc4a 3503df6 990cc4a 142761b e84e247 3503df6 e84e247 142761b e84e247 2b1734a 142761b 2b1734a 3503df6 e84e247 142761b e84e247 3503df6 e84e247 3503df6 e84e247 3503df6 e84e247 3503df6 9ac8ba3 e84e247 3503df6 e84e247 3503df6 e84e247 142761b e84e247 3503df6 e84e247 142761b e84e247 3503df6 e84e247 142761b e84e247 3503df6 e84e247 142761b e84e247 3503df6 e84e247 142761b e84e247 3503df6 990cc4a 3503df6 990cc4a 3503df6 990cc4a 3503df6 990cc4a 3503df6 990cc4a 3503df6 e84e247 3503df6 e84e247 3503df6 1497817 3503df6 1497817 3503df6 1497817 73da0dc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | ---
base_model:
- unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit
- canopylabs/orpheus-3b-0.1-ft
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
- tts
license: apache-2.0
language:
- en
datasets:
- Jinsaryko/Ceylia
---
# Introduction
VT-Orpheus-3B-TTS-lora-adapter is a Lora adapter fine-tuned from [Orpheus-TTS](https://github.com/canopyai/Orpheus-TTS).
Dataset is from <https://huggingface.co/datasets/Jinsaryko/Ceylia>.
# Sample Audio
Check my [setup guide](https://huggingface.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF#running-locally) for running the local Orpheus model with my Lora adapter.
```python
python gguf_orpheus.py --text "Seriously? <giggle> That's the cutest thing I've ever heard ! " --voice ceylia
```
<audio controls><source src="https://huggingface.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF/resolve/main/output.wav" type="audio/wav"></audio>
```python
python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia
```
<audio controls><source src="https://huggingface.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF/resolve/main/ceylia_20250409_010117.wav" type="audio/wav"></audio>
```python
python gguf_orpheus.py --text "Morning! <giggle> I finally finished that project last night. It took forever, but the results look amazing. <yawn> Sorry, still a bit tired from staying up so late." --voice ceylia
```
<audio controls><source src="https://huggingface.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF/resolve/main/ceylia_20250409_013043.wav" type="audio/wav"></audio>
# Running Locally
This section provides a step-by-step guide to running the `VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf` model locally on your machine. There are two main methods to run this model:
## Method 1: Using LM Studio (Recommended for beginners)
### Prerequisites
1. [LM Studio](https://lmstudio.ai/) installed on your computer
2. Python 3.8+ installed
3. The `VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf` model file
### Setup Steps
1. **Install LM Studio**
- Download and install LM Studio from [lmstudio.ai](https://lmstudio.ai/)
- Launch LM Studio
2. **Load the GGUF model**
- In LM Studio, click "Add Model"
- Select the `VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf` file from your computer
- Once added, click on the model to load it
3. **Start the local server**
- Go to the "Local Server" tab in LM Studio
- Click "Start Server" to launch the local API server (default address is `http://127.0.0.1:1234`)
4. **Clone orpheus-tts-local repository**
```bash
git clone https://github.com/isaiahbjork/orpheus-tts-local.git
cd orpheus-tts-local
```
5. **Install dependencies**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
5.1 **Edit gguf_orpheus.py to include new ceylia voice**
Open `gguf_orpheus.py` file in ./orpheus-tts-local directory, find the line of `AVAILABLE_VOICES` and `DEFAULT_VOICE` and edit to include ceylia voice, default is `tara`.
```python
# Available voices based on the Orpheus-TTS repository
AVAILABLE_VOICES = ["tara", "leah", "jess", "leo", "dan", "mia", "zac", "zoe", "ceylia"]
DEFAULT_VOICE = "ceylia"
```
Save the file `gguf_orpheus.py`.
6. **Run the model**
```bash
python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia --output output.wav
```
### Available Parameters
- `--text`: The text to convert to speech (required)
- `--voice`: The voice to use (default is "tara", but use "ceylia" for this model)
- `--output`: Output WAV file path (default: auto-generated filename)
- `--temperature`: Temperature for generation (default: 0.6)
- `--top_p`: Top-p sampling parameter (default: 0.9)
- `--repetition_penalty`: Repetition penalty (default: 1.1)
- `--backend`: Specify the backend (default: "lmstudio", also supports "ollama")
## Method 2: Using llama.cpp directly
### Prerequisites
1. [llama.cpp](https://github.com/ggerganov/llama.cpp) installed and built on your system
2. The [VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf](https://huggingface.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF/blob/main/VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf) model file
### Setup Steps
1. **Clone and build llama.cpp**
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
```
2. **Start the server**
```bash
./llama-server -m /path/to/VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf --port 8080
```
3. **Clone orpheus-tts-local repository**
```bash
git clone https://github.com/isaiahbjork/orpheus-tts-local.git
cd orpheus-tts-local
```
4. **Install dependencies**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
5. **Run the model with custom API URL**
```bash
python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> Let's play! <sniffle> This is so exciting! <giggle>" --voice ceylia --output output.wav --api_url http://localhost:8080/v1
```
## Emotion Tags
You can add emotion to the speech by including the following tags in your text:
- `<giggle>`
- `<laugh>`
- `<chuckle>`
- `<sigh>`
- `<cough>`
- `<sniffle>`
- `<groan>`
- `<yawn>`
- `<gasp>`
Example:
```bash
python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia
```
## Troubleshooting
1. **Error connecting to server**: Make sure LM Studio's server is running or llama.cpp server is running on the correct port
2. **Low-quality audio**: Try adjusting the temperature (higher = more variance) or repetition_penalty (>1.1 recommended)
3. **Slow generation**: Reduce model precision or run on a more powerful GPU if available
# Uploaded model
- **Developed by:** vinhnx90
- **License:** apache-2.0
- **Finetuned from model :** unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |