Instructions to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF", dtype="auto")

llama-cpp-python

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF",
	filename="VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Use Docker

docker model run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

LM Studio
Jan
Ollama
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Ollama:
```
ollama run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
```

Unsloth Studio

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF to start chatting

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Docker Model Runner:
```
docker model run hf.co/vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M
```

Lemonade

How to use vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull vinhnx90/VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF:Q4_K_M

Run and chat with the model

lemonade run user.VT-Orpheus-3B-TTS-Ceylia-Q4KM-GGUFF-Q4_K_M

List all available models

lemonade list

Introduction

VT-Orpheus-3B-TTS-lora-adapter is a Lora adapter fine-tuned from Orpheus-TTS.

Dataset is from https://huggingface.co/datasets/Jinsaryko/Ceylia.

Sample Audio

Check my setup guide for running the local Orpheus model with my Lora adapter.

python gguf_orpheus.py --text "Seriously? <giggle> That's the cutest thing I've ever heard ! " --voice ceylia

python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia

python gguf_orpheus.py --text "Morning! <giggle> I finally finished that project last night. It took forever, but the results look amazing. <yawn> Sorry, still a bit tired from staying up so late." --voice ceylia

Running Locally

This section provides a step-by-step guide to running the VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf model locally on your machine. There are two main methods to run this model:

Method 1: Using LM Studio (Recommended for beginners)

Prerequisites

LM Studio installed on your computer
Python 3.8+ installed
The VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf model file

Setup Steps

Install LM Studio

Download and install LM Studio from lmstudio.ai
Launch LM Studio

Load the GGUF model

In LM Studio, click "Add Model"
Select the VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf file from your computer
Once added, click on the model to load it

Start the local server

Go to the "Local Server" tab in LM Studio
Click "Start Server" to launch the local API server (default address is http://127.0.0.1:1234)

Clone orpheus-tts-local repository

git clone https://github.com/isaiahbjork/orpheus-tts-local.git
cd orpheus-tts-local

Install dependencies

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

5.1 Edit gguf_orpheus.py to include new ceylia voice

Open gguf_orpheus.py file in ./orpheus-tts-local directory, find the line of AVAILABLE_VOICES and DEFAULT_VOICE and edit to include ceylia voice, default is tara.

# Available voices based on the Orpheus-TTS repository
AVAILABLE_VOICES = ["tara", "leah", "jess", "leo", "dan", "mia", "zac", "zoe", "ceylia"]
DEFAULT_VOICE = "ceylia"

Save the file gguf_orpheus.py.

Run the model

python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia --output output.wav

Available Parameters

--text: The text to convert to speech (required)
--voice: The voice to use (default is "tara", but use "ceylia" for this model)
--output: Output WAV file path (default: auto-generated filename)
--temperature: Temperature for generation (default: 0.6)
--top_p: Top-p sampling parameter (default: 0.9)
--repetition_penalty: Repetition penalty (default: 1.1)
--backend: Specify the backend (default: "lmstudio", also supports "ollama")

Method 2: Using llama.cpp directly

Prerequisites

llama.cpp installed and built on your system
The VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf model file

Setup Steps

Clone and build llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

Start the server

./llama-server -m /path/to/VT-Orpheus-3B-TTS-Ceylia.Q4_K_M.gguf --port 8080

Clone orpheus-tts-local repository

git clone https://github.com/isaiahbjork/orpheus-tts-local.git
cd orpheus-tts-local

Install dependencies

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Run the model with custom API URL

python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> Let's play! <sniffle> This is so exciting! <giggle>" --voice ceylia --output output.wav --api_url http://localhost:8080/v1

Emotion Tags

You can add emotion to the speech by including the following tags in your text:

<giggle>
<laugh>
<chuckle>
<sigh>
<cough>
<sniffle>
<groan>
<yawn>
<gasp>

Example:

python gguf_orpheus.py --text "Hi! I'm Ceylia. <laugh> This is so exciting! <giggle>" --voice ceylia

Troubleshooting

Error connecting to server: Make sure LM Studio's server is running or llama.cpp server is running on the correct port
Low-quality audio: Try adjusting the temperature (higher = more variance) or repetition_penalty (>1.1 recommended)
Slow generation: Reduce model precision or run on a more powerful GPU if available