Text Generation
GGUF
English
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
science fiction
romance
all genres
story
writing
vivid prosing
vivid writing
fiction
roleplaying
bfloat16
swearing
role play
sillytavern
backyard
horror
llama 3.1
context 128k
mergekit
Merge
llama
llama-3
llama-3.1
conversational
Instructions to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf", filename="DS-R1-D-L3.1-16.5B-Brainstorm-D_AU-IQ4_XS.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
Use Docker
docker model run hf.co/DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
- Ollama
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with Ollama:
ollama run hf.co/DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
- Unsloth Studio new
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf to start chatting
- Docker Model Runner
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with Docker Model Runner:
docker model run hf.co/DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
- Lemonade
How to use DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull DavidAU/DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf:Q4_K_M
Run and chat with the model
lemonade run user.DeepSeek-R1-Distill-Llama-3.1-16.5B-Brainstorm-gguf-Q4_K_M
List all available models
lemonade list
Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ tags:
|
|
| 34 |
pipeline_tag: text-generation
|
| 35 |
---
|
| 36 |
|
| 37 |
-
<h2>
|
| 38 |
|
| 39 |
<img src="deepseek.jpg" style="float:right; width:300px; height:300px; padding:10px;">
|
| 40 |
|
|
@@ -46,6 +46,8 @@ Keep in mind this model is experimental and may require one or more regens to wo
|
|
| 46 |
|
| 47 |
Brainstorm 40x is by DavidAU, and extends the "decision making" and "creativity" of an LLM/AI.
|
| 48 |
|
|
|
|
|
|
|
| 49 |
The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "Distill" model from Deepseek:
|
| 50 |
|
| 51 |
[ https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B ]
|
|
@@ -64,16 +66,16 @@ The Grand Horrors retain all of their "horror/creative power" and are augmented
|
|
| 64 |
|
| 65 |
<b>CRITICAL SETTINGS:</B>
|
| 66 |
|
| 67 |
-
1. Set Temp between 0 and .8, higher than this "think" functions
|
| 68 |
-
2. Set "repeat penalty" to 1.02 to 1.08 and "repeat penalty range" to 64-128.
|
| 69 |
-
3.
|
| 70 |
-
4.
|
| 71 |
-
5.
|
| 72 |
-
6.
|
| 73 |
-
7.
|
| 74 |
-
8.
|
| 75 |
-
9.
|
| 76 |
-
10.
|
| 77 |
|
| 78 |
---
|
| 79 |
|
|
|
|
| 34 |
pipeline_tag: text-generation
|
| 35 |
---
|
| 36 |
|
| 37 |
+
<h2>Deepseek-R1-Llama3.1 with Brainstorm 40x, 16.5B. (72 layers, 643 tensors) </h2>
|
| 38 |
|
| 39 |
<img src="deepseek.jpg" style="float:right; width:300px; height:300px; padding:10px;">
|
| 40 |
|
|
|
|
| 46 |
|
| 47 |
Brainstorm 40x is by DavidAU, and extends the "decision making" and "creativity" of an LLM/AI.
|
| 48 |
|
| 49 |
+
Higher temps will result in deeper, richer "thoughts"... and frankly more interesting ones too.
|
| 50 |
+
|
| 51 |
The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "Distill" model from Deepseek:
|
| 52 |
|
| 53 |
[ https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B ]
|
|
|
|
| 66 |
|
| 67 |
<b>CRITICAL SETTINGS:</B>
|
| 68 |
|
| 69 |
+
1. Set Temp between 0 and .8, higher than this "think" functions MAY not activate. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
|
| 70 |
+
2. Set "repeat penalty" to 1.02 to 1.08 and "repeat penalty range" to 64-128.
|
| 71 |
+
3. Temps 1+, 2+ will deepen thoughts, conclusions, and generation thinking.
|
| 72 |
+
4. This model requires a Llama 3 Instruct and/or Command-R chat template. (see notes on "System Prompt" / "Role" below)
|
| 73 |
+
5. It may take one or more regens for "thinking" to "activate"... depending on your prompt.
|
| 74 |
+
6. If you enter a prompt without implied "step by step" requirements, "thinking" (one or more) will activate AFTER first generation. You will also get a lot of variations - some will continue the generation, others will talk about how to improve it, and some (ie generation of a scene) will cause the characters to "reason" about this situation. In some cases, the model will ask you to continue generation / thoughts too. In some cases the model's "thoughts" may appear in the generation itself.
|
| 75 |
+
7. State the word size length max IN THE PROMPT for best results, especially for activation of "thinking."
|
| 76 |
+
8. I have found opening a "new chat" per prompt works best with "thinking/reasoning activation", with temp .6
|
| 77 |
+
9. Depending on your AI app, "thoughts" may appear with "< THINK >" and "</ THINK >" tags AND/OR the AI will generate "thoughts" directly in the main output or later output(s).
|
| 78 |
+
10. Although quant Q4KM was used for testing/examples, higher quants will provide better generation / more sound "reasoning/thinking".
|
| 79 |
|
| 80 |
---
|
| 81 |
|