Instructions to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")
model = AutoModelForCausalLM.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged

SGLang

How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with Docker Model Runner:
```
docker model run hf.co/AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged
```

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Training details:

LoRA fine-tuning:
- lora_r = 16
- lora_alpha = 64
- lora_dropout = 0.2
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
Training args:
- num_train_epochs = 1.25
- per_device_train_batch_size = 1
- per_device_eval_batch_size = 1
- gradient_accumulation_steps = 4
- learning_rate = 2e-5
- weight_decay = 0.001
- optim = "paged_adamw_8bit"
- lr_scheduler_type = "constant"
- warmup_ratio = 0.03
- max_seq_length = 1024
- neftune_noise_alpha = 5
Extra added tokens:
- Task:
- Output Schema:
- <|system|>
- <|user|>
- |assistant|>

Usage:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")
model = AutoModelForCausalLM.from_pretrained(
  "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
  low_cpu_mem_usage=True,
  return_dict=True,
  torch_dtype=torch.float16,
  device_map={"": 0},
)

eval_prompt = """<|system|>
Task: Given the a list of previous user queries, predict 3 future queries.

Output Schema:
{'properties': {'predicted_queries': {'description': 'The list of predicted queries', 'items': {'type': 'string'}, 'title': 'Predicted Queries', 'type': 'array'}}, 'required': ['predicted_queries'], 'title': 'ResponseModel', 'type': 'object'}<｜end▁of▁sentence｜><|user|>
# Previous queries:
---
- Your core strength lies in understanding user intent and delivering clear, truthful, and empathetic responses.
- Utilize the provided reference information to enhance your responses and ensure accuracy.
- Always cross-reference the information for reliability, as references may vary in accuracy.
---<｜end▁of▁sentence｜><|assistant|>"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

merged_model.eval()
with torch.no_grad():
    stop_token_id = tokenizer.convert_tokens_to_ids("<｜end▁of▁sentence｜>")
    gen_config = merged_model.generation_config
    gen_config.temperature = 0.1
    gen_config.max_length = 500
    gen_config.stop_token_id = stop_token_id
    output = merged_model.generate(**model_input, generation_config=gen_config)
    decoded_output = [tokenizer.decode(token_id) for token_id in output]
    print(decoded_output[0])

# Output:
# {"predicted_queries": ["How can I effectively communicate my core strength to users?", "What are some effective strategies for delivering accurate and truthful responses to users?", "How can I ensure accuracy and reliability of my responses to users?"]}

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

F16