Instructions to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged") model = AutoModelForCausalLM.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged
- SGLang
How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged with Docker Model Runner:
docker model run hf.co/AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Training details:
LoRA fine-tuning:
- lora_r = 16
- lora_alpha = 64
- lora_dropout = 0.2
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
Training args:
- num_train_epochs = 1.25
- per_device_train_batch_size = 1
- per_device_eval_batch_size = 1
- gradient_accumulation_steps = 4
- learning_rate = 2e-5
- weight_decay = 0.001
- optim = "paged_adamw_8bit"
- lr_scheduler_type = "constant"
- warmup_ratio = 0.03
- max_seq_length = 1024
- neftune_noise_alpha = 5
Extra added tokens:
- Task:
- Output Schema:
- <|system|>
- <|user|>
- |assistant|>
Usage:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")
model = AutoModelForCausalLM.from_pretrained(
"AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map={"": 0},
)
eval_prompt = """<|system|>
Task: Given the a list of previous user queries, predict 3 future queries.
Output Schema:
{'properties': {'predicted_queries': {'description': 'The list of predicted queries', 'items': {'type': 'string'}, 'title': 'Predicted Queries', 'type': 'array'}}, 'required': ['predicted_queries'], 'title': 'ResponseModel', 'type': 'object'}<|end▁of▁sentence|><|user|>
# Previous queries:
---
- Your core strength lies in understanding user intent and delivering clear, truthful, and empathetic responses.
- Utilize the provided reference information to enhance your responses and ensure accuracy.
- Always cross-reference the information for reliability, as references may vary in accuracy.
---<|end▁of▁sentence|><|assistant|>"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
merged_model.eval()
with torch.no_grad():
stop_token_id = tokenizer.convert_tokens_to_ids("<|end▁of▁sentence|>")
gen_config = merged_model.generation_config
gen_config.temperature = 0.1
gen_config.max_length = 500
gen_config.stop_token_id = stop_token_id
output = merged_model.generate(**model_input, generation_config=gen_config)
decoded_output = [tokenizer.decode(token_id) for token_id in output]
print(decoded_output[0])
# Output:
# {"predicted_queries": ["How can I effectively communicate my core strength to users?", "What are some effective strategies for delivering accurate and truthful responses to users?", "How can I ensure accuracy and reliability of my responses to users?"]}
- Downloads last month
- 3