Block Diffusion for Flash Speculative Decoding
AI & ML interests
Efficient AI
Recent Activity
Papers
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 10 -
z-lab/Qwen3.6-27B-PARO
Image-Text-to-Text • 6B • Updated • 5.38k • 21 -
z-lab/gemma-4-31B-it-PARO
Image-Text-to-Text • 6B • Updated • 8k • 19 -
z-lab/gemma-4-E4B-it-PARO
Image-Text-to-Text • 5B • Updated
Block Diffusion for Flash Speculative Decoding
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 10 -
z-lab/Qwen3.6-27B-PARO
Image-Text-to-Text • 6B • Updated • 5.38k • 21 -
z-lab/gemma-4-31B-it-PARO
Image-Text-to-Text • 6B • Updated • 8k • 19 -
z-lab/gemma-4-E4B-it-PARO
Image-Text-to-Text • 5B • Updated
Accelerating LLM Fine-Tuning with Contextual Sparsity