SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 4 days ago • 67
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 23
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents Paper • 2605.09530 • Published 8 days ago • 141
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE Paper • 2505.19645 • Published Feb 16 • 1
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware RakshitAralimatti • Aug 8, 2025 • 35
SD-MoE: Spectral Decomposition for Effective Expert Specialization Paper • 2602.12556 • Published Feb 13 • 1
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 11 days ago • 183
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 329
🍎 Qwopus3.6 Collection This collection features the advanced Qwopus3.6 series of multimodal large models, which are fine-tuned from the Qwen3.6 base models with a focus on e • 4 items • Updated 11 days ago • 43
Thinking with Drafting: Optical Decompression via Logical Reconstruction Paper • 2602.11731 • Published Feb 12 • 35
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 18 days ago • 50
Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS Paper • 2604.25679 • Published 20 days ago • 1
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 22
view article Article How 🤗 Accelerate runs very large models thanks to PyTorch sgugger • Sep 27, 2022 • 18