Eugene Oskin's picture

Eugene Oskin

eoskin

·

AI & ML interests

None yet

Recent Activity

liked a model about 16 hours ago

Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers

upvoted a paper about 16 hours ago

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

liked a Space 1 day ago

smolagents/ml-intern

View all activity

Organizations

None yet

upvoted a paper about 16 hours ago

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Paper • 2605.15178 • Published 4 days ago • 67

upvoted a paper 1 day ago

Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 4 days ago • 82

upvoted a paper 3 days ago

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 23

upvoted a collection 3 days ago

Qwen3.6

4 items • Updated 25 days ago • 355

upvoted 2 papers 3 days ago

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

Paper • 2605.09530 • Published 8 days ago • 141

Efficient Pre-Training with Token Superposition

Paper • 2605.06546 • Published 11 days ago • 37

upvoted a paper 4 days ago

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Paper • 2505.19645 • Published Feb 16 • 1

upvoted an article 4 days ago

Article

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

RakshitAralimatti

•

Aug 8, 2025

• 35

upvoted 2 papers 5 days ago

SD-MoE: Spectral Decomposition for Effective Expert Specialization

Paper • 2602.12556 • Published Feb 13 • 1

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

Paper • 2605.06169 • Published 11 days ago • 183

upvoted an article 6 days ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

not-lain

•

Jan 30, 2025

• 329

upvoted a collection 10 days ago

🍎 Qwopus3.6

This collection features the advanced Qwopus3.6 series of multimodal large models, which are fine-tuned from the Qwen3.6 base models with a focus on e • 4 items • Updated 11 days ago • 43

upvoted a paper 11 days ago

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Paper • 2602.11731 • Published Feb 12 • 35

upvoted a collection 13 days ago

Granite 4.1 Language Models

Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 18 days ago • 50

upvoted an article 13 days ago

Article

Granite 4.1 LLMs: How They’re Built

ibm-granite

•

18 days ago

• 70

upvoted 2 papers 14 days ago

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS

Paper • 2604.25679 • Published 20 days ago • 1

Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 52

upvoted a collection 16 days ago

Qwen-Scope

16 items • Updated 3 days ago • 65

upvoted a paper 18 days ago

Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 22

upvoted an article 20 days ago

Article

How 🤗 Accelerate runs very large models thanks to PyTorch

sgugger

•

Sep 27, 2022

• 18