high-git-star
updated
Packing Input Frame Context in Next-Frame Prediction Models for Video
Generation
Paper
• 2504.12626
• Published • 51
Paper
• 2505.09388
• Published • 341
Qwen-Image Technical Report
Paper
• 2508.02324
• Published • 276
Paper
• 2508.10104
• Published • 308
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility,
Reasoning, and Efficiency
Paper
• 2508.18265
• Published • 224
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper
• 2508.05748
• Published • 143
VibeVoice Technical Report
Paper
• 2508.19205
• Published • 171
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper
• 2508.15144
• Published • 66
Prompt Orchestration Markup Language
Paper
• 2508.13948
• Published • 48
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper
• 2507.02592
• Published • 126
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM
Fine-Tuning Data from Unstructured Documents
Paper
• 2507.04009
• Published • 55
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
• 2506.07900
• Published • 99
OmniGen2: Exploration to Advanced Multimodal Generation
Paper
• 2506.18871
• Published • 78
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper
• 2506.07491
• Published • 51
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published • 311
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published • 124
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published • 87
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published • 173
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
• 2503.08638
• Published • 73
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
• 2503.10291
• Published • 36
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
• 2503.09516
• Published • 40
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for
Open Base Models in the Wild
Paper
• 2503.18892
• Published • 31
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published • 104
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper
• 2501.12326
• Published • 64