VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Paper • 2408.02629 • Published • 15
How to use Fudan-FUXI/VIDGEN-v1.0 with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("Fudan-FUXI/VIDGEN-v1.0", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]We trained an excellent video generation model based on the VidGen-1M, a superior training dataset for text-to-video models. Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency. When used to train the video generation model