UniVideo

Unified Understanding, Generation, and Editing for Videos

Cong Wei^{* 1,2} Quande Liu^{† 2} Zixuan Ye² Qiulin Wang² Xintao Wang²
Pengfei Wan² Kun Gai² Wenhu Chen^{† 1}

¹University of Waterloo ²Kling Team, Kuaishou Technology
^*Work done during an internship at Kling Team, Kuaishou Technology. ^†Corresponding authors

Arxiv Paper 🤗 HF Paper Code 🤗 Model

In‑Context Generation

Instruction: "Two men engrossed in a deep conversation. The setting is the interior of a high-tech laboratory."

Instruction: "A man dressed in a vibrant Hawaiian shirt with a colorful floral pattern, sits on a beach lounge chair. On his shoulder, a Pikachu with a small detective hat perches. The man holds an ice cream cone, taking a bite."

Instruction: "A man wearing in a black T-shirt rides a majestic tiger across a sunlit plain. He holds a gaint RTX 4090 graphics card in one hand, maintaining perfect balance as the tiger moves gracefully."

Instruction: "Wu kong, clad in ornate golden armor adorned with intricate red and black patterns, strides confidently through the aisles of a brightly lit modern supermarket."