Generate Minutes-Long Videos with AI

Free & Open Source • 13.6B Parameters • MIT Licensed

⭐ 47
GitHub Stars
❤️ 28
HuggingFace Likes
13.6B
Parameters
720p 30fps
Video Quality

Key Features

🎬

Three Tasks, One Model

Unified architecture supporting Text-to-Video, Image-to-Video, and Video-Continuation within a single framework. No need for multiple models.

Efficient Inference

Generate 720p 30fps videos within minutes using coarse-to-fine strategy and Block Sparse Attention for high-resolution efficiency.

🎞️

Long Video Generation

Natively pretrained on Video-Continuation tasks, enabling minutes-long videos without color drifting or quality degradation.

🏆

Multi-Reward RLHF

Powered by Group Relative Policy Optimization (GRPO), achieving performance comparable to leading commercial solutions.

🔓

MIT Licensed

Completely free and open source. Use commercially, modify freely, and deploy anywhere without restrictions.

💪

Dense Architecture

13.6B dense parameters outperforming 28B MoE models. All parameters are activated, ensuring consistent quality.

Performance Benchmarks

Text-to-Video MOS Scores

Model Accessibility Architecture Total Params Text-Alignment ↑ Visual Quality ↑ Motion Quality ↑ Overall Quality ↑
Veo3 Proprietary - - 3.99 3.23 3.86 3.48
PixVerse-V5 Proprietary - - 3.81 3.13 3.81 3.36
Wan 2.2-T2V-A14B Open Source MoE 28B (14B activated) 3.70 3.26 3.78 3.35
LongCat-Video Open Source Dense 13.6B 3.76 3.25 3.74 3.38

Image-to-Video MOS Scores

Model Accessibility Architecture Total Params Image-Alignment ↑ Text-Alignment ↑ Visual Quality ↑ Motion Quality ↑ Overall Quality ↑
Seedance 1.0 Proprietary - - 4.12 3.70 3.22 3.77 3.35
Hailuo-02 Proprietary - - 4.18 3.85 3.18 3.80 3.27
Wan 2.2-I2V-A14B Open Source MoE 28B (14B activated) 4.18 3.33 3.23 3.79 3.26
LongCat-Video Open Source Dense 13.6B 4.04 3.49 3.27 3.59 3.17

Quick Start

1. Clone the Repository

git clone https://github.com/meituan-longcat/LongCat-Video
cd LongCat-Video

2. Install Dependencies

# Create conda environment
conda create -n longcat-video python=3.10
conda activate longcat-video

# Install torch
pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 --index-url https://download.pytorch.org/whl/cu124

# Install flash-attn-2
pip install ninja psutil packaging flash_attn==2.7.4.post1

# Install other requirements
pip install -r requirements.txt

3. Download Model

pip install "huggingface_hub[cli]"
huggingface-cli download meituan-longcat/LongCat-Video --local-dir ./weights/LongCat-Video

4. Run Text-to-Video

# Single-GPU inference
torchrun run_demo_text_to_video.py --checkpoint_dir=./weights/LongCat-Video --enable_compile

# Multi-GPU inference
torchrun --nproc_per_node=2 run_demo_text_to_video.py --context_parallel_size=2 --checkpoint_dir=./weights/LongCat-Video --enable_compile

Use Cases

📱

Social Media Content

Create engaging videos for Instagram, TikTok, and YouTube from simple text prompts.

🎓

Educational Material

Generate educational videos and visual explanations for online courses and tutorials.

🛍️

Marketing Videos

Produce product demos and promotional content without expensive video production.

🎨

Creative Projects

Bring your artistic visions to life with AI-powered video generation.

🔬

Research

Experiment with video generation models and advance AI research.

📺

Content Production

Scale video content production for media companies and agencies.

What People Are Saying

Community Reactions

"MIT license foundation models changing the game for video generation accessibility"

- AI Community Member

"Weird right? Still if it performs as described at those parameter numbers this will be a banger."

- Developer on Twitter

"47 stars on GitHub already! The open-source AI community is moving fast."

- GitHub Community

Frequently Asked Questions

Is LongCat Video really free?
Yes! LongCat Video is completely free and open source under MIT license. You can use it commercially, modify it freely, and deploy it anywhere without any restrictions or licensing fees.
How long can the generated videos be?
LongCat Video can generate videos up to several minutes long without color drifting or quality degradation, thanks to native Video-Continuation pretraining. This is one of its key advantages over other models.
What hardware do I need to run LongCat Video?
LongCat Video supports both single-GPU and multi-GPU inference. For optimal performance, we recommend using modern NVIDIA GPUs with at least 24GB VRAM. The model uses FlashAttention-2 for efficient memory usage.
How does LongCat compare to commercial solutions like Sora or Runway?
LongCat Video achieves performance comparable to leading commercial solutions (Overall Quality MOS score of 3.38), while being completely free and open source. With only 13.6B dense parameters, it outperforms larger 28B MoE models.
Can I use LongCat Video for commercial projects?
Absolutely! The MIT license allows you to use LongCat Video for any purpose, including commercial projects. You can modify the model, integrate it into your products, and even sell services based on it.
What makes LongCat Video's architecture unique?
LongCat Video uses a unified architecture that handles Text-to-Video, Image-to-Video, and Video-Continuation tasks within a single model. It employs Block Sparse Attention and a coarse-to-fine generation strategy for efficient 720p 30fps video generation.
Is there an API or cloud service available?
Currently, LongCat Video is available for self-hosting. You can deploy it on your own infrastructure using the provided installation instructions. Cloud API services may be available in the future.
How can I contribute to the project?
You can contribute by submitting issues, pull requests, or improvements on the GitHub repository. The project welcomes contributions from the community under the MIT license.