Wan 2.1 is a family of video models.
You will first need:
umt5_xxl_fp8_e4m3fn_scaled.safetensors goes in: ComfyUI/models/text_encoders/
wan_2.1_vae.safetensors goes in: ComfyUI/models/vae/
The diffusion models can be found here
These files go in: ComfyUI/models/diffusion_models/
These examples use the 16 bit files but you can use the fp8 ones instead if you don’t have enough memory.
This workflow requires the wan2.1_t2v_1.3B_fp16.safetensors file (put it in: ComfyUI/models/diffusion_models/). You can also use it with the 14B model.
This workflow requires the wan2.1_i2v_480p_14B_bf16.safetensors file (put it in: ComfyUI/models/diffusion_models/) and clip_vision_h.safetensors which goes in: ComfyUI/models/clip_vision/
Note this example only generates 33 frames at 512x512 because I wanted it to be accessible, the model can do more than that. The 720p model is pretty good if you have the hardware/patience to run it.
The input image can be found on the flux page.
Here’s the same example with the 720p model: