ComfyUI_examples

Nvidia Cosmos Predict2

These are a family of text to image and image to video models from Nvidia.

Files to Download

You will first need:

Text encoder and VAE:

oldt5_xxl_fp8_e4m3fn_scaled.safetensors goes in: ComfyUI/models/text_encoders/

wan_2.1_vae.safetensors goes in: ComfyUI/models/vae/

Note: oldt5_xxl is not the same as the t5xxl used in flux and other models. oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1

You can find all the diffusion models (go in ComfyUI/models/diffusion_models/) here: Repackaged safetensors files or Official Nvidia Model Files

Workflows

Text to Image

This workflow uses the 2B text to image cosmos predict2 model. The file used in the workflow is cosmos_predict2_2B_t2i.safetensors this file goes in: ComfyUI/models/diffusion_models/

Example

You can load this image in ComfyUI to get the full workflow.

I think the 2B model is the most interesting one but you can find the bigger 14B model here: cosmos_predict2_14B_t2i.safetensors and use it in the workflow above.

Image to Video

These models are pretty picky about the resolution/length of the videos. This workflow is for the 480p models, for the 720p models you will have to set the resolution to 720p or your results might be bad.

This workflow uses the 2B image to video cosmos predict2 model. The file used in the workflow is cosmos_predict2_2B_video2world_480p_16fps.safetensors this file goes in: ComfyUI/models/diffusion_models/

Example

Workflow in Json format