These are a family of text to image and image to video models from Nvidia.
You will first need:
oldt5_xxl_fp8_e4m3fn_scaled.safetensors goes in: ComfyUI/models/text_encoders/
wan_2.1_vae.safetensors goes in: ComfyUI/models/vae/
Note: oldt5_xxl is not the same as the t5xxl used in flux and other models. oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1
You can find all the diffusion models (go in ComfyUI/models/diffusion_models/) here: Repackaged safetensors files or Official Nvidia Model Files
This workflow uses the 2B text to image cosmos predict2 model. The file used in the workflow is cosmos_predict2_2B_t2i.safetensors this file goes in: ComfyUI/models/diffusion_models/
You can load this image in ComfyUI to get the full workflow.
I think the 2B model is the most interesting one but you can find the bigger 14B model here: cosmos_predict2_14B_t2i.safetensors and use it in the workflow above.
These models are pretty picky about the resolution/length of the videos. This workflow is for the 480p models, for the 720p models you will have to set the resolution to 720p or your results might be bad.
This workflow uses the 2B image to video cosmos predict2 model. The file used in the workflow is cosmos_predict2_2B_video2world_480p_16fps.safetensors this file goes in: ComfyUI/models/diffusion_models/