Virtual Fashion Try-On & Animation from Stills & Video

This project presents a ComfyUI workflow called "Fashion Try On and Animate," which transforms a still photograph of a person, an image of a garment, and a control video into an animated clip.

The system first uses CatVTON, a lightweight virtual try-on network, to dress the person in the new garment while attempting to preserve their identity. Subsequently, it employs Wan 2.1 Control, a video diffusion model, to animate the re-clothed person using motion data extracted from the control video; this driving video can even be AI-generated(Wan T2V 1.3B was used). The integration of these stages is partially automated using Florence2 and SegmentAnything.

Current limitations include identity preservation and maintaining intricate patterns during animation, as well as segmentation accuracy. Future development aims to enhance these aspects by fine-tuning CatVTON and Wan 2.1 Control, utilizing better VLMs, employing better segmentation models, and potentially tailoring the system for culturally specific attire like intricately patterned and flowing Indian garments.

Check out the code here: [Github]