Automated E-commerce Product Shots: SDXL & ControlNet Inpainting Workflow

This project details a ComfyUI inpainting workflow that uses SDXL, ControlNet, and LoRA models to generate images of models holding physical products. The ultimate goal is to fully automate this process from a single product image input, though the current version requires manual image preparation.

It focuses on maximizing SDXL's potential for high-quality outputs on low-VRAM systems through a multi-stage approach to generate and refine how a model grips an item. The initial stage employs specialized LoRAs for basic pose and grip, while a subsequent stage uses advanced refinement techniques for hand realism before reintegrating the original product. Both these denoisings are guided via ControlNet.

Currently, the custom-trained LoRAs are primarily effective for items like bags and briefcases and are considered undertrained, which means hand generation can be inconsistent. This workflow is experimental and not yet ready for production. Future development plans include migrating to more advanced models like Flux, fully automating the input preparation steps, and significantly improving LoRA training with more data.

Check out the code here: [Github]