--- library_name: diffusers tags: - modular-diffusers - diffusers - stable-diffusion-xl - inpainting - image-to-image - controlnet - text-to-image - modular-diffusers - diffusers - stable-diffusion-xl - inpainting - image-to-image - controlnet - text-to-image --- This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework. **Pipeline Type**: StableDiffusionXLAutoBlocks **Description**: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using Stable Diffusion XL. This pipeline uses a 5-block architecture that can be customized and extended. ## Example Usage [TODO] ## Pipeline Architecture This modular pipeline is composed of the following blocks: 1. **text_encoder** (`StableDiffusionXLTextEncoderStep`) - Text Encoder step that generate text_embeddings to guide the image generation 2. **ip_adapter** (`StableDiffusionXLAutoIPAdapterStep`) - Run IP Adapter step if `ip_adapter_image` is provided. This step should be placed before the 'input' step. 3. **vae_encoder** (`StableDiffusionXLAutoVaeEncoderStep`) - Vae encoder step that encode the image inputs into their latent representations. 4. **denoise** (`StableDiffusionXLCoreDenoiseStep`) - Core step that performs the denoising process. 5. **decode** (`StableDiffusionXLAutoDecodeStep`) - Decode step that decode the denoised latents into images outputs. ## Model Components 1. text_encoder (`CLIPTextModel`) 2. text_encoder_2 (`CLIPTextModelWithProjection`) 3. tokenizer (`CLIPTokenizer`) 4. tokenizer_2 (`CLIPTokenizer`) 5. guider (`ClassifierFreeGuidance`) 6. image_encoder (`CLIPVisionModelWithProjection`) 7. feature_extractor (`CLIPImageProcessor`) 8. unet (`UNet2DConditionModel`) 9. vae (`AutoencoderKL`) 10. image_processor (`VaeImageProcessor`) 11. mask_processor (`VaeImageProcessor`) 12. scheduler (`EulerDiscreteScheduler`) 13. controlnet (`ControlNetUnionModel`) 14. control_image_processor (`VaeImageProcessor`) ## Configuration Parameters force_zeros_for_empty_prompt (default: True) requires_aesthetics_score (default: False) ## Workflow Input Specification
text2image - `prompt` (`None`, *optional*): No description provided
image2image - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided
inpainting - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided - `mask_image` (`None`): No description provided
controlnet_text2image - `prompt` (`None`, *optional*): No description provided - `control_image` (`None`): No description provided
controlnet_image2image - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided - `control_image` (`None`): No description provided
controlnet_inpainting - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided - `mask_image` (`None`): No description provided - `control_image` (`None`): No description provided
controlnet_union_text2image - `prompt` (`None`, *optional*): No description provided - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
controlnet_union_image2image - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
controlnet_union_inpainting - `prompt` (`None`, *optional*): No description provided - `image` (`None`): No description provided - `mask_image` (`None`): No description provided - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
ip_adapter_text2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter
ip_adapter_image2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided
ip_adapter_inpainting - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided - `mask_image` (`None`): No description provided
ip_adapter_controlnet_text2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `control_image` (`None`): No description provided
ip_adapter_controlnet_image2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided - `control_image` (`None`): No description provided
ip_adapter_controlnet_inpainting - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided - `mask_image` (`None`): No description provided - `control_image` (`None`): No description provided
ip_adapter_controlnet_union_text2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
ip_adapter_controlnet_union_image2image - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
ip_adapter_controlnet_union_inpainting - `prompt` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`): The image(s) to be used as ip adapter - `image` (`None`): No description provided - `mask_image` (`None`): No description provided - `control_image` (`None`): No description provided - `control_mode` (`None`): No description provided
## Input/Output Specification **Inputs:** - `prompt` (`None`, *optional*): No description provided - `prompt_2` (`None`, *optional*): No description provided - `negative_prompt` (`None`, *optional*): No description provided - `negative_prompt_2` (`None`, *optional*): No description provided - `cross_attention_kwargs` (`None`, *optional*): No description provided - `clip_skip` (`None`, *optional*): No description provided - `ip_adapter_image` (`Image | ndarray | Tensor | list | list | list`, *optional*): The image(s) to be used as ip adapter - `height` (`None`, *optional*): No description provided - `width` (`None`, *optional*): No description provided - `image` (`None`, *optional*): No description provided - `mask_image` (`None`, *optional*): No description provided - `padding_mask_crop` (`None`, *optional*): No description provided - `dtype` (`dtype`, *optional*): The dtype of the model inputs - `generator` (`None`, *optional*): No description provided - `preprocess_kwargs` (`dict | NoneType`, *optional*): A kwargs dictionary that if specified is passed along to the `ImageProcessor` as defined under `self.image_processor` in [diffusers.image_processor.VaeImageProcessor] - `num_images_per_prompt` (`None`, *optional*, defaults to `1`): No description provided - `ip_adapter_embeds` (`list`, *optional*): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step. - `negative_ip_adapter_embeds` (`list`, *optional*): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step. - `num_inference_steps` (`None`, *optional*, defaults to `50`): No description provided - `timesteps` (`None`, *optional*): No description provided - `sigmas` (`None`, *optional*): No description provided - `denoising_end` (`None`, *optional*): No description provided - `strength` (`None`, *optional*, defaults to `0.3`): No description provided - `denoising_start` (`None`, *optional*): No description provided - `latents` (`None`): No description provided - `image_latents` (`Tensor`, *optional*): The latents representing the reference image for image-to-image/inpainting generation. Can be generated in vae_encode step. - `mask` (`Tensor`, *optional*): The mask for the inpainting generation. Can be generated in vae_encode step. - `masked_image_latents` (`Tensor`, *optional*): The masked image latents for the inpainting generation (only for inpainting-specific unet). Can be generated in vae_encode step. - `original_size` (`None`, *optional*): No description provided - `target_size` (`None`, *optional*): No description provided - `negative_original_size` (`None`, *optional*): No description provided - `negative_target_size` (`None`, *optional*): No description provided - `crops_coords_top_left` (`None`, *optional*, defaults to `(0, 0)`): No description provided - `negative_crops_coords_top_left` (`None`, *optional*, defaults to `(0, 0)`): No description provided - `aesthetic_score` (`None`, *optional*, defaults to `6.0`): No description provided - `negative_aesthetic_score` (`None`, *optional*, defaults to `2.0`): No description provided - `control_image` (`None`, *optional*): No description provided - `control_mode` (`None`, *optional*): No description provided - `control_guidance_start` (`None`, *optional*, defaults to `0.0`): No description provided - `control_guidance_end` (`None`, *optional*, defaults to `1.0`): No description provided - `controlnet_conditioning_scale` (`None`, *optional*, defaults to `1.0`): No description provided - `guess_mode` (`None`, *optional*, defaults to `False`): No description provided - `crops_coords` (`tuple | NoneType`, *optional*): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step. - `controlnet_cond` (`Tensor`, *optional*): The control image to use for the denoising process. Can be generated in prepare_controlnet_inputs step. - `conditioning_scale` (`float`, *optional*): The controlnet conditioning scale value to use for the denoising process. Can be generated in prepare_controlnet_inputs step. - `controlnet_keep` (`list`, *optional*): The controlnet keep values to use for the denoising process. Can be generated in prepare_controlnet_inputs step. - `**denoiser_input_fields` (`None`, *optional*): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please add `kwargs_type=denoiser_input_fields` to their parameter spec (`OutputParam`) when they are created and added to the pipeline state - `eta` (`None`, *optional*, defaults to `0.0`): No description provided - `output_type` (`None`, *optional*, defaults to `pil`): No description provided **Outputs:** - `images` (`list`): Generated images.