建筑公司网站源码下载,北京网站建设文章,韩国最新新闻,ps制作网站背景FreeU: Free Lunch in Diffusion U-Net
摘要
作者研究了 U-Net 架构对去噪过程的关键贡献#xff0c;并发现其主干部分主要在去噪方面发挥作用#xff0c;而其跳跃连接主要是向解码器模块引入高频特征#xff0c;这使得网络忽略了主干部分的语义信息。基于这一发现#…FreeU: Free Lunch in Diffusion U-Net
摘要
作者研究了 U-Net 架构对去噪过程的关键贡献并发现其主干部分主要在去噪方面发挥作用而其跳跃连接主要是向解码器模块引入高频特征这使得网络忽略了主干部分的语义信息。基于这一发现我们提出了一种简单却有效的方法-- “FreeU”它无需额外训练或微调就能提升生成质量。我们的核心思路是从策略上对源自 U-Net 跳跃连接和主干特征图的贡献进行重新加权以充分利用 U-Net 架构中这两个组件的优势。在图像和视频生成任务上取得的良好结果表明 FreeU 方法可以很容易地集成到现有的扩散模型中例如稳定扩散Stable Diffusion、DreamBooth、ModelScope、Rerender 和 ReVersion 等只需几行代码就能提升生成质量。 试验表明如果把decoder阶段的全部backbone都放大会导致oversmoothed texture。为了缓解这种情况只在decoder的前两个阶段使用放大backbone并且缩小skip features。skip features需要进行FFT和IFFT详见函数 fourier_filter代码。 完整的stable diffusion1.5的UNet结构可参考UNet2DConditionModel
SDXL效果对比 参数来自于FreeU
SD1.4: will be updated soon
b1: 1.3, b2: 1.4, s1: 0.9, s2: 0.2SD1.5: (will be updated soon
b1: 1.5, b2: 1.6, s1: 0.9, s2: 0.2SD2.1
b1: 1.1, b2: 1.2, s1: 0.9, s2: 0.2
b1: 1.4, b2: 1.6, s1: 0.9, s2: 0.2SDXL
b1: 1.3, b2: 1.4, s1: 0.9, s2: 0.2 SDXL resultsRange for More Parameters
When trying additional parameters, consider the following ranges:b1: 1 ≤ b1 ≤ 1.2
b2: 1.2 ≤ b2 ≤ 1.6
s1: s1 ≤ 1
s2: s2 ≤ 1代码
使用方法
import torch
from diffusers import DiffusionPipelinepipeline DiffusionPipeline.from_pretrained(stabilityai/stable-diffusion-xl-base-1.0, torch_dtypetorch.float16,
).to(cuda)
pipeline.enable_freeu(s10.9, s20.2, b11.3, b21.4) ##add
generator torch.Generator(devicecpu).manual_seed(13)
prompt A squirrel eating a burger
image pipeline(prompt, generatorgenerator).images[0]
imageFreeU函数来自于diffusers
def apply_freeu(resolution_idx: int, hidden_states: torch.Tensor, res_hidden_states: torch.Tensor, **freeu_kwargs
) - Tuple[torch.Tensor, torch.Tensor]:Applies the FreeU mechanism as introduced in https://arxiv.org/abs/2309.11497. Adapted from the official code repository: https://github.com/ChenyangSi/FreeU.Args:resolution_idx (int): Integer denoting the UNet block where FreeU is being applied.hidden_states (torch.Tensor): Inputs to the underlying block.res_hidden_states (torch.Tensor): Features from the skip block corresponding to the underlying block.s1 (float): Scaling factor for stage 1 to attenuate the contributions of the skip features.s2 (float): Scaling factor for stage 2 to attenuate the contributions of the skip features.b1 (float): Scaling factor for stage 1 to amplify the contributions of backbone features.b2 (float): Scaling factor for stage 2 to amplify the contributions of backbone features.if resolution_idx 0:num_half_channels hidden_states.shape[1] // 2hidden_states[:, :num_half_channels] hidden_states[:, :num_half_channels] * freeu_kwargs[b1]res_hidden_states fourier_filter(res_hidden_states, threshold1, scalefreeu_kwargs[s1])if resolution_idx 1:num_half_channels hidden_states.shape[1] // 2hidden_states[:, :num_half_channels] hidden_states[:, :num_half_channels] * freeu_kwargs[b2]res_hidden_states fourier_filter(res_hidden_states, threshold1, scalefreeu_kwargs[s2])return hidden_states, res_hidden_statesdef fourier_filter(x_in: torch.Tensor, threshold: int, scale: int) - torch.Tensor:Fourier filter as introduced in FreeU (https://arxiv.org/abs/2309.11497).This version of the method comes from here:https://github.com/huggingface/diffusers/pull/5164#issuecomment-1732638706x x_inB, C, H, W x.shape# Non-power of 2 images must be float32if (W (W - 1)) ! 0 or (H (H - 1)) ! 0:x x.to(dtypetorch.float32)# fftn does not support bfloat16elif x.dtype torch.bfloat16:x x.to(dtypetorch.float32)# FFTx_freq fftn(x, dim(-2, -1))x_freq fftshift(x_freq, dim(-2, -1))B, C, H, W x_freq.shapemask torch.ones((B, C, H, W), devicex.device)crow, ccol H // 2, W // 2mask[..., crow - threshold : crow threshold, ccol - threshold : ccol threshold] scalex_freq x_freq * mask# IFFTx_freq ifftshift(x_freq, dim(-2, -1))x_filtered ifftn(x_freq, dim(-2, -1)).realreturn x_filtered.to(dtypex_in.dtype)