PID: Fast and High-Resolution Latent Decoding with Pixel Diffusion
The article introduces PiD, a new Pixel diffusion Decoder that enhances the process of decoding latent representations into high-resolution images. This model improves efficiency by unifying decoding and upsampling into a single generative module, achieving faster and higher-quality results. PiD can decode images at resolutions of 2048×2048 pixels in under one second, significantly outperforming traditional methods.
- ▪PiD replaces the traditional decode-then-super-resolve cascade with a more efficient pixel diffusion approach.
- ▪The model can decode latents of 512×512 images into 2048×2048 pixels in under 1 second on consumer hardware.
- ▪PiD is reported to be up to 5.9 times faster than existing super-resolution pipelines while maintaining better visual fidelity.
Opening excerpt (first ~120 words) tap to expand
PiD:Fast and High-Resolution Latent Decodingwith Pixel Diffusion Yifan Lu Qi Wu Jay Zhangjie Wu Zian Wang Huan Ling Sanja Fidler Xuanchi Ren NVIDIA Read Paper (arXiv) Model Code TL;DR: PiD directly decodes latent representations into high-resolution images, replacing the decode–then–super-resolve cascade while achieving lower latency and higher visual quality. Real Image Latent Generated Image Latent SD3 VAE VAE Decoder PiD DINOv2 RAE Decoder PiD Z-Image VAE Decoder PiD Flux.2 [dev] VAE Decoder PiD Abstract Most practical high-resolution text-to-image systems rely on latent diffusion models, where generation is performed in a compact latent space and a decoder maps latents back to pixels.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Nvidia.