Generating Cats with KPN Filtering

17 days ago

Copy Link

The post explores generative modeling for cat images using KPN denoising in pixel space.
Unlike typical diffusion models operating in latent space, this approach uses KPN bilateral filters and predicts low-rank targets directly.
KPN filters offer good regularization bias and efficient GPU implementation, suitable for edge devices.
The model is trained on 64x64 cat images, using an architecture with an 8x8 patch transformer and upscaling convolutions.
Training involves gradually noising images to Gaussian noise and predicting the original using L2 and LPIPS losses.
Bilateral filters' convex combination limitation is mitigated by a low-capacity network predicting color drift (bias) added post-filtering.
Non-convex filtering is enabled by not normalizing bilateral weights and using tanh activation, allowing new color/detail generation.
A simplified version of Neural Partitioning Pyramids and Procedural Kernel Networks optimizes the filtering network.
Color drift prediction uses a small U-Net for low-frequency components, improving color fidelity while KPN filtering can be quantized.
Generated samples after 5k epochs serve as a proof of concept, though results are not yet impressive.

Hasty Briefsbeta