Generative models have revolutionized the landscape of artificial intelligence, enabling the creation of data that closely mimics real-world inputs. Among these, consistency models have emerged as a promising new class, offering the ability to generate high-quality data in a single step, circumventing the need for the complex adversarial training typically required by models like GANs. Despite their potential, existing consistency models face significant limitations, primarily due to their dependence on distillation from pre-trained diffusion models and the use of biased learned metrics like LPIPS for evaluation. We introduce groundbreaking techniques to enhance consistency training, allowing models to learn directly from data, thus overcoming the constraints of distillation and improving overall performance.
Understanding Consistency Models
Consistency models are designed to sample data in a single step, providing a significant computational advantage over multi-step models like diffusion models. Traditional consistency models rely on a process called distillation, where they learn from a pre-trained diffusion model. This method, while effective, inherently caps the performance of consistency models to that of the diffusion models from which they are distilled. Additionally, using learned metrics such as LPIPS (Learned Perceptual Image Patch Similarity) for evaluating sample quality introduces biases that can skew results and undermine the models' reliability.
Addressing the Challenges
To push the boundaries of what consistency models can achieve, our approach focuses on eliminating the constraints of distillation and mitigating the biases introduced by learned metrics. We propose several novel techniques that fundamentally enhance the training and evaluation of consistency models:
Direct Learning from Data: By enabling consistency models to learn directly from raw data rather than relying on a pre-trained diffusion model, we remove the ceiling imposed by distillation. This approach allows the model to develop a more nuanced understanding of the data distribution.
Replacing LPIPS with Pseudo-Huber Losses: LPIPS, while useful, is prone to bias. To address this, we adopt Pseudo-Huber losses from robust statistics, which provide a more balanced and unbiased metric for evaluating sample quality. This change leads to more accurate assessments of model performance.
Introducing Lognormal Noise Schedules: We implement a lognormal noise schedule for the consistency training objective. This noise schedule helps in better modeling the distribution of the data, leading to higher quality samples.
Enhanced Training Regimen: We propose doubling the total discretization steps at regular intervals during training. This strategy, combined with meticulous hyperparameter tuning, significantly improves the model's performance.
Empirical Results
Our improvements yield substantial gains in model performance. On the CIFAR-10 dataset, our consistency model achieves an FID (Fréchet Inception Distance) score of 2.51 in a single sampling step, marking a 3.5× improvement over previous methods. For the ImageNet 64×64 dataset, we attain an FID score of 3.25, representing a 4× enhancement.
By employing a two-step sampling process, we reduce FID scores even more—to 2.24 for CIFAR-10 and 2.77 for ImageNet 64×64. These results not only surpass those achieved via distillation but also narrow the performance gap between consistency models and other state-of-the-art generative models.
Conclusion
The advancements in consistency training presented in this article significantly elevate the potential of consistency models. By removing the dependency on distillation, adopting robust evaluation metrics, refining the training process, and introducing innovative training techniques, we set a new benchmark in generative modeling. Our approach not only achieves unprecedented FID scores but also establishes a robust foundation for future research and development in generative AI. As consistency models continue to evolve, they promise to unlock new possibilities in high-fidelity data generation, paving the way for more efficient and effective AI applications.
Add a Comment: