Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FID of DDPMs on CIFAR-10 #3

Closed
GloryyrolG opened this issue Nov 2, 2021 · 3 comments
Closed

FID of DDPMs on CIFAR-10 #3

GloryyrolG opened this issue Nov 2, 2021 · 3 comments

Comments

@GloryyrolG
Copy link

GloryyrolG commented Nov 2, 2021

Hi,

I found with the converted pretrained CIFAR-10 DDPM,

# This used the pretrained DDPM model, see https://github.com/pesser/pytorch_diffusion

I got a FID of 5.68 in the setting of timesteps=100, eta=0, which is a margin away from 4.16 reported in the paper. May I know is that about a model you trained by yourselves?

For FID calculation, I use https://github.com/mseitzer/pytorch-fid

@jiamings
Copy link
Member

jiamings commented Nov 4, 2021

I suspect that you are using uniform skips (default) instead of quadratic skips (works better for CIFAR-10).
See option here: https://github.com/ermongroup/ddim/blob/main/main.py#L74

I reran with a different seed and got FID of 4.26.

@GloryyrolG
Copy link
Author

Okay, I find it in the suppl. Now it seems better. Thanks, Jiaming. It is just curious to me that this empirical change will lead to a not small margin in FID (up to 1.42/4.26)...

@FutureXiang
Copy link

I suspect that you are using uniform skips (default) instead of quadratic skips (works better for CIFAR-10). See option here: https://github.com/ermongroup/ddim/blob/main/main.py#L74

I reran with a different seed and got FID of 4.26.

Hi Jiaming, I wonder why the skip_type == "quad" code implements a "quadspace" between $0$ and $0.8T$.
I guess it's to focus more on the interval where the added noises are relatively larger (i.e. the denoising is more dramatic).

But I still have a few questions:

  1. Where does the number $0.8$ come from? Is it empirical?
  2. Since it makes the denoising procedure focus more on timesteps near $t=0$, why it isn't beneficial for other datasets?
  3. Do you think the quadratic skips idea is similar to the cosine beta schedule in Improved Denoising Diffusion Probabilistic Models? Can it be replaced by the cosine schedule?

It will be appreciated if you give more information and thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants