Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal use of uvcgan2 ? #30

Open
PerrinAntonin opened this issue Apr 23, 2024 · 3 comments
Open

Multimodal use of uvcgan2 ? #30

PerrinAntonin opened this issue Apr 23, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@PerrinAntonin
Copy link

Hello,

Congratulations for this very successful project! I wanted to ask you if you think a multimodal use of uvcgan2 is possible? In order to use it as an MUnit, where it's possible to generate different images from a single reference image. In MUnit, you simply choose a different style vector for a new generation, but in Uvcgan it's the VIT that generates it, and I was wondering how to play with that.

Sincerely,
Antonin

@usert5432 usert5432 self-assigned this Apr 23, 2024
@usert5432 usert5432 added the question Further information is requested label Apr 23, 2024
@usert5432
Copy link
Collaborator

Hello @PerrinAntonin,

Thank you for your interest in our work.

I wanted to ask you if you think a multimodal use of uvcgan2 is possible?
... and I was wondering how to play with that.

I actually thought about this a bit. The short answer is that currently it is not possible.

I think, in principle, one can modify the generator architecture a bit to expose its style to the user. Then, one can implement a custom training setup, following MUnit or DRIT examples. If done correctly, I believe everything will work and make UVCGAN multimodal. All the modifications are rather straightforward, but they will take some time to implement and debug. And, unfortunately, we do not have resources to explore these modifications at the moment.

@PerrinAntonin
Copy link
Author

Hi @usert5432

Thank you for your quick reply!
Ok I can see that it looks implementable but it needs a bit of time. To avoid this problem, wouldn't it be possible to use the style vector generated by an image at the output of the VIT and to reinject this vector for the reconstruction of another image? But I have the impression that there isn't really a loss linked to this style token and that it will therefore depend mainly on the image supplied the 1st time and won't work on another.

I could also see that you set a different learning rate for the discriminator and the generator. If the generator rate is smaller, is it because the generator learns too quickly compared to the discriminator?

@usert5432
Copy link
Collaborator

To avoid this problem, wouldn't it be possible to use the style vector generated by an image at the output of the VIT and to reinject this vector for the reconstruction of another image? But I have the impression that there isn't really a loss linked to this style token and that it will therefore depend mainly on the image supplied the 1st time and won't work on another.

I cannot say definitively, since it is more of an empirical question, but my intuition matches yours. Currently, UVCGAN is not trained to work correctly with mismatching styles, so I would expect it to break if some unexpected style is substituted.

I could also see that you set a different learning rate for the discriminator and the generator. If the generator rate is smaller, is it because the generator learns too quickly compared to the discriminator?

Yes, this is my working hypothesis. Although, I am not sure it is 100% correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants