Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

Open
piever opened this issue Oct 3, 2024 · 2 comments

Comments

@piever
Copy link
Contributor

piever commented Oct 3, 2024

Bug description

I've experienced the following inconsistency between GPU and CPU gradient computation for sum(abs, _).

julia> using Zygote, CUDA

julia> rl, cplx = [0.0f0], [0.0f0 + 0.0f0im]
(Float32[0.0], ComplexF32[0.0f0 + 0.0f0im])

julia> l1(x) = sum(abs, x)
l1 (generic function with 1 method)

julia> Zygote.gradient(l1, rl)
(Float32[0.0],)

julia> Zygote.gradient(l1, cplx)
(ComplexF32[0.0f0 + 0.0f0im],)

julia> Zygote.gradient(l1, cu(rl))
(Float32[1.0],)

julia> Zygote.gradient(l1, cu(cplx))
(ComplexF32[NaN32 + NaN32*im],)

The last one is particularly problematic, as it leads to NaN values in the gradient that may be hard to understand in a more complex model.

Slack discussion

On Slack, @mcabbott explained to me the most likely cause for this:

  • on GPU, in the backward pass Zygote converts sum(abs, x) to sum(abs.(x)) and the broadcasting part is differentiated via ForwardDiff
  • ForwardDiff is responsible for the different values of the gradient
julia> abs(ForwardDiff.Dual(0,1))
Dual{Nothing}(0,1)

julia> abs(ForwardDiff.Dual(0,1) + 0im)
Dual{Nothing}(0.0,NaN)
  • even though DiffRules has a rule for abs (used for real inputs), for complex inputs the computation passes through hypot and the DiffRule method for the derivative of hypot in (0, 0) gives NaN

Not sure what the best fix is here. If DiffRules is open to it, maybe the easiest is to fix their hypot derivative rule?

Version info

I'm on julia 1.10.5, on a fresh environment with

(jl_FHvUua) pkg> st
Status `/tmp/jl_FHvUua/Project.toml`
  [052768ef] CUDA v5.5.2
  [e88e6eb3] Zygote v0.6.71
@mcabbott
Copy link
Member

mcabbott commented Oct 5, 2024

Any chance that JuliaDiff/ForwardDiff.jl#669 solves this?

@piever
Copy link
Contributor Author

piever commented Oct 7, 2024

Somehow it doesn't... Unless I messed something up, I checked out JuliaDiff/ForwardDiff.jl#669 (manually changing version ForwardDiff version number to 0.10) and still get

julia> Zygote.gradient(l1, cu(cplx))
(ComplexF32[NaN32 + NaN32*im],)

which is weird, because indeed hypot differentiates just fine:

julia> f(x) = hypot(x, 0, 0)
f (generic function with 1 method)

julia> ForwardDiff.derivative(f, 0.0)
1.0

julia> ForwardDiff.derivative(f, -0.0)
-1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants