Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incorrect push! and pop! gradients #1025

Merged
merged 9 commits into from
Sep 28, 2021
Merged

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Jul 12, 2021

This is a replacement for #876, which doesn't seem to get the right gradients at all.

It also looks into fixing #992, where similar gradients appear never to be called with nontrivial input.

src/lib/array.jl Outdated Show resolved Hide resolved
src/lib/array.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Member Author

After fiddling a bit, I'm not really sure this can be fixed. Here's the core problem:

julia> using Zygote: @showgrad

julia> gradient([1,2,3], 4) do xs, y
             a = sum(@showgrad(xs))
             b = sum(push!(@showgrad(xs), y))
             c = sum(@showgrad xs)
             a+b+c
           end
∂(xs) = Fill(1, 4)  # c, final size
∂(xs) = Fill(1, 3)  # b, size before push!
∂(xs) = Fill(1, 4)  # a, surprising, a bug?
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 3 and 4")
Stacktrace:
...
  [8] accum(::FillArrays.Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}, ::FillArrays.Fill{Int64, 1, Tuple{Base.OneTo{Int64}}}, ::FillArrays.Fill{Int64, 1, Tuple{Base.OneTo{Int64}}})

pushing to / popping from xs means the same variable corresponds to a different size of array at different times, and somehow the gradients for it which are being accumulated will need to keep track of this. One could hack accum not to error, but not to do the right thing. Whether the deep internals can keep track of this I'm not quite sure.

@darsnack
Copy link
Member

Yeah seems like you'd need element-level tracking to make sure things lined up when accumulating even if we did have some padding hack for the size mismatch.

@mcabbott
Copy link
Member Author

It's possible that if you always wrote xs = push!(xs, y) then Zygote would understand that the label xs is attached to different-length arrays at different points of the code, and accumulate their gradients correctly. Maybe some attempt was made to automate that, when trying to allow mutation via @adjoint! etc? Have never looked that deep into the internals. Maybe @simeonschaub knows things.

But without something like that, we should probably just make these back into errors.

@ToucheSir
Copy link
Member

Rolling back the clock a bit, is there a way we could tackle the original issue in #37 (comment) without something like #876? As noted in FluxML/Flux.jl#1614 (comment) that would get us back to a known point that is less buggy than what exists now.

@mcabbott
Copy link
Member Author

That does sound like a good path. #37 doesn't have a stacktrace, I can block out time on my calendar to compile the DiffEqverse and try it sometime soon...

@ToucheSir
Copy link
Member

Partially answering my own question now that this is up for review, @ChrisRackauckas's comment at #876 (comment) suggests that permitting push!/pop! for AbstractVector wouldn't have done much for pop!(::BinaryHeap) anyhow. This seems to be corroborated by the DataStructures.jl method too, which not only calls push!, but further performs scalar array mutation in a loop here.

test/features.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Member Author

mcabbott commented Sep 5, 2021

Ok, then you've got more out of these scattered threads than I did. If the goal is to make certain objects in DataStructures like BinaryHeap AD-able, then I think someone should directly state that on an issue there, and we can discuss.

The percolate_up! and heappush! functions you link to don't look like good candidates, for the same reason as push!(::Vector, ...) above, and for the generic reason that allowing f!(x) assumes that no other operation has closed over x to re-use this in the backward pass. But maybe something can be attached to a higher-level function. Or maybe the bad things can have errors attached to them to leave a safe happy path.

The code in #37 (comment) leads me (eventually) to this error in all 4 cases, from which I learn nothing about what's desired:

ERROR: MethodError: no method matching fast_materialize(::Vector{Float64})
Closest candidates are:
  fast_materialize(::Base.Broadcast.Broadcasted{S}) where S at /Users/me/.julia/packages/FastBroadcast/WP7Ws/src/FastBroadcast.jl:18
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(ctx::Zygote.Context, f::typeof(FastBroadcast.fast_materialize), args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface2.jl:9

@mcabbott
Copy link
Member Author

mcabbott commented Sep 5, 2021

New CI error on 1.3 is not caused by this PR. It's a bad interaction between complex numbers in the gradient of ^ and Dual numbers from hessian. It's fixed (well, skipped) here: https://github.com/FluxML/Zygote.jl/pull/1044/files#diff-3483c521f73b08fdb6b00f014614cc0d69f87ea7b098a24aff838cbdc812704dR25
So it would be simplest to merge that first.

@mcabbott mcabbott changed the title Fix push! and pop! gradients Remove incorrect push! and pop! gradients Sep 10, 2021
@ToucheSir ToucheSir closed this Sep 28, 2021
@ToucheSir ToucheSir reopened this Sep 28, 2021
@ToucheSir
Copy link
Member

Apparently my attempt to re-run CI was a bust, I guess this needs a rebase?

Rolling things back sucks, but it'll buy us some time to ponder how best to address push!, pop! and co. without more issues like #992 (comment) popping up. So if there the tests pass and there aren't any unrelated regressions, this is probably good to go.

Co-authored-by: Brian Chen <[email protected]>
@mcabbott mcabbott merged commit 4e43922 into FluxML:master Sep 28, 2021
@mcabbott mcabbott deleted the pushpop branch September 28, 2021 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants