Better handling for nesting Params #823

DhairyaLGandhi · 2020-11-10T13:49:54Z

This allows for differentiating through the Params code to be able to nest gradient calls with Params instead of the usual arguments.

nn = Chain(Dense(3,3), Dense(3,1))
ip = rand(Float32, 3,2)
gradient(ps) do
  y,b = pullback(ps) do
    sum(nn(ip))
  end
  _gs = b(y)
  sum(x -> sum(_gs[x]), ps)
end

src/compiler/interface.jl

CarloLucibello · 2021-05-24T12:56:01Z

src/lib/base.jl

+    d[k]
+  else
+    hk[] = false
+    d[k] = default


get should not mutate, maybe you want to define the adjoiint for get!?

We aren't mutating anything in the user defined objects, so it should be fine. get! would still mutate the gradient dictionary so its not any better.

you are defining an adjoint for get(d::AbstractDict, k, default) that mutates d, this is not fine at all

Ok, I see what you mean

DhairyaLGandhi · 2021-06-04T03:50:58Z

This resolves the conflicts and moves iteration over to the dictionary which more honestly captures gradients from globals. This can break the current assumption of algebra on Grads but that's alright since most cases access the params fields directly, and those that don't can be fixed up.

CarloLucibello · 2021-06-04T10:41:24Z

src/compiler/interface.jl

@@ -171,15 +172,15 @@ const ADictOrGrads = Union{AbstractDict, Grads}
 # Dictionary interface.
 # Don't use the IdDict directly since it may contain some spurious pairs.
 Base.haskey(gs::Grads, x) = x ∈ gs.params 
-Base.keys(gs::Grads) = gs.params
+# Base.keys(gs::Grads) = gs.params
 Base.values(gs::Grads) = (gs.grads[p] for p in gs.params)


Forwarding keys to gs.grads while relying on gs.params for the values is not good. Either we base the dictionary interface entirely on gs.params or we base it on gs.grads, can't have mixed stuff.

The comments above

# Don't use the IdDict directly since it may contain some spurious pairs

suggests that we should think changes thoroughly or leave things as they are (which seems the best option to me)

What is the spurious stuff you refer to? It contains references to the objects that had gradients along the way.

I think Carlo is referring to the existing comment at https://github.com/FluxML/Zygote.jl/pull/823/files#diff-7511b224d7f3ebb56465690de8e307422e3c9798a22bdd4e960d5c86ba6528aaR173. My understanding of that is that Base.keys(Grads.grads) may contain items not in Base.keys(Grads.params). This seems like it should never happen though, so is the comment out of date or am I missing some scenario where it could?

the comment is not outdated

julia> using Flux julia> m = Chain(Dense(2,2), x->relu.(x), BatchNorm(2)) Chain(Dense(2, 2), #1, BatchNorm(2)) julia> gs = gradient(() -> sum(m(rand(2,2))), Flux.params(m)) Grads(...) julia> gs.grads IdDict{Any, Any} with 8 entries: Float32[0.0, 0.0] => [0.0, 0.0] BatchNorm(2) => RefValue{Any}((λ = nothing, β = nothing, γ = nothing, μ = nothing, σ² = nothing, ϵ = 0.0, momentum = nothing, affi… Float32[-0.63824 0.222623; -0.785237 0.536415] => [0.0 0.0; 0.0 0.0] :(Main.m) => (layers = (nothing, nothing, RefValue{Any}((λ = nothing, β = nothing, γ = nothing, μ = nothing, σ² = nothing, ϵ = … Box([0.0; 0.0]) => RefValue{Any}((contents = nothing,)) Float32[0.0, 0.0] => [2.0, 2.0] Box([0.0; 0.0]) => RefValue{Any}((contents = nothing,)) Float32[1.0, 1.0] => [0.0, 0.0]

we have to keep the current dict interface based on gs.params

None of this is spurious though, there is no prior knowledge of what needs to be tracked at the beginning of the differentiation. The grads dictionary returns all the stuff it needed to track even if those entities weren't present in the params. They may have been indirectly needed to get the grads of the params. What we can guarantee is that the grads dictionary will always have the params as keys. So the defensive thing is to return the entire dict, so these values for the intermediaries are available to multiple levels of differentiation.

Then per Carlo's point, Base.values(gs::Grads) should also forward to .grads as well. Having the 2 be differently sized is unexpected (i.e. potentially subtly breaking), and arguably breaking the contract of keys + values.

Yes, I'll add that

CarloLucibello · 2021-06-04T10:42:34Z

This needs some tests of the features it adds and for #941 if it actually fixes it

DhairyaLGandhi · 2021-06-04T12:24:38Z

#941 may well require extra handling. I'd still prefer to solve the underlying problems first.

CarloLucibello · 2021-06-06T07:00:44Z

src/lib/array.jl

 # the adjoint jacobian of an FFT with respect to its input is the reverse FFT of the
 # gradient of its inputs, but with different normalization factor
 @adjoint function fft(xs)
  return AbstractFFTs.fft(xs), function(Δ)
    return (AbstractFFTs.bfft(Δ),)
  end
 end
-


? these empty lines shouldn't be removed

CarloLucibello · 2021-06-06T07:12:22Z

src/lib/base.jl

@@ -44,6 +44,27 @@ end
  end
 end

+@adjoint function Base._oidd_nextind(a, i)


do we need to define an adjoint for an internal function of Base?

Unfortunately I couldn't see a different way at the time. I'm with you on internal functions. We ended up dropping some grads without it, which hopefully shouldn't be.

DhairyaLGandhi · 2021-06-10T11:29:55Z

Fixes #941

CarloLucibello · 2021-06-10T12:10:04Z

This is not fixing #941

ToucheSir · 2021-06-10T16:40:54Z

Yeah, the most direct fix for this is to define @adjointBase.push!(ps::Params, x...) because that's what params calls and what #876 (implicitly/inadvertently?) eliminated.

ToucheSir · 2021-07-26T18:21:29Z

Since this was mentioned in #1035 (comment), here's what I'd like to see before a merge:

Review comments addressed
Tests for the issues it covers. Currently there isn't one for Regression from v0.6.3 to v0.6.8: Mutating arrays error when not mutating anything #941, so I'm a little unsure as to which issue(s) this is resolving (it obviously is resolving something, I just don't know the issue number(s)). In particular, it would be good to know if this resolves:
- Regression from v0.6.3 to v0.6.8: Mutating arrays error when not mutating anything #941
- Zygote DiffEq Integration (Pun Intended) #37 (or rather, the knock-on effect mentioned in Differentiate push! with implicit Params #992 (comment)). Test cases in placeholder adjoint for params Flux.jl#1614 (comment)
- Remove incorrect push! and pop! gradients #1025 (comment)
- placeholder adjoint for params Flux.jl#1614 (comment)

DhairyaLGandhi · 2021-08-12T07:16:16Z

Well, some of this is orthogonal. Handling Params in nested differentiation is more than just handling push! so I wouldn't block it on push!. push! was needed for construction, this is necessary for nesting and tracking parameters in higher order differentiation.

ToucheSir · 2021-09-28T22:35:21Z

Coming back to this with more clarity about the whole push! story (i.e. the understanding that this is completely unrelated), I think the only changes left are a non-mutating get adjoint and having Grads forward Base.values to its inner IdDict? Most of the CI failures seem unrelated, so I presume they'll go away after a rebase and we can safely land this.

CarloLucibello · 2023-01-24T09:54:08Z

closing as stale

Michael Abbott and others added 6 commits September 8, 2020 22:58

enumerate, Filter, Product

c81aa22

zip

dcd1687

more zip

4404c53

allow recursive params

33d4133

cleanup

27a5521

add empty lines back

9f07687

DhairyaLGandhi mentioned this pull request Jan 13, 2021

Derivative in loss function error FluxML/Flux.jl#1464

Open

DhairyaLGandhi mentioned this pull request May 6, 2021

Add @nograd for params FluxML/Flux.jl#1590

Closed

4 tasks

darsnack reviewed May 6, 2021

View reviewed changes

src/compiler/interface.jl Show resolved Hide resolved

darsnack mentioned this pull request May 8, 2021

Fix for #301 according to discourse FluxML/model-zoo#302

Closed

DhairyaLGandhi mentioned this pull request May 20, 2021

calculating 2nd order differentials of a function containing a NN w.r.t. to parameters #911

Open

CarloLucibello reviewed May 24, 2021

View reviewed changes

remove bad iteration

92431c6

CarloLucibello reviewed Jun 4, 2021

View reviewed changes

CarloLucibello reviewed Jun 6, 2021

View reviewed changes

add dense example

e9c2a05

DhairyaLGandhi mentioned this pull request Jun 10, 2021

placeholder adjoint for params FluxML/Flux.jl#1614

Closed

add l2 test

b91c465

DhairyaLGandhi mentioned this pull request Jul 26, 2021

Move to ChainRulesCore v1.0 (in OptingOut) #1035

Merged

3 tasks

CarloLucibello mentioned this pull request Sep 28, 2021

Differentiate push! with implicit Params #992

Merged

CarloLucibello closed this Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for nesting Params #823

Better handling for nesting Params #823

DhairyaLGandhi commented Nov 10, 2020

CarloLucibello May 24, 2021

DhairyaLGandhi Jun 4, 2021

CarloLucibello Jun 4, 2021

DhairyaLGandhi Jun 4, 2021

DhairyaLGandhi commented Jun 4, 2021

CarloLucibello Jun 4, 2021

DhairyaLGandhi Jun 4, 2021

ToucheSir Jun 5, 2021

CarloLucibello Jun 6, 2021

CarloLucibello Jun 6, 2021

DhairyaLGandhi Aug 12, 2021

ToucheSir Aug 12, 2021

DhairyaLGandhi Aug 12, 2021

CarloLucibello commented Jun 4, 2021

DhairyaLGandhi commented Jun 4, 2021

CarloLucibello Jun 6, 2021 •

edited

Loading

CarloLucibello Jun 6, 2021

DhairyaLGandhi Aug 12, 2021

DhairyaLGandhi commented Jun 10, 2021

CarloLucibello commented Jun 10, 2021

ToucheSir commented Jun 10, 2021

ToucheSir commented Jul 26, 2021

DhairyaLGandhi commented Aug 12, 2021

ToucheSir commented Sep 28, 2021

CarloLucibello commented Jan 24, 2023

Better handling for nesting Params #823

Better handling for nesting Params #823

Conversation

DhairyaLGandhi commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Jun 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Jun 4, 2021

DhairyaLGandhi commented Jun 4, 2021

CarloLucibello Jun 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Jun 10, 2021

CarloLucibello commented Jun 10, 2021

ToucheSir commented Jun 10, 2021

ToucheSir commented Jul 26, 2021

DhairyaLGandhi commented Aug 12, 2021

ToucheSir commented Sep 28, 2021

CarloLucibello commented Jan 24, 2023

CarloLucibello Jun 6, 2021 •

edited

Loading