-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fvec
#31
Closed
Closed
fvec
#31
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mcabbott
commented
Jan 31, 2022
Comment on lines
+28
to
+29
functor(::Type{<:Transpose}, x) = (parent = parent(x),), y -> transpose(only(y)) | ||
lazywrap(x::Transpose) = parent(x), transpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #28 and should perhaps be pulled out.
From FluxML/Optimisers.jl#42 I think the right pattern is actually more like this -- calling parent(x)
is not safe, and we may want to transpose things which aren't matrix-like.
Functors.functor(::Type{<:Transpose}, x) = (parent = _transpose(x),), y -> _transpose(only(y))
_transpose(x) = transpose(x)
_transpose(x::NamedTuple{(:parent,)}) = x.parent # "structural" gradient
# _transpose(bc::Broadcast.Broadcasted{S}) where S = Broadcast.Broadcasted{S}(map(_transpose, bc.args)) do args...
# transpose(bc.f(map(transpose, args)...))
# end
_transpose(bc::Broadcast.Broadcasted) = transpose(Broadcast.materialize(bc))
This was referenced Jan 31, 2022
This was referenced Feb 5, 2022
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an attempt to write a function which does what
Flux.destructure
does, but more carefully. It allows for missing branches of the structural gradient, and for shared weights having separate gradients, which ought to accumulate in making a vector. When returning to a structural gradient, only the first node gets the gradient, to avoid mistakenly doubling it.It also declares some Base types like Transpose to be functors, since a decoder with transposed weights seems like the canonical example. And sets up a way to allow their gradients not to be Transpose.
And, thirdly, it changes
fmap
to not use the cache onisbits
objects. This is to avoid falsely tying two parameters initially saySA[0,0,0]
. We thought elsewhere that the way to indicate that you do what such things tied is to wrap them in someTiedArray
type like https://gist.github.com/mcabbott/35f660ff9fb8e7b0b23a2abb94618cf4, but not in this PR.Most of this can simply run with a different walk to hit only trainable arrays. Which I think ought to be done in Optimisers.jl, which owns
trainable
. The exception is that the gradient offcopy
doesn't seem to fit into just using a different walk, maybe there's a way? It takes a differentchildren
function instead.Not sure what should be exported here, or what anything should be called. I think the name
destructure
should belong to Optimisers.jl too. Perhaps this package should export a function which similarly returns a vector and a reconstructor. That's just one line after the pieces defined here. (But easier to write & test the pieces separately.)Needs more tests, still.