-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Stateful Containers with tf.Module #56
Conversation
|
||
**`tf.Variable` creation and reuse** | ||
|
||
- Modules should be able to create and reuse variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by reuse here?
My understanding is that modules only create variables, and reuse is done by reusing modules, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. Variable creation (self.w = tf.Variable(..)
) and reuse (tf.whatever(some_input, self.w)
) is enabled via OOP (no magic).
We considered re-implementing variable_scope
and get_variable
but felt that the OOP design was a lot cleaner and more Pythonic.
def apply_gradients(self, vars_and_grads): | ||
for variable, grad in vars_and_grads: | ||
variable.assign_sub(grad * self.learning_rate) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to comment here on the following points:
- Module should be the base class for tf.keras.Layer
- We explicitly chose not to force a call signature or build() strategy
- Pros and cons of the name Module (familiarity with pytorch being a big issue here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed re (1) and re (2) I call out why a bit in the detailed design if anyone wants more context.
Re (3) I feel strongly that Module
is the right name and what users will expect. The term Module
and its usage in ML is almost as old as I am, the oldest reference I know about (h/t @jekbradbury for the reference) is from SN/Lush [0]:
In Lush, building and training a complex system is done by assembling basic blocks, called modules. Modules are subclasses of the class gb-module.
[0] http://lush.sourceforge.net/lush-manual/51d13a16626fd68f.html
@tomhennigan, That is awesome! It is very similar to what we have in GPflow - Although, RFC could be extended with array-like modules, but don't confuse it with keras sequence. Keras sequence implies dependency between layers/modules, whereas array-like module is a different approach for organising submodules with indexing restricted to integer. E.g. in GPflow we can construct a sum of the kernels (modules) and represent it as a super-kernel class or a combination kernel. Essentially, a combination kernel is a list of kernels, and when we call, let's say, __call__ method, it sums over sub-kernels results. I'm happy to give more examples if needed. Thanks! |
'custom_name/v:0' | ||
``` | ||
|
||
Modules created inside methods of other modules inherit their parent module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How variable scopes would be affected in graph mode? For example, let's say we have a module named 'bar' that creates a tf.Variable 'v' and a tf.Tensor 't' inside __ call __. Then, we call our module while inside a 'foo' name scope. What would be the names in that case? Maybe 'bar/v' for the tf.Variable and 'foo/bar/t' for the tf.Tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whilst the design works in TF1 we have designed with TF2 in mind. In TF2 there are no variable scopes (we use name scope to name variables and tensors). In eager mode tf.Tensor
does not have a name
property, although if you inspect the graph from a @tf.function
you do see tensor names.
Module inherits the name_scope
from when the module is constructed. I've run the following with TF1 and as you can see the 'use' name_scope
makes no difference inside the module but is applied when using the result of calling the module:
class BarModule(tf.Module):
def __call__(self):
v = tf.Variable(1., name='v')
t = tf.constant(1., name='t')
return v, t
with tf.Graph().as_default(): # Enable TF1 graph mode.
with tf.name_scope('create'):
bar = BarModule()
with tf.name_scope('use'):
for i in range(2):
print('iter', i)
v, t = bar()
x = v * t
print('- v', v.name)
print('- t', t.name)
print('- x', x.name)
print()
iter 0
- v create/bar_module/v:0
- t create/bar_module/t:0
- x use/mul:0
iter 1
- v create/bar_module/v_1:0
- t create/bar_module/t_1:0
- x use/mul_1:0
self.v = tf.Variable(1., name='v') | ||
|
||
>>> m = MyModule() | ||
>>> m.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the module name ends in '/'? Usually that is used to set absolute name scopes ignoring the current stack. Similarly, what happens if a child module does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the experimental implementation of this I've enforced (in the constructor) that the module name is a valid Python 2 identifier [0], so /
is not allowed.
[0] https://docs.python.org/2/reference/lexical_analysis.html#identifiers
Thank you very much for this proposal! We have a remarkably similar class in our custom framework, where we use it as the base for all our own layers and composable networks. It makes me happy to see something like this being added to TF 2.0. Given the similarity, I'll try to keep an eye to the review in case I notice anything that reminds me of any problems we had developing or using our own approach, but so far I'd say it looks pretty good. |
I was under the impression that inheriting from |
If this RFC is accepted @alextp and I hope that we can make Conceptually I see Keras If the additional functionality of Keras is not useful for you then you may prefer to subclass |
Serialization (as in saving/restoring parameters) should work great (since we extend
I think this should "just work". I tried creating a simple example, please let me know if I've completely misunderstood what you're trying to achieve: class MulKernel(tf.Module):
def __init__(self, m, dtype=tf.float32, name=None):
super(MulKernel, self).__init__(name=name)
self.m = tf.Variable(tf.constant(m, dtype=dtype), name='m')
def __call__(self, x):
return self.m * 1
class ArrayModule(tf.Module):
def __init__(self, kernels):
super(ArrayModule, self).__init__()
self.kernels = kernels
def __call__(self, x):
x = tf.convert_to_tensor(x, name='x')
results = [kernel(x) for kernel in self.kernels]
return tf.reduce_sum(results)
a = ArrayModule([MulKernel(i, name='mul_%d' % i) for i in range(10)])
y = a(1.) Which produces this rather pretty graph: |
rfcs/20190117-tf-module.md
Outdated
|
||
@property | ||
def variables(self): | ||
"""Returns a tuple of variables owned by this module and it's submodules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think list
is a better fit conceptually, since the type in question is [Variable] rather than (Variable, Variable, ...). Maybe we could simply remove the type from the docstring or replace it with something less restrictive e.g. "collection"?
The same applies to other *variables
properties below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the IRL discussion, I'll use "Collection" in the docstring and still return a tuple at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, returning a tuple makes the API a bit more efficient since you don't have to return a copy to avoid callers mutating your internal data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree returning a tuple is the right decision, but the signature/contract of the method should be less specific to allow this to change in future (e.g. if there is a faster immutable collection we might want to use that instead).
If we could include type hints in TF code I would go with the following:
@property
def variables(self) -> typing.Collection[tf.Variable]:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps typing.Sequence
would be better here? Importantly, the returned variables have a well-defined, deterministic ordering.
your base class. | ||
- Encouraging a lazy pattern might make it harder to create modules with | ||
multiple forward methods (e.g. a VAE module might be implemented with an | ||
`encode` and `decode` method with no `__call__`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the users have to override __call__
with NotImplementedError
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prototyped it such that you didn't have to implement __call__
(e.g. _call
was not @abstractmethod
), but if users called your module then we throw NotImplementedError
. Even this has downsides though, like introspection/generated docs making it look like you had a call method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomhennigan, I agree 100% with this decision. There is no need to bloat the tf.Module
class more than is absolutely required. Since the goal of this class is to handle stateful DAGs of modules, it is best to focus efforts on that.
As pointed out in this doc, there are many use cases where we would want to handle stateful DAGs but don't require a __call__
method. I see it the responsibility of other libraries/sub-package to extend the functionality of tf.Module
. I find specialized classes much easier to extend and reason about.
each leaf. This means variables nested in properties will be found (e.g. | ||
variables in `list`s or `tuple`s on the module). | ||
|
||
This "walk" function is used to provide the following properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there properties computed once per instance, or are they fully dynamic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully dynamic, trying to reason about when to invalidate the caches would be non-trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking the performance cost of having this dynamic should be well documented so that users know to cache the results of these calls if using eager and good performance is required. Though, if everything works seamlessly with tf.function
, then maybe this wouldn't be necessary and encouraging users to use tf.function
for better performance would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed re documentation, I'll add a note in the docstrings.
For a 100 linear layer sequential it takes 2ms to collect all 200 parameters (at least on my workstation). This could be worse for modules with deep hierarchies or large nests.
Assuming the call to trainable_variables
(or whatever) is inside the tf.function
this cost is only paid when the function is traced (~once).
The finest abstraction is not actually deep learning model (is that correct that keras still a deep learning dedicated framework?), and not a model at all. The
I'd say, making keras layers depend on
That's great!
If |
We've been mindful to design
Yep, this works as you would expect (we use |
Or Iterable? We could adjust the docstring now.
Come 2020, we will drop Py2 support, and all this typing can actually
happen for real.
|
Can't wait 😄count me in on a fixit to go through and add types! |
Addresses comments from tensorflow/community#56. PiperOrigin-RevId: 230225963
+1 for this proposal, it strikes a great balance between abstraction and usability. |
|
||
**`tf.Module` tracking** | ||
|
||
- Modules should track dependencies (e.g. a sequential should provide access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that started on twitter -> https://twitter.com/lc0d3r/status/1090351739537145858
Since we are trying to also
track dependencies (e.g. a sequential should provide access to its layers) and support checkpointing as a graph.
Would it make sense to think about the next step and have on the model level an approachable way to have access to DAG.
The idea is, once you inherit from tf.Module you can run something similar to keras
model.summary
and even more exciting - model.plot_model
.
Sorry for a bit confusing way to formulate this suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lc0, thanks for the suggestion! We've been thinking about something similar. I'm not sure if we should solve this inside tf.Module or if this is something we should leave to framework authors, what do you think?
FWIW the current module APIs expose a lot of useful information to do a modest job of this outside of Module:
import tabulate
def weight_size(m):
size = 0
for v in m.variables:
size += v.numpy().nbytes
suffix = 'B'
suffixes = ['KB', 'MB', 'GB', 'TB']
while size > 1024 and suffixes:
size /= 1024
suffix = suffixes.pop(0)
return '%d %s' % (size, suffix)
def summarise(module):
rows = []
for m in module.submodules:
row = [m.name, m.__class__.__name__, weight_size(m)]
rows.append(row)
print(tabulate.tabulate(rows, headers=['attr', 'class', 'size'], tablefmt='grid'))
+----------+---------+--------+
| attr | class | size |
+==========+=========+========+
| linear_0 | Linear | 8 KB |
+----------+---------+--------+
| linear_1 | Linear | 4 MB |
+----------+---------+--------+
| linear_2 | Linear | 4 MB |
+----------+---------+--------+
| linear_3 | Linear | 40 KB |
+----------+---------+--------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table seems excessive. Here is something with less chrome:
attr class size
-------------------------
linear_0 Linear 8 KB
linear_1 Linear 4 MB
linear_2 Linear 4 MB
linear_3 Linear 40 KB
Alternatively, you can use the pyTorch pattern, which IMO is cleaner:
YourFantasticMode: (
(linear_0) Linear(100, 20, 100) 8 KB
(linear_1) Linear(20, 40, 100) 3.2 KB
(linear_2) Linear(40, 20, 100) 3.2 KB
(linear_3) Linear(20, 20, 100) 1.6 KB
)
>>> m.name_scope.name | ||
'my_module/' | ||
>>> m.v.name | ||
'my_module/v:0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the variable names (with scope) attached upon the variable creation? An alternative might be the pytorch way that a variable does not know about its (parent's) scope, and the full scoped name is only constructed on the fly depending on which parent module are we dealing with (because a module can be re-used in different parent modules, and they might even be constructed externally and pass in via the constructor of a custom module).
I'm thinking about the following pseudocode example:
class FunnyAutoEncoder(tf.Module):
def __init__(self, encoder, decoder, name=None):
super(...)
self.encoder = encoder
self.decoder = decoder
def __call__(self, x):
z = self.encoder(x)
z2 = tf.funny_op(z)
return self.decoder(z2)
encoder = MLP(...)
decoder = MLP(...)
ae = FunnyAutoEncoder(encoder, decoder)
then ae.encoder.layerX.w.name
will probably not show anything related to being under an auto encoder? Instead, it will be just with a top-level 'mlp/'
scope?
I think a major scenario of having full scoped names available is to iterate through all the variables in a module, and do some inspection / processing to the variable according to the names. In this case, it might be more useful to have the (scoped) names reflect what role each variable is playing in the module.
However, I guess one drawback might be to detect potential duplication. For example, if one does this
encoder = ResidualBlock(...)
decoder = encoder
ae = FunnyAutoEncoder(encoder, decoder)
then the same variables might be listed twice under different (scoped) names. But I guess this situation is rather rare, and it is probably to explicitly define a FunnyAutoEncoderSharedEncoderDecoder
that has only one encoder_and_decoder
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently the proposal is that name scopes are determined when the module is constructed and from that point it is fixed per module. For name based filtering you can do something like this:
encoder = MLP(..., name='encoder')
decoder = MLP(..., name='decoder')
ae = FunnyAutoEncoder(encoder, decoder)
encoder_vars = encoder.variables
# .. or ..
encoder_vars = [v for v in ae.variables if v.name.startswith('encoder/')]
It's worth pointing out that you can also enter a name scope and build modules inside it if you want to group a set of modules that way:
with tf.name_scope('encoder'):
encoder = MLP(...) # name_scope will be `encoder/mlp/`
decoder = MLP(..., name='decoder') # name_scope := `decoder/`
ae = FunnyAutoEncoder(encoder, decoder) # name_scope := `funny_auto_encoder/`
Is that flexible enough for your use cases? We've had a similar pattern in Sonnet and I've not heard any complaints from users about it, but perhaps there's something I'm missing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomhennigan Thanks! I'm not sure if it is because I'm too used to the other way of thinking. Let me state the use cases in my mind and we can see if it is easy to do in the current proposal. I think the main inflexibility of the current approach is that in order to have a meaningful namescope hierarchy, we can only construct the model from top-down according to the hierarchy. It will be difficult, for example, to construct some part of the component (e.g. the encoder
from the example above) without knowing beforehand what that component is going to be used for (i.e. in your example, you explicitly pass the name='encoder'
argument, indicating you know that it is going to be used as an encoder).
This problem might be more pronounced when one want to write a generic library of commonly used (sub)nets. For example, in library1 (say Sonnet2), I have a Module
that defines a default ResNet implementation. In library2, which is a general RL training framework, that allow one to pass in a Module
to be used for each specific component (e.g. the Vision part of the agent). Now as the user, if I want to use the ResNet in the vision part of the agent, I might need to use some hack to figure out what proper name
to pass into when constructing the ResNet to make it consistent with the naming hierarchy in the agent model. Moreover, if the agent is grouping all its vision part code under the name scope agent/vision
, then I cannot pass agent/vision
as name=
parameter to ResNet
because it is not a proper python variable name.
My guess that this is not a problem in Sonnet(v1) is that Sonnet(v1) is still graph based library with explicit build()
function. In other words, when you create the Python object, the underlying graph building (therefore the naming and scoping) has not been called yet. Therefore, one can also change the name scoping when the build is actually being called. Here is some pseudo-code of my understanding of what might happen in Sonnet V1
class Agent(...):
def __init__(self, vision_net, ...):
self.vision_net = vision_net
# ....
def build(self, ...):
with tf.name_scope(self.name + '/vision'):
self.vision_net.build(...)
On the other hand, with the current proposal, users could (and it seems recommended) create variable in the python object __init__
function, which will attach name to the variable upon creation. Therefore, one need to know the exact name of the parent hierarchy upon creating that python object. Once the Module
object is created, the parent module can no longer re-wire its name scope as in the case of Sonnet V1. Is my understanding correct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the IRL discussions @pluskid, @brilee, @superbobry and @chr1sj0nes.
To summarize our discussions there are many cases where there are multiple paths to weights in a model (e.g. model.encoder.w
might be the same instance as model.decoder.w
) and this is somewhat confusing since in TensorFlow variables have a single, immutable name (e.g. w
's name would be encoder/w:0
in both cases if encoder created it).
Perhaps the "right" solution is to remove the name from variables in Python, and instead use Python variable names to "name" variables. Sadly this is not in scope for TensorFlow 2, and we agreed that the current strategy of variable naming at creation time was OK for simple cases and better than not naming.
We discussed that the python variables or object structure can be used instead to associate a path which each parameter and in the above case you would have two paths to w
. We agreed that it would be convenient to have some easy way to traverse the structure of arbitrary Module
s in a way that have some simple protocol (e.g. somewhere inside the model there is a property called encoder
and somewhere inside the model there is a property called decoder
) to filter these paths.
In tensorflow/tensorflow@1d5fd9219 I added with_path
to Module's _flatten
method. This allows you to do exactly that. For example you can find paths in some arbitrary model to all tf.Variable
s inside a (possibly deeply nested) Module
in a property called encoder
:
>>> class MyModule(tf.Module):
... @property
... def state_dict(self):
... return dict(self._flatten(
... predicate=lambda v: isinstance(v, tf.Variable), with_path=True))
>>> mod = MyModule()
>>> mod.encoder = Encoder()
>>> mod.decoder = mod.encoder
>>> mod.state_dict
{('encoder', 'w'), <tf.Variable: ...>,
('decoder', 'w'), <tf.Variable: ...>}
>>> {k: v for k, v in mod.state_dict.items() if 'encoder' in k}
{('encoder', 'w'), <tf.Variable: ...>}
I hope that helps 😄. h/t @gehring who was asking about how to determine "depth" in another comment. I think this with_path
option to _flatten
(or the recursive
flag) would help solve that too.
rfcs/20190117-tf-module.md
Outdated
... self.b = tf.Variable(tf.zeros([output_features]), name='b') | ||
... | ||
... def __call__(self, x): | ||
... x = tf.convert_to_tensor(x, name='x') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it is important to avoid having the users to call tf.convert_to_tensor
every time they define their own __call__
function. One might want to define implicit pre-call hook to do the conversion automatically. Or better approach might be to explicitly require the API to only accept Tensor
(or nested tuple of tensors) as inputs, and ask user to do explicit conversion before calling from top level.
This is because many of the modules are probably to be used as building blocks in bigger architectures (e.g. residual blocks inside a resnet). They will not directly interact with the user most of the time, so there is probably no need for them to call convert_to_tensor
every time, but it might be better to enforce the API to have consistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree with you that it would be good for any callable subclass of tf.Module
to have some automatic conversion to tensor (e.g., in an implementation of Layer 2.0), this is beyond the scope of tf.Module
. The Dense
class example only serves to demonstrate how one would subclass tf.Module
. Let me try to lay out what I understand the purpose of tf.Module
to be and what the reason for its limited scope is. (I am not affiliated with the TF team)
After the shift to object-oriented storing of tf.Variable
's, variable reuse is done by storing variables within object attributes. It would be impractical and clunky to store all variables in a unique container object (which would essentially recreate TF 1.0's collections). To avoid this, we prefer to have variables contained within the objects that would use them, e.g., kernel
and bias
variables as attributes of the Dense
instance that uses it. While this new way of organizing variables has many advantages, it does make checkpointing and operating on the "global" variables more difficult (e.g., collecting variables for gradient descent).
By design, tf.Module
is lightweight and focuses on addressing the challenges of managing nested stateful objects. Its sole purpose is to abstract away the DAG of modules for the purpose of operating on all relevant variables. This allows us to act like our top-level/root module explicitly contains all our variables. This is done by providing traversal methods which collect all the variables. As a bonus, tf.Module
automatically handles the complexities of checkpointing when variables are allowed to be created before and after the checkpoint is loaded, e.g., hydration and instantiation of variables.
By limiting tf.Module
's responsibilities to managing nested stateful modules, we end up with code that is easy to understand, document and extend. As a result, any developer wishing to implement some stateful containers can easily do so without intricate knowledge of how best to manage "global" state by subclassing tf.Module
. Since handling stateful DAGs of modules is a very common requirement, tf.Module
could easily be used by most sub-packages and libraries with stateful operations, e.g., tf.train
, tf.metrics
, tf.estimator
. (Note, this specific proposal is arguing for the existence/inclusion of tf.Module
in TF 2.0. The advantages of using tf.Module
as a base class throughout tensorflow will likely be examined on case by case basis in a separate proposal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gehring Yes, I agree with you. I don't think doing automatic conversion is a good idea. But enforcing the inputs are Tensors are probably good (even only implicit as a statement in the documentation). Anyhow, this is a minor thing in the whole proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @gehring says. The decision of whether to call convert_to_tensor
is entirely up to whoever subclasses tf.Module
, there is no requirement to do so 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly against auto-conversion to tensors and agree with @tomhennigan, it is up to a scientist, engineers or anyone else, what to do with non tensor types.
I've made this "private" since this exposes the implementation of modules, which in typical Python code does not represent part of the classes public API. As such I think having the leading _ makes it clear (at least to the linter) that you're using an "internal" method. As always this is a "sign" not a cop. As a concrete example, I strongly encourage the following usage: >>> class FooModule(...): ... @Property ... def hash_tables(self): ... return tuple(self._flatten(predicate=IS_HASH_TABLE)) But in general we should discourage usage outside of modules: >>> m = FooModule() >>> hash_tables = tuple(m._flatten(predicate=IS_HASH_TABLE)) # Naughty. c.f. tensorflow/community#56 PiperOrigin-RevId: 232246092
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the proposal.
|
||
- Each module instance must have a name scope. This name should include the | ||
name scope from when the module is constructed. | ||
- Every method on the module must enter this name scope before calling user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name scopes aren't reentrant; do you have a plan for how we can avoid creating ops with confusing names like 'mymodule/mymodule/mymodule/relu_2'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In eager op names are not a thing so that's good.
For tf.function
names are uniqueified within the function body, so you would only get an _N suffix if there were actually two relu's within the same module (or you called your module multiple times sequentially).
As a nit, scope names are re-entrant (as in you can do with tf.name_scope("foo/"): ...
as many times as you like) but op names need to be unique.
Example: https://gist.github.com/tomhennigan/a183a6570d2a6fb9bce08758ee2d78b6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the correction. I see now that this works, although I'm still a bit confused at where the _1 suffix comes from.
@tf.function
def f(x):
name_scope = tf.name_scope('blah')
with name_scope:
with name_scope:
return x + 1
g = f.get_concrete_function(tf.TensorSpec([1, 64], tf.float32)).graph
for node in g.as_graph_def().node:
print(node.name)
...
x
blah_1/add/y
blah_1/add
Identity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case you're creating the name scope in graph mode (inside the function) and in graph mode name scopes are de-dupd. I guess the function is being traced twice (hence 'blah' followed by 'blah_1'). If you use a fully qualified name scope (e.g. tf.name_scope('blah/')
) then the add op will be de-dup'd not the name_scope.
With tf.Module
we use the module name to create a unique scope in the constructor [0] and then re-enter the fully qualified name whenever you use with module.name_scope: ...
[1].
This is an interesting point though, in eager mode name scopes are not de-dup'd but in graph they are. We have a test explicitly defending this behavior [2] and I expect in TF2 almost all users to create modules outside a tf.function
(aka. eagerly) and to call them inside tf.function
s.
[0] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L176-L177
[1] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module.py#L194-L198
[2] https://github.com/tensorflow/tensorflow/blob/e0585bc351b19da39610cc20f6d7622b439dca4d/tensorflow/python/module/module_test.py#L92-L109
>>> m.name_scope.name | ||
'my_module/' | ||
>>> m.v.name | ||
'my_module/v:0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pluskid that dependency injection should at least be possible. My preference is for DI to be the recommended way to build up modules, but I'd understand if others disagreed.
Addresses feedback on tensorflow/community#56. PiperOrigin-RevId: 232457135
Thanks brilee@ for spotting!
This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@
yield variables inside nested structures (lists etc) but not in other | ||
modules. | ||
""" | ||
|
||
@property | ||
def submodules(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomhennigan Is this done depth first or breadth first? I feel like it could be a good thing for tf.Module
to offer a way to infer the depth of submodules/variables. Without owned_*
, what would be the best way for a user to do this without re-implementing tf.Module
's logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breadth first.
So there are two options (frustratingly some recent commits haven't been sync'd to GitHub so I can't just point you at the code!).
Firstly we've made it really easy for subclasses to implement this if the want by exposing _flatten
:
class MyModule(tf.Module):
@property
def owned_submodules(self):
return tuple(self._flatten(recursive=False, predicate=lambda v: isinstance(v, tf.Module)))
I also discussed with brilee@ and pluskid@ earlier about their feedback on variable names. We were aligned that what they both wanted was not really Variable.name
, but rather the path to all variables (e.g. allowing the same variable instance to be reachable via multiple paths). We're planning to add an option to _flatten
to allow flattening with paths. So for example in the case where you have:
encoder = Encoder()
decoder = encoder
auto = Auto(encoder, decoder)
You can implement variables_and_paths
in one line with self._flatten(..., with_path=True)
:
>>> auto.variables_and_paths()
{"encoder.layers[0].w": <tf.Variable ...>,
"encoder.layers[1].w": <tf.Variable ...>,
"decoder.layers[0].w": <tf.Variable ...>,
"decoder.layers[1].w": <tf.Variable ...>}
It seems to me like this is another way of representing depth. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the CL has been pushed 😄, _flatten
is in tensorflow/tensorflow@5076adf6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have a particular use case in mind but I believe it is likely for someone to require that type of info (if not simply for human readability). The with_path
solution is lightweight and should pretty much cover any uses cases related to the topology of the variables/sub-modules. I'm 100% on board, nice job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI with_path
is in tensorflow/tensorflow@1d5fd921 😄
``` >>> class MyModule(tf.Module): ... @Property ... def state_dict(self): ... return dict(self._flatten( ... predicate=lambda v: isinstance(v, tf.Variable), with_path=True)) >>> mod = MyModule() >>> mod.encoder = Encoder() >>> mod.decoder = mod.encoder >>> mod.state_dict {('encoder', 'w'), <tf.Variable: ...>, ('decoder', 'w'), <tf.Variable: ...>} ``` h/t tensorflow/community#56 PiperOrigin-RevId: 232908045
An issue with decorating methods in the metaclass is that when users annotate methods with decorators that return Objects (e.g. `tf.function`) they lose the ability to access methods on those objects (other than call) because our `with_name_scope` decorator is "on the outside". A simple answer to this from other decorators is to copy the __dict__ from the wrapped object to the wrapper (or use `__getattr__`). However in our case this is not safe, since properties like `get_concrete_function` on `Function` would need special casing (so the returned callable also entered the module name scope before tracing). After a bunch of discussion (thanks superbobry@ and chr1sj0nes@!) about how we could do this in horrible ways, it seems that the cleanest solution is to allow a Function to be rewrapped (iff it has not been traced). This pushes the decorator down the stack of decorators and allows tf.Module to preserve @tf.function annoated methods as Functions. h/t tensorflow/community#56 PiperOrigin-RevId: 233034181
See tensorflow/tensorflow@5076adf6 for more context.
Thanks k-w-w@!
Returns: | ||
A collection of variables for the current module (sorted by attribute name) | ||
followed by variables from all submodules recursively (depth first). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit late to the party, but I don't see this in the discussion anywhere:
Keras layers and models implement a trainable
flag as well.
When set to False, none of the variables in the module, or submodules are returned in by trainable_variables.
The current implementation of module does not recognize a module-level trainable flag, it's not clear if that's intentional, or an omission.
Looking at the implementation... it would require a separate predicate for recursion.
This seems like a minor change that could reduce surprises and increase compatibility. Especially with the plan to make layer
and model
sub-classes of module.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. Module uses the immutable "training" bit on tf.Variable
to determine which variables are trainable. If you want to implement layer freezing like Keras does for a Module
subclass you can do:
class KerasLike(tf.Module):
def __init__(self, trainable=True, name=None):
super(KerasLike, self).__init__(name=name)
self.trainable = trainable
@property
def trainable_variables(self):
if not self.trainable:
return ()
return super(KerasLike, self).trainable_variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for the clarifications.
After attempting to integrate `tf.Module` into existing codebases (e.g. `tf.keras`) we've found that the automatic name scoping is too invasive (e.g. changing op and variable names) and it is desirable to disable it ~everywhere. We propose that name scoping for `tf.Module` becomes opt-in: >>> class MyModule(tf.Module): ... ... @tf.Module.with_name_scope ... def auto_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... def manual_name_scope(self, x): ... if not hasattr(self, 'w'): ... with self.name_scope: ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... def no_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w We will move opt-out name scoping into Sonnet: >>> class MyModule(snt.Module): ... ... def auto_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w ... ... @snt.no_name_scope ... def no_name_scope(self, x): ... if not hasattr(self, 'w'): ... self.w = tf.Variable(1., name='w') ... return x * self.w In TF2 name scopes are cosmetic and this should be less of a big deal. We might consider encouraging users who want to filter on names to instead use flatten to extract a state dictionary for their objects (c.f. tensorflow/community#56 (comment)). I have moved the automatic name scoping logic (Metaclass etc) and associated tests into Sonnet 2. PiperOrigin-RevId: 235540184
* Adding a doc to deprecate collections * Responding to Karmels comments * Minor fix to VariableTracker sample code * RFC for random numbers in TensorFlow 2.0 * Changes after some feedback * Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section. * Some changes after feedback * A tweak * Change after feedback * A tweak * changes * changes * fix link * new-rfc * changes * Update rfcs/20181225-tf-backend.md Co-Authored-By: alextp <apassos@google.com> * Added some considerations about tf.function * Renamed the internal name "op_generator" to "global_generator" * Changed seed size from 256 to 1024 bits * Initial signpost for community meetings Adding this so there is basic information about how to find the community calendar and get invited to meetings. * Add iCal link too * changes * Initial version of embedding and partitioned variable RFC. * Fix one formatting issue. * Fix another formatting issue. * Use markdown language for the table instead of HTML. * Add tensorflow/io R Package CRAN release instructions (tensorflow#53) * Added Design Review Notes * Make clear distinction between embedding variables and loadbalancing variables. * Added decisions below each question, and "how to use generators with distribution strategies". * Adopted Dong Lin's suggestions * Add a paragraph pointing out the problem with the `partition_strategy` argument. * RFC: Move from tf.contrib to addons (tensorflow#37) * Checkpoint addons RFC for review * Add code review to RFC Add future pull request information to criteria Update modified date added some description RFC Move to addons * Add weight decay optimizers * Remove conv2d_in_plane * Add group_norm * Accept addons RFC * Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels. * Add a section for saving and restore `PartitionedVariable`. * Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism. * Create README.md (tensorflow#57) * Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted" * Updated timestamp * Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__` * Update according to the discussion * Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control; * Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves; * Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0; * We want to prioritize shuffling embedding after 2.0 release; * We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions. * Update relese binary build command for sig-io (tensorflow#58) This PR updates relese binary build command for sig-io Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add Bryan to SIG IO release team (tensorflow#59) * Change to accepted * Add link to TensorFlow IO R package * Updated link for the friction log. (tensorflow#64) * Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63) * RFC: Attention for Dense Networks on Keras (tensorflow#54) * Design review for "Attention for Dense Networks" * RFC: Stateful Containers with tf.Module (tensorflow#56) * Create 20190117-tf-module.md * Update 20190117-tf-module.md * Loosen return type for variable properties. * Use Dense consistently. Thanks brilee@ for spotting! * Remove convert_to_tensor from examples. This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@ * Remove owned_* methods. * Document `_flatten` See tensorflow/tensorflow@5076adf6 for more context. * Fix typo in module name. Thanks k-w-w@! * Update 20190117-tf-module.md * RFC: New tf.print (tensorflow#14) * New tf.print proposal * Attempt to fix table of contents * Removed not-working TOC label * Minor updates to the doc. * Update tf.print to be accepted * Added design review notes * Marking doc as accepted * Update cond_v2 design doc (tensorflow#70) * Update to bring in line with implementation * Added the symbol map to the RFC. * Updated testing section of the Community site. * Removed the 100%, formatting tweaks. * Update CHARTER.md * Change contact email address I will leave my current company soon, so update my email. * Create README.md * Logos for SIGs * Update README.md * Update addons owners (tensorflow#85) Add Yan Facai as another project lead. * Created a FAQ for TF 2.0. (tensorflow#78) Adding 2.0 related FAQ to the Testing group. * Request and charter for SIG JVM (tensorflow#86) Chartering docs for SIG JVM * Update CODEOWNERS Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm. * Update CODEOWNERS Add missing / * Update CODEOWNERS Add @dynamicwebpaige as owner for sigs/testing/ * Update RFC with current information (tensorflow#89) Make current to SIG Addons * RFC: TF on Demand Project (tensorflow#69) * Adding an RFC for TF on Demand Project. * modified one line in tf-on-demand md file. * Changing RFC status from PROPOSED to ACCEPTED. * RFC: SavedModel Save/Load in 2.x (tensorflow#34) * RFC for SavedModel Save/Load in 2.x * Minor edits and a discussion topic for load() with multiple MetaGraphs * Tweak to the "Imported representations of signatures" section * Update "Importing existing SavedModels" with the .signatures change * Update RFC and add review notes * Status -> accepted * Update CHARTER.md New leads. * Update 20180920-unify-rnn-interface.md (tensorflow#81) Typo fix. * Update yyyymmdd-rfc-template.md Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way. * Update while_v2 design doc (tensorflow#71) * Update while_v2 design doc, include link to implementation * Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94) * CLN: update sig addons logo png (tensorflow#99) * Add SIG Keras Add a reference link to Keras' governance repository for SIG Keras. * RFC: String Tensor Unification (tensorflow#91) * RFC: String Tensor Unification * Updated rfcs/20190411-string-unification.md Updated TFLite sections to address feedback from @jdduke. Marked as Accepted. * Start RFC for tensor buffers
h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
h/t. tensorflow/community#56 PiperOrigin-RevId: 250436499 Change-Id: I9332d90f6331756f12907566af9a9e23f86aded8
This review is open for public comment until 2019-02-11
Summary
We propose a lightweight base class (
tf.Module
) for stateful containers in TensorFlow. This base class offers three main features:Users are encouraged to subclass
tf.Module
to create their own stateful components and we expect and encourage an ecosystem of third party modules to evolve.Thanks in advance for your feedback!