-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update index.qmd #548
base: master
Are you sure you want to change the base?
Update index.qmd #548
Conversation
Preview the changes: https://turinglang.org/docs/pr-previews/548 |
I think there might still be a misunderstanding here. The keyword argument So the real question is:
|
Ah, I see. Maybe @torfjelde is the right person to ask about this? I've not dug in to exactly where to sort out RD gradient stuff yet in Turing.jl. |
Typically this occurs in the initial step of the sampler, i.e. once (there are exceptions, but in those cases you can't really do much better in our case).
Unfortunately not for ReverseDiff.jl in compiled mode (they can however just not use compiled mode). |
|
||
Cached tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model. | ||
Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches. | ||
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the data will always execute the same branches during sampling (if the data is constant throughout sampling and, e.g., no mini-batching is used). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the data will always execute the same branches during sampling (if the data is constant throughout sampling and, e.g., no mini-batching is used). | |
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. |
I don't think there's much point mentioning minibatching here, as there's no "easy to use" support for this in Turing.jl and so it's really not something people do much of (don't think I've ever seen anyone actually do this in applications with Turing.jl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments 👍
@@ -20,11 +20,12 @@ As of Turing version v0.30, the global configuration flag for the AD backend has | |||
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`. | |||
|
|||
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of 0 permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of 0 permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size). | |
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of `nothing` permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add this, as @gdalle pointed out, this is not actually the case anymore (this was a left-over thing from using LogDensityProblemsAD.jl I believe, but this changed when it all moved to ADTypes.jl backed)
@torfjelde I think that's the key confusion between us. My understanding is that, regardless of whether it is compiled or not, the tape only captures execution for one version of the control flow. Thus, the tape always becomes invalid if the control flow changes (e.g. different branch of an Source: https://juliadiff.org/ReverseDiff.jl/dev/api/#The-AbstractTape-API |
The tape is purely internal and neither stored nor reused if
This is wrong. The tape is only reused if |
Okay then IIUC it's a matter of inconsistency between
|
Note that in ADTypes, the "compile" argument to AutoReverseDiff is defined ambiguously too. So we should perhaps add more details to the struct, something like AutoReverseDiff(tape=true, compile=false)? |
I'd love for you to chime in here @devmotion @torfjelde: SciML/ADTypes.jl#91 |
I'm not too opinionated about this, as compilation without caching seems somewhat useless? Are there scenarios where you'd like to do that? |
Indeed you can't compile a tape you never record in the first place. In any case, I think the ambiguous terminology was fixed by SciML/ADTypes.jl#91. It's just a shame that the word "compile" was chosen instead of "record", given how both are used in ReverseDiff's documentation. But it's a sunk cost now. |
@@ -20,11 +20,12 @@ As of Turing version v0.30, the global configuration flag for the AD backend has | |||
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`. | |||
|
|||
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of 0 permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size). | |||
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional argument can be provided to `AutoReverseDiff` to specify whether to to compile the tape only once and cache it for later use (`false` by default, which means no caching tape). Be aware that the use of caching in certain types of models can lead to incorrect results and/or errors. | |||
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional argument can be provided to `AutoReverseDiff` to specify whether to to cache the tape only once and reuse it later use (`false` by default, which means no caching). This can substantially improve performance, but risks silently incorrect results if not used with care. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional argument can be provided to `AutoReverseDiff` to specify whether to to cache the tape only once and reuse it later use (`false` by default, which means no caching). This can substantially improve performance, but risks silently incorrect results if not used with care. | |
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care. | |
Thus, e.g., in the model definition and all im- and explicitly called functions in the model all loops should be of fixed size, and `if`-statements should consistently execute the same branches. | ||
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on the data will always execute the same branches during sampling (if the data is constant throughout sampling and, e.g., no mini-batching is used). | ||
|
||
Cached tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model. | |
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model. | |
Addresses part of #547 .
@gdalle does this read more correctly?