Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in @time compilation time measurement, using _tryfinally macro. #41923

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Aug 18, 2021

This PR reopens @IanButterworth's PR #39138, because in the meantime we've discovered a bug in the @time macro that needs a mechanism like this in order to solve it.

I'll describe the bug below. But first, some context:

Since the @time macro is currently broken without this change, and since we know that there is a mechanism on the way (via #39217) that will replace this, it seems to me (and @Sacha0) that this is an expedient solution to the existing bug, without baking-in a reference to the internal intermediate form that we need to maintain. I've left a note in the code that directly states that this macro is deprecated and should not be used, and that a replacement is coming. I've renamed it to @_tryfinally to discourage its use. Hopefully that seems like a reasonable tradeoff to allow us to fix the bug in @time compilation measurement?

Thanks! :)

The PR description follows:


Use a _tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown anywhere, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing @time blocks, without warning the user about it.

Here are some examples of the current, broken behavior:

  • Throwing (and catching) an exception disables compilation time
    measurement, even though the exception didn't escape, without warning
    the user - note 0 compilation time reported:
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
  • Throwing an exception while the first task is sleeping disables
    compilation time measurement, so this ends up incorrectly reporting 0
    compilation time:
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)
    
    julia> f(0) # warmup
    WARNING: replacing module M.
    4
    
    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780
    
    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1
    
    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)

After this commit, both of those examples correctly report their
compilation time.

@NHDaly
Copy link
Member Author

NHDaly commented Aug 18, 2021

For a more complete history:

BUT, multithreading / concurrency of course is here to make everything worse.

The approach taken in #39133 was reasonable if there's only ever a single Task on a single Thread. If you're in the middle of a @time block, and someone throws an exception, it's probably reasonable to assume that wasn't intended and just disable the timing. (Although you might get it wrong if the exception was intended, and would have been handled.)

But in a multithreaded world, where people are putting @time statements into functions being executed on different threads (as we do at RelationalAI), an exception anywhere disables the timing for all tasks, which seems definitely bad.

In the following two PRs, we attempted to fix this by using a different version of try / finally that don't introduce scope:

@JeffBezanson said we would prefer the second approach, since :tryfinally is an internal intermediate form and shouldn't be used in source ASTs. But the second PR has stalled, and so I've re-opened the first one now that we are aware of this bug.

If we think we can make progress on #39217 in a relatively short timescale, that's excellent! :) But in the meantime, this PR attempts to fix the bug in an expedient way that we know will currently work, while we wait for the better solution.

Does that sound reasonable?

Thanks for your help! :)

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch strikes me as a good stopgap solution and seems ready to go. (I am a bit biased though, having been involved in the scheming of this patch, so perhaps at least one neutral party should sign off as well. And perhaps Jeff in particular, given he had reservations about committing a patch like this previously. Maybe @vchuravy would like to take a look, what with #41762 (comment)? :))

@Sacha0
Copy link
Member

Sacha0 commented Aug 27, 2021

Looks like a conflict has crept in --- perhaps rebase @NHDaly? :)

base/timing.jl Outdated Show resolved Hide resolved
@NHDaly NHDaly added the status:triage This should be discussed on a triage call label Sep 2, 2021
@NHDaly
Copy link
Member Author

NHDaly commented Sep 2, 2021

I've marked this PR for #triage, given that @JeffBezanson had concerns with this approach the first time around, since ":tryfinally is an internal intermediate form and shouldn't be used in source ASTs."

Please see my description in the comment above, which explains why we are trying this approach again:
#41923 (comment)

Thanks!

elapsedtime = time_ns() - elapsedtime
local val = @__tryfinally($(esc(ex)),
(elapsedtime = time_ns() - elapsedtime;
compile_elapsedtime = cumulative_compile_time_ns_after() - compile_elapsedtime)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the order of these be switched to avoid the possibility of compile time being more than 100%?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Oscar made the point that we care more about an accurate elapsed time than about compile time fraction, so maybe it's ok.

@JeffBezanson
Copy link
Sponsor Member

Ok, I will stop being a purist about this and this can be merged :)

@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Sep 16, 2021
@NHDaly
Copy link
Member Author

NHDaly commented Sep 16, 2021

Awesome, thanks @JeffBezanson! Sorry I didn't make #triage today. Thanks for the flexibility 😊

@Sacha0
Copy link
Member

Sacha0 commented Sep 16, 2021

tester_freebsd64 is doing its thing, i.e. failing in various seemingly unrelated ways. The tester_linux32 failures seem to be Profile-related; not sure what to make of those.

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.

Rename `@tryfinally` to `@_tryfinally` since it shouldn't be used.

And add note that it's deprecated and should not be used
@Sacha0
Copy link
Member

Sacha0 commented Sep 16, 2021

I hope you don't mind Nathan: I took the liberty of rebasing this pull request (in anticipation of merge) given it's a couple weeks stale and to check whether the tester_linux32 failure occurs again :).

@NHDaly
Copy link
Member Author

NHDaly commented Sep 17, 2021 via email

@KristofferC KristofferC merged commit 08e100b into master Sep 23, 2021
@KristofferC KristofferC deleted the nhd/time_tryfinally_macro branch September 23, 2021 11:29
KristofferC pushed a commit that referenced this pull request Sep 23, 2021
…ro. (#41923)

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.

(cherry picked from commit 08e100b)
@NHDaly
Copy link
Member Author

NHDaly commented Sep 26, 2021

🙏 thanks @KristofferC and @Sacha0!

KristofferC pushed a commit that referenced this pull request Sep 28, 2021
…ro. (#41923)

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.
janrous-rai pushed a commit to janrous-rai/julia that referenced this pull request Oct 28, 2021
…ro. (JuliaLang#41923)

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…ro. (JuliaLang#41923)

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…ro. (JuliaLang#41923)

Use a tryfinally macro to ensure compile timer is closed, while maintaining scope.

This fixes a bug where an exception thrown _anywhere_, on any Task on
any thread, completely disables the compilation time measurement for any
concurrently executing `@time` blocks, without warning the user about
it.

Here are some examples of the current, broken behavior:
- Throwing (and catching) an exception disables compilation time
  measurement, even though the exception didn't escape, without warning
  the user - note 0 compilation time reported:
    ```julia
    julia> @time begin
              # The following exception disables compilation time measurement
              try throw("ha HAH!") catch end
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
      0.003950 seconds (2.17 k allocations: 170.093 KiB)
    4
    ```
- Throwing an exception while the first task is sleeping disables
  compilation time measurement, so this ends up incorrectly reporting 0
  compilation time:
    ```julia
    julia> f(n) = begin
              # while we're sleeping, someone throws an exception somewhere else :'(
              sleep(n)
              @eval module M ; f(x) = 2*x ; end
              @eval M.f(2)
          end
    f (generic function with 1 method)

    julia> f(0) # warmup
    WARNING: replacing module M.
    4

    julia> @async @time f(5)
    Task (runnable) @0x000000010f2c0780

    julia> throw("oh no sad")
    ERROR: "oh no sad"
    Stacktrace:
    [1] top-level scope
      @ REPL[9]:1

    WARNING: replacing module M.
      5.008401 seconds (1.92 k allocations: 120.515 KiB)
    ```

After this commit, both of those examples correctly report their
compilation time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants