Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster next / finish by using rb_profile_frames #746

Merged
merged 4 commits into from
Nov 2, 2022

Conversation

WillHalto
Copy link
Contributor

Description

  • By using rb_profile_frames instead of rb_make_backtrace in DEBUGGER__.frame_depth, we can improve the performance of frame_depth substantially for common cases. This leads to much faster next and finish performance.
  • rb_profile_frames is notably faster than rb_make_backtrace as it returns pointers and does not perform allocations (which rb_make_backtrace does).
  • We must set a buffer size for rb_profile_frames which we have picked here as 4096. This should cover the vast majority of calls. (i.e. the common stack depth in our large codebase is in the hundreds). There is a fallback to rb_make_backtrace to support the very large stack depth cases.
    • Note: it would have been possible to use a smaller fixed size buffer to iterate over the entire stack with rb_profile_frames and not need the fallback, however there is currently an open issue regarding the "start" parameter not working properly, which would be required for that.

Co-authored by: @jhawthorn

Note
Currently, this approach will only work with ruby >= 3.0.
It appears that for ruby < 3.0, rb_profile_frames omits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict the rb_profile_frames approach to only be used for ruby >= 3.0.

Examples/validation

This file simulates a large callstack with deep recursion + method call:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Testing a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Baseline - no stepping:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.197s
user    0m0.151s
sys     0m0.052s

Stepping before this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.024s
user    0m1.987s
sys     0m0.040s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s
  • So this gives a ~95% improvement in the performance of next for the example case.
  • It also gives a ~80% performance improvement in the rare fallback case, where the stack depth exceeds 4096.
Expand to see the fallback results here

Extreme case: stack depth larger than BUFF_SIZE
We can test this with:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,4500) # line 13. Ensure stack depth > 4096

Baseline - no stepping:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.197s
user    0m0.187s
sys     0m0.013s

Stepping before this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m35.587s
user    0m35.536s
sys     0m0.045s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m7.375s
user    0m7.221s
sys     0m0.158s

^ It gives an ~80% performance improvement for this extreme example, where we hit the rb_make_backtrace fallback.

@ko1
Copy link
Collaborator

ko1 commented Sep 16, 2022

Basically I agree with the idea but at least on ruby 2.7 and before it doesn't work because it skips C method frames. I need to check more it is same or not on CRuby's code.

NOTE: next command can be more lightweight.

  • Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.
  • Using internal hack, we can get real stack depth.
  • Setting more clever breakpoints we can eliminate per-line traces for next command. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).

@WillHalto
Copy link
Contributor Author

WillHalto commented Sep 17, 2022

@ko1 Yeah, currently this will only work for ruby 3.0 and up

Note
Currently, this approach will only work with ruby >= 3.0.
It appears that for ruby < 3.0, rb_profile_frames omits c frames. So before moving forward, we would need to modify the logic here to support ruby < 3.0, or just restrict the rb_profile_frames approach to only be used for ruby >= 3.0.

Is there any modification we could do for ruby <= 2.7? I don't think there would be a way to use rb_profile_frames to get the stack depth for ruby <= 2.7 but I could be wrong.

Alternatively, perhaps we could restrict the logic so that the rb_profile_frames approach is only used for ruby >= 3.0 environments. At least then we could begin getting a performance improvement for those versions.

@WillHalto
Copy link
Contributor Author

  • Providing stack-depth API for debuggers (I forget to introduce it to older Rubys. I need to keep in mind to introduce it into 3.2.

That sounds good. Maybe that would be a more long term improvement and this optimization could be seen as a short term way to improve the performance.

  • Using internal hack, we can get real stack depth.

Is that something we could do now? Would that be a better way to optimize instead of using rb_profile_frames?

  • Setting more clever breakpoints we can eliminate per-line traces for next command. But I'm not sure it is valuable (especially if this kind of optimization techniques works well).

Interesting, I am not sure if it would be needed, see the performance improvement in #743 (comment). It seems to work well. It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.

@ko1
Copy link
Collaborator

ko1 commented Oct 4, 2022

It might also be difficult to handle exceptions/conditional logic when setting the breakpoints in this way.

Correct.

@WillHalto
Copy link
Contributor Author

@ko1 What do you think of this: 533fef0?diff=split&w ?

If we wanted to use the approach in this PR to make performance fast for ruby >= 3.0 we could do something like that.

That will allow use of rb_profile_frames in the ruby >=3.0 and use existing frame_depth implementation in other versions.

@ko1
Copy link
Collaborator

ko1 commented Oct 28, 2022

Github actions shows errors on Ruby 3.0?
Could you rebase it and fix it?

On Ruby 3.2, I'll prepare the new debug feature to get depth directly.

@WillHalto
Copy link
Contributor Author

Yeah, I can work on a fix for the failing versions 👍

@WillHalto
Copy link
Contributor Author

Alright @ko1, I've fixed this in 55a0115 and checks are all passing 👍

@ko1
Copy link
Collaborator

ko1 commented Nov 2, 2022

  1. can you fix the conflict?
  2. can you squash commits if the separation of the commits is not important? (or I can merge with squash)

On ruby < 3.0 rb_profile_frames omits c frames so we cannot use it to get frame_depth.
On ruby < 3.2.0 it adds an extra dummy frame in the output that must be accounted for.
@WillHalto
Copy link
Contributor Author

  1. can you fix the conflict?

Done.

  1. can you squash commits if the separation of the commits is not important? (or I can merge with squash)

@ko1 you can go ahead and merge with squash 👍

@ko1 ko1 merged commit a1213eb into ruby:master Nov 2, 2022
ko1 added a commit that referenced this pull request Nov 25, 2022
@ko1 ko1 mentioned this pull request Nov 25, 2022
ko1 added a commit that referenced this pull request Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants