Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add signal support #1026

Merged
merged 1 commit into from
Feb 3, 2016
Merged

Add signal support #1026

merged 1 commit into from
Feb 3, 2016

Conversation

undingen
Copy link
Contributor

@undingen undingen commented Dec 8, 2015

We check for signals on most calls to runtime functions in the llvm tier and the bjit and inside the interpreter on some additional places, because checking on every bytecode is a too large slowdown (>10%).
I don't like that there is different behavior between the different tiers but it was the best result I could accomplish.
This patch does more checks than pypy does see: https://bitbucket.org/pypy/pypy/issues/1841/signal-check-only-in-jump_absolute-when

I think the perf regression is not nice but I think we could accept it:

           django_template3.py             2.1s (6)             2.2s (4)  +2.8%
                 pyxl_bench.py             1.8s (6)             1.8s (4)  +4.2%
     sqlalchemy_imperative2.py             2.2s (6)             2.3s (4)  +3.0%
                pyxl_bench2.py             1.1s (6)             1.1s (4)  +0.8%
       django_template3_10x.py            13.9s (6)            14.3s (4)  +2.8%
             pyxl_bench_10x.py            13.5s (6)            13.8s (4)  +2.0%
 sqlalchemy_imperative2_10x.py            16.9s (6)            17.3s (4)  +2.7%
            pyxl_bench2_10x.py             9.3s (6)             9.3s (4)  +0.1%
                       geomean                 4.8s                 4.9s  +2.3%

I tried generating better code in the bjit (e.g. directly emitting the check or creating a special function which does the pendingCalls check with a special calling convention which does not modify any regs but nothing improved the perf...)

For fun I tried disabling the ticker check which get executed on every bytecode in cpython and saw about a 3.5% perf improvement there...

On additional not nice thing is that this currently does not add checks inside the llvm codegen if the called function is a capi function...

undingen added a commit to undingen/pyston_v1 that referenced this pull request Jan 7, 2016
This function let's one register a potential root range from the CAPI
Was part of the unmerged pyston#1026 but split out because pyston#1029 requires it too.
@kmod
Copy link
Collaborator

kmod commented Jan 25, 2016

Hey, sorry I didn't get to this before I left. I've lost all the context on this though -- should we close #1024?

@undingen
Copy link
Contributor Author

I closed the old wip PR implementation and rebased this one.
But perf is now even worse :-(

                                   upstream/master:  origin/signal_hand:
           django_template3.py             2.1s (4)             2.2s (4)  +4.2%
                 pyxl_bench.py             1.8s (4)             1.8s (4)  +3.5%
     sqlalchemy_imperative2.py             2.2s (4)             2.3s (4)  +4.4%
                pyxl_bench2.py             1.1s (4)             1.1s (4)  +3.6%
       django_template3_10x.py            13.8s (4)            14.4s (4)  +3.9%
             pyxl_bench_10x.py            13.4s (4)            13.7s (4)  +2.5%
 sqlalchemy_imperative2_10x.py            16.9s (4)            17.6s (4)  +4.1%
            pyxl_bench2_10x.py             9.1s (4)             9.6s (4)  +5.7%
                       geomean                 4.7s                 4.9s  +4.0%

@kmod
Copy link
Collaborator

kmod commented Jan 30, 2016

lgtm, though it looks like there are some merge conflicts with the frame-introspection change.

I'm not sure how much room we have to optimize this other than avoiding the checks -- LLVM produces pretty good code for this (keeps the address of the global in a register), but I think the sheer number of times we add those checks is what leads to the slowdown.

We check for signals on most calls to runtime functions in the llvm tier and the bjit and inside the interpreter on some additional places,
because checking on every bytecode is a too large slowdown (>10%).
undingen added a commit that referenced this pull request Feb 3, 2016
@undingen undingen merged commit dc91299 into pyston:master Feb 3, 2016
@undingen
Copy link
Contributor Author

undingen commented Feb 3, 2016

I fixed the merge conflicts.
Final numbers are now:

                                origin/signal_hand:  origin/signal_hand:
           django_template3.py             2.2s (4)             2.3s (4)  +1.6%
                 pyxl_bench.py             1.9s (4)             2.0s (4)  +2.4%
     sqlalchemy_imperative2.py             2.3s (4)             2.3s (4)  +1.2%
                pyxl_bench2.py             1.1s (4)             1.1s (4)  +2.3%
       django_template3_10x.py            14.9s (4)            15.0s (4)  +0.2%
             pyxl_bench_10x.py            14.4s (4)            14.8s (4)  +2.6%
 sqlalchemy_imperative2_10x.py            17.3s (4)            17.6s (4)  +1.5%
            pyxl_bench2_10x.py            10.0s (4)            10.0s (4)  -0.4%
                       geomean                 5.0s                 5.1s  +1.4%

I'm still not very happy about this patch but it should to the job until we come up with a nicer solution...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants