WIP: bjit: register functions with GDB #796

undingen · 2015-08-04T16:51:59Z

Tell GDB the function name and EH table of the jited function
this is extremely slow [0] and should only be used for debugging and the implementation is dangerous...

Tried different routes, elf template, using LLVM MC layer, but I only got this one to work :-(.
This maybe okay if it's disabled per default and one can enable it if needed.
Currently WIP but would like to know what if you guys think it's worth to clean this up.

[0] probably: https://code.google.com/p/v8-wiki/wiki/GDBJITInterface
GDB side of JIT Interface currently (as of GDB 7.2) does not handle registration of code objects very effectively. Each next registration takes more time: with 500 registered objects each next registration takes more than 50ms, with 1000 registered code objects - more than 300 ms. This problem was reported to GDB developers (http://sourceware.org/ml/gdb/2011-01/msg00002.html) but currently there is no solution available.

int(str) and int(float) don't always return ints (cant return longs, doh). If we call int() on a subclass of int, we should call its __int__ method in case the subclass overrode it.

This division is expensive; the divisor is always sizeof(char) or sizeof(Py_UNICODE), and it seems to be faster to do a branch and then possibly a shift.

- put it into a header file (and start including it) - move the grow-the-array part into a separate function to encourage the fast-path to get inlined.

Particularly for string slicing, where we would always memset the string data to zero, and then immediately memcpy it.

It was unused

- copy CPython's implementation (that uses C slots) - implement the C slots for str and list - avoid doing a division for non-step slices

Do this by adding "contains" to our codegen type system, and implement a special contains on the unboxedtuple type. This makes this operation quite a lot faster, but it looks like largely because we don't implement a couple optimizations that we should: - we create a new tuple object every time we hit that line - our generic contains code goes through compare(), which returns a box (since "<" and friends can return non-bools), but contains will always return a bool, so we have a bunch of extra boxing/unboxing We probably should separate out the contains logic from the rest of the comparisons, since it works quite differently and doesn't gain anything by being there.

Another 2.7.9 compatibility fix

some fixes and cleanups

optimize regex handling

Convert "a in (b, c)" to "a == b or a == c"

Preparation for finalizer code, refactoring some simple_destructor logic.

ie django_template minus the lexing. We are faster now on the lexing, but the parsing is where most of the time gets spent. Also, change this benchmark and django_lexing to have a unicode template. Usually django does that conversion automatically, but the templates bypass where that happens, and we end up doing a lot of extra unicode decoding.

The main capi calling convention is to box all the positional arguments into a tuple, and then pass the tuple to PyArg_ParseTuple along with a format string that describes how to parse out the arguments. This ends up being pretty wasteful and misses all of the fast argument-rearrangement that we are able to JIT out. These unicode functions are particularly egregious, since they use a helper function that ends up having to dynamically generate the format string to include the function name. This commit is a very simple change gets some of the common cases: in addition to the existing METH_O calling convention ('self' plus one positional arg), add the METH_O2 and METH_O3 calling conventions. Plus add METH_D1/D2/D3 as additional flags that can be or'd into the calling convention flags, which specify that there should some number of default arguments. This is pretty limited: - only handles up to 3 arguments / defaults - only handles "O" type specifiers (ie no unboxing of ints) - only allows NULL as the default value - doesn't give as much diagnostic info on error The first two could be handled by passing the format string as part of the function metadata instead of using it in the function body, though this would mean having to add the ability to understand the format strings. The last two issues are tricky from an API perspective since they would require a larger change to pass through variable-length data structures. So anyway, punt on those issues for now, and just use the simple flag approach. This cuts the function call overhead by about 4x for the functions that it's applied to, which are some common ones: string.count, unicode.count, unicode.startswith. (endswith, [r]find, and [r]index should all get updated as well)

They're not used any more, and even though they are empty NopLocks, they change the struct structure.

well, except that two fields were swapped, and there is an extra struct wrapper in there. But with some small changes we can now let capi code use the list macros for faster list manipulation.

And internStringMortal, which for now just resolves to internStringImmortal, but lets us mark strings that could eventually be collected. (We could use the same approach that CPython uses and have a string destructor that removes mortal strings from the intern table.)

The real benefit is that we intern the strings that end up getting used as attribute names, so we can compare them using pointer comparisons. It should also reduce the size overhead of hidden classes, since we no longer have to copy the string data into the hidden class.

speed calling of (some) capi code

reenable PyList macros

One of the downsides of the BoxedString-as-attributes change is that hidden classes become more complicated to gc-scan; try to claw some of that back.

Due to their non-standard bootstrapping, they ended up getting initialized back to NORMAL hidden classes. It's silly, but force them back to SINGLETON.

CMake already did this, and I think this is discrepancy is why if you did `make quick_check`, then `make check_dbg` would fail.

Add tip on how to debug infinite recursive bugs from tp_ slots.

Small bjit improvements + fixes the build

Import the ctypes module.

Less helpful than I thought for now -- the KeyErrors are thrown by a custom class that does `raise KeyError`, so we won't benefit from this until we can have jitted code throw (not just receive) capi exceptions.

And fix a bunch of resulting issues (mostly just some pretty benign assertion errors).

Add some function to old style class

Make Raise and Subscript exprs use capi exceptions

Haven't looked into it that much, but my guess is that it was introduced in some newer version of clang than what some people are using. It was added for the use of the clang-pgo build, but we ended up dropping that so it should be ok to drop this flag.

Enable `test slice` and add missing function.

Recursive printing

Some improvements to make test unary could pass

getiter: use tp_iter for non python types

Tell GDB the function name and EH table of the jited function this is extremely slow and should only be used for debugging. The implementation is dangerous...

kmod · 2015-08-04T21:25:01Z

Would it be easier if we scale back to only trying to get gdb to not barf? ie instead of trying to get the right function names (which we don't get already with the interpreter, so I think it's ok) we just try to make sure that the stackframes are right and that gdb doesn't start backtracing randomly through memory. I don't know why it doesn't get the rbp-based unwinding right, but maybe there's some trick we could do to convince it to do that, or maybe we could add one of those custom jit callbacks that would force it to use that.

kmod · 2015-08-04T21:29:00Z

oh, sorry didn't realize that we are omitting the frame pointer for bjit code. maybe we could try saving it and see if that helps gdb?

undingen · 2015-08-04T22:13:11Z

yes we omit the frame pointer but even with frame pointers the stacktrace was wrong :-(.
AFAIK omitting it hasn't made things worse.

I didn't think that using this interface is so slow. The bad perf of this one make the new JIT gdb plugin interface look nicer. I just don't like that the plugin has to be GPL licensed and manually loaded...

There is also this mozilla bugreport where the encounter the same issue with gdb backtraces on x64: https://bugzilla.mozilla.org/show_bug.cgi?id=642320:
Yes, both x86 and x86_64 preserve the frame pointer. Unfortunately, on x86_64 it doesn't do any good -- you can't get a backtrace with a JITted method on the stack, apparently because the x86_64 ABI says that you can't depend on having the frame pointer and so must use unwind info for this instead. I don't understand why gdb can't detect that no unwind information is available and fall back to using the frame pointer (before giving up entirely), and I suspect there's a worthwhile patch to gdb to fix that. jimb might know more.

Maybe there is a way to let gdb know about our EH tables....

kmod and others added 30 commits July 17, 2015 00:04

add cpython regex tests

069a701

int() continues to be tricky

1505cc6

int(str) and int(float) don't always return ints (cant return longs, doh). If we call int() on a subclass of int, we should call its __int__ method in case the subclass overrode it.

fix: need to check tp_getattro as well

7c4c909

Optimization to cpythons regex library

d95b70f

This division is expensive; the divisor is always sizeof(char) or sizeof(Py_UNICODE), and it seems to be faster to do a branch and then possibly a shift.

Make listAppendInternal inlineable

b44f8a5

- put it into a header file (and start including it) - move the grow-the-array part into a separate function to encourage the fast-path to get inlined.

Reduce unnecessary string memsets

ba389a2

Particularly for string slicing, where we would always memset the string data to zero, and then immediately memcpy it.

Remove ObjLookupCache.objptr

3658143

It was unused

Optimize PySequence_GetSlice

5bd967f

- copy CPython's implementation (that uses C slots) - implement the C slots for str and list - avoid doing a division for non-step slices

cmake dependency tracking workaround

5385cf7

tuple.contains tests identity not just equality

f57db82

Merge pull request pyston#711 from kmod/fix_707

6e2c06a

Another 2.7.9 compatibility fix

Merge pull request pyston#712 from kmod/perf2

baad890

some fixes and cleanups

Merge pull request pyston#713 from kmod/perf3

ef2d7ba

optimize regex handling

Merge pull request pyston#714 from kmod/perf4

65712e0

Convert "a in (b, c)" to "a == b or a == c"

Use weakref's original tp_dealloc.

fb2515a

Merge pull request pyston#666 from rudi-c/gcfinalizers3

e648044

Preparation for finalizer code, refactoring some simple_destructor logic.

stat unicode allocations

d2ffecb

Remove list locks

81b321d

They're not used any more, and even though they are empty NopLocks, they change the struct structure.

Our list format is the same as CPython's

f9326e1

well, except that two fields were swapped, and there is an extra struct wrapper in there. But with some small changes we can now let capi code use the list macros for faster list manipulation.

Merge pull request pyston#716 from kmod/perf2

4385b6b

speed calling of (some) capi code

Merge pull request pyston#717 from kmod/attr_perf

aba975d

reenable PyList macros

Optimize hidden class scanning

61830a3

One of the downsides of the BoxedString-as-attributes change is that hidden classes become more complicated to gc-scan; try to claw some of that back.

object and type didn't have singleton hidden classes

d78db72

Due to their non-standard bootstrapping, they ended up getting initialized back to NORMAL hidden classes. It's silly, but force them back to SINGLETON.

Some ContiguousMap additions/fixes

ef71d45

Make the makefile use absolute paths for testing

41bd148

CMake already did this, and I think this is discrepancy is why if you did `make quick_check`, then `make check_dbg` would fail.

kmod and others added 24 commits July 31, 2015 16:08

Merge pull request pyston#789 from rudi-c/docs_slots

bdf0764

Add tip on how to debug infinite recursive bugs from tp_ slots.

Merge pull request pyston#788 from undingen/bjit_opt2

2f60cce

Small bjit improvements + fixes the build

Merge pull request pyston#767 from rudi-c/ctypes

b04977a

Import the ctypes module.

Make raise statements in try blocks use capi exceptions

daa197c

Fast-path passing NULL kwargs

fc02e94

capi exceptions for getitems

ae77051

Less helpful than I thought for now -- the KeyErrors are thrown by a custom class that does `raise KeyError`, so we won't benefit from this until we can have jitted code throw (not just receive) capi exceptions.

Add a testing mode to force-use capi exceptions

0514bf3

And fix a bunch of resulting issues (mostly just some pretty benign assertion errors).

Merge pull request pyston#787 from Daetalus/old_style_class

3f88340

Add some function to old style class

Merge pull request pyston#790 from kmod/exceptions2

b82d141

Make Raise and Subscript exprs use capi exceptions

last commit before return to base :-)

b784a5c

re-enable test_slice

017fb54

add four slice builtin functions: indices, __hash__, compare, __reduce__

6fd62fd

enable test_unary.py

00df199

add compare support and __pos__ to complex

8f2ec50

add __pos__ to float

f07534a

use new callattr to find attribute function, and add error check

4881de1

Merge pull request pyston#791 from Daetalus/test_slice

ff6fe32

Enable `test slice` and add missing function.

Merge pull request pyston#774 from corona10/recursive

7c2b76c

Recursive printing

Merge pull request pyston#793 from Daetalus/test_unary

47f126a

Some improvements to make test unary could pass

getiter: use tp_iter for non python types

dcea152

microoptimize: PyDict_Contains(attrwrapper_cls, str)

baae318

Merge pull request pyston#794 from undingen/tp_iter

8ffb67a

getiter: use tp_iter for non python types

WIP: bjit: register functions with GDB

c2f0b3c

Tell GDB the function name and EH table of the jited function this is extremely slow and should only be used for debugging. The implementation is dangerous...

kmod added the wip label Aug 10, 2015

kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: bjit: register functions with GDB #796

WIP: bjit: register functions with GDB #796

undingen commented Aug 4, 2015

kmod commented Aug 4, 2015

kmod commented Aug 4, 2015

undingen commented Aug 4, 2015

WIP: bjit: register functions with GDB #796

Are you sure you want to change the base?

WIP: bjit: register functions with GDB #796

Conversation

undingen commented Aug 4, 2015

kmod commented Aug 4, 2015

kmod commented Aug 4, 2015

undingen commented Aug 4, 2015