Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Scikit-learn commit #1308

Open
wants to merge 4,712 commits into
base: master
Choose a base branch
from
Open

[WIP]Scikit-learn commit #1308

wants to merge 4,712 commits into from

Conversation

Daetalus
Copy link
Contributor

@Daetalus Daetalus commented Jul 27, 2016

In CPython list sort function, before the list get sort, it will reverse the list itself first. See here: https://github.com/python/cpython/blob/2.7/Objects/listobject.c#L2115

This will cause the incompatibility in below:

c = [(0, 0, 0), (1.0, 2, 0.0), (1.0, 3, 0.0)]
print("Original:")
print(c)
print("result:")
d = sorted(c, key=lambda x: x[0], reverse=True)
e = sorted(c, key=lambda x: x[0])
f = sorted(c, reverse=True)
print("Reverse with key:")
print(d)
print("Only has key:")
print(e)
print("Only has reverse")
print(f)

Pyston:

Original:
[(0, 0, 0), (1.0, 2, 0.0), (1.0, 3, 0.0)]
result:
Reverse with key:
[(1.0, 3, 0.0), (1.0, 2, 0.0), (0, 0, 0)]
Only has key:
[(0, 0, 0), (1.0, 2, 0.0), (1.0, 3, 0.0)]
Only has reverse
[(1.0, 3, 0.0), (1.0, 2, 0.0), (0, 0, 0)]

CPython && PyPy:

Original:
[(0, 0, 0), (1.0, 2, 0.0), (1.0, 3, 0.0)]
result:
Reverse with key:
[(1.0, 2, 0.0), (1.0, 3, 0.0), (0, 0, 0)]
Only has key:
[(0, 0, 0), (1.0, 2, 0.0), (1.0, 3, 0.0)]
Only has reverse
[(1.0, 3, 0.0), (1.0, 2, 0.0), (0, 0, 0)]

The differences is in Reverse with key.

CPython use TimSort, this algorithm need the list is descending. So for maintain the stability in reverse sort, it need to reverse the list first, then reverse again when the list got sorted.

So in order to let Pyston compatible with CPython. I add the second commit.

That's just my investigation, @undingen @kmod please correct me if I was wrong. Thanks!

References:

TimSort Java Implementation
TimSort CPython Implementation

undingen and others added 30 commits May 24, 2016 16:59
string is special in that it is a c++ type which has tp_as_number and tp_as_sequence.
This causes problems because when we fixup the slot dispatcher we will set the tp_as_number fields but not the
tp_as_sequence because setting both can cause problems.
Some extensions (e.g. numpy) require that we use the sq_* functions instead of nb_*.
Therefore clear the tp_as_number fields (except nb_remainder which cpython has set too because it is not part of
tp_as_sequence).
str: use tp_as_sequence instead of tp_as_number
that is just based off of pulling our latest release
PyObject_New: register the type if the type is not yet registered
The problem is that we emit an llvm "unreachable" instruction, and then
continue to emit other code, which fails the verifier.

endBlock(DEAD) is supposed to be the right way to handle that, but there
is some more work that would need to be done there to get that working
properly.

So just do the easy thing for now -- create a new BB so that it's ok to
emit more code.
We were doing a "call bumpUse() early" optimization to free up registers
when we can, but as a side-effect it looked to the refcounter like the
reference was done being used.
Not using it in this commit, just wanted to
get the unmodified version in so it's easier to see the changes.
This is for adding a guard on a non-immortal object, since we need
to be safe against that object getting deallocated and a different object
is allocated in its spot.

We had support for this already, but it leaked memory.  The biggest was
that we never freed our runtimeICs, so if those ended up getting any GC
references in them, then we would leak memory.  So I started freeing those,
but then that exposed the issue that the ICInvalidators expect that their
dependent ICs never get freed.  So I added back a mapping from ICSlotInfo->
ICInvalidators that reference them.
until I realize that it's because we were passing more tests than we expected.
The behavior changed in CPython 2.7.4, and Travis-CI runs 2.7.3.
Switch to CPython's descrobject.c
before we added a it as a module which made code fail
which does something like __builtins__["unicode"]
Some packaging / distributing updates
update list of failing cpython tests
exec, input: if globals has no __builtins__ add it as a dictwrapper
since we were installing it from the LLVM APT repo, which they
took down since it was getting too expensive.  I guess we can
just run with the gcc build until that situation gets resolved.
Boxiang Sun and others added 23 commits July 29, 2016 22:20
They end up generating "pass" statements with a lineno of 0, which
trips an assert later on.  This commit just sets them to have a lineno
of 1.

I'm not sure how to test this, since piping into stdin is supposed to
be treated as a file (not as the repl).  Though, we get that wrong right now.
Support empty lines on the repl
using cpython's `sys.flags` inplementation
enable `PyObject_Format` in `from_cpython/Objects/abstract.c`
+ use pyston::DenseMap to save a little more memory

this saves about 5%-10% of peak memory on django
hidden classes: split into subclasses to reduce memory consumption
Update to fewer-pyston-changes virtualenv
and change it to use a unordered_map because it uses less memory in this case and is faster
(I assume because it does not have align the key value tuples)
saves about 10% of peak memory on django
ScopeNameUsage merge dicts into a single big one
Comment out some part of listobject.c, use the CPython list sort and
apply some changes to existed Pyston code.
If BoxedList::allocated is -1, it means the items inside were changed.
Some CPython list functions need this to check exceptions.
Switch to CPython list sort(Not its list implementation)
this saves a lot of memory
delete the llvm module after code generation
ICSlotInfo: remove old invalidator entries
@Daetalus Daetalus force-pushed the sklearn_nexedi_1 branch 2 times, most recently from 94530a6 to e1e9870 Compare August 6, 2016 18:26
PyErr_BadInternalCall();
return NULL;
}
return setPop(static_cast<BoxedSet*>(set));
Copy link
Contributor

@undingen undingen Aug 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be return callCXXFromStyle<CAPI>(setPop, static_cast<BoxedSet*>(set));

@kmod kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants