-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #260 Partial python 3 compatibility #658
Issue #260 Partial python 3 compatibility #658
Conversation
Why is there a print here anyway?
The DCD parser is not compatible with python 3, this commit makes it optional on that version of python. As the LAMPPS coordinate parser uses the DCD parser, it is made optional as well. The TRZ parser requires some small modifications to be compatible. It is made optional until is can be tested.
The unicode representation for the Angstrom symbol in the dictionary used for unit conversion was wrong. While the escaping sequence was correct, it was used as a series of bytes rather than like a unicode string. Also, the same key was created twice as two escaping sequences were used (utf-16 and utf-8) and they were synonymous. The tests were passing because they were testing against the same series of bytes. This commit use a proper unicode string as key and remove the redundant definition. It also adds a test with the actual unicode symbol rather than an escaping sequence.
On python 2, user defined object are hashable by default unless the __hash__ method is overloaded to return None. On python 3, user objects are not hashable if the __eq__ method is overloaded. This commit defines a __hash__ method that returns a hash based on the sorted indices of the atoms involved in the topology object. This should reproduce the behavior advertised in the topology object documentation. Yet, it is different from the previous behavior. In the previous behavior, two different objects with the same atoms were equal for __eq__ but different regarding the hash.
* The test for format guessing ignore DCD, DATA, and LAMMPS on python 3 * fix some call to the `map` function that do not return a list on python 3 * other minor fixes
In python 3, numpy integers are not instances of int anymore.
* replace seek(0L) by seek(0) * use six for StringIO
* use six for cPickle * use the `next` builtin rather than the `next` method
An odd failure about StringIO remains in test_streamio.py on Python 3.
Read the TPR file as binary (mode='rb').
In Python 3, map returns a generator rather than a list. Make coordinates/test_dl_poly.py pass on python 3
Note also that tests cannot run with the |
@jbarnoud this looks awesome. You can spin out the things that aren't easy into more issues, (eg I don't mind fixing TRZ once I know what doesn't work) You can add in Py3 as an "allowed to fail" row in the matrix. I don't see any harm in having this on the develop build for now. It will let us see what needs doing to work towards passing |
|
||
def __hash__(self): | ||
"""Makes the object hashable""" | ||
return hash(self._cmp_key()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a hash needs to be a string, so you could just do hash(self.indices)
or maybe hash((self.__class__.__name__, self.indices))
. I think this hash will happen a lot, so I'd rather it was fast (not constructing strings each time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I changed the _cmp_key
method accordingly.
Tests that relies on DCD, LAMMPS, or TRZ test files ends with an error because the reader is not supported on python 3 yet. This creates a lot of noise in the nose report that hides the other failures. Hereby we make these tests being skipped if the parser is not loaded. Note that this reduce dramatically the test coverage on python 3!
bb51c39
to
0ad9078
Compare
The python 3 tests are allowed to fail so they will not mark pull requests as broken.
0ad9078
to
18beb92
Compare
Following the comment of @richardjgowers on MDAnalysis#658
Note however than some tests are skipped because they rely on files in unsupported formats.
I wouldn't bother skipping all the tests that currently don't work, I'd rather keep them as failing so we know what to fix. Once you're ready/driven insane we can merge this and split it up into many issues |
9db0c76
to
187272c
Compare
@richardjgowers A lot of tests fail because the DCD reader is not compatible. It makes the report impossible to read because of the noise. Of course failing tests should be reported as failing, but in that particular instance it is the same error that get reported over and over. I would like to fix the issue with |
@jbarnoud ready to review & merge? |
@richardjgowers Yes. I'll do an other PR for what remains to be done. |
if six.PY2: | ||
# DCD, LAMMPS, and TRZ are not supported on Python 3 yet | ||
formats += [ | ||
('DATA', mda.topology.LAMMPSParser.DATAParser, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a separate issue for each of these? (I've done DCD)
|
||
|
||
class TestAtom(TestCase): | ||
"""Tests of Atom.""" | ||
|
||
@dec.skipif(parser_not_found('DCD'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it ok to let all this stuff fail. We tell travis that a failed python3 test is ok for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting these test fail creates a lot of noise that make nose's report painful to read. Also all the errors generated by these tests are unrelated to what the test tries to asses. The correct semantics would be known failures, but they still display the traceback in the report which left the problem unresolved.
I think the amount of skipped tests is enough to show that there is a problem with the coverage. Also, as soon as the DCD parser works, these tests will not be skipped any more. And so even before we remove the explicit skip decorator.
Issue #260 Partial python 3 compatibility
This pull request brings a bunch of small fixes to improve MDAnalysis compatibility with Python 3. MDAnalysis can now be imported on python 3, and some tests pass.
This PR, among other things deals, with:
map
returning a generator on python 3cStringIO
being moved in theio
modulecPickle
being renamedpickle
iteritems
method of dict now pass throughsix
(see Apply Six fixes for Python 3 #632)next
method anymore, instead one should use thenext
built-in;A few changes may have side effects.
First of all, the DCD and LAMMPS coordinate parsers are made optional on python 3 but not on python 2. In practice, MDAnalysis will fail to load the parsers but will continue to run. The TRZ coordinate parser is also made optional. Indeed, there is a small incompatibility with python 3 and I was not sure about how to fix it.
Python 3 changes the conditions for an object to be hashable (and therefore usable as a dict key or as part of a set). In python 2, a user-defined object is hashable unless the
__hash__
method is overloaded to returnNone
. It is assumed that two objects that are equal (a == b
) will have the same hash. This assumption can break when a user overload the__eq__
method. This is why, in python 3, a user-defined object is not hashable by default is the__eq__
method get overloaded.MDAnalysis
, relies on instances ofTopologyObject
to be hashable so they can be parts of set. Yet the__eq__
method is overloaded, soTopologyObject
is not hashable in python 3. I wrote a__hash__
method forTopologyObject
and I wrote it so as two instances that are equal have the same hash. This differs from the behavior of these objects in python 2. Before this PR, in python 2, if two instances involve the same atoms but in reverse order, then they are equal but they have a different hash. With this PR, these instances would be equal, and with the same hash (so only one of them will be kept in a set). I am not sure that it is the expected behavior, but is seemed to be what was expected from the documentation.The following tests do pass:
The main reason for tests not to pass is that a lot of them rely on DCD test files. Since the DCD parser is not working on python 3, these tests do not pass.
An other reason for tests not to pass is that
util.anyopen
open text files as bytes rather than string. This is especially puzzling as I had to explicitly specify that TPR files had to be open as bytes. I am still investigating that point.Besides fixing all the small things, here is what remains to be done:
anyopen
problemI tried to understand each line I changed, but I may have overlooked things. So please, have a look.