-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use mrcfile library for MRC/CCP4 file parsing #100
Conversation
orbeckst
commented
Dec 30, 2021
- fix CCP4 file format is out of date #83
- use mrcfile https://github.com/ccpem/mrcfile (light-weight dependency) for file parsing
- new mrc module; functionality replaces CCP4 module
- keep CCP4 module for one release cycle
- add tests
Hello @orbeckst! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-02-20 21:01:37 UTC |
Codecov Report
@@ Coverage Diff @@
## master #100 +/- ##
==========================================
+ Coverage 87.69% 88.07% +0.37%
==========================================
Files 5 6 +1
Lines 821 855 +34
Branches 141 146 +5
==========================================
+ Hits 720 753 +33
Misses 60 60
- Partials 41 42 +1
Continue to review full report at Codecov.
|
For reviewers: I intend to ignore the PEP8 formatting warnings... if you feel strongly about it then I suggest we run everything through black from here on out. I don't have the time to spare to manually prettify code that started out pretty ugly anyway. |
@MDAnalysis/coredevs if you want to have a quick look then that would be much appreciated. If you think it's good enough then you can approve it — I am mainly looking for a second pair of eyes to avoid really stupid mistakes. If nobody has time I'll merge it by the end of the week latest in order to get a release out. |
I've penciled time in to review on Thursday if that's ok? (assuming no one else gets to review first) |
@orbeckst, thank you for implementing this. Doing so has been on my to-do list for a long time but never managed to get to it. The code looks good to me but I have two questions about functionality.
I can add writing MRC files for orthorhombic cells, if you like. Once I understand the issues for non-orthorhombic cells, I should be able to add read/write support for these as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything but the one unnecessary import is a question.
gridData/mrc.py
Outdated
from __future__ import absolute_import | ||
from six.moves import range | ||
|
||
import warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unecessary import? (unless I've missed something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
if filename is not None: | ||
self.filename = filename | ||
with mrcfile.open(filename) as mrc: | ||
if not mrc.is_volume(): #pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is calling validate worth it too? (not sure how much of an issue a non-valid mrc file is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to read a file with an iffy header and just get the data. Not sure, TBH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No implementing for now because I don't understand the implication/standard use.
if not mrc.is_volume(): #pragma: no cover | ||
raise ValueError( | ||
"MRC file {} is not a volumetric density.".format(filename)) | ||
self.header = h = mrc.header.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any difference between header and extended_header here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't know — experts??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://www.ccpem.ac.uk/mrc_format/mrc2014.php,
The extended header is now used by different software to hold various additional metadata instead, as indicated by the EXTTYP tag.
The EXTTYP
flag takes seven different values. You could make a copy incase the user wants it. Note that self.header
does not seem to be accessible from Grid
or, at least, it is not obvious to me how to access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to have access to header
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a personal-use case in mind. But, if it isn't difficult, it could be beneficial in cases we haven't thought of yet. Not everything in the header
ends up in a Grid
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding the header as undocumented Grid._mrc_header
. If it is used then we can think of a better way to keep format-specific data.
"supported, not " | ||
"alpha={0}, beta={1}, gamma={2}".format( | ||
h.cellb.alpha, h.cellb.beta, h.cellb.gamma)) | ||
# mrc.data[z, y, x] indexed: convert to x,y,z as used in GridDataFormats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do further manipulations here?
Note that the MRC format allows the data axes to be swapped using the header’s mapc, mapr and maps fields. This library does not attempt to swap the axes and simply assigns the columns to X, rows to Y and sections to Z. (The data array is indexed in C style, so data values can be accessed using mrc.data[z][y][x].) In general, EM data is written using the default axes, but crystallographic data files might use swapped axes in certain space groups – if this might matter to you, you should check the mapc, mapr and maps fields after opening the file and consider transposing the data array if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. If the file has mapc, mapr, maps = 3, 2, 1
, then the data should already be in C-order and np.transpose()
would be unnecessary. mapc, mapr, maps = 1, 2, 3
is Fortran-order. It might be best to check for these two cases and disallow the rest. np.transpose()
should be able to handle the other cases but it they are probably rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this needs to be fixed then. Perhaps that fixes #76, too.
Code welcome!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately, this and calculating the origin correctly did fix #76
gridData/tests/test_mrc.py
Outdated
]) | ||
def test_ccp4_read_header(ccp4data, name, value): | ||
if type(value) is float: | ||
assert_almost_equal(ccp4data.header[name], value, decimal=6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention is now to use assert_allclose
instead of assert_almost_equal
.
I fee like there should be a nicer way to do this comparison, but at the end of the day it works so there's no point trying to find the prettiest answer to this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YEs, can be changed, of course. I copied & pasted what we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept it,,, need sleep
Thanks for review and comments. This week I definitely won't have time to look at anything, sorry. |
@tluchko can we try to get the simple reading code in and then add the enhancements, namely writing? (EDIT: I'd ❤️ a code contribution from you for the writing) For the non-orthorhombic cells I can't quite remember why I restricted it to orthorhombic. There might be something in the Grid class that's not working with triclinic cells. I'd try to get this PR done with the limitation and then raise a new issue for triclinic cells. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested some code to deal with the different potential orderings of mrc.data
.
The code with sorted mapc/r/s transposed axes does not fix #76 . In ChimeraX
1cbs.ccp4 :
This looks as if mrcfile does indeed always reorder the data into z, x, y order. |
@tluchko and @IAlibay I addressed most of your comments and after a long time thinking about what mrcfile does and what the mapc/r/s parameter mean I managed to get the conversion of MRC data to Grid to work (and fixed the super-irritating #76 along the way). This should be good for a final review. If you can find a nicer way to write the double-transposition axes_order = np.hstack([h.mapc, h.mapr, h.maps])
axes_c_order = np.argsort(axes_order)
self.array = np.transpose(np.transpose(mrc.data), axes=axes_c_order) then let me know. However, just reversing |
Honestly, I don't know why reversing self.array = np.transpose(mrc.data, axes=axes_c_order[::-1]) gives the wrong shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested some revised code to avoid the double transpose of the data and match the delta and origin reported for test files in ChimeraX and VMD.
- fix #83 - use mrcfile https://github.com/ccpem/mrcfile (light-weight dependency) for file parsing - new mrc module; functionality replaces CCP4 module - keep CCP4 module for one release cycle - add docs for MRC - tests for mrc - basic tests (copied from CCP4 tests) - test mrc ValueError for triclinic boxes with EMD 3001 - test direct file init and read() on empty class - test almost all header elements for MRC/CCP4
- deprecation warning raised - test for warning - add deprecation to docs
- mrcfile stores data in z,y,x format: we use the information of the axis orientation in mapc, mapr. maps to orient the data in the GridDataFormats x,y,z coordinate system - correctly reorient and center coordinates from MRC - fix #76 - add tests - mrcfile only reshapes the data to nz,ny,nx but does not take the mapc, mapr, maps into account. For gridData we need to do the coordinate system reorientation after transposing the mrcfile.data to nx, ny, nz. - add test (using CCP4_1JZV from the test data which contains mapc, mapr, maps = 2, 1, 3 - manually checked 1cbs.ccp4 from issue #76 (also a 2, 1, 3) and orientation is now correct but there is still an offset error that needs to be fixed - update CHANGELOG (and add @tluchko) Co-authored-by: tluchko <31545346+tluchko@users.noreply.github.com>
- add versionchanged noting use of MRC instead of CCP4 - add undocumented Grid._mrc_header for storing the MRC header - add tests (for _mrc_header and generally checking that a correct Grid object is built from the MRC data) - code/doc cleanup
I rewrote the history so that this can be merged as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest here I don't really have enough time to fully understand what is being done with the MRC file transpose. From a quick look it seems to make sense and being no MRC expert I am happy to default to yours and @tluchko's expertise here.
I'm not sure what was meant by https://github.com/MDAnalysis/GridDataFormats/pull/100/files#r810475496 but I've added a couple of suggested changes for assert_almost_equal -> assert_allclose if it's still of interest to make the switch.
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
Following up on #100 (comment), @tluchko please open issues for
Thanks! |
Thank you @orbeckst! |