Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use mrcfile library for MRC/CCP4 file parsing #100

Merged
merged 6 commits into from
Feb 20, 2022
Merged

use mrcfile library for MRC/CCP4 file parsing #100

merged 6 commits into from
Feb 20, 2022

Conversation

orbeckst
Copy link
Member

@pep8speaks
Copy link

pep8speaks commented Dec 30, 2021

Hello @orbeckst! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 169:80: E501 line too long (82 > 79 characters)
Line 206:80: E501 line too long (83 > 79 characters)

Line 118:80: E501 line too long (81 > 79 characters)

Line 96:63: E262 inline comment should start with '# '
Line 98:80: E501 line too long (80 > 79 characters)
Line 106:80: E501 line too long (81 > 79 characters)
Line 107:80: E501 line too long (84 > 79 characters)
Line 109:80: E501 line too long (89 > 79 characters)
Line 110:80: E501 line too long (87 > 79 characters)
Line 111:80: E501 line too long (84 > 79 characters)
Line 112:80: E501 line too long (89 > 79 characters)
Line 120:80: E501 line too long (98 > 79 characters)
Line 123:80: E501 line too long (102 > 79 characters)
Line 125:80: E501 line too long (83 > 79 characters)
Line 149:1: W391 blank line at end of file

Line 13:80: E501 line too long (84 > 79 characters)

Line 16:80: E501 line too long (90 > 79 characters)
Line 33:80: E501 line too long (90 > 79 characters)

Line 13:1: E302 expected 2 blank lines, found 1
Line 17:1: E302 expected 2 blank lines, found 1
Line 24:1: E302 expected 2 blank lines, found 1
Line 27:1: E302 expected 2 blank lines, found 1
Line 30:1: E302 expected 2 blank lines, found 1
Line 39:1: E303 too many blank lines (3)
Line 43:1: E302 expected 2 blank lines, found 1
Line 56:25: E128 continuation line under-indented for visual indent
Line 58:80: E501 line too long (89 > 79 characters)
Line 75:80: E501 line too long (92 > 79 characters)
Line 76:80: E501 line too long (92 > 79 characters)
Line 77:80: E501 line too long (92 > 79 characters)
Line 78:80: E501 line too long (92 > 79 characters)
Line 79:80: E501 line too long (92 > 79 characters)
Line 80:80: E501 line too long (92 > 79 characters)
Line 81:80: E501 line too long (92 > 79 characters)
Line 82:80: E501 line too long (92 > 79 characters)
Line 83:80: E501 line too long (92 > 79 characters)
Line 84:80: E501 line too long (93 > 79 characters)
Line 93:1: E302 expected 2 blank lines, found 1
Line 99:1: E302 expected 2 blank lines, found 1
Line 101:30: E203 whitespace before ','
Line 101:41: E202 whitespace before ']'
Line 102:19: E203 whitespace before ','
Line 102:41: E202 whitespace before ']'
Line 103:19: E203 whitespace before ','
Line 103:30: E203 whitespace before ','
Line 105:1: E302 expected 2 blank lines, found 1
Line 110:75: E202 whitespace before ']'
Line 112:1: E302 expected 2 blank lines, found 1
Line 118:1: E302 expected 2 blank lines, found 1

Comment last updated at 2022-02-20 21:01:37 UTC

@codecov
Copy link

codecov bot commented Dec 30, 2021

Codecov Report

Merging #100 (3129570) into master (01dfa7d) will increase coverage by 0.37%.
The diff coverage is 97.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
+ Coverage   87.69%   88.07%   +0.37%     
==========================================
  Files           5        6       +1     
  Lines         821      855      +34     
  Branches      141      146       +5     
==========================================
+ Hits          720      753      +33     
  Misses         60       60              
- Partials       41       42       +1     
Impacted Files Coverage Δ
gridData/__init__.py 100.00% <ø> (ø)
gridData/mrc.py 96.96% <96.96%> (ø)
gridData/CCP4.py 86.66% <100.00%> (+0.14%) ⬆️
gridData/core.py 94.40% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01dfa7d...3129570. Read the comment docs.

@orbeckst
Copy link
Member Author

For reviewers: I intend to ignore the PEP8 formatting warnings... if you feel strongly about it then I suggest we run everything through black from here on out. I don't have the time to spare to manually prettify code that started out pretty ugly anyway.

@orbeckst
Copy link
Member Author

orbeckst commented Jan 3, 2022

@MDAnalysis/coredevs if you want to have a quick look then that would be much appreciated. If you think it's good enough then you can approve it — I am mainly looking for a second pair of eyes to avoid really stupid mistakes.

If nobody has time I'll merge it by the end of the week latest in order to get a release out.

@IAlibay
Copy link
Member

IAlibay commented Jan 3, 2022

If nobody has time I'll merge it by the end of the week latest in order to get a release out.

I've penciled time in to review on Thursday if that's ok? (assuming no one else gets to review first)

@tluchko
Copy link
Contributor

tluchko commented Jan 8, 2022

@orbeckst, thank you for implementing this. Doing so has been on my to-do list for a long time but never managed to get to it.

The code looks good to me but I have two questions about functionality.

  1. Why are non-orthorhombic unit cells not supported? It seems they are supported for the DX format, as I get no complaints when I load such a file.
  2. Can we add support for writing MRC files? I played around with this and I think it will be straightforward. However, I don't know how to recover the unit cell angles for non-orthorhombic from a Grid object.

I can add writing MRC files for orthorhombic cells, if you like. Once I understand the issues for non-orthorhombic cells, I should be able to add read/write support for these as well.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything but the one unnecessary import is a question.

gridData/CCP4.py Show resolved Hide resolved
gridData/tests/datafiles/__init__.py Show resolved Hide resolved
gridData/core.py Show resolved Hide resolved
gridData/mrc.py Outdated
from __future__ import absolute_import
from six.moves import range

import warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unecessary import? (unless I've missed something)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

if filename is not None:
self.filename = filename
with mrcfile.open(filename) as mrc:
if not mrc.is_volume(): #pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is calling validate worth it too? (not sure how much of an issue a non-valid mrc file is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to read a file with an iffy header and just get the data. Not sure, TBH.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No implementing for now because I don't understand the implication/standard use.

if not mrc.is_volume(): #pragma: no cover
raise ValueError(
"MRC file {} is not a volumetric density.".format(filename))
self.header = h = mrc.header.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any difference between header and extended_header here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know — experts??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://www.ccpem.ac.uk/mrc_format/mrc2014.php,

The extended header is now used by different software to hold various additional metadata instead, as indicated by the EXTTYP tag.

The EXTTYP flag takes seven different values. You could make a copy incase the user wants it. Note that self.header does not seem to be accessible from Grid or, at least, it is not obvious to me how to access it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to have access to header ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a personal-use case in mind. But, if it isn't difficult, it could be beneficial in cases we haven't thought of yet. Not everything in the header ends up in a Grid object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding the header as undocumented Grid._mrc_header. If it is used then we can think of a better way to keep format-specific data.

gridData/mrc.py Show resolved Hide resolved
"supported, not "
"alpha={0}, beta={1}, gamma={2}".format(
h.cellb.alpha, h.cellb.beta, h.cellb.gamma))
# mrc.data[z, y, x] indexed: convert to x,y,z as used in GridDataFormats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do further manipulations here?

Note that the MRC format allows the data axes to be swapped using the header’s mapc, mapr and maps fields. This library does not attempt to swap the axes and simply assigns the columns to X, rows to Y and sections to Z. (The data array is indexed in C style, so data values can be accessed using mrc.data[z][y][x].) In general, EM data is written using the default axes, but crystallographic data files might use swapped axes in certain space groups – if this might matter to you, you should check the mapc, mapr and maps fields after opening the file and consider transposing the data array if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. If the file has mapc, mapr, maps = 3, 2, 1, then the data should already be in C-order and np.transpose() would be unnecessary. mapc, mapr, maps = 1, 2, 3 is Fortran-order. It might be best to check for these two cases and disallow the rest. np.transpose() should be able to handle the other cases but it they are probably rare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this needs to be fixed then. Perhaps that fixes #76, too.

Code welcome!!!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, this and calculating the origin correctly did fix #76

])
def test_ccp4_read_header(ccp4data, name, value):
if type(value) is float:
assert_almost_equal(ccp4data.header[name], value, decimal=6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is now to use assert_allclose instead of assert_almost_equal.

I fee like there should be a nicer way to do this comparison, but at the end of the day it works so there's no point trying to find the prettiest answer to this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YEs, can be changed, of course. I copied & pasted what we had before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept it,,, need sleep

@orbeckst
Copy link
Member Author

Thanks for review and comments. This week I definitely won't have time to look at anything, sorry.

@orbeckst
Copy link
Member Author

orbeckst commented Jan 11, 2022

@tluchko can we try to get the simple reading code in and then add the enhancements, namely writing? (EDIT: I'd ❤️ a code contribution from you for the writing)

For the non-orthorhombic cells I can't quite remember why I restricted it to orthorhombic. There might be something in the Grid class that's not working with triclinic cells. I'd try to get this PR done with the limitation and then raise a new issue for triclinic cells.

Copy link
Contributor

@tluchko tluchko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some code to deal with the different potential orderings of mrc.data.

gridData/mrc.py Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member Author

The code with sorted mapc/r/s transposed axes does not fix #76 . In ChimeraX

  • Opened 1cbs.ccp4 as 2, grid size 81,70,96, pixel 0.601,0.595,0.588, (= y, x, z)
  • Opened 1cbs.dx as 3, grid size 70,96,81, pixel 0.601,0.595,0.588 (=x, z, y)

1cbs.ccp4 :

  • mapc, mapr, maps: 2, 1, 3.
  • nx, ny, nz: 70, 81, 96
  • mrc.data.shape = (96, 81, 70) (automatically reordered into z, x, y)

This looks as if mrcfile does indeed always reorder the data into z, x, y order.

orbeckst added a commit that referenced this pull request Feb 19, 2022
@orbeckst orbeckst linked an issue Feb 19, 2022 that may be closed by this pull request
@orbeckst
Copy link
Member Author

@tluchko and @IAlibay I addressed most of your comments and after a long time thinking about what mrcfile does and what the mapc/r/s parameter mean I managed to get the conversion of MRC data to Grid to work (and fixed the super-irritating #76 along the way). This should be good for a final review.

If you can find a nicer way to write the double-transposition

            axes_order = np.hstack([h.mapc, h.mapr, h.maps])
            axes_c_order = np.argsort(axes_order)
            self.array = np.transpose(np.transpose(mrc.data), axes=axes_c_order)

then let me know. However, just reversing axes_c_order does not work as far as I can tell... there must be a way to permute the axes_c_order directly but I am too tired to figure it out. (The first transpose is always (210) (rearrange zyx to xyz), the second one depends on the mapc/r/s values and is in the test (102) (x is really y, y is really x, and z is z). The end result needs to be (201) ((zyx) -> (yxz)).

@orbeckst
Copy link
Member Author

Honestly, I don't know why reversing axes_order_c did not work for me because (102) reversed is (201) and that's also what you get when applying (210) to (102). But

self.array = np.transpose(mrc.data, axes=axes_c_order[::-1])

gives the wrong shape.

Copy link
Contributor

@tluchko tluchko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some revised code to avoid the double transpose of the data and match the delta and origin reported for test files in ChimeraX and VMD.

gridData/mrc.py Outdated Show resolved Hide resolved
orbeckst and others added 4 commits February 20, 2022 12:00
- fix #83
- use mrcfile https://github.com/ccpem/mrcfile (light-weight dependency) for file parsing
- new mrc module; functionality replaces CCP4 module
- keep CCP4 module for one release cycle
- add docs for MRC
- tests for mrc
  - basic tests (copied from CCP4 tests)
  - test mrc ValueError for triclinic boxes with EMD 3001
  - test direct file init and read() on empty class
  - test almost all header elements for MRC/CCP4
- deprecation warning raised
- test for warning
- add deprecation to docs
- mrcfile stores data in z,y,x format: we use the information
  of the axis orientation in mapc, mapr. maps to orient the
  data in the GridDataFormats x,y,z coordinate system
- correctly reorient and center coordinates from MRC
  - fix #76
  - add tests
- mrcfile only reshapes the data to nz,ny,nx but does not take
  the mapc, mapr, maps into account. For gridData we need to
  do the coordinate system reorientation after transposing the
  mrcfile.data to nx, ny, nz.
- add test (using CCP4_1JZV from the test data which contains
  mapc, mapr, maps = 2, 1, 3
- manually checked 1cbs.ccp4 from issue #76 (also a 2, 1, 3)
  and orientation is now correct but there is still an offset error
  that needs to be fixed
- update CHANGELOG (and add @tluchko)

Co-authored-by: tluchko <31545346+tluchko@users.noreply.github.com>
- add versionchanged noting use of MRC instead of CCP4
- add undocumented Grid._mrc_header for storing the MRC header
- add tests (for _mrc_header and generally checking that a correct
  Grid object is built from the MRC data)
- code/doc cleanup
@orbeckst
Copy link
Member Author

I rewrote the history so that this can be merged as is.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest here I don't really have enough time to fully understand what is being done with the MRC file transpose. From a quick look it seems to make sense and being no MRC expert I am happy to default to yours and @tluchko's expertise here.

I'm not sure what was meant by https://github.com/MDAnalysis/GridDataFormats/pull/100/files#r810475496 but I've added a couple of suggested changes for assert_almost_equal -> assert_allclose if it's still of interest to make the switch.

gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
gridData/tests/test_mrc.py Outdated Show resolved Hide resolved
CHANGELOG Outdated Show resolved Hide resolved
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@orbeckst orbeckst self-assigned this Feb 20, 2022
@orbeckst orbeckst merged commit 07b4083 into master Feb 20, 2022
@orbeckst orbeckst deleted the mrcfile branch February 20, 2022 21:10
@orbeckst
Copy link
Member Author

Thank you @tluchko and @IAlibay !

@orbeckst
Copy link
Member Author

Following up on #100 (comment), @tluchko please open issues for

  • writing MRC files
  • handling of triclinic boxes

Thanks!

@tluchko
Copy link
Contributor

tluchko commented Feb 20, 2022

Thank you @orbeckst!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CCP4 file format is out of date Wrong conversion from CCP4 to DX
4 participants