-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1 A^3 boxes from EM structures in the PDB #2599
Comments
Yeah catching these boxes seems like a good idea and can’t really harm
anyone (except people with sub 1A boxes).
…On Mon, Mar 9, 2020 at 14:39, Irfan Alibay ***@***.***> wrote:
Context
This is related to this discussion:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/mdnalysis-discussion/IKYv18MqlFs
From a cursory look at a couple of systems (6NT5, 6CVM, 3JAC), it looks
like PDB files for EM structures have a tendency to have the following
CRYST1 records:
CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1
As dicussed in the dicussion thread, this is causing issues with periodic
selections as it is folding the system into a cube of 1 A.
Proposed solution
My solution here would be to catch cases where the CRYST1 record is
defining a 1 A cube, warn the user and make the system dimensionless.
Code to reproduce the behavior
Very slightly altered code from Lucas Siemons' post.
import MDAnalysis as mdaimport numpy as np
u = mda.Universe('6cvm.pdb')
cur_sele = u.select_atoms('around 0.1 (resid 4 and name CA and segid A)')
ref_ca = u.select_atoms('resid 4 and name CA and segid A')
print('sele len: ', len(cur_sele))for i in cur_sele:
dist = np.linalg.norm(i.position-ref_ca[0].position)
print('Distance: %f '%(dist), i)
Current version of MDAnalysis
- Which version are you using? (run python -c "import MDAnalysis as
mda; print(mda.__version__)") 0.20.2-dev0
- Which version of Python (python -V)? 3.7.6
- Which operating system? Ubuntu 18.04.3 LTS
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2599?email_source=notifications&email_token=ACGSGB4NWPD7HLCGCI2RWPDRGT5SXA5CNFSM4LEKGWN2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ITS5R4Q>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGB4BNB6JCEEE7GMAPOTRGT5SXANCNFSM4LEKGWNQ>
.
|
@siddharthjain1611 this issue is indeed currently free to work on if you wish to do so. I would however suggest possibly not over-extending yourself by looking to fix multiple issues concurrently (namely #2469), and instead focusing on one so that you can get a PR merged before the GSOC application window opens. Doing so means that it also doesn't block off issues from other prospective GSOC applicants. Edit: looks like the comment this was aimed at has now been deleted. |
Sir, I would first focus on the other issue, therefore, please feel free to assign someone else to this issue who is more worthy. |
Note:
Technically, it’s the users responsibility to provide correct input. In practice, this can easily lead to very wrong results when working with PDB files. I really dislike having to put a lot of special casing in readers but if
is a common thing in cryo-EM structures (which will soon make up a significant portion of the PDB) then we can’t just slip them through. Perhaps we should have a way to set the box to None (or whatever we use for no box) when we encounter either these magic box size or if we have another clear indication that the box is non-sensical. TODO
|
So the codes pasted above give no dimensions info when using We could try and do some sort of string match in the remarks if we really wanted, but I just think it's fine to warn and delete the box info. |
So to tackle the first "TODO" from @orbeckst's post, I had a quick thrawl through the RSCB PDB to investigate the frequency of cryo-EM PDB CRYST1 records matching:
Out of 2286 entries matching the a text search for "cryo-em", 2109 (~92%) matched the above CRYST1 record. I've not had a thorough look at the cases where the CRYST1 record was different, but in ~ dozen cases it was a case that the CRYST1 record was partially empty in some way, in ~ 50 cases there were no CRYST1 records at all, in the remainder of cases some kind of non-1 A^3 box was defined. |
I really don't want to start digging through REMARKS. Let's define what we mean by "a sensible box", document, throw it out & warn by default, and give user an option to keep whatever rubbish is in the file if they really want to. |
is someone currently working on this issue? |
@smrpn, as far as I can see there's no PR linked to this issue at the moment so by all means go ahead and put up a PR. |
Hello! |
@GomaruSai i'm not |
@GomaruSai as far as I'm aware no one has opened a PR on this, please do go ahead and open one if you'd like to. |
Hi! I'm trying this out. If you trace the issue back to groups.py in the core module, it seems like unless the periodic key in the dictionary is not specified as periodic=False (during select_atoms function call) it would take it as True by default. Could you (@IAlibay) kindly clarify this ? |
I was under the impression that @GomaruSai wanted to work on this issue.
I'm not sure I follow you fully here, but yes this is happening because periodicity is applied by default if a CRYST record exist in the PDB. That being said, we wouldn't want to change the default value of the periodicity flag. Instead we just want to pick up a case where an 1 A^3 box is defined as a possible EM structure that was given a default non-real CRYST record. On identifying this, we'd then just warn the user, and not set the box dimensions. |
I'm kind-of struggling to find the genesis of this default behavior on encountering CRYST. I found that the PDB.py has a section that reads CRYST1 but it doesn't explain the behavior. Any hints on what I might be missing? |
So the section you're probably looking for is: mdanalysis/package/MDAnalysis/coordinates/PDB.py Lines 391 to 400 in 5f51383
What we want to do is identify where the unit cell values being fed here define a 1 A^3 box, i.e. when the CRYST1 record is: CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1 .
|
Context
This is related to this discussion: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/mdnalysis-discussion/IKYv18MqlFs
From a cursory look at a couple of systems (6NT5, 6CVM, 3JAC), it looks like PDB files for EM structures have a tendency to have the following CRYST1 records:
CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1
As dicussed in the dicussion thread, this is causing issues with periodic selections as it is folding the system into a cube of 1 A.
Proposed solution
My solution here would be to catch cases where the CRYST1 record is defining a 1 A cube, warn the user and make the system dimensionless.
Code to reproduce the behavior
Very slightly altered code from Lucas Siemons' post.
Current version of MDAnalysis
python -c "import MDAnalysis as mda; print(mda.__version__)"
) 0.20.2-dev0python -V
)? 3.7.6The text was updated successfully, but these errors were encountered: