-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should readers demand an n_atoms
kwarg
#733
Comments
I think a lot of the binary readers need the keyword in order to read their files, so Universe always passes the kwarg (and sometimes it's ignored). We then check that Reader and Topology had the same number of atoms here: mdanalysis/package/MDAnalysis/core/AtomGroup.py Line 4577 in c81ece3
|
If they truly need to know the number of atoms in advance we should make it an explicit part of the reader API. |
It's kind of there? But yeah, not completely clear |
OK so it is a requirement. This is ok. I can check the requirements then in the tests. |
Readers can ignore n_atoms in the list of arguments that they get so it is not a requirement for the reader that it must be processed in any way; that's why we used kwargs instead of explicit naming. See the Trajectory Reader specs in the Trajectory API docs. We don't really have a API for topology things... probably should have one with/after #363. |
I wrote a script to check which Readers currently require the
From these the PDB readers don't really need the kwarg. LAMMPS also looks like it can infer the number of atoms from the trajectory itself. For the TRJ and TRZ format we need number of atoms beforehand because they don't encode it separately in the trajectory file. From our implementation I also don't see a way that we could infer the number of atoms reliable from the trajectory. Personally I would like if the API to construct a Reader to be consistent between all formats. We have currently two formats which need a to know Test Scriptimport MDAnalysis as mda
from os.path import basename
from MDAnalysisTests.datafiles import (CRD, DCD, DMS, GRO, INPCRD, PDB, PQR,
TRJ, TRR, TRZ, GMS_ASYMOPT, DLP_CONFIG,
DLP_HISTORY, XTC, LAMMPScnt,
mol2_molecules, PDBQT_input, XYZ)
c = mda.coordinates
readers = {c.CRD.CRDReader: CRD,
c.DCD.DCDReader: DCD,
c.DMS.DMSReader: DMS,
c.GRO.GROReader: GRO,
c.INPCRD.INPReader: INPCRD,
c.PDB.PDBReader: PDB,
c.PDB.PrimitivePDBReader: PDB,
c.PDB.ExtendedPDBReader: PDB,
c.PQR.PQRReader: PQR,
c.TRJ.TRJReader: TRJ,
c.TRR.TRRReader: TRR,
c.TRZ.TRZReader: TRZ,
c.XTC.XTCReader: XTC,
c.XYZ.XYZReader: XYZ,
c.DLPoly.ConfigReader: DLP_CONFIG,
c.DLPoly.HistoryReader: DLP_HISTORY,
c.GMS.GMSReader: GMS_ASYMOPT,
c.MOL2.MOL2Reader: mol2_molecules,
c.PDBQT.PDBQTReader: PDBQT_input,
c.LAMMPS.DATAReader: LAMMPScnt,
}
for read, topo in readers.items():
# print("using Reader = {}".format(read))
try:
read(topo)
except Exception as e:
print("open failed for format {}".format(basename(topo)))
print("used Reader = {}".format(read))
print("Exception = {}", e)
print("") |
I second @orbeckst's view. Readers that don't need I would definitely not make it mandatory for every Reader. It brings unneeded work for the user for the majority of the cases. |
So I think I wrote DATAReader, and if I remember right I had in my mind that because n_atoms is read in the topology the Reader shouldn't have to do it. But yeah, upon reflection it makes more sense for the Reader to work standalone, so that's an issue. TRZReader I could get to work without n_atoms as each written fortran block is prefixed by its size in bytes, and you can figure out how many floats (==natoms). But that's a bit hacky. @kain88-de you've forgot TRJ.NCDF, it can calculate n_atoms on its own. I don't see a problem with an optional requirement, it just reflects the formats strengths/weaknesses |
The "problem" comes with the standardized tests If we are all OK to have Readers by default work without any topology information. I can keep As far as documentation goes for users. I would remove the hint about the |
I've started to add the new standard reader tests to our PDB tests. I've noticed now that
PrimitivePDBReader
andExtendedPDBReader
expect annatoms
kwarg. None of the other two readers (XDR, XYZ) I have adopted so far expects this kwarg. Does our topology system a check of then_atoms
read by the trajectory?Questions is should the
natom
kwarg be removed or become an explicit part of the reader API?The text was updated successfully, but these errors were encountered: