Should readers demand an `n_atoms` kwarg #733

kain88-de · 2016-02-23T09:06:31Z

I've started to add the new standard reader tests to our PDB tests. I've noticed now that PrimitivePDBReader and ExtendedPDBReader expect an natoms kwarg. None of the other two readers (XDR, XYZ) I have adopted so far expects this kwarg. Does our topology system a check of the n_atoms read by the trajectory?

Questions is should the natom kwarg be removed or become an explicit part of the reader API?

The text was updated successfully, but these errors were encountered:

richardjgowers · 2016-02-23T10:26:12Z

I think a lot of the binary readers need the keyword in order to read their files, so Universe always passes the kwarg (and sometimes it's ignored).

We then check that Reader and Topology had the same number of atoms here:

mdanalysis/package/MDAnalysis/core/AtomGroup.py

Line 4577 in c81ece3

if self.trajectory.n_atoms != self.atoms.n_atoms:

kain88-de · 2016-02-23T10:45:15Z

If they truly need to know the number of atoms in advance we should make it an explicit part of the reader API.

richardjgowers · 2016-02-23T12:00:53Z

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/__init__.py#L430

It's kind of there? But yeah, not completely clear

kain88-de · 2016-02-23T12:23:56Z

OK so it is a requirement. This is ok. I can check the requirements then in the tests.

orbeckst · 2016-02-24T00:49:12Z

Readers can ignore n_atoms in the list of arguments that they get so it is not a requirement for the reader that it must be processed in any way; that's why we used kwargs instead of explicit naming. See the Trajectory Reader specs in the Trajectory API docs. We don't really have a API for topology things... probably should have one with/after #363.

kain88-de · 2016-02-25T09:48:44Z

I wrote a script to check which Readers currently require the n_atoms kwarg, see bottom.
The following readers currently trigger an exception if n_atoms isn't passed as an argument.

TRJReader, PrimitivePDBReader, ExtentedPDBReader, LAMMPS.DATAReader, TRZReader

From these the PDB readers don't really need the kwarg. LAMMPS also looks like it can infer the number of atoms from the trajectory itself. For the TRJ and TRZ format we need number of atoms beforehand because they don't encode it separately in the trajectory file. From our implementation I also don't see a way that we could infer the number of atoms reliable from the trajectory.

Personally I would like if the API to construct a Reader to be consistent between all formats. We have currently two formats which need a to know n_atoms to be parsed. Because of that I would say that all Readers require a n_atoms kwarg independent of if we can infer n_atoms from the trajectory or not.

Test Script

import MDAnalysis as mda
from os.path import basename

from MDAnalysisTests.datafiles import (CRD, DCD, DMS, GRO, INPCRD, PDB, PQR,
                                       TRJ, TRR, TRZ, GMS_ASYMOPT, DLP_CONFIG,
                                       DLP_HISTORY, XTC, LAMMPScnt,
                                       mol2_molecules, PDBQT_input, XYZ)


c = mda.coordinates
readers = {c.CRD.CRDReader: CRD,
           c.DCD.DCDReader: DCD,
           c.DMS.DMSReader: DMS,
           c.GRO.GROReader: GRO,
           c.INPCRD.INPReader: INPCRD,
           c.PDB.PDBReader: PDB,
           c.PDB.PrimitivePDBReader: PDB,
           c.PDB.ExtendedPDBReader: PDB,
           c.PQR.PQRReader: PQR,
           c.TRJ.TRJReader: TRJ,
           c.TRR.TRRReader: TRR,
           c.TRZ.TRZReader: TRZ,
           c.XTC.XTCReader: XTC,
           c.XYZ.XYZReader: XYZ,
           c.DLPoly.ConfigReader: DLP_CONFIG,
           c.DLPoly.HistoryReader: DLP_HISTORY,
           c.GMS.GMSReader: GMS_ASYMOPT,
           c.MOL2.MOL2Reader: mol2_molecules,
           c.PDBQT.PDBQTReader: PDBQT_input,
           c.LAMMPS.DATAReader: LAMMPScnt,
           }


for read, topo in readers.items():
    # print("using Reader = {}".format(read))
    try:
        read(topo)
    except Exception as e:
        print("open failed for format {}".format(basename(topo)))
        print("used Reader = {}".format(read))
        print("Exception = {}", e)
        print("")

mnmelo · 2016-02-25T10:09:04Z

I second @orbeckst's view. Readers that don't need n_atoms can just ignore the kwarg. We just need to properly document this aspect of the API.

I would definitely not make it mandatory for every Reader. It brings unneeded work for the user for the majority of the cases.
Plus, it raises the question: what happens when a user supplies n_atoms different from the trajectory's? Is the kwarg ignored? Then it makes little sense demanding it be present.

richardjgowers · 2016-02-25T10:17:45Z

So I think I wrote DATAReader, and if I remember right I had in my mind that because n_atoms is read in the topology the Reader shouldn't have to do it. But yeah, upon reflection it makes more sense for the Reader to work standalone, so that's an issue.

TRZReader I could get to work without n_atoms as each written fortran block is prefixed by its size in bytes, and you can figure out how many floats (==natoms). But that's a bit hacky.

@kain88-de you've forgot TRJ.NCDF, it can calculate n_atoms on its own.

I don't see a problem with an optional requirement, it just reflects the formats strengths/weaknesses

kain88-de · 2016-02-25T10:57:28Z

The "problem" comes with the standardized tests BaseReaderTest for the Readers I introduced last fall.
It expects that all Readers can be constructed with exactly the same calls, right now just a string pointing to a trajectory file. It wouldn't be to hard to overwrite the __init__ function for the formats that need to have the n_atoms kwarg.

If we are all OK to have Readers by default work without any topology information. I can keep BaseReaderTest as it is and add documentation what should happen for formats where we need additional information besides just the filename.

As far as documentation goes for users. I would remove the hint about the kwargs and say only that some formats need additional information from the topology to be read and that more information can be found in the doc-strings of the specific format readers.

orbeckst added question API Component-Readers labels Feb 24, 2016

richardjgowers mentioned this issue Jul 22, 2016

better documentation for Reader API and n_atoms kwarg #910

Merged

2 tasks

jbarnoud closed this as completed in #910 Jul 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should readers demand an `n_atoms` kwarg #733

Should readers demand an `n_atoms` kwarg #733

kain88-de commented Feb 23, 2016

richardjgowers commented Feb 23, 2016

kain88-de commented Feb 23, 2016

richardjgowers commented Feb 23, 2016

kain88-de commented Feb 23, 2016

orbeckst commented Feb 24, 2016

kain88-de commented Feb 25, 2016

mnmelo commented Feb 25, 2016

richardjgowers commented Feb 25, 2016

kain88-de commented Feb 25, 2016

Should readers demand an n_atoms kwarg #733

Should readers demand an n_atoms kwarg #733

Comments

kain88-de commented Feb 23, 2016

richardjgowers commented Feb 23, 2016

kain88-de commented Feb 23, 2016

richardjgowers commented Feb 23, 2016

kain88-de commented Feb 23, 2016

orbeckst commented Feb 24, 2016

kain88-de commented Feb 25, 2016

Test Script

mnmelo commented Feb 25, 2016

richardjgowers commented Feb 25, 2016

kain88-de commented Feb 25, 2016

Should readers demand an `n_atoms` kwarg #733

Should readers demand an `n_atoms` kwarg #733