Skip to content

Light Parser for Sparse OTU Tables

adamrp edited this page Oct 17, 2013 · 2 revisions

There is an alternative lightweight parser available for sparse OTU tables. This version of the parser is faster and more memory efficient, but there are a few caveats:

  1. It can parse only sparse OTU tables -- it cannot parse dense tables, and it will treat all tables as OTU tables regardless of the actual table type;
  2. The parser works by reading 5000 characters of the input file at a time and looking for identifiers for particular fields. If the position of one or more of the identifiers being searched falls directly on a multiple of 5000, such that the entire identifier cannot be read at once, the parse will fail;
  3. The parser is not well-tested, but has produced the same results as the standard parser in all trials so far (as of 10-17-2013).
Notes
  • If you attempt to parse a dense biom table, it will fail silently!! All data will be missing from the table.
  • If you parse a biom table that is not an OTU table, the table will be converted into an OTU table; all special behaviors expected to be associated with other types of tables (metabolite table, gene table, pathway table, etc.) will not apply.
  • If one of the identifiers mentioned in (1) above cannot be found for any reason -- including reason stated above -- the parser will raise an exception with message "Did not find _____ section in biom file" (where ____ is the name of the identifier it could not find)

How to get this version of the light parser:

  1. If you don't already have the latest version of biom-format, get it by running
    git clone https://github.com/biom-format/biom-format.git
    Or if you already have biom-format and you need to update it to the latest version, run
    git pull https://github.com/biom-format/biom-format.git master
  2. Try merging the light parser with the latest version by running
git pull https://github.com/adamrp/biom-format.git update_light_parser

At this point, you may see that there are conflicts. If so, you should do

git checkout master

and revert to the last known compatible version by running

git checkout 296f95d223d5de7612b22498fa4e25232ebf90f9

and retry step (2) above.

You may get a message that you are in a detached HEAD state. At this point, you should run

git checkout -b update_light_parser

to save the light parser + compatible biom-format version as a branch.

The light parser is now active and will be used in place of the standard parser, e.g., when you run your usual QIIME commands. You can test the parser by invoking, e.g., summarize_taxa.py on any biom formatted OTU table:

summarize_taxa.py -i example.biom -o example_summarized_taxa

To change back to the standard parser, checkout the master branch instead of the update_light_parser branch.