You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While submitting #904, I was looking at this bit of the code in biom/parse.py function parse_uc (Create a Table object from a uclust/usearch/vsearch uc file):
That's a really interesting observation. I think we're safe here though as we are not testing for the presence of missing keys, but instead incrementing specific keys (see https://github.com/biocore/biom-format/blob/2.1.14/biom/parse.py#L338). In PyCQA/flake8-bugbear#323, I think what's driving the memory bloat is the if threshold < counts[str(x)] check forcing the creation of a key : default value pair.
Probably OK then - yes, the memory bloat is if you try to access the missing entries (because it then adds a 0 entry with the key), made worse if you use long strings as keys as I was in my own code.
While submitting #904, I was looking at this bit of the code in
biom/parse.py
functionparse_uc
(Create a Table object from a uclust/usearch/vsearch uc file):https://github.com/biocore/biom-format/blob/2.1.14/biom/parse.py#L282
I think it would be more memory efficient to replace
defaultdict(int)
withCounter()
, see PyCQA/flake8-bugbear#323However, that probably deserves a little benchmarking by someone familiar with the code to see if it actually matters for the memory overhead here?
The text was updated successfully, but these errors were encountered: