Skip to content

Commit

Permalink
Expose --subsample-seed parameter
Browse files Browse the repository at this point in the history
  • Loading branch information
Didion, John (NIH/NHGRI) [F] committed May 1, 2017
1 parent 6228a8c commit d4a2cb0
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 1 deletion.
8 changes: 7 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# Changes

v1.1.3 (dev)
v1.1.4 (dev)
------------
* Exposed option to set PRNG seed when subsampling reads.

v1.1.3 (2017.05.01)
-------------------
* Updated Dockerfile to use smaller, Alpine-based image.
* Added Docker image for v1.1.2 to Docker Hub.
* Updated Travis config to automatically build Docker images for each release.
* Ported over improvements to adapter parsing (635eea9) from Cutadapt.
* Fixed #12: tqdm progress bar not working.
* Fixed #13: unnecessary differences in summary output between Cutadapt and Atropos.

v1.1.2 (2017.04.12)
-------------------
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ We welcome any contributions via GitHub issues and pull requests. See the [docu
* Migrate to xphyle (https://github.com/jdidion/xphyle) for file management.
* Provide option for RNA-seq data that will trim polyA sequence.
* Accept multiple input files.
* Support SAM output.
* Expand the list of contaminants that are detected by default.
* Automate creation and sending of user statistics and crash reports using [pytattle](https://github.com/biologyguy/PyTattle).
* Accessibility:
Expand Down
4 changes: 4 additions & 0 deletions atropos/commands/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,9 @@ def __init__(self, options, summary_class=Summary):
# Wrap reader in subsampler
if options.subsample:
import random
if options.subsample_seed:
random.seed(options.subsample_seed)

def subsample(reader, frac):
"""Generator that yields a random subsample of records.
Expand All @@ -204,6 +207,7 @@ def subsample(reader, frac):
for reads in reader:
if random.random() < frac:
yield reads

reader = subsample(reader, options.subsample)

self.iterable = enumerate(reader, 1)
Expand Down
5 changes: 5 additions & 0 deletions atropos/commands/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,11 @@ def add_common_options(self):
"--subsample",
type=probability, default=None, metavar="PROB",
help="Subsample a fraction of reads. (no)")
group.add_argument(
"--subsample-seed",
type=int, default=None, metavar="SEED",
help="The seed to use for the pseudorandom number generator. Using"
"the same seed will result in the same subsampling of reads.")
group.add_argument(
"--batch-size",
type=int_or_str, metavar="SIZE",
Expand Down

0 comments on commit d4a2cb0

Please sign in to comment.