-
Notifications
You must be signed in to change notification settings - Fork 265
Conversation
* improved docstrings * removed required methods for subclasses to implement * an iterator is now passed in instead of constructed * workflow is determined up front, so functions dependent on options are now actually completely avoided * function priority can be assigned
* simplified get_workflow * updated license and unittest import * added all_wf_methods test
@requires(state=has_barcode) | ||
def _quality_index_ambiguity(self): | ||
barcode_characters = set(self.state['Barcode']) | ||
valid_characters = set(['A', 'T', 'G', 'C', 'a', 't', 'g', 'c']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should source from skbio
@josenavas brought up an excellent point which is that this lacks usage tests. I'll get this pushed in relatively soon. I haven't used it yet, but click provides a test runner which should be awesome. On an aside, the coverage on qiime/process_seqs.py was at like 98 or 99% though I know we don't currently track in qiime |
Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1163/ |
-b $PWD/lane1_barcode.fastq.gz -b $PWD/lane2_barcode.fastq.gz \ | ||
-m $PWD/map.txt -o slout_q20 --rev-comp-mapping-barcodes -q 20 | ||
""" | ||
print kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops...
Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1165/ |
@wasade, given the size of this, would it make sense for us to sit down together at ISME next week and discuss? |
Want me to skype in for this talk? On Mon, Aug 18, 2014 at 11:00 AM, Greg Caporaso notifications@github.com
|
@gregcaporaso agree; @walterst, that would be fantastic if you could join. This is not a small PR, and it definitely needs to be guided to a mergable state. |
Yes, that would be great. I believe ~9am in Korea will be around 5pm in On Mon, Aug 18, 2014 at 8:13 AM, Daniel McDonald notifications@github.com
|
Same On Mon, Aug 18, 2014 at 9:18 AM, Greg Caporaso notifications@github.com
|
I can do either, but I'd prefer Monday. I think it's actually 6:00 PM MDT On Mon, Aug 18, 2014 at 11:38 AM, Daniel McDonald notifications@github.com
|
Monday is going to be hard for me, as it's the first day of the conference, On Mon, Aug 18, 2014 at 10:39 AM, Tony notifications@github.com wrote:
|
Ah, to clarify, that would be Monday for me (6 PM), Tuesday for you guys On Mon, Aug 18, 2014 at 1:56 PM, Greg Caporaso notifications@github.com
|
Oh, right, sorry. That would work for me. @wasade? On Mon, Aug 18, 2014 at 10:57 AM, Tony notifications@github.com wrote:
|
Not aware of any issues, lets do it |
Perfect, thanks! On Mon, Aug 18, 2014 at 11:06 AM, Daniel McDonald notifications@github.com
|
@wasade, based on recent discussions I think we're not merging this, but planning to use it as the basis for demultiplexing in QIIME 2, and will start with this as a model of new QIIME 2 code - is that right? If so, should we close this PR while you port to the metoo repo? |
That's my impression as well. It would of course be nice to preserve the On Mon, Oct 27, 2014 at 2:21 PM, Greg Caporaso notifications@github.com
|
Ok, thanks. I agree that preserving the commit history would be ideal, but not sure how we can do that. |
NOT READY FOR MERGE YET
I apologize in advance for the size of this pull request, and for the size of the associated comment.
This pull request covers an attempt at redoing the core concept behind split libraries (incl. fastq) with a lot of input from @walterst early on. This is not a complete rewrite of
split_libraries.py
orsplit_libraries_fastq.py
, though it covers most of the latter. The intent was to handle some of the core and primary functionality, support fasta/fastq/whatever, develop an easy to extend and test framework, and then merge in additional support and options as we proceed forward, assuming the concept being used here makes sense as a replacement. At the end of this long description is an example of the command and the type of data that is yielded by theWorkflow
. The use of theWorkflow
object allows for:If the concept does make sense, then there are a few areas that I'd like help with for guiding this to completion. Specifically:
Workflow
, are there others that would be nice to have?split_libraries_fastq.py
surrounds manipulating inputs (e.g., rc of mapping file barcodes) or manipulating outputs (e.g., rc of reads being written). These are transforms that are external to theWorkflow
or sequence iterators. Can some of this logic be managed elsewhere?slib
when iterating theWorkflow
here is somewhat awkward, but that is currently the best way to handle sequences that fail.A few deviations:
--sample_ids
is not currently easy to support as theWorkflow
does not know what file a given sequence originated from at this time. Is it critical to support this option? If so, it can be done external to theWorkflow
.--retain_unassigned_reads
is just on all the time, much easier to do it this way. This currently includes all failed sequences as well. Is it necessary for the unassigned reads to be raw or should they be processed, and should this include failed reads (e.g., poor quality)?--store_qual_scores
is done only if--to-fastq
is used--sample_ids
, this can be done external to the workflow if this is criticalIn addition to the split libraries work, this PR also includes a new command line interface via click, and is stubbed out for a possible structure that can be used for other scripts. This is meant as a possible starting point for using click, and I'm very curious about what other developers think here. The primary point of execution is
scripts/qiime
. This file simply imports the click command group (qiime_cli
) and any other click commands, and additionally, makes the qiime config readily available to any other command viactx
. Though the commands aren't used directly withinscripts/qiime
, the import makes them part ofqiime_cli
. The next piece isqiime/cli.py
, which provides the qiime command group, and which we likely want to expand out as these are bits that will be common to qiime. Last, there is aqiime/click_commands.py
script that contains theslib
command. There were a few choices made here that should be explained:slib
method. The benefit is that those imports are only executed if the slib command is actually used. If they were at the script level, they would be imported on import ofslib
thereby adding overhead.ctx
is passed in, which makes it trivial to access a common qiime configWorkflow
takeskwargs
, but not all of thekwargs
are relevant for the workflow. As a result, the unused ones are popped off.slib
, the motivation to minimize import overhead ifslib
is not actually executed.Example output:
And finally, here is what one of the yielded items from the Workflow looks like (pprint'd) and the resulting stats tracked for processing a the test data: