Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938

Closed
wants to merge 1 commit into from

Conversation

heuermh
Copy link
Member

@heuermh heuermh commented Feb 11, 2016

Fixes #937

@heuermh
Copy link
Member Author

heuermh commented Feb 11, 2016

I'm not sure this is comprehensive enough, e.g. sometimes even Parquet files won't have .seqdict files

$ ./bin/adam-submit fasta2adam \
  adam-cli/src/test/resources/contigs.fa contigs.contig.adam

$ ./bin/adam-submit transform -aligned_read_predicate \
  contigs.contig.adam output.adam

Command body threw exception:
java.io.FileNotFoundException: File contigs.contig.adam.seqdict does not exist
  Exception in thread "main" java.io.FileNotFoundException:
    File contigs.contig.adam.seqdict does not exist

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1079/
Test PASSed.

@fnothaft
Copy link
Member

Sorry; should be checking that the file is a Parquet file of alignment records.

Alternatively, we could make this a ValidationStringency effect. E.g., if you have ValidationStringency.LENIENT, we give a RecordGroupDictionary.empty/SequenceDictionary.empty if they're missing.

@heuermh
Copy link
Member Author

heuermh commented Feb 13, 2016

Sorry; should be checking that the file is a Parquet file of alignment records.

What do you mean by this, flipping the logic of isNonParquet around?

@heuermh heuermh changed the title Adding check for aligned read predicate or limit projection flags and non-parquet input path [ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path Feb 13, 2016
@fnothaft
Copy link
Member

What do you mean by this, flipping the logic of isNonParquet around?

Oh, I just meant that it shouldn't check for any Parquet dataset. Rather, the check should be more specific and check that it is a Parquet dataset of AlignmentRecords.

@heuermh
Copy link
Member Author

heuermh commented Feb 13, 2016

Can't do that by filename check, so it would have to come later, instead of fail fast in the cli class.

@fnothaft
Copy link
Member

Ah, yes, fair. I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere. We can patch that later, if needed.

Do you want to do any of the ValidationStringency stuff I suggested earlier, or shall we skip that for now and possibly come back to it later?

@heuermh
Copy link
Member Author

heuermh commented Feb 13, 2016

I think we suggest "*.contig.adam" as a convention for files of NucleotideContigFragments, but we don't enforce it anywhere.

Yep, I cribbed from here
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala#L858

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

@fnothaft
Copy link
Member

I can add the check for *.contig.adam if useful. I think validation strategy might be overkill.

This is probably sufficient for now. We can add the two later if need be. I'll merge this shortly.

@heuermh heuermh modified the milestone: 0.19.0 Feb 16, 2016
@massie
Copy link
Member

massie commented Feb 24, 2016

Thanks, Michael.

@massie
Copy link
Member

massie commented Feb 24, 2016

Merged manually.

@massie massie closed this Feb 24, 2016
@heuermh heuermh deleted the issue-937 branch June 28, 2016 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants