-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBase seperate module PR #1388
HBase seperate module PR #1388
Conversation
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good update! You'll want to update jenkins-test
to run the HBase profile.
<groupId>org.bdgenomics.utils</groupId> | ||
<artifactId>utils-cli_${scala.version.prefix}</artifactId> | ||
</dependency> | ||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to trim out some of these dependencies, e.g., intervalrdd, fastutil, etc.
@@ -0,0 +1,18 @@ | |||
# Root logger option | |||
log4j.rootLogger=INFO, stderr, logfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this in addition to the adam-core log4j property file? As an aside, I think that @ryan-williams was making a compelling argument recently for not committing log4j property files, but I can't quite remember it.
package org.bdgenomics.adam.hbase | ||
|
||
import java.io.ByteArrayOutputStream | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: space
import org.bdgenomics.adam.rich.RichVariant | ||
import org.bdgenomics.formats.avro._ | ||
import org.bdgenomics.adam.models.{ ReferencePosition, ReferenceRegion, SequenceDictionary, VariantContext } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: space.
* | ||
* @param tableDescriptorMeta | ||
*/ | ||
private[hbase] def createTable(tableDescriptorMeta: HTableDescriptor) { admin.createTable(tableDescriptorMeta) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break long line here and in the next function.
Also, add docs for parameters.
private[hbase] object KeyStrategy1 extends KeyStrategy[KeyStrategy1rowKeyInfo, KeyStrategy1RangePrefixInfo] { | ||
def getKey(rowKeyInfo: KeyStrategy1rowKeyInfo): Array[Byte] = { | ||
|
||
println("rowkey: " + "%s_%010d_%s_%s_%d".format(rowKeyInfo.contigName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nix println.
@@ -0,0 +1,57 @@ | |||
##fileformat=VCFv4.1 | |||
##FILTER=<ID=IndelFS,Description="FS > 200.0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to copy this file, you should be able to use the copyResource
function in ADAMFunSuite
.
import org.mockito.Mockito._ | ||
|
||
/** | ||
* Created by paschallj on 11/25/16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nix author comments.
@@ -565,6 +591,31 @@ | |||
<artifactId>scala-guice_${scala.version.prefix}</artifactId> | |||
<version>4.0.1</version> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can actually move these dependencies into the hbase profile as well.
<!-- distribution should be last --> | ||
<module>adam-hbase</module> | ||
</modules> | ||
<build> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need the build bits in the profile.
Added HBaseFunctions Added VCF genotype save and load Hbase functions Added saveHBaseGenotypesSingleSample packing multiple genotypes at same position and samle allele into same hbase value Added multisample capable loadHBaseGenotypes() Removed commented-out earlier versions of save and load genotypes removed more dead code Clean up formatting - limit line length Added saveHbaseSampleMetadata function Added save and load SequenceDictionary functions to HBaseFunctions Added createHbaseGenotypeTable Adding loadGenotypeRddFromHBase in progress updates to multi-sample hbase save multi sample VCF save and load now working Added repartitioning parameter tp hbase genotype load Added comments identifying the public api vs helper functions COB Aug 25 Added genomic start and stop parameters to loadGenotypesFromHBaseToGenotypeRDD Added boolean saveSequenceDictionary toggle parameter to saveVariantContextRddToHBase fixed start, stop null ptr exeception first steps in adding hbaseToVariantcontextRDD Changed region query to use ADAM ReferenceRegion Added custom HBaseEncoder1 save function Added custom Encoder1 Hbase loader Added Encoder1 hbase variant context loader Working - before rowkey int Changed end in key to be size, added data block encoding in create table Added create table splits Removed dead code of encoder2 Added option to repartion vcrdd before saving to HBase Added bulk load save to Hbase option changed to cdh hbase api depedencies in POM allow sample name file list as input to load functions made sample_ids lis parameter in load optional Added deleteSamplesFromHBase function Fixed bulk delete and made loadVariantContext work even when requested samplids are missing Removed code from failed version of sample delete function Moved delete function up with test of genotype code Fixed errors after rebase against master small formatting cleanup first pass hbase cli demo second pass hbase cli demo remove saveSeqDict check add seq dict id to cli import clean up removed undeeded demo and temp.js code Ran ./scripts/format-source due to build failure on Jenkins changed Hbase dependencies to provided addressed some review issues Changed hbases dependencies back to NOT being provided fixed None.get first step key Strategy pattern second step key Strategy pattern more cleanup from review made private functions package private for hbase Factored out hbase connection Factored out hbase connection - fix1 Factored out hbase connection - cleanup Removed the non-bulk load hbase function step one adding tests first step implement DAO step2 DAO step3 DAO Added first loadHBase Test Added some javadoc use dao everywhere added more javadoc step 1 in adding save hbase test step 2 in adding save hbase test clean up hbase doc add more assertions to hbase load test improved POM and imports in HBaseFunctions fixed more PR suggestions
Test PASSed. |
I find that to access the class (using the code from this PR) I'm not sure how the POM needs to modified so that adam-shell can see the hbase module, suggestions @heuermh ? |
I suppose we need a fix for #1349 and/or a conditional dependency on |
I would personally go with:
Since the |
Closing this in favor of #1878. |
Testing to see if tests succeed with Jenkins.
Not ready for review yet.