Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBase seperate module PR #1388

Closed
wants to merge 8 commits into from
Closed

HBase seperate module PR #1388

wants to merge 8 commits into from

Conversation

jpdna
Copy link
Member

@jpdna jpdna commented Feb 13, 2017

Testing to see if tests succeed with Jenkins.
Not ready for review yet.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1783/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1784/
Test PASSed.

Copy link
Member

@fnothaft fnothaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good update! You'll want to update jenkins-test to run the HBase profile.

<groupId>org.bdgenomics.utils</groupId>
<artifactId>utils-cli_${scala.version.prefix}</artifactId>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to trim out some of these dependencies, e.g., intervalrdd, fastutil, etc.

@@ -0,0 +1,18 @@
# Root logger option
log4j.rootLogger=INFO, stderr, logfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in addition to the adam-core log4j property file? As an aside, I think that @ryan-williams was making a compelling argument recently for not committing log4j property files, but I can't quite remember it.

package org.bdgenomics.adam.hbase

import java.io.ByteArrayOutputStream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space

import org.bdgenomics.adam.rich.RichVariant
import org.bdgenomics.formats.avro._
import org.bdgenomics.adam.models.{ ReferencePosition, ReferenceRegion, SequenceDictionary, VariantContext }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space.

*
* @param tableDescriptorMeta
*/
private[hbase] def createTable(tableDescriptorMeta: HTableDescriptor) { admin.createTable(tableDescriptorMeta) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break long line here and in the next function.
Also, add docs for parameters.

private[hbase] object KeyStrategy1 extends KeyStrategy[KeyStrategy1rowKeyInfo, KeyStrategy1RangePrefixInfo] {
def getKey(rowKeyInfo: KeyStrategy1rowKeyInfo): Array[Byte] = {

println("rowkey: " + "%s_%010d_%s_%s_%d".format(rowKeyInfo.contigName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix println.

@@ -0,0 +1,57 @@
##fileformat=VCFv4.1
##FILTER=<ID=IndelFS,Description="FS > 200.0">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to copy this file, you should be able to use the copyResource function in ADAMFunSuite.

import org.mockito.Mockito._

/**
* Created by paschallj on 11/25/16.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix author comments.

@@ -565,6 +591,31 @@
<artifactId>scala-guice_${scala.version.prefix}</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can actually move these dependencies into the hbase profile as well.

<!-- distribution should be last -->
<module>adam-hbase</module>
</modules>
<build>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need the build bits in the profile.

Added HBaseFunctions

Added VCF genotype save and load Hbase functions

Added saveHBaseGenotypesSingleSample packing multiple genotypes at same position and samle allele into same hbase value

Added multisample capable loadHBaseGenotypes()

Removed commented-out earlier versions of save and load genotypes

removed more dead code

Clean up formatting - limit line length

Added saveHbaseSampleMetadata function

Added save and load SequenceDictionary functions to HBaseFunctions

Added createHbaseGenotypeTable

Adding loadGenotypeRddFromHBase

in progress updates to multi-sample hbase save

multi sample VCF save and load now working

Added repartitioning parameter tp hbase genotype load

Added comments identifying the public api vs helper functions

COB Aug 25

Added genomic start and stop parameters to loadGenotypesFromHBaseToGenotypeRDD

Added boolean saveSequenceDictionary toggle parameter to saveVariantContextRddToHBase

fixed start, stop null ptr exeception

first steps in adding hbaseToVariantcontextRDD

Changed region query to use ADAM ReferenceRegion

Added custom HBaseEncoder1 save function

Added custom Encoder1 Hbase loader

Added Encoder1 hbase variant context loader

Working - before rowkey int

Changed end in key to be size, added data block encoding in create table

Added create table splits

Removed dead code of encoder2

Added option to repartion vcrdd before saving to HBase

Added bulk load save to Hbase option

changed to cdh hbase api depedencies in POM

allow sample name file list as input to load functions

made sample_ids lis parameter  in load optional

Added deleteSamplesFromHBase function

Fixed bulk delete and made loadVariantContext work even when requested samplids are missing

Removed code from  failed version of sample delete function

Moved delete function up with test of genotype code

Fixed errors after rebase against master

small formatting cleanup

first pass hbase cli demo

second pass hbase cli demo

remove saveSeqDict check

add seq dict id to cli

import clean up

removed undeeded demo and temp.js code

Ran ./scripts/format-source due to build failure on Jenkins

changed Hbase dependencies to provided

addressed some review issues

Changed hbases dependencies back to NOT being provided

fixed None.get

first step key Strategy pattern

second step key Strategy pattern

more cleanup from review

made private functions package private for hbase

Factored out hbase connection

Factored out hbase connection - fix1

Factored out hbase connection - cleanup

Removed the non-bulk load hbase function

step one adding tests

first step implement DAO

step2 DAO

step3 DAO

Added first loadHBase Test

Added some javadoc

use dao everywhere

added more javadoc

step 1 in adding save hbase test

step 2 in adding save hbase test

clean up hbase doc

add more assertions to hbase load test

improved POM and imports in HBaseFunctions

fixed more PR suggestions
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1813/
Test PASSed.

@jpdna
Copy link
Member Author

jpdna commented Feb 22, 2017

I find that to access the class (using the code from this PR)
org.bdgenomics.adam.hbase.HBaseFunctions
from adam-shell I must do:
./bin/adam-shell --jars adam-hbase/target/adam-hbase_2.10-0.20.1-SNAPSHOT.jar
otherwise without "--jars" the class cannot be found.

I'm not sure how the POM needs to modified so that adam-shell can see the hbase module, suggestions @heuermh ?

@heuermh
Copy link
Member

heuermh commented Feb 22, 2017

I suppose we need a fix for #1349 and/or a conditional dependency on adam-hbase from adam-assembly.

@fnothaft
Copy link
Member

I would personally go with:

a conditional dependency on adam-hbase from adam-assembly.

Since the hbase work comes in through a profile, I think we can just make the change (added dependency) there.

@heuermh heuermh added this to the 0.24.0 milestone Nov 29, 2017
@fnothaft fnothaft removed this from the 0.24.0 milestone Jan 21, 2018
@fnothaft
Copy link
Member

Closing this in favor of #1878.

@fnothaft fnothaft closed this Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants