tf-adb-hdi-hbase

This Terraform template shows an E2E demonstration of how to connect from Azure Databricks to an HDInsight HBase cluster using the hbase-spark connector. In doing so it takes care of the following caveats;

The Hortwonworks shc connector is broken on Databricks, see this issue.
hbase-spark and shc have some subtle but important differences in package and data source names. Correct usage can be seen in this example published by Cloudera.
Databricks and HDInsight HBase must be provisioned in the same VNET.
Authentication to HBase is done via config hbase-site.xml. This file exists on HDInsight head node and is copied to the attached Blob Storage. This blob storage container is then also mounted to Databricks i.e. the config file becomes available to all Databricks cluster nodes at /dbfs/mnt/hdi/hbase-site.xml.
Databricks Cluster must be provisioned with runtime Scala 2.11 e.g. Runtime v6.6. Runtimes with Scala 2.12 won't work yet.
The following 3 libraries must be attached to the cluster. Note the extra two in addition to hbase-spark;

org.apache.hbase.connectors.spark:hbase-spark:1.0.0
org.apache.hbase:hbase-common:2.3.1
org.apache.hbase:hbase-server:2.3.1

Known Issue(s)

HBASE-22769: Runtime Error when a filter is applied

Requirements

Azure resources

Virtual Network
Blob Storage
Azure Databricks Workspace
HDInsight HBase cluster

Note: The HBase cluster is provisioned with cheapest possible VMs for Head, Region and Zookeeper nodes. It will cost you ~$550 / month in Western Europe.

Smoke Test

Once terraform apply has succeeded, navigate to the Databricks workspace and run the notebook /Shared/TestHBase.scala. This notebook connects to the HBase cluster and loads Contacts table into a DataFrame. This table was populated into HBase as part of the Terraform provisioning.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TestHBase.scala		TestHBase.scala
databricks.tf		databricks.tf
main.tf		main.tf
outputs.tf		outputs.tf
setup_hbase.txt		setup_hbase.txt
variables.tf		variables.tf
vnet.tf		vnet.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tf-adb-hdi-hbase

Known Issue(s)

Requirements

Azure resources

Smoke Test

About

Releases

Packages

Languages

License

syedhassaanahmed/tf-adb-hdi-hbase

Folders and files

Latest commit

History

Repository files navigation

tf-adb-hdi-hbase

Known Issue(s)

Requirements

Azure resources

Smoke Test

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages