Skip to content

Terraform template which shows how to connect E2E from Azure Databricks to an HDInsight HBase cluster

License

Notifications You must be signed in to change notification settings

syedhassaanahmed/tf-adb-hdi-hbase

Repository files navigation

tf-adb-hdi-hbase

Terraform

This Terraform template shows an E2E demonstration of how to connect from Azure Databricks to an HDInsight HBase cluster using the hbase-spark connector. In doing so it takes care of the following caveats;

  • The Hortwonworks shc connector is broken on Databricks, see this issue.
  • hbase-spark and shc have some subtle but important differences in package and data source names. Correct usage can be seen in this example published by Cloudera.
  • Databricks and HDInsight HBase must be provisioned in the same VNET.
  • Authentication to HBase is done via config hbase-site.xml. This file exists on HDInsight head node and is copied to the attached Blob Storage. This blob storage container is then also mounted to Databricks i.e. the config file becomes available to all Databricks cluster nodes at /dbfs/mnt/hdi/hbase-site.xml.
  • Databricks Cluster must be provisioned with runtime Scala 2.11 e.g. Runtime v6.6. Runtimes with Scala 2.12 won't work yet.
  • The following 3 libraries must be attached to the cluster. Note the extra two in addition to hbase-spark;
org.apache.hbase.connectors.spark:hbase-spark:1.0.0
org.apache.hbase:hbase-common:2.3.1
org.apache.hbase:hbase-server:2.3.1

Known Issue(s)

Requirements

Azure resources

  • Virtual Network
  • Blob Storage
  • Azure Databricks Workspace
  • HDInsight HBase cluster

Note: The HBase cluster is provisioned with cheapest possible VMs for Head, Region and Zookeeper nodes. It will cost you ~$550 / month in Western Europe.

Smoke Test

Once terraform apply has succeeded, navigate to the Databricks workspace and run the notebook /Shared/TestHBase.scala. This notebook connects to the HBase cluster and loads Contacts table into a DataFrame. This table was populated into HBase as part of the Terraform provisioning.

About

Terraform template which shows how to connect E2E from Azure Databricks to an HDInsight HBase cluster

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published