Skip to content

nil1729/trino-hive-demo

Repository files navigation

Hive Metastore Integration with Trino

This github repository describes a simple way to integrate Hive Metastore with Trino.

Considerations

  • This integration uses Google Storage as the storage backend for Hive Metastore.
  • Used latest stable version of Trino which is 433.
  • Used hive metastore version 2.3.1 and corresponding supported version of Hadoop 2.10.1
  • Used latest stable version of Superset which is 2.1.3. (3.x.y version showing incorrect version number on UI)

Trino Configuration

  • Used mysql event listener to capture the query events from Trino. Read Docs
  • Fault Tolerant execution of queries: using TASK retry policy. Read Docs
  • Exchange Manager configuration for Fault Tolerant execution. Read Docs
  • JMX setup for monitoring Trino. Read Docs
  • Google Cloud Storage as the storage backend for Hive Metastore. Read Docs

Superset Configuration

  • Used postgres as the backend database for Superset. Read Docs
  • Customize superset image for adding trino driver. Read Docs
  • Adding Trino connection to Superset. Read Docs
  • Superset Initialization in Docker Container. Dockerhub Guide

Hive Metastore Configuration

  • Used mysql as the metadata database for Hive Metastore. Youtube Guide
  • Used Google File System as the Hadoop File System for Hive Metastore. Github Guide
  • Used custom Hive Image for running hiveserver2 and hive metastore together in a single container. Read more about hiverserver2 and hive thrift server here

Architecture (Docker Compose)

Docker Compose


Running the project locally

Prerequisites

  • Make sure you have a service account with access to a GCS bucket.
<!-- hive/core-site.xml -->

<property>
        <name>fs.gs.project.id</name>
        <value>{{YOUR GCP PROJECT}}</value>
</property>
  • Get your HMAC keys from GCP console and add them to exchange-manager.properties file.
# trino/common/exchange-manager.properties

exchange.base-directories={{YOUR BUCKET NAME}}
exchange.s3.region={{YOUR BUCKET REGION}}
exchange.s3.aws-access-key={{HMAC ACCESS KEY}}
exchange.s3.aws-secret-key={{HMAC SECRET KEY}}
  • Put your service account JSON file as gcp-sa.json in secrets directory.
  • Working directory should be as follows:
.
├── docker-compose.yml
├── hive
│   ├── Dockerfile
│   ├── core-site.xml
│   ├── docker-entrypoint.sh
│   └── hive-site.xml
├── issues
├── secrets
│   └── gcp-sa.json
├── superset
│   ├── Dockerfile
│   ├── docker-init.sh
│   └── superset_config.py
└── trino
    ├── Dockerfile
    ├── catalog
    │   ├── hive.properties
    │   └── jmx.properties
    ├── common
    │   └── exchange-manager.properties
    ├── coordinator
    │   ├── config.properties
    │   ├── jmx-config.yaml
    │   ├── jvm.config
    │   └── mysql-event-listener.properties
    └── worker
        └── config.properties

Running the services

docker compose up -d

Docker Compose Services

Executing query with Trino CLI

trino --server http://localhost:8080 --catalog hive

Trino CLI Demo

Access Superset

Login to Superset using admin as username and admin as password.

Superset Login

Create new Database Connection in Superset (Postgres, Trino, MySQL)

Superset Trino Connection

Run SQL query in Superset

Superset SQL Lab


Superset Troubleshooting


Acknowledgements


Made with ❤️ by nil1729

About

getting started with trino and hive on docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages