This repository has been archived by the owner on Mar 30, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 92
Installing and Setup Druid
hbutani edited this page Jan 14, 2016
·
8 revisions
As of this writing we used druid version 0.8.2.
Druid needs a Relational DB as a Metadata Storage. Here we describe how to setup Mysql. For other databases see
Assuming you have root access to your mysql db, the setup steps are:
- Start a cli session:
mysql -u root -p
- Inside the session issue the following commands.
CREATE DATABASE druid DEFAULT CHARACTER SET utf8;
CREATE USER 'druid'@'%' IDENTIFIED BY 'diurd';
GRANT ALL PRIVILEGES ON *.* TO 'druid'@'%' WITH GRANT OPTION;
CREATE USER 'druid'@'localhost' IDENTIFIED BY 'diurd';
GRANT ALL PRIVILEGES ON *.* TO 'druid'@'localhost' WITH GRANT OPTION;
flush privileges;
The settings we use for a dev. environment are listed below. For information on settings see the druid production settings and configuration pages
In addition we setup helper start and stop scripts. The sequence we use to start the Druid services is:
cd <druid_home>
../zookeeper-3.4.6/bin/zkServer.sh stop
../zookeeper-3.4.6/bin/zkCleanup.sh
../zookeeper-3.4.6/bin/zkServer.sh start
./start-all.sh
We have a top level folder named druid, underneath we have zookeeper installed and different versions of druid(for example druid-0.8.2)
The script to stop all the services is ./stop-all.sh
# Extensions (no deep storage model is listed - using local fs for deep storage - not recommended for production)
druid.extensions.coordinates=["io.druid.extensions:druid-examples","io.druid.extensions:druid-kafka-eight","io.druid.extensions:mysql-metadata-storage"]
# Zookeeper
druid.zk.service.host=localhost
# Metadata Storage (mysql)
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=diurd
# Deep storage (local filesystem for examples - don't use this in production)
druid.storage.type=local
druid.storage.storageDirectory=/Users/hbutani/druid/localStorage
# Query Cache (we use a simple 10mb heap-based local cache on the broker)
druid.cache.type=local
druid.cache.sizeInBytes=10000000
# Indexing service discovery
druid.selectors.indexing.serviceName=overlord
# Monitoring (disabled for examples, if you enable SysMonitor, make sure to include sigar jar in your cp)
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"]
# Metrics logging (disabled for examples - change this to logging or http in production)
druid.emitter=noop
druid.service=broker
# We enable using the local query cache here
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
# For prod: set numThreads = # cores - 1, and sizeBytes to 512mb
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1
# Default host: localhost. Default port: 8081. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8081
druid.service=coordinator
# The coordinator begins assignment operations after the start delay.
# We override the default here to start things up faster for examples.
# In production you should use PT5M or PT10M
druid.coordinator.startDelay=PT70s
# Default host: localhost. Default port: 8083. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8083
druid.service=historical
# Our intermediate buffer is also very small so longer topNs will be slow.
# In prod: set sizeBytes = 512mb
druid.processing.buffer.sizeBytes=100000000
# We can only 1 scan segment in parallel with these configs.
# In prod: set numThreads = # cores - 1
druid.processing.numThreads=1
# maxSize should reflect the performance you want.
# Druid memory maps segments.
# memory_for_segments = total_memory - heap_size - (processing.buffer.sizeBytes * (processing.numThreads+1)) - JVM overhead (~1G)
# The greater the memory/disk ratio, the better performance you should see
druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 10000000000}]
druid.server.maxSize=10000000000
# Default host: localhost. Default port: 8090. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8090
druid.service=overlord
# Run the overlord in local mode with a single peon to execute tasks
# This is not recommended for production.
druid.indexer.queue.startDelay=PT0M
# This setting is too small for real production workloads
druid.indexer.runner.javaOpts="-server -Xmx4g"
# These settings are also too small for real production workloads
# Please see our recommended production settings in the docs (http://druid.io/docs/latest/Production-Cluster-Configuration.html)
druid.indexer.fork.property.druid.processing.numThreads=4
druid.indexer.fork.property.druid.computation.buffer.size=500000000
# Default host: localhost. Default port: 8084. If you run each node type on its own node in production, you should override these values to be IP:8080
#druid.host=localhost
#druid.port=8084
druid.service=realtime
# We can only 1 scan segment in parallel with these configs.
# Our intermediate buffer is also very small so longer topNs will be slow.
# In production sizeBytes should be 512mb, and numThreads should be # cores - 1
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1
# Enable Real monitoring
# druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.segment.realtime.RealtimeMetricsMonitor"]
- Overview
- Quick Start
-
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
- Dev. Guide
- Reference Architectures
- Releases
- Cluster Spinup Tool
- TPCH Benchmark