This documentation is intended to be a quick-start guide, not a comprehensive
list of all available methods and configuration options. Please look through
the source for more information; a great place to get started is DruidDB::Client
and the DruidDB::Query
modules as they expose most of the methods on the client.
This guide assumes a significant knowledge of Druid, for more info: http://druid.io/docs/latest/design/index.html
druiddb-ruby provides a client for your Ruby application to push data to Druid leveraging the Kafka Indexing Service. The client also provides an interface for querying and performing management tasks. It will automatically find and connect to Kafka and the Druid nodes through ZooKeeper, which means you only need to provide the ZooKeeper host and it will find everything else.
$ gem install druiddb
client = DruidDB::Client.new()
Note: There are many configuration options, please take a look at
DruidDB::Configuration
for more details.
This gem leverages the Kafka Indexing Service for ingesting data. The gem pushes datapoints onto Kafka topics (typically named after the datasource). You can also use the gem to upload an ingestion spec, which is needed for Druid to consume the Kafka topic.
This repo contains a docker-compose.yml
build that may help bootstrap development with Druid and the Kafka Indexing Service. It's what we use for integration testing.
path = 'path/to/spec.json'
client.submit_supervisor_spec(path)
topic_name = 'foo'
datapoint = {
timestamp: Time.now.utc.iso8601,
foo: 'bar',
units: 1
}
client.write_point(topic_name, datapoint)
client.query(
queryType: 'timeseries',
dataSource: 'foo',
granularity: 'day',
intervals: Time.now.utc.advance(days: -30) + '/' + Time.now.utc.iso8601,
aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }]
)
The query
method POSTs the query to Druid; for information on
querying Druid: http://druid.io/docs/latest/querying/querying.html. This is
intentionally simple to allow all current features and hopefully all future
features of the Druid query language without updating the gem.
Currently, Druid will not fill empty intervals for which there are no points. To
accommodate this need until it is handled more efficiently in Druid, use the
experimental fill_value
feature in your query. This ensure you get a result
for every interval in intervals.
This has only been tested with 'timeseries' and single-dimension 'groupBy' queries with simple granularities.
client.query(
queryType: 'timeseries',
dataSource: 'foo',
granularity: 'day',
intervals: Time.now.utc.advance(days: -30) + '/' + Time.now.utc.iso8601,
aggregations: [{ type: 'longSum', name: 'baz', fieldName: 'baz' }],
fill_value: 0
)
List datasources.
client.list_datasources
List supervisor tasks.
client.supervisor_tasks
This project uses docker-compose to provide a development environment.
- git clone the project
- cd into project
docker-compose up
- this will download necessary images and run all dependencies in the foreground.
When changes are made to the project, rebuild the Docker image with:
$ docker build -t <some_tag> .
Where <some_tag>
is something like druiddb-ruby
.
To interact with the newly changed project, run it with:
$ docker run -it --network=druiddbruby_druiddb <some_tag> <some_command>
Where <some_command>
is a shell command that can be run on the docker image (i.e. bash
or anything in the bin
folder)
Viewing data in the database can be a bit annoying, use a tool like Metabase makes this much easier and is what I personally do when developing.
Testing is run utilizing the docker-compose environment.
docker-compose up
docker run -it --network=druiddbruby_druiddb <some_tag> bin/run_tests.sh
The gem is available as open source under the terms of the MIT License.