Containerized Kafka producer endlessly producing data for testing and demoing of consumer apps. The project consists on:
- A python script using
kafka-python
to endlessly produce data - A modified Docker image already prepared with a topic, a producer and ready to produce data on launch
Docker images available:
- ulitol97/kafka-producer: Kafka+Zookeeper image with a topic endlessly producing data
- ulitol97/kafka-zookeeper: Up-to-date Kafka+Zookeeper bundled in a single image
- Tweak
producer.py
to suit your needs (it is programmed to produce data that serves my use case) or pull ulitol97/kafka-producer to use as is - Build a Docker image using the project's
Dockerfile
- Run your image in a new
container:
docker run --name kafka-producer -d -p 2181:2181 -p 9092:9092 --env TOPIC_NAME=my-topic-for-tests ulitol97/kafka-producer:dev
- Enjoy an endless Kafka stream in the configured host and port
The following environment variables can be used to modify app containers:
TOPIC_NAME
: the name of the topic that will be constantly streaming messages from the container (default istest-topic
)TIME_BETWEEN_MESSAGES
: the number of milliseconds for the producer to wait between messages (default is5000
)
Additionally, the following can be used to change Kafka's base behaviour (see them here):
ADVERTISED_HOST
: the external ip for the container (default islocalhost
)ZK_CHROOT
: the zookeeper chroot that's used by Kafka (without / prefix), e.g. "kafka"LOG_RETENTION_HOURS
: the minimum age of a log file in hours to be eligible for deletionLOG_RETENTION_MINUTES
: the minimum age of a log file in minutes to be eligible for deletion (supersedesLOG_RETENTION_HOURS
if defined) (default is15
)LOG_RETENTION_BYTES
: configure the size at which segments are pruned from the log (default is20971520
, for 20MB)NUM_PARTITIONS
: configure the default number of log partitions per topic ( default is1
)AUTO_CREATE_TOPICS
: whether a new topic should be created when a non-existent topic is written to (default istrue
)
LOG_RETENTION_MINUTES
andLOG_RETENTION_BYTES
are low by default to avoid wasting space since data is not relevant and only used for testing
As explained in Behind the scenes, the demo itself relies on a custom image running Kafka and Zookeeper altogether. You are free to build your own derived images from its Dockerfile , although for most changes overriding one of this build arguments should do the trick:
KAFKA_VERSION
: Kafka version to be downloaded and installedZOOKEEPER_VERSION
: Zookeeper version to be downloaded and installed. Download links are slightly different since 3.5, so downgrading below that won't work without adapting the DockerfileSCALA_VERSION
: Scala version, should remain 2.13 for a while for new Kafka versions
I was involved in a project requiring validation of persistent data streams, but I needed some data producers to begin testing and deployment!
Instead of hard-coding my way through, I thought of preparing a customizable Docker image that can serve anyone's needs.
The resulting project is a compendium of different techniques from different sources, especial thanks go to:
- Dario Radečić and his awesome blog posts for installing Kafka and creating a simple producer
- @hey-johnnypark for implementing a Docker
image containing both Kafka and Zookeeper altogether, removing the hassle of
setting up a docker-compose setup.
- This image was modified with the provided environment variables to
generate
my ulitol97/kafka-zookeeper
with:
- Apache Kafka
3.1.0
(the latest release as of March 2022) - Apache Zookeeper
3.7.0
(the latest release in the stable branch as of March 2022)
- Apache Kafka
- The original image itself can be found in hey-johnnypark/docker-kafka-zookeeper , the main idea coming from Spotify's deprecated spotify/docker-kafka
- This image was modified with the provided environment variables to
generate
my ulitol97/kafka-zookeeper
with: