For this project, I implemented a distributed publish/subscribe system similar to the original Kafka design.
You can find part 2 here: https://github.com/CS682-S22/distributed-system-part2-Jennytang1224
Implemented a Producer
API that may be used by an application running on any host that publishes messages to a broker. The Producer
allows the application to do the following:
- Connect to a
Broker
- Send data to the
Broker
by providing abyte[]
containing the data and aString
containing the topic.
Implemented a Consumer
API that may be used by an application running on any host that consumes messages from a broker. The Consumer
will allow the application to do the following:
- Connect to a
Broker
- Retrieve data from the
Broker
using a pull-based approach by specifying a topic of interest and a starting position in the message stream
The Broker
accepts an unlimited number of connection requests from producers and consumers. The basic Broker
implementation* will maintain a thread-safe, in-memory data structure that stores all messages. The basic Broker
will be stateless with respect to the Consumer
hosts.
In total three Producers, three Consumers, and one Broker.
Built multiple instances of the Broker
running on separate hosts. Each topic may have multiple partitions, and each partition may be handled by different Broker
. When a new message is posted it specifies both the topic and a key. Like in the real Kafka implementation, the key will be hashed to determine which Broker
is managing the partition for that <key, topic>. I also designed the mechanism for directing a request to the appropriate Broker
by implementing a custom load balancer that is essentially just another service that accepts a request containing a key and returns the host information of the Broker
that manages that partition.