GitHub - amanpreet9391/Kafka-with-Twitter: An application which will fetch realtime data from twitter

Kafka-with-twitter
In this project, Kafka is used with Twitter API.
Main idea behind this workflow is first to fetch required data(tweets) with the help of Twitter API and then storing that data in ElasticSearch. Kafka is used as a messaging system. It collect data from Twitter and store in ElasticSearch. Lets try to make things a bit more clear with the help of a flow-diagram.

Steps -

Create Twitter Client
We don't want all the tweets, just the selected ones. In this case we have taken the condition that, only take those tweets with keywords "bitcoin" and "kafka". We can take as many terms as we want. Declare the host you want to connect to, the endpoint, and authentication. To connect to Twitter API first condition is to create a Twitter Developer Account, and create an application. You'll get credentials which are required for the authentication. Connect to the client client.connect() and we are done.
Create Kafka Producer
After the client let's create the producer. Client will fetch the desired tweets from Twitter and store in an ArrayList(msgQueue). For producer, first of all define the Kafka-Topic. We need to manually create this topic through kafka-topics --zookeeper localhost:2181 --create --topic twitter-tweets --partitions 6 --replication-factor 1. My topic name is twitter-tweets. The message(messageQueue.poll()) will be taken from the messageQueue to be produced to Kafka. To learn more refer the link.

Here is the output format from producer. It will show the desired tweets.
Create Elasticsearch Client
For free ElasticSearch cluster refer to app.bonsai.io. Signup and create free cluster with 3 nodes. This will give you your own elasticsearch cluster. You need to provide credentials here as well, which are available in Accesssection of bonsai. This client will allow us to insert data in elasticsearch.
Create Kafka Consumer
For consumer, first of all create properties, then create consumer and then subscribe the consumer with the Kafka topic i.e. twitter-tweets.For now we are displaying the number of recieved tweets and then forwarding them to ElasticSearch. An ID is generated through which you can access the exact tweet in Elasticsearch by GET /twitter-tweets/tweets/ID

We got an ID `hbuuKXIBvbRgTsLvXfCH`, then after running `GET twitter-tweets/tweets/hbuuKXIBvbRgTsLvXfCH`, we got the complete information of that tweet in JSON format.

Note - Before trying to run producer and consumer, there are two things one has to make sure first.
(1) Zookeeper should be running. To run zookeeper - zookeeper-server-start config/zookeeper.properties. (2) Kafka server should be running. To run a kafka server - kafka-server-start config/kafka.properties.
Upcoming changes
In future I am also planning to perform monitoring with the help of tools like grafana or cprometheus. So stay tuned! To learn more about Kafka refer to official documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
Elasticsearch-Consumer		Elasticsearch-Consumer
Twitter-producer		Twitter-producer
target/classes/Twitter		target/classes/Twitter
Kafka-with-twitter.iml		Kafka-with-twitter.iml
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

amanpreet9391/Kafka-with-Twitter

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages