-
Notifications
You must be signed in to change notification settings - Fork 82
Start realtime twitter stream ingestion into local AsterixDB
Note: the following guide will not work if you use sh
to run the scripts, so, please use the commands provided in the guide.
please refer to https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html to get your own twitter developer access keys and tokens.
cd apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/
./start-sample-cluster.sh
Fill your Twitter API access tokens to the corresponding place in examples/twittermap/script/streamFeed.sh
-ck
Your Consumer Key
-cs
Your Consumer Secret
-tk
Your Access token
-ts
Your Acces Secret Token
Copy the following code to the Query box and Run
it.
use twitter;
create feed TweetFeed with {
"adapter-name" : "socket_adapter",
"sockets" : "asterix_nc1:10001",
"address-type" : "nc",
"type-name" : "typeTweet",
"format" : "adm",
"upsert-feed" : "false"
};
connect feed TweetFeed to dataset ds_tweet;
start feed TweetFeed;
cd examples/twittermap
./script/streamFeed.sh
Now the realtime streaming tweets are being ingested to your local AsterixDB.
You can check the # of tweets ingested on this page: http://localhost:19002/admin/active
This line -loc -173.847656,17.644022,-65.390625,70.377854
indicates only ingesting tweets within this geographic bounding box which is roughly the U.S.
You can also change to other locations you're interested in. To get the coordinates of a certain area, just open Google Maps and click one point to see the latitude and longitude.
This -loc
parameter uses this pattern [Southwestern Corner], [Northeastern Corner]
.
You can indicate a list of keywords after the parameter -tr
to filter only tweets containing the keywords you are interested in, e.g. -tr hurricane, storm, tornado
.
If you want to get the raw tweets to gzipped JSON files only, you can add -fo
parameter to the end of streamFile.sh
.