This crawler scrapes all new submissions and comments posted on Reddit in real time. A topology is defined with three Catenae modules. The extracted texts can be retrieved on the Kafka topic new_texts
.
- docker
- docker-compose
In order to launch the crawler in standalone mode with its own Kafka broker, execute the launch.sh
script.