Welcome to the "Big Data Architectures for Investment Banking" exercise.
In this exercise you will build a complete end-to-end Big Data Streaming Architecture, processing & analyzing cryptocurrencies data in real-time. The below diagram provides a high level view of the architecture:
In this other diagram we can see the architecture with some more detail:
By the end of the classes we should have the whole architecture up & running, then the trainee will have to find a new data source (ideally in real-time) and build its own flow.
Follow the links below to access each exercise's details, code, etc.
- Exercise 0: Setup
- Exercise 1: Data Ingestion
- Exercise 2: Data Processing
- Exercise 3: Data Storage
- Exercise 4: Data Visualization
- Exercise 5: Machine Learning
- Post work: Build you own flow
Component | Service | Description | URL/port |
---|---|---|---|
Apache NiFi | nifi-upv | Data Ingestion Tool | https://localhost:8443/nifi |
Confluent Kafka | broker | Message Broker | localhost:9092 |
Kafka UI (Redpanda Console) | kafka-ui | Kafka Manager & Monitor | http://localhost:8080 |
Spark UI | N/A | Spark UI (jobs, environment, etc.) | http://localhost:4040 |
Elasticsearch | elasticsearch | Data storage and search engine | http://localhost:9200 |
Kibana | kibana | Dashboarding and Elasticsearch dev/admin tool | http://localhost:5601 |
Jupyter | jupyter | Notebooks analytics | http://localhost:8888 |