Skip to content

Latest commit

 

History

History
139 lines (96 loc) · 7.95 KB

README.adoc

File metadata and controls

139 lines (96 loc) · 7.95 KB

Streaming Pac-Man

Streaming Pac-Man is the funniest application that you ever see while applying principles of streaming analytics using Apache Kafka. Built around the famous Pac-Man game, this application ingest and store events from the game into Kafka topics and allow you to process them in near real-time using ksqlDB. In order to keep you focused this application is based on fully managed services in Confluent Cloud for both the Kafka cluster as well as ksqlDB.

pacman game

To implement streaming analytics in the game we built a scoreboard using ksqlDB. The scoreboard will be a table containing aggregated metrics of the players such as their highest score, the highest level achieved, and the number of times that the player loses. As new events arrive the scoreboard gets instantly updated by the continuous queries that keep processing those events as they happen.

What you are going to need?

  • Managed Kafka - You need to have an active account with Confluent Cloud to be able to spin up environments with the services required for this application. At a very minimum, you will need a Kafka cluster where your topics will be created. Optionally, you may want to create a ksqlDB application to implement the scoreboard pipeline.

  • Terraform - The application is automatically created using Terraform. The default cloud provider supported is AWS, but you could port this logic also to GCP and Azure as well. Besides having Terraform installed locally, will need to provide your cloud provider credentials so Terraform can create and manage the resources for you. (tested with terraform v0.14.8)

  • Java and Maven - The UI layer of the application relies on two APIs that are implemented using Java, therefore you will need to have Java 11+ installed to build the source-code. The build itseld is implemented using Maven, and it is triggered automatically by Terraform.

  • Confluent Cloud CLI - During the demo setup, a bash script will set up the whole Confluent Cloud environment for you. To do that it will need to have the Confluent Cloud CLI installed locally. You can find instructions about how to install it here. v1.7.0 or later is required, logged in with the --save argument which saves your Confluent Cloud user login credentials or refresh token (in the case of SSO) to the local netrc file.

1) Configure the demo

The whole demo creation is scripted. The script will perform 2 main actions: 1. As mentioned before the application uses a Kafka cluster running in a fully managed service for Apache Kafka. Therefore the first thing it will provision is Confluent Cloud resources using the Confluent Cloud CLI. If you are interested in how you can create a cluster in Confluent Cloud via the Web UI have a look at our Quick Start for Apache Kafka using Confluent Cloud. 2. Terraform will provision all the resources need in AWS

Configure the Confluent CLI

  1. Log in to the Confluent Cloud CLI:

    confluent login --save

The --save flag will save your Confluent Cloud login credentials to the ~/.netrc file.

Configure the demo

The demo script will do everything for you, but for it to work you need to add some configurations.

  1. Create the demo.cfg file using the example provided in the config folder

    mv config/demo.cfg.example config/demo.cfg
  2. Provide the required information on the 'demo.cfg' file

    export AWS_ACCESS_KEY="<AWS_ACCESS_KEY>"
    export AWS_SECRET_KEY="<AWS_SECRET_KEY>"
  3. Notice the optional configuration in the same file. Uncomment and define as you see fit

    # These are optional configs
    # export S3_BUCKET_NAME="ksqldbpacman"
    # export SCHEMA_REGISTRY_GEO="eu"
    # export CLUSTER_REGION="eu-west-2"

2) Deploying the application

The application is essentially a set of HTML/CSS/JS files that forms a microsite that can be hosted statically anywhere. But for the sake of coolness we will deploy this microsite in a S3 bucket from AWS. This bucket will be created in the same region selected for the Confluent Cloud cluster to ensure that the application will be co-located. The application will emit events that will be processed by a event handler implemented as an API Gateway which uses a Lambda function as backend. This event handler API receives the events and writes them into Kafka using ksqlDB.

pac man arch

Please note that during deployment, the script takes care of creating the required Kafka topics and also the ksqlDB queries. Therefore, there is no need to manually create them.

  1. Start the demo creation

    ./start.sh
  2. At the end of the provisioning the Output with the demo endpoint will be shown. Paste the demo url in your browser and start playing!

    Outputs:
    
    Pacman = http://streaming-pacman00000.s3-website-region.amazonaws.com

Note: When you are done with the application, you can automatically destroy all the resources created using the command below:

./stop.sh

3) The scoreboard

The scoreboard can be visualized in real time by clicking on the SCOREBOARD link in the pacman game (top right corner). Who will be the best player?

scoreboard

4) Looking under the hood

When users play with the Pac-Man game two types of events will be generated. One is called User Game and contains the data about the user’s current game such as their score, current level, and the number of lives. The other is called User Losses and as the name implies contains data about whether the user lose in the game. To build a scoreboard out of this a streaming analytics pipeline will be created to transform these raw events into a table with the scoreboard that is updated in near real-time.

pipeline

To implement the pipeline we use ksqlDB. The code for this pipeline has been written for you and it was automatically deployed into a fully managed ksqlDB Server.

the Scoreboard logic

ksqlDB supports Pull queries, where you can get the latest value for a given key. The pacman app uses this feature in order to show you the scoreboard, with a simple trick:

  1. A first request is sent to get the SET of all user_id of the players. This collection of strings is calculated in real-time by ksqlDB continously, using a COLLECT_SET aggregated function, as you can see in the statements.sql). By using a constant as the key for aggregation we are effectively creating an aggregation for all the events in the stream. We can then use this constant string as key in our pull query

    SELECT HIGHEST_SCORE_VALUE, USERS_SET_VALUE FROM SUMMARY_STATS WHERE SUMMARY_KEY='SUMMARY_KEY';
  2. A query to the scoreboard is sent using the list retrieved with the first api call in the IN where clause:

    select USER, HIGHEST_SCORE, HIGHEST_LEVEL, TOTAL_LOSSES from STATS_PER_USER WHERE USER IN (${userListCsv});

Troubleshooting

If you face issues in the pacman app, try open the developer tools of your browser and watch what errors are in the console. If you see a CORS related issue, check your user in AWS IAM, we have seen issues were missing permission would result is this issues. The solution is to add your user to the relevat Groups.

License

This project is licensed under the Apache 2.0 License.

Previous Pacman Demo

Are you looking for the previous version of this demo? You can find it here: https://github.com/confluentinc/demo-scene/releases/tag/pacman-v1.0