Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Example of PyFlink application reading from Kinesis Data Stream and writing to Kinesis Data Firehose

  • Flink version: 1.15.2
  • Flink API: Table API/SQL
  • Language: Python

Simple application reading records from Kinesis Data Stream and publishing to Kinesis Data Firehose.

🚨 This example also shows how to package a PyFlink application with multiple jar dependencies (e.g. multiple connectors).

Packaging Instructions for multiple connectors in Amazon Managed Service for Apache Flink

If you need to use multiple connectors in your streaming application, you will need to create a fat jar, bundle it with your application and reference it in your application configuration as described here. This sample application shows how to bundle multiple connectors into a fat jar.

Pre-requisites:

  1. Apache Maven
  2. A Kinesis Data Stream and Kinesis Data Firehose Stream
  3. (If running locally) Apache Flink installed and appropriate AWS credentials to access Kinesis and Firehose streams

To get this sample application working locally:

  1. Run mvn clean package in the FirehoseSink folder
  2. Ensure the resulting jar is referenced correctly in the python script
  3. Ensure the application_properties.json parameters are configured correctly
  4. Set the environment variable IS_LOCAL=True
  5. Run the python script python streaming-firehose-sink.py

To get this sample application working in Amazon Managed Service for Apache Flink:

  1. Run mvn clean package in the FirehoseSink folder
  2. Zip the python script and fat jar generated in the previous step
  3. Upload the zip to an in region S3 bucket
  4. Create the Amazon Managed Service for Apache Flink application
  5. Configure the application to use the zip uploaded to the S3 bucket and configure the application IAM role to be able to access both Kinesis and Firehose streams
  6. Run the application

A sample script to produce appropriate Kinesis records, as well as detailed configuration instructions for Amazon Managed Service for Apache Flink can be found here.

Runtime configuration

Sample data

Use the Python script to generate sample stock data to Kinesis Data Stream.