This talk explores how to efficiently transfer data from a Pandas DataFrame into Elasticsearch, maintain an up-to-date index using Google Cloud Platform's Cloud Functions and Cloud Scheduler, and leverage the power of parallel bulk operations. The dataset that will be used for this talk will be Near Earth Object Web Service (NeoWs), a RESTful web service that provides near-earth Asteroid information.
- This example uses Elasticsearch version 8.15; if you are new, check out our Quick Start on Elasticsearch. Any 8.x version should work.
- Download the latest version of Python if you don't have it installed on your machine. This example utilizes Python 3.12.1.
- An API key for NASA's APIs.
- You will use the Requests package to connect to a NASA API, Pandas to manipulate data, the Elasticsearch Python Client to load data into an index and keep it up to date, and Jupyter Notebooks to work with your data interactively while testing. You can run the following line to install these required packages:
pip install requests pandas elasticsearch notebook
This Google Cloud Function retrieves data on near-earth objects from NASA's API and updates an Elasticsearch index with the new data. It's triggered by messages on a Google Cloud Pub/Sub topic, making it well-suited for regular updates based on streaming data or periodic events.
- Python 3.12
functions-framework
- For Google Cloud Function emulation and deployment.requests
- For making HTTP requests.pandas
- For data manipulation.elasticsearch
- For interacting with Elasticsearch.
Set the following environment variables:
ELASTIC_CLOUD_ID
: Your Elasticsearch Cloud ID.ELASTIC_API_KEY
: API Key for authenticating with Elasticsearch.NASA_API_KEY
: API Key for accessing NASA's NEO API.
To deploy this function to Google Cloud Functions:
- Create a Google Cloud Pub/Sub topic
- Deploy the function
- Schedule using Google Cloud Schedular
0 8 * * * (America/New_York)
- How to analyze data using Python, Elasticsearch and Kibana
- Keeping your Elasticsearch index current with Python and Google Cloud Platform Functions
- Asteroids - NeoWs API
- What is an Elasticsearch index?
- Intro to Elasticsearch
- Beginner's Crash Course to Elastic Stack
- Elasticsearch Python Client Helpers
- GCP Cloud Scheduler
- GCP Cloud Functions