Skip to content

aws-samples/amazon-athena-train-amazon-sagemaker

Amazon Athena Query Federation with Amazon SageMaker

Overview

This repo is created to demonstrate how to integrate Athena User Defined Function (UDF) with Amazon SageMaker

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own code. This enables you to integrate with new data sources, proprietary data formats, or build in new user defined functions. Initially these customizations will be limited to the parts of a query that occur during a TableScan operation but will eventually be expanded to include other parts of the query lifecycle using the same easy to understand interface.

This functionality is currently in Public Preview while customers provide us feedback on usability, ease of using the service or building new connectors. We do not recommend that you use these connectors in production or use this preview to make assumptions about the performance of Athena’s Federation features. As we receive more feedback, we will make improvements to the preview and increase limits associated with query/connector performance, APIs, SDKs, and user experience. The best way to understand the performance of Athena Data Source Connectors is to run a benchmark when they become generally available (GA) or review our performance guidance.

To enable this Preview feature you need to create an Athena workgroup named AmazonAthenaPreviewFunctionality and run any queries attempting to federate to this connector, use a UDF, or SageMaker inference from that workgroup.

tldr; Pre-reqs to Get Started:

  1. Ensure you have the proper permissions/policies to deploy/use Athena Federated Queries
  2. Ensure latest version of SAM CLI is intalled (Tested with 0.45.0)
  3. Ensure SageMaker Java SDK is installed on the machine
  4. Ensure Amazon SageMaker Java SDK is added in Maven Local repo. If not, add <packaging>jar</packaging> in SageMaker SDK pom.xml, then compile with Maven command mvn clean install.
  5. Navigate to Servless Application Repository and search for "athena-federation". Be sure to check the box to show entries that require custom IAM roles. Look for entries published by the "Amazon Athena Federation" author named "AthenaUserDefinedFunctions"
  6. Deploy the application
  7. Go to the Athena Console in us-east-1 (N. Virginia) and create a workgroup called "AmazonAthenaPreviewFunctionality", any queries run from that workgroup will be able to use Preview features described in this repository. Create Workgroup
  8. Run a query "show databases in `lambda:<func_name>`" where <func_name> is the name of the Lambda function you deployed in the previous steps.
  9. Go to "athena-udfs" folder on this repo and follow instruction there to create custom Athena UDF with Amazon SageMaker

For more information please consult:

  1. Intro Video
  2. SDK ReadMe
  3. Quick Start Guide
  4. Available Connectors
  5. Federation Features
  6. How To Build A Connector or UDF
  7. Gathering diagnostic info for support
  8. Frequently Asked Questions
  9. Common Problems
  10. Installation Pre-requisites
  11. Known Limitations & Open Issues
  12. Predicate Pushdown How-To
  13. Our Github Wiki.
  14. Java Doc

License

This project is licensed under the Apache-2.0 License.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •