Athena Parameterized Queries Blog
Athena Parameterized Queries Blog
This is a repository for Use Amazon Athena parameterized queries to provide data as a service blog post. It contains a CloudFormation template for provisioning accompanying resources and assets for deploying a sample web application to demonstrate new ability to pass executions parameters to Amazon Athena StartQueryExecution API.
Prerequisites:
This sample application uses the following AWS services to demonstrate a Data-as-a-Service (DaaS) architecture pattern that uses Amazon Athena to query the Amazon.com customer reviews dataset.
- AWS Amplify
- Amazon API Gateway
- Amazon Athena
- AWS CloudFormation
- Amazon DynamoDB
- AWS Glue and AWS Glue Data Catalog
- AWS Identity and Access Management (IAM)
- AWS Lambda
- Amazon S3
The instructions assume you have:
- An AWS Account. For instructions, refer to Creating an AWS account.
- A CloudTrail trail created. For instructions, refer to Creating a trail.
- Amazon S3 Bucket created. For instructions, refer to Creating a bucket.
- Node.js version 16+ installed on your device.
Deploy the CloudFormation Stack
In this section, you deploy a CloudFormation template (athena-parameterized-queries.yaml) that creates the following resources:
- AWS Glue Data Catalog database
- AWS Glue Data Catalog table
- Amazon Athena workgroup
- Three Athena prepared statements
- Three Athena named queries
- The API Gateway HTTP API
- The Lambda execution role for Athena queries
- The Lambda execution role for API Gateway HTTP API authorization
- Five Lambda functions:
- Update the AWS Glue Data Catalog
- Authorize API Gateway requests
- Submit Athena queries
- List Athena prepared statements
- List Athena named queries
To deploy the CloudFormation stack template, follow these steps:
- Navigate to this GitHub repository.
- Clone the repository or copy the CloudFormation template
athena-parameterized-queries.yaml
. - On the AWS CloudFormation console, choose Create stack.
- Select Upload a template file and choose Choose file.
- Upload
athena-parameterized-queries.yaml
, then choose Next. - On the Specify stack details page, enter the stack name
athena-parameterized-queries
. - On the same page, there are two parameters:
- For S3QueryResultsBucketName, enter the S3 bucket name in your AWS Account and in the same AWS Region as where you’re running your CloudFormation stack. (For this post, we use the bucket name value, like
my-bucket
). - For APIPassphrase, enter a passphrase to authenticate API requests. You use this later.
- For S3QueryResultsBucketName, enter the S3 bucket name in your AWS Account and in the same AWS Region as where you’re running your CloudFormation stack. (For this post, we use the bucket name value, like
- Choose Next.
- On the Configure stack options page, choose Next.
- On the Review page, choose I acknowledge that AWS CloudFormation might create IAM resources with custom names, and choose Create stack.
The script takes less than two minutes to run and change to a CREATE_COMPLETE
state. If you deploy the stack twice in the same AWS account and Region, some resources may already exist, and the process fails with a message indicating the resource already exists in another template.
- On the Outputs tab, copy the
APIEndpoint
value to use later.
Refer to Use Amazon Athena parameterized queries to provide data as a service blog for guidance on least-privilege authorization for deployment of the CloudFormation template.
Deploy the Amplify application
In this section, you deploy your Amplify application.
- In the cloned repository, open
web-application/.env
in a text editor. - Set
AWS_API_ENDPOINT
as theAPIEndpoint
value from the CloudFormation stack Outputs. For example:AWS_API_ENDPOINT="https://123456abcd.execute-api.your-region.amazonaws.com"
. - Set
API_AUTH_CODE
as the value you input as the CloudFormation stack’sAPIPassphrase
parameter argument. For example:API_AUTH_CODE="YOUR_PASSPHRASE"
. - Navigate to the
web-application/
directory and runnpm install
. - Run
npm run build
to compile distribution assets. - On the Amplify console, choose All apps.
- Choose New app.
- Select Host web app, select Deploy without Git provider, then choose Continue.
- For App name, enter
Athena Parameterized Queries App
. - For Environment name, you don’t need to enter a value.
- Select Drag and Drop.
- Locate
dist/
directory insideweb-application/
, drag it into the window and drop it. Ensure you drag the entire directory, not the files within it.
- Choose Save and deploy to deploy the web application on Amplify.
This step will take less than a minute to complete.
- Under App settings, choose Access control, then choose Manage access
- Select Apply a global password, then enter values for Username and Password.
You will use these credentials to access your Amplify application.
Troubleshooting:
- If the Amplify app page shows as blank, refer back to Step 10 and make sure to drag and drop the entire dist/ directory instead of individual files inside of it
- If you see Network Error, verify that web-application/.env file has the correct API Endpoint URL (refer back to Step 12 in Instruction to deploy the API)
- If you see error status code 403, verify that web-application/.env file has the correct Passphrase (refer back to Step 1)
Access your Amplify application and run queries
In this section, you use the Amplify application to run Athena parameterized queries against the Amazon.com customer reviews dataset. The left side of the application shows how you can run parameterized queries using Athena prepared statements. The right side of the application shows how you can run parameterized queries without prepared statements, such as if the queries are written in your code; this sample uses named queries within the Athena workgroup. For more information about named queries, refer to NamedQuery in the Amazon Athena API Reference.
- Open the Amplify web application link located under Domain. For example:
https://dev123.abcd12345xyz.amplifyapp.com/
. - In the Sign in prompt, enter the user name and password your provided as the Amplify application global password.
- For Workgroup Name, choose
ParameterizedStatementsWG
workgroup. - Choose a statement example on the Prepared Statement or SQL Statement drop-down menu.
Selecting a statement displays a description about the query, including examples of parameters you can try with this statement, and the original SQL query string. SQL parameters of type string must be surrounded by single quotes, for example: 'your_string_value'
.
- Enter your query parameters.
The following figure depicts an example of the parameters to input for the product_helpful_reviews
prepared statement.
- Choose Run Query to send the query request to the API endpoint.
After the query runs, the results will be returned as a table. The complete query workflow is depicted in the previous architecture diagram.
Using execution parameters with the AWS SDK for Python (Boto3)
In this section, you inspect the Lambda function code for using the StartQueryExecution
API with and without prepared statements.
- On the Lambda console, choose Functions.
- Navigate to
LambdaAthenaFunction-athena-parameterized-queries
Lambda function. - Choose the Code Source window.
Examples of passing parameters to the Athena StartQueryExecution
API using AWS SDK for Python (Boto3) begin on lines 39 and 49. Note the new ExecutionParameters
option on lines 45 and 55.
The following code uses execution parameters with Athena prepared statements:
response = athena.start_query_execution(
QueryString=f'EXECUTE {statement}', # Example: "EXECUTE prepared_statement_name"
WorkGroup=workgroup,
QueryExecutionContext={
'Database': 'athena_prepared_statements'
},
ExecutionParameters=input_parameters
)
The following code uses execution parameters without Athena prepared statements:
response = athena.start_query_execution(
QueryString=statement, # Example: "SELECT \* FROM TABLE WHERE parameter_name = ?"
WorkGroup=workgroup,
QueryExecutionContext={
'Database': 'athena_prepared_statements'
},
ExecutionParameters=input_parameters
)
Clean up
In this post, you created several components, which generate cost. To avoid incurring future charges, remove the resources with the following steps:
- Delete the S3 bucket’s results prefix created after you ran a query on your workgroup.
With the default template, the prefix is named <S3QueryResultsBucketName>/athena-results
. Use caution in this step. Unless you are using versioning on your S3 bucket, deleting S3 objects cannot be undone.
- On the Amplify console, select the app to delete and on the Actions menu, choose Delete app, then confirm.
- On the AWS CloudFormation console, select the stack to delete, choose Delete, and confirm.
Security
See CONTRIBUTING for more information.
License
This project is licensed under the MIT-0 License. See the LICENSE file.