Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(step_functions): S3 Json Lines Item Reader #33601

Open
2 tasks
MFC-MiguelFerreira opened this issue Feb 27, 2025 · 1 comment
Open
2 tasks

(step_functions): S3 Json Lines Item Reader #33601

MFC-MiguelFerreira opened this issue Feb 27, 2025 · 1 comment
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2

Comments

@MFC-MiguelFerreira
Copy link

Describe the feature

The AWS Step Functions team recently introduced support for JSON Lines (JSONL) in Distributed Map, allowing efficient processing of large datasets stored in this format:
πŸ”— AWS Blog Post – JSONL Support in Step Functions Distributed Map

Currently, the AWS CDK provides S3JsonItemReader (docs), which supports reading JSON objects from an S3 file. However, this construct does not support JSONL files. Given that JSONL is now natively supported by Step Functions Distributed Map, it would be highly beneficial to have native support for JSONL in the CDK as well.

Use Case

Developers using AWS Step Functions with CDK would be able to seamlessly leverage JSONL for large-scale data processing, without resorting to custom implementations or workarounds.

Proposed Solution

Introduce a new construct (or extend the existing S3JsonItemReader) to support JSONL files, aligning with the latest Step Functions capabilities.

Example:

s3_jsonl_reader = stepfunctions.S3JsonItemReader(
    bucket=s3_bucket,
    key="data.jsonl",
    format=stepfunctions.JsonFormat.JSONL  # Example of possible new parameter
)

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.280.0

Environment details (OS name and version, etc.)

Windows 11, python

@MFC-MiguelFerreira MFC-MiguelFerreira added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Feb 27, 2025
@github-actions github-actions bot added the @aws-cdk/aws-stepfunctions Related to AWS StepFunctions label Feb 27, 2025
@pahud
Copy link
Contributor

pahud commented Feb 27, 2025

Hi @MFC-MiguelFerreira,

Thank you for submitting this feature request! We appreciate you bringing this new Step Functions capability to our attention.

This is a valid feature request that aligns well with the recent AWS service updates. The CDK should indeed provide a way to leverage JSONL support in Distributed Map for large-scale data processing.

Your proposed solution of extending the S3JsonItemReader with a format parameter is a good approach.

I'm making this a p2 feature request. If you or anyone from the community is interested in contributing this feature, we would welcome a pull request.

Thanks again for helping improve the AWS CDK!

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests

2 participants