-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-14336: Creating processor to list box folder contents #9784
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the way this processor is likely going to be used, I think we should be prescriptive here and avoid generating one flow file per listed file in the configured folder. Instead I would generate a single FlowFile with JSON content with an array of records, each record containing the metadata information of the listed files.
Assuming a folder with 50k files, that will avoid generating 50k flowfiles in one execution of the processor. By generating one single flowfile, a user could then use a first SliptRecord processor configured with 1000 records split, then a second SplitRecord with 1 record split, and finally a ForkRecord with path(s) for the fields that should be moved into flowfile attributes. This way the backpressure would do its job.
Thoughts?
.../nifi-box-processors/src/main/java/org/apache/nifi/processors/box/FetchBoxFilesInFolder.java
Outdated
Show resolved
Hide resolved
.../nifi-box-processors/src/main/java/org/apache/nifi/processors/box/FetchBoxFilesInFolder.java
Outdated
Show resolved
Hide resolved
.../nifi-box-processors/src/main/java/org/apache/nifi/processors/box/FetchBoxFilesInFolder.java
Outdated
Show resolved
Hide resolved
.../nifi-box-processors/src/main/java/org/apache/nifi/processors/box/FetchBoxFilesInFolder.java
Outdated
Show resolved
Hide resolved
Thanks for the review, yes I think that would make more sense for folders with large amounts of files in them. I've adjusted the logic to add in a writer and write the contents to a record instead. I've also added a batch based writing system in case of large numbers of files. |
Summary
NIFI-14336
A processor responsible for listing folder items for a Box Folder.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation