Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus support: optimize the file size lookup when completed Globus uploads are finalized #10977

Closed
landreev opened this issue Oct 28, 2024 · 1 comment · Fixed by #11040
Closed
Assignees
Labels
Feature: Globus FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Milestone

Comments

@landreev
Copy link
Contributor

landreev commented Oct 28, 2024

In the current implementation of the addFiles workflow, a separate API call is made for each individual file when a completed Globus upload is finalized on the Dataverse end. It's being done like that simply because it was inherited from how direct uploads to S3 are handled. There is no real need to keep doing this - we could instead call the Globus /operation/endpoint/.../ls on the entire folder once, when we get the confirmation that the transfer task has been completed, and populate the sizes for all the files in the upload batch.

In the ongoing prod. use case, the depositors are trying transfer entire 8K-file datasets in one batch. Miraculously, this has worked for at least one dataset, but these size lookups proved to be the bottleneck and took an obscene amount of time.

@cmbz
Copy link

cmbz commented Oct 29, 2024

Moved to Sprint Ready due to urgency, @landreev will size it appropriately.

@landreev landreev added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Oct 30, 2024
@cmbz cmbz moved this from SPRINT READY to In Progress 💻 in IQSS Dataverse Project Nov 7, 2024
@cmbz cmbz added the FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) label Nov 7, 2024
landreev added a commit that referenced this issue Nov 15, 2024
landreev added a commit that referenced this issue Nov 18, 2024
landreev added a commit that referenced this issue Nov 20, 2024
landreev added a commit that referenced this issue Nov 20, 2024
landreev added a commit that referenced this issue Nov 20, 2024
landreev added a commit that referenced this issue Nov 25, 2024
…the /addFiles API (i.e., we don't want to trust the users of the direct s3 upload api when it comes to file sizes). #10977
landreev added a commit that referenced this issue Nov 25, 2024
@pdurbin pdurbin added this to the 6.5 milestone Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Globus FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants