-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to upload files immediately to S3 #1399
Conversation
Change has two parts: Adds an optional uploadImmediately field to the uploader metadata, that when present and set will cause the uploader to upload it immediately on change, and code within the on-detection function `handleNewOrModifiedS3Metadata` to perform the upload if that field is detected. This change in particular involved refactoring the vast majority of the code in `checkUploads` into a seperate function. I wanted to avoid duplicating the logic in `checkUploads` in `handleNewOrModifiedS3Metadata`. If I directly called `checkUploads`, however, that would upload all the files that were queued to upload, rather than just the files marked to upload immediately. So I made a new function that would just take a map of files to upload and upload only those files. So when it needed to upload the all queued files, as in `checkUploads`, it could directly work on the `metadataToUploader` field, and when it only needs to upload a single block of files, it can only be passed that single uploader/metadata.
Commenting here to document a chat we had in-person. Decided to go a slightly different route and have the immediate uploads submit their own future to the executorService, rather than trying to hook something immediate into the runs of the scheduledExecutorService. @PtrTeixeira is making updates and we'll review again once those are up |
The way that I had written the code for this, the main thread, which was used to poll the filesystem for metadata changes, was being blocked by awaiting the files to finish uploading. This moves the code that depends on the the uploads finishing (namely, logging that it finished and that closing the uploaders) into callbacks off of the futures. This change also no longer places any uploaders into the class-scoped expiredUpload queue.
I modified the S2UploadMetadata resource to add a new optional field, then failed to modify any constructor calls to guarantee that it actually worked. This _should_ make it so that the program actually compiles.
cc @ssalinas Could you take a look? |
@PtrTeixeira looks good overall. For ease of debugging when you try this out, I'd add some info or debug logging near the beginning of |
Add some additional logging statements to help track the progress of the program before and through the callbacks on the immediate uploader. Hopefully this should make it easier to debug the program.
Depending on the settings, the immediate uploader may still get added to the expiring list. The change guarantees that it is always removed from the expiring list, in order to prevent a memory leak in the application.
In general, the upload driver wants to avoid re-adding modified existing metadata, because it doesn't want to create a new uploader for it and it doesn't actually (for the most part) look at the contents of the metadata file, delegating that to the uploader itself. So on the whole it ignores existing metadata files on change. In this case, in particular, it needs to pick up changes in the metadata file because it should still perform the upload asap. This change just forces the upload if there is an existing metadata file that has been changed so that it requires immediate upload, it will upload the indicated files. As a heads-up, if you do this (modify an existing metadata file), it will still delete the metadata file after it performs the upload, so if you want to keep that around you will need to re-create the metadata file.
FYI @PtrTeixeira I merged #1391 ,should be able to resolve the merge conflicts on this one now |
c92b0d0
to
8650654
Compare
Rename metric for immediate uploaders
There _appears_ to be a threading issue where if the file system poller happens to run at the same time as an immediate uploaders is running, a second immediate uploader can be created for the same metadata. Typically this would be prevented by the metadata-to-uploader map, but immediate uploaders are removed from or never added to the map, so it was never checked if there was an existing uploader for a given metadata file. This polls the list of immediate uploaders to check their metadata, and will refuse to create another uploader of there is an existing uploader with the same metadata file.
Duplicates the same kind of structure that we are currently using to keep track of the normal loaders with the immediate uploaders. That is, I added another map from metadata to immediate uploaders. This is checked before creating a new uploader and only cleared when the existing uploader is definitely done and has been removed. Hopefully this should prevent the creation of duplicate immediate uploaders in a more robust way than the prior attempt, which might not maintain the block for the lifetime of the first immediate uploader.
The way that we are blocking attempts to create duplicate immediate uploaders is by creating a list of metadata objects that we have immediate uploaders for, and refusing to create a new uploader if that list already contains the metadata file associated with the new uploader. We need to perform the check-and-add operation atomically, to prevent the creation of two duplicate immediate uploaders to be interspersed in such a way that they are both created. In order to do so, this moves the check-and-add operation into a locked block of code in order to prevent that kind of interspersal.
Log the metadata involved when attempting to get a lock on creating a new immediate SingularityS3Uploader.
…oaders Prevent building duplicate immediate uploaders
Change has two parts: Adds an optional uploadImmediately field to the uploader
metadata, that when present and set will cause the uploader to upload it
immediately on change, and code within the on-detection function
handleNewOrModifiedS3Metadata
to perform the upload if that field isdetected.
This change in particular involved moving the vast majority of the code in
checkUploads
into a seperate function. I wanted to avoid duplicating thelogic in
checkUploads
inhandleNewOrModifiedS3Metadata
. If I directlycalled
checkUploads
, however, that would upload all the files that werequeued to upload, rather than just the files marked to upload immediately. So I
made a new function that would just take a map of files to upload and upload
only those files. So when it needed to upload the all queued files, as in
checkUploads
, it could directly work on themetadataToUploader
field, andwhen it only needs to upload a single block of files, it can only be passed
that single uploader/metadata.