-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 Sync Deleting Excluded Files #1268
Comments
@jarednjames |
OLD, SEE BELOW POST (left for reference and continuity). @kyleknap Thanks. At present we have a group of guys working on SolidWorks. They save to the server and then every 30 minutes the server uploads the changes to S3. Now when I add the line to sync back down so it retrieves any changes, this is where the problems begin. In the future, the plan is to have multiple servers syncing to S3. At present we only have one local server syncing up to S3 and I am testing the ability to sync back down. So for example: If I bring download in as well, the local server will sync up to the cloud with any changes (deleting files from S3 that no longer exist locally) and then it will attempt to sync down. The problem is that it will ignore the --exclude="~*" and delete the temporary files from the local machine, quite simply, because they don't exist on S3. |
Update After Testing I have simulated the situation syncing a single folder up and down and have locked onto the problem, as follows: 3 files - text1.txt, text2.txt, ~text3.txt The temporary file is indeed ignored during upload and download. So that isn't that problem. The problem is that if I sync text1.txt up to the server and then text1.txt changes on the local machine prior to the downward sync from S3 occurring (let's say the user saves the file again), the server doesn't acknowledge the file on the local machine is the latest version and instead downloads the S3 version (now older than the local version) and replaces the local file. This seems to be a major glitch in the sync function. |
Second Update I synced text1.txt and it uploaded to S3. I then did a sync back down from S3 and it downloaded it, even without any changes to the remote file. It's as if the sync back from S3 doesn't work. Although it's strange because not all files will download, only recently changed ones which sync up to the server first. Once it's gone up and then come back down, any subsequent up / down syncs don't have problems - until the file changes again and the cycle repeats. I ran a debug and the up sync says the file size and last modified time has changed - ok - but the down sync then says the same and redownloads the file, even if the local file was updated after the S3 copy was uploaded. Similar issue, but it doesn't explain why the newer local file (modified after an S3 up sync) is overwritten during a down sync: |
@jarednjames
That behavior is expected. When we sync, we treat s3 as a backup system. So if you are syncing from s3 to local, if there are changes to the local object, the local object will be restored with the s3 object. If the logic was to sync if local object was older than the s3 object, we would have a round-triping issue where if you synced a brand new directory and its files to s3 and synced it back down, the local files will be overridden, even though the local files were not touched, with what was in s3 because we use last modified time and size to determine if we sync and s3 sets the last modified time to when the file was uploaded (not with the actual last modified time of local file). I am unable to think of a good work around for your use case other than using --exclude filters or syncing down to a different directory. This would have to be a feature request where we add a sync strategy that updates local objects that are older than the s3 object, but not update objects that are newer than the s3 object. As to your second issue, I am not able to reproduce it. What version of the CLI are you using? You may want to upgrade to the latest, which is 1.7.20. Let me know if you have any more question or comments. |
@kyleknap OK, but this is extremely poorly documented (well, it isn't documented). Sync is quite a key feature to have and to omit the fact that a sync to S3 will upload a newer file and ignore an older one but a sync from S3 will replace the local copy regardless, so long as it is either newer or older - this isn't mentioned anywhere and has now crippled several weeks of development. Your "round trip" problem is curious as I also use rsync to produce a local backup and this doesn't have such a problem. I was expecting similar operation to rsync as this is what is indicated in the documentation. Looks like I'll have to write my own client to deal with this. |
Further to the above, rsync has a -u option which tells it to ignore newer files on the receiving file system. Sounds like exactly what is needed. |
Good Morning! We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI. This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports. As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions. We’ve imported existing feature requests from GitHub - Search for this issue there! And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue. GitHub will remain the channel for reporting bugs. Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface -The AWS SDKs & Tools Team |
Based on community feedback, we have decided to return feature requests to GitHub issues. |
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it. |
I do not want to delete excluded |
I have a simple bash script setup which syncs from a local server to an S3 bucket and then from the S3 bucket to the local server.
The idea is that several local servers can sync up to the cloud and then retrieve any updates from other servers following their own update.
I am using the following commands:
aws s3 sync s3://bucket-name/home /home --delete --exclude "
" --exclude ""aws s3 sync /home s3://bucket-name/home --delete --exclude "
" --exclude ""The first is to upload all changed files whilst ignoring any temporary files and the second is to download any newer files from the cloud whilst ignoring temporary files (not really required as they should never get uploaded).
My problem is that it will upload perfectly well, ignoring the temporary files and deleting out the files that no longer exist on the server, but when syncing back down it downloads any updated files correctly but then starts deleting out the temporary files because they aren't in the S3 bucket. It would appear that the --exclude option doesn't encompass the whole sync operation. It's as if the exclude is ignored and such files marked to be excluded from the sync are included for the purpose of the delete option.
I've tried putting the --delete option after the --exclude but it then ignores the excludes and syncs everything.
Would appreciate any help with regards to a correct syntax or confirming this is a bug.
The text was updated successfully, but these errors were encountered: