Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Sync Deleting Excluded Files #1268

Closed
ghost opened this issue Apr 7, 2015 · 11 comments
Closed

S3 Sync Deleting Excluded Files #1268

ghost opened this issue Apr 7, 2015 · 11 comments
Labels
closed-for-staleness feature-request A feature should be added or improved. s3filters s3

Comments

@ghost
Copy link

ghost commented Apr 7, 2015

I have a simple bash script setup which syncs from a local server to an S3 bucket and then from the S3 bucket to the local server.

The idea is that several local servers can sync up to the cloud and then retrieve any updates from other servers following their own update.

I am using the following commands:
aws s3 sync s3://bucket-name/home /home --delete --exclude "" --exclude ""
aws s3 sync /home s3://bucket-name/home --delete --exclude "" --exclude ""

The first is to upload all changed files whilst ignoring any temporary files and the second is to download any newer files from the cloud whilst ignoring temporary files (not really required as they should never get uploaded).

My problem is that it will upload perfectly well, ignoring the temporary files and deleting out the files that no longer exist on the server, but when syncing back down it downloads any updated files correctly but then starts deleting out the temporary files because they aren't in the S3 bucket. It would appear that the --exclude option doesn't encompass the whole sync operation. It's as if the exclude is ignored and such files marked to be excluded from the sync are included for the purpose of the delete option.

I've tried putting the --delete option after the --exclude but it then ignores the excludes and syncs everything.

Would appreciate any help with regards to a correct syntax or confirming this is a bug.

@kyleknap
Copy link
Contributor

kyleknap commented Apr 7, 2015

@jarednjames
We will look into this. In the meantime, could you provide a minimal set of steps that we can follow to reproduce your issue? For example, could you provide a small sample setup (with filenames) that we can use to sync up to s3 and sync down to reproduce the issue?

@kyleknap kyleknap added bug This issue is a bug. response-needed labels Apr 7, 2015
@ghost
Copy link
Author

ghost commented Apr 8, 2015

OLD, SEE BELOW POST (left for reference and continuity).

@kyleknap Thanks.

At present we have a group of guys working on SolidWorks. They save to the server and then every 30 minutes the server uploads the changes to S3. Now when I add the line to sync back down so it retrieves any changes, this is where the problems begin.

In the future, the plan is to have multiple servers syncing to S3. At present we only have one local server syncing up to S3 and I am testing the ability to sync back down.

So for example:
CAD A will be working on a model containing a.sldprt, b.sldprt, c.sldprt. These will get synced up to the server but their temporary files (prefixed by a ~) are ignored. 30 minutes later it will sync again and update them if they've changed.

If I bring download in as well, the local server will sync up to the cloud with any changes (deleting files from S3 that no longer exist locally) and then it will attempt to sync down. The problem is that it will ignore the --exclude="~*" and delete the temporary files from the local machine, quite simply, because they don't exist on S3.

@ghost
Copy link
Author

ghost commented Apr 8, 2015

Update After Testing

I have simulated the situation syncing a single folder up and down and have locked onto the problem, as follows:

3 files - text1.txt, text2.txt, ~text3.txt

The temporary file is indeed ignored during upload and download. So that isn't that problem.

The problem is that if I sync text1.txt up to the server and then text1.txt changes on the local machine prior to the downward sync from S3 occurring (let's say the user saves the file again), the server doesn't acknowledge the file on the local machine is the latest version and instead downloads the S3 version (now older than the local version) and replaces the local file.

This seems to be a major glitch in the sync function.

@ghost
Copy link
Author

ghost commented Apr 8, 2015

Second Update

I synced text1.txt and it uploaded to S3. I then did a sync back down from S3 and it downloaded it, even without any changes to the remote file.

It's as if the sync back from S3 doesn't work. Although it's strange because not all files will download, only recently changed ones which sync up to the server first. Once it's gone up and then come back down, any subsequent up / down syncs don't have problems - until the file changes again and the cycle repeats.

I ran a debug and the up sync says the file size and last modified time has changed - ok - but the down sync then says the same and redownloads the file, even if the local file was updated after the S3 copy was uploaded.

Similar issue, but it doesn't explain why the newer local file (modified after an S3 up sync) is overwritten during a down sync:
#599

@kyleknap
Copy link
Contributor

kyleknap commented Apr 8, 2015

@jarednjames
I think I see what you are trying to do. So you are seeing this currently?

# List out all of the files in the directory
$ ls temp
text1.txt   text2.txt   ~text3.txt

# Sync the files up to the s3 bucket.
$ aws s3 sync temp s3://mybucket/temp --delete --exclude "~*" --exclude "*~"
upload: temp/text1.txt to s3://mybucket/temp/text1.txt
upload: temp/text2.txt to s3://mybucket/temp/text2.txt

$ Modify a local file
$ touch temp/text1.txt

# Sync the files down, but you expected the text1.txt to not be synced down as it clobers your local file.
$ aws s3 sync s3://mybucket/temp temp --delete --exclude "~*" --exclude "*~"
download: s3://mybucket/temp/text1.txt to temp/text1.txt

That behavior is expected. When we sync, we treat s3 as a backup system. So if you are syncing from s3 to local, if there are changes to the local object, the local object will be restored with the s3 object.

If the logic was to sync if local object was older than the s3 object, we would have a round-triping issue where if you synced a brand new directory and its files to s3 and synced it back down, the local files will be overridden, even though the local files were not touched, with what was in s3 because we use last modified time and size to determine if we sync and s3 sets the last modified time to when the file was uploaded (not with the actual last modified time of local file).

I am unable to think of a good work around for your use case other than using --exclude filters or syncing down to a different directory. This would have to be a feature request where we add a sync strategy that updates local objects that are older than the s3 object, but not update objects that are newer than the s3 object.

As to your second issue, I am not able to reproduce it. What version of the CLI are you using? You may want to upgrade to the latest, which is 1.7.20.

Let me know if you have any more question or comments.

@kyleknap kyleknap added feature-request A feature should be added or improved. and removed bug This issue is a bug. labels Apr 8, 2015
@ghost
Copy link
Author

ghost commented Apr 9, 2015

@kyleknap OK, but this is extremely poorly documented (well, it isn't documented).

Sync is quite a key feature to have and to omit the fact that a sync to S3 will upload a newer file and ignore an older one but a sync from S3 will replace the local copy regardless, so long as it is either newer or older - this isn't mentioned anywhere and has now crippled several weeks of development.

Your "round trip" problem is curious as I also use rsync to produce a local backup and this doesn't have such a problem.

I was expecting similar operation to rsync as this is what is indicated in the documentation.

Looks like I'll have to write my own client to deal with this.

@ghost
Copy link
Author

ghost commented Apr 9, 2015

Further to the above, rsync has a -u option which tells it to ignore newer files on the receiving file system. Sounds like exactly what is needed.

@ASayre
Copy link
Contributor

ASayre commented Feb 6, 2018

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

@ASayre ASayre closed this as completed Feb 6, 2018
@jamesls jamesls reopened this Apr 6, 2018
@jamesls
Copy link
Member

jamesls commented Apr 6, 2018

Based on community feedback, we have decided to return feature requests to GitHub issues.

@github-actions
Copy link

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. Because it has been longer than one year since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment to prevent automatic closure, or if the issue is already closed, please feel free to reopen it.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Sep 18, 2020
@shwetao
Copy link

shwetao commented Aug 24, 2022

I do not want to delete excluded
what options can I use to delete from included but not from excluded?
This does not work: "--include '' --exclude 'doc/' --delete"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness feature-request A feature should be added or improved. s3filters s3
Projects
None yet
Development

No branches or pull requests

4 participants