Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New behaviour of S3 storage exists() breaks current workflow #1430

Open
violuke opened this issue Jul 9, 2024 · 19 comments
Open

New behaviour of S3 storage exists() breaks current workflow #1430

violuke opened this issue Jul 9, 2024 · 19 comments

Comments

@violuke
Copy link

violuke commented Jul 9, 2024

Since upgrading from 1.14.3 to 1.14.4, we're having a problem with this code:

from django.core.files.base import ContentFile
from django.core.files.storage import storages

storage = storages["default"]

file_name = storage.save("example/example.txt", ContentFile(content=b"hello there!"))

if not storage.exists(file_name):
    raise Exception(f"File {file_name} does not exist")

print("All good!")

On 1.14.3, I get the "All good!" print but with 1.14.4, I get the Exception.

Django 5.0.6. I suspect this is a problem with #1381 edit: see below

I've downgraded again for now, but any help would be much appreciated. Thanks to everyone involved for a really helpful package 🙏

@violuke violuke changed the title Bug introduced in 1.14.4 with S3 (related to #1381 maybe) Bug introduced in 1.14.4 with S3 Jul 9, 2024
@violuke
Copy link
Author

violuke commented Jul 9, 2024

This is probably related to #1429 that was posted while I was typing my issue up 😄 The fact that #1422 was not in the changelog is why I wrongly assumed the issue might be related to a different PR. Thank you.

@jschneier
Copy link
Owner

jschneier commented Jul 10, 2024

Thanks for posting. The reason I made this change was two-fold:

  1. A security issue (which I reported) that was patched upstream: https://www.djangoproject.com/weblog/2024/jul/09/security-releases/
  2. The overwrite functionality was added to the upstream FilesystemStorage (part of how I found the issue) django/django@0b33a3a and as part of this they implemented it in terms of exists, see this comment: Fixed #35326 -- added allow_overwrite parameter to FileSystemStorage. django/django#18020 (comment)

I'm not surprised people were relying on the previous behavior and I frankly find the Django documentation bizarre but I feel a slight bit paralyzed.

@jschneier jschneier changed the title Bug introduced in 1.14.4 with S3 New behaviour of S3 storage exists() breaks current workflow Jul 10, 2024
@saemideluxe
Copy link

Sorry, I am not versatile enough to get the full picture of the security issue.
However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change.
However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

@jschneier
Copy link
Owner

Sorry, I am not versatile enough to get the full picture of the security issue. However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change. However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

Before the upstream fix, yes. Now that Django has patched, no.

The issue, in short, is that if the Storage overrides get_available_name and does not mirror all of the security checks in it (which this library respectively did and did not because the CVE fixes were added later) then calling storage.save(name) with a user supplied name was vulnerable. NOTE that field.save was not vulnerable because we were not overwriting generate_filename (although there were folks via GitHub search who did so).

Also, I still provide support for unsupported versions of Django (though they will be dropped shortly) which did not get the patch.

@jschneier
Copy link
Owner

The corresponding fix in Django: django/django@fe4a0bb

@saemideluxe
Copy link

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

@jschneier
Copy link
Owner

If the fix in Django is there then the following I think is fine because we are no longer relying on Storage.get_available_name for any of the security requirements.

def get_available_name(self, name, max_length=None):
  # TODO: something with max_length?
  if self.file_overwrite:
    return name
  return super().get_available_name(name, max_length=max_length)

@violuke
Copy link
Author

violuke commented Jul 10, 2024

Thanks everyone for the comments and time spent on this 🙏

I think I understand the reasoning behind the change originally, and we don't want a security issue, of course, but I personally feel that this should be reverted (or partially, at least).

In my opinion, it's a bug of this package that I can create a file, then call .exists() and get a False returned value? I should get the right boolean response to match the Django docs. Especially with file_overwrite being set to its default value of True and it not being obviously a setting that impacts if the exists() function works or not.

If people are bothered about having security fixes (which they should be) then they should be updating this package and Django. If they don't do updates properly and continue to use old versions of this package and/or Django, then that's their (ill-advised) choice.

If this change is reverted, then my thinking is that they have the security risk because they haven't also updated Django, as the fix is in Django and as long as they use the latest patch release of a supported version, they'll have the security fix. How do others feel about this logic?

Thank you.

@craiga
Copy link

craiga commented Jul 10, 2024

Yeah, the documentation is a bit confusing:

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

It's not clear what should happen when a file referenced by the given name already exists and the name is available for a new file.

I suppose I'd lean toward emulating whatever happens with a local filesystem, and maybe issue a warning if the user is using an unpatched version of Django.

Another solution would be to add a head method to the S3 storage which implements the old functionality. Possibly with a log message in exists to try and explain what's going on?

I'd be happy to try making a contribution of either of these fixes.

@th3hamm0r
Copy link
Contributor

I just stumbled over this issue too, so I'm also a bit late to the changes in Django, which I'm not real fan of. I think the basic "exists" operation of a storage class should return exactly that information.

But since this change is now part of Django too, I think the best option is to also set file_overwrite to False by default analog to the FileSystemStorage.

We're using wagtail a lot and setting file_overwrite to False is actually required there (see docs), but we've missed it for some other usages... So this default now seems to be the safer option, also because its analog to Django's handling.

@jschneier
Copy link
Owner

I think we should probably open a bug with Django. If someone has the time, please also to link to this issue.

@th3hamm0r
Copy link
Contributor

Good idea, done: https://code.djangoproject.com/ticket/35604
I hope it is explained clearly, otherwise you are welcome to edit it.

@sarahboyce
Copy link
Contributor

Thank you all for this discussion 👍

From what I understand, #1422 is motivated by django/django@0b33a3a.
This aligns the behaviour of django-storages as if the storages defined in this package were inheriting from FileSystemStorage with the new flag allow_overwrite=True (which is added in Django 5.1).
This is also motivated by Django discouraging overwriting get_available_name due to security concerns

However, after the security release to Django, having get_available_name to be defined similar to as described by @jschneier should be secure and django-storages can keep the previous behaviour of .exists()

The "side question" is "what does .exists() mean when you can overwrite files" given that the docs here are a little ambiguous

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

Given the engagement on this discussion around what .exists() should do in the overwrite case, and the recent security patch to Django, it makes sense for FileSystemStorage to also take this approach and overwrite get_available_name but leave the existing exists behaviour.
A clarification to the FileSystemStorage docs also make sense

This is roughly my conclusion 👍 I will also write on the ticket

So, in short, I hear you and will work on this.
Changes to django-storages are independent to this effort and can be worked on seperately

@nitsujri
Copy link

Hi all, I also attempted to upgrade to 1.14.4 and, as well, was hit by:

if self.file_overwrite:
return False

For us, this behavior based on variable naming is quite confusing because we want both exists() to return True AND file_overwrite to be True.

Doubt it matters, but our use case is uploading compiled web assets to S3.

I'm reading this thread and am very confused (I understand a lot of people are). At the root, is unintuitive that exists() is affected by file_overwrite because those should be independent options?

@jschneier
Copy link
Owner

Given the way Django is moving there will be another release that partially undoes this change.

derlin added a commit to divio/django-storage-url that referenced this issue Jul 19, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
derlin added a commit to divio/django-storage-url that referenced this issue Jul 19, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
@berkcoker
Copy link

Seconding the opinion that this should have been included in the changelog. We had this commit break the functionality of our app unexpectedly

@mindcruzer
Copy link

mindcruzer commented Jul 22, 2024

This change breaks the ManifestFilesMixin when collecting static files. Before attempting to get the file blob to compute the hash, it first checks if the file exists. If file overwrite is enabled it's not possible for this mixin to function.

@sarahboyce
Copy link
Contributor

django/django@8d6a20b just landed in Django

I want to thank @jschneier for all his work on the recent Django improvements and security release.
I know it's generated extra work (and possibly stress), and a lot of that work is behind closed doors. You've been fantastic and t's very appreciated, thank you 👏 👏 🥇

derlin added a commit to divio/django-storage-url that referenced this issue Jul 25, 2024
The overwrite_files is True by default, and should be settable from the
django settings using AZURE_OVERWRITE_FILES.

Removing this override will keep the default behaviour (True), while
still allowing users to change it.

NOTE: this change is important because django / django-storages changed
the semantics of the .exists() function, which now returns whether a
path is free for upload (and not whether the file already exists on the
remote storage). With overwrite_files set to True, .exists() will now
always return False.

See jschneier/django-storages#1430
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
lunika added a commit to openfun/marsha that referenced this issue Jul 30, 2024
In version 1.14.4 there are at least two modifications made that lead to
a breaking change in Marsha.
The most annoying one is linked to this issue:
jschneier/django-storages#1430 and we have to
wait a newer version with a fix to have the previous behaviour. This fix
is related to a security issue in django. This security is fixed in
version 4.2.14 and we already use this version, so we are safe.
The second one is related to how the signature in computed when an url
is generated. Previously the signature was generated no matter if we
need it or not and then we choose to remove the signautre part using the
private method `_strip_signing_parameters`. This private does not exists
anymore, instead a new setting is used, we have to set the setting
`querystring_auth` to False to not compute the signature, it's real
improvement as it saves the cost of computing the signature.
@MaximePETIT-code
Copy link

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

  • Wait until django-storage can declare the requirement Django>=4.2.14 (because we can assume from then on the vulnerability is fixed upstream) and then remove the lines at
    if self.file_overwrite:
    return False
  • Let users of django-storage (who rely on the old behavior) subclass the S3Storage class and overwrite the exists method according to their needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants