Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible serializers support #209

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

subnix
Copy link

@subnix subnix commented Dec 31, 2020

Pickle serializer is not secure and may lead to remote code execution or local privilege escalation. If attacker gains access to cache storage (filesystem, memcached, redis) then he can craft special payload, poison the cache and execute python-code. To avoid the attack we can use a safer serializer (eg. JSON).
I’ve added possibility to use any serializer. All changes are backward compatible and default serializer is pickle.

References:
https://cwe.mitre.org/data/definitions/94.html
https://cwe.mitre.org/data/definitions/502.html
https://davidhamann.de/2020/04/05/exploiting-python-pickle/ (attack example)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 79.805% when pulling 8d7b253 on subnix:feature/extensible-serializer into 82fa1d5 on sh4nks:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 79.805% when pulling 8d7b253 on subnix:feature/extensible-serializer into 82fa1d5 on sh4nks:master.

@oittaa
Copy link
Contributor

oittaa commented Mar 6, 2021

Would it make sense to encode bytes in base64 (or something else) with JSON? Having bytes in an object is a relatively common case and JSON can't serialize bytes. If you look at the example below, cls=BytesEncoder and object_hook=as_bytes could be passed to json.dumps and json.loads respectively as defaults.

import json
from base64 import b64decode, b64encode

BYTES_HINT = "__bytes__"


class BytesEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (bytes, bytearray)):
            return {BYTES_HINT: True, "bytes": b64encode(obj).decode("ascii")}
        return json.JSONEncoder.default(self, obj)


def as_bytes(obj):
    if BYTES_HINT in obj:
        return b64decode(obj["bytes"])
    else:
        return obj


test_data = dict()
test_data["float"] = 1.5
test_data["bytes"] = b"bytes"
test_data["nested_dict"] = {"1": 1, "2": b"dict_bytes"}
test_data["list"] = [True, "string", b"list_bytes", None]
print(test_data)

jsonstring = json.dumps(test_data, cls=BytesEncoder)
assert test_data == json.loads(jsonstring, object_hook=as_bytes)

@subnix
Copy link
Author

subnix commented Mar 7, 2021

  1. It’s a “fake responsibility” that library is responsive for serializing types unsupported by json. We can’t cover all the cases and shouldn’t try.
  2. Explicit is better than implicit. This may lead to unexpected behavior by developer and requires strong input validation. For example, application can cache user-provided (or third party) data that may contain special keys (“bytes”: True).

All in all, it’s developer’s responsibility, not library’s one.

@subnix
Copy link
Author

subnix commented Mar 9, 2021

I see 2 options for supporting bytes:

  1. Low level. Add some flag (e.g. raw=True) to get/set methods, that allows to bypass serialization.
  2. High level. Add decorator (e.g. memoize_bytes) with auto base64 encoding/decoding.

@oittaa
Copy link
Contributor

oittaa commented Mar 9, 2021

Explicit is better than implicit. This may lead to unexpected behavior by developer and requires strong input validation. For example, application can cache user-provided (or third party) data that may contain special keys (“bytes”: True).

Just to point out that bytes isn't really the special key in the example. I tried to show that with test_data["bytes"] = b"bytes"

The special key was__bytes__and if needed you could use something more specific like __flask_caching_bytes_hint__.

Here's an example that's even more clear showing that having bytes as a key doesn't break things. That key could be almost anything and it shouldn't matter:

test_data = {'bytes': True}
jsonstring = json.dumps(test_data, cls=BytesEncoder)
assert test_data == json.loads(jsonstring, object_hook=as_bytes)

@subnix
Copy link
Author

subnix commented Mar 9, 2021

My bad, I mean __bytes__: True.

test_data = (
    {"__bytes__": True},
    {"__bytes__": True, "bytes": "AAA"},
    {"__bytes__": True, "bytes": "!"},
    {"__bytes__": True, "bytes": "Cg=="}
)
for case in test_data:
    jsonstring = json.dumps(case, cls=BytesEncoder)
    assert test_data == json.loads(jsonstring, object_hook=as_bytes)

@subnix
Copy link
Author

subnix commented Mar 18, 2021

@sh4nks are there any updates?

@subnix
Copy link
Author

subnix commented May 14, 2021

I’ve obtained CVE ID (CVE-2021-33026) for this issue.

@ndouchin
Copy link

ndouchin commented Jun 1, 2021

@subnix @sh4nks are there any updates on this issue?

We discovered the CVE through python-safety in our codebase.

@subnix Do you need help with this issue?

@sh4nks
Copy link
Collaborator

sh4nks commented Jun 1, 2021

Hi, sorry for ignoring this issue for a while. Due to RL I don't have as much time as I'd like for maintenance work. I'll try to take a closer look today or tomorrow.

flask_caching/backends/base.py Outdated Show resolved Hide resolved
@subnix
Copy link
Author

subnix commented Jun 2, 2021

Pylibmc also uses pickle by default. @sh4nks maybe we should force (de-)serialize data on our own?

https://github.com/lericson/pylibmc/blob/8e783a69cc69fb04b9faad6c61ab43193569ab0f/src/_pylibmcmodule.c#L1245-L1284

Revert "Add test_generic_get_bytes"
Refs: 6ace510
@dimon222

This comment has been minimized.

@aduggirala28

This comment has been minimized.

@davidism
Copy link
Member

davidism commented Nov 8, 2021

As stated above, we consider the CVE invalid. You'll want to figure out how to mark those as invalid in whatever scanning tool is reporting them to you. Use of pickle on its own isn't a security issue, it would only be an issue if we were loading arbitrary data, which we're not.

This repo has recently changed maintainers, and the volunteer maintainers are still trying to find time to start working on various issues. There is a PR in cachelib to support serializer customization, and we will be transitioning flask-caching to use cachelib.

@fluffy-critter
Copy link

Is there any way to get the CVE closed? I use flask-caching in a library that I provide to others and I'm not thrilled with people being concerned about getting weekly dependabot notifications for this issue which boils down to "if you let people upload code to your server, they can run it."

@tvb
Copy link

tvb commented Dec 16, 2021

Still not able to close the CVE? My gitlab vulnerability report is also complaining.

@fluffy-critter
Copy link

For my projects I ended up just telling dependabot to ignore the CVE on the basis of “risk is tolerable to this project.” That doesn’t help downstream users though. :(

@pallets-eco pallets-eco deleted a comment from long76 Dec 31, 2021
@FLIGHTLAMA
Copy link

FLIGHTLAMA commented Jan 31, 2022

First of all, thanks for the great work with flask-caching.
Unfortunately I can't disable warnings in the scanning tool as they are managed centrally at some institutes.
Is there any workaround or fork in the meantime until cachelib is used?

@northernSage
Copy link
Member

northernSage commented Feb 21, 2022

Once #308 is merged, this will need some updates due to cachelib:#63

@subnix
Copy link
Author

subnix commented Feb 21, 2022

Once #308 is merged, this will need some updates due to cachelib:#63

I see only one way to update this according to changes - close the PR and open a new one :)

@northernSage may be we can implement common serializers (json, pickle with hmac) in cachelib? Otherwise every library user will have to implement boilerplate serialization engines.

Note about severity:
According to Censys, there are 300,000 memched/redis servers exposed to internet, but what about intranet?.. I hope they don't use pickle :). Sure, there isn't a real-time data and honeypots make some part of it.
So we can endlessly discuss pickle (un-)security, but this still holds true - cache storages are not properly secured. The main goal of the CVE is to warn end users about this behaviour. I have decided for me to patch library, because I don't agree with this behaviour, but everyone decides for himself what to do.

self._serializer = serializer

def dumps(self, obj, *args, **kwargs):
result = self._serializer.dumps(obj, *args, **kwargs)
Copy link

@NotoriousPyro NotoriousPyro Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I set the default serializer in base.py to json, many tests failed because they could not serialize bytes to str.
I fixed it with the following, which I think should be included if this BytesSerializer is to work with the JSON serializer properly: NotoriousPyro/flask-caching-json@591f89e#diff-67f94315529cee25fa3dc7c13b27c29fdf9c96fd765156090247b49d587ba08cR16-R17

    def dumps(self, obj, *args, **kwargs):
        if isinstance(obj, bytes):
            obj = obj.decode("UTF-8")
        result = self._serializer.dumps(obj, *args, **kwargs)
        if isinstance(result, str):
            return result.encode()
        return result

@rylyade1
Copy link

rylyade1 commented May 9, 2022

Hi, are there any updates on how this pull request or the corresponding CVE will be handled in the library? Will other serializers (e.g. json) be offered as @subnix suggested?

@shahwaizkhan
Copy link

Hi, Any update on when this PR will be merged to reslove this CVE issue ?
AWS vulnerability report is also complaining.

@northernSage
Copy link
Member

@shahwaizkhan, @rylyade1 We are currently going through an internal refactor process, please see #308 mentioned in my previous comment:

Once #308 is merged, this will need some updates due to cachelib:#63

Bear in mind that cachelib (which is now flask-caching's backend) already supports custom/configurable serializers, what is left now is exposing this functionality in a convenient way through flask-caching's API, ongoing in pallets-eco/cachelib#136 (comment)

@bkhakshoor
Copy link

Can I get some clarity here, if a user controls the parameters being passed to the memoize function, can they execute code?

@fluffy-critter
Copy link

The issue isn't about memoization, it's about the deserialization from the cache, due to its use of Pickle. See https://security.stackexchange.com/a/201243/5176 for an example of such an attack.

For an attack like this to work, an attacker must:

  1. Be able to write arbitrary values to the cache
  2. Be able to generate a cache key that will collide with a value being read by the application
  3. Cause the application to read a maliciously-injected value

Any situation where all 3 of those is true is a situation where the application has larger problems; for example, if someone's able to inject malicious cached rendered pages into a Flask app's cache, then they can make the website say literally anything they want, regardless of whether it involves the execution of remote code. Basically, the Pickle vulnerability follows from a website already being extremely vulnerable (due to conditions 1 and 2 being met).

Remote code execution is definitely a bad thing, but in any situation where RCE can be triggered through this (extremely theoretical) vulnerability, there's already bigger problems with the website.

@aarti9
Copy link

aarti9 commented Sep 16, 2022

Hi all,

I am looking for solution to the vulnerability for flask-caching. Can you please let me know if there is a solution to use? This is a bit urgent so would appreciate any response. Thank you.

Thanks
Aarti

@ThiefMaster
Copy link
Contributor

This is not really urgent nor an actual vulnerability. The ONLY case where this can be exploited is if your infrastructure is already compromised and an attacker can write arbitrary data to your cache backend. Please see the comment right above yours as well.

@fluffy-critter
Copy link

I just submitted an update request to mitre.org with the following description:

The stated vulnerability is not a vulnerability in flask-caching itself, but only makes flask-caching one intermediate member in a chain of multiple security issues.

For an attack like this to work, an attacker must:

  1. Be able to write arbitrary values to the cache
  2. Be able to generate a cache key that will collide with a value being read by the application
  3. Cause the application to read a maliciously-injected value

Any situation where all 3 of those is true is a situation where the application has larger problems; for example, if someone's able to inject malicious cached rendered pages into a Flask app's cache, then they can make the website say literally anything they want, regardless of whether it involves the execution of remote code. Basically, the Pickle vulnerability follows from a website already being extremely vulnerable (due to conditions 1 and 2 being met).

There has not been a single confirmed case of this exploit having ever occurred, despite the description being public for two years; the attack is only theoretical, requires a very specific exploit path which requires there to be much worse security issues to be visible in the first place, and it boils down to "if you let people upload code to your server, they can run it."

This CVE should be rejected, or at least have its severity reduced, with the following base metrics:

Attack complexity: high (requires multiple failures in security infrastructure)
User interaction: required (the exploit requires a web request that reads from a poisoned cache item)
Confidentiality: none (this does not directly result in the disclosure of confidential data)
Availability: none (it's regarding data in a cache, which is inherently transitory/non-durable)

@NotoriousPyro
Copy link

Hi, are there any updates on how this pull request or the corresponding CVE will be handled in the library? Will other serializers (e.g. json) be offered as @subnix suggested?

I forked the repo and made a library extending this that replaces the serializers, though its been a while since I worked on it since we went with Next.js instead. I have no idea if it still works. I think it should even be able to serialize bytes to json, by base64 it... again no guarantees and no warranty... https://github.com/NotoriousPyro/flask-caching-json

oliverchang added a commit to pypa/advisory-database that referenced this pull request Jul 25, 2023
Per pallets-eco/flask-caching#209, this is not considered to be a vulnerability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.