Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addons + Proxito: return X-RTD-Resolver-Filename and inject via CF #11100

Merged
merged 12 commits into from
Feb 27, 2024

Conversation

humitos
Copy link
Member

@humitos humitos commented Feb 8, 2024

Return X-RTD-Resolver-Filename from El Proxito and inject it as HTML meta tag from Cloudflare Worker.

Closes readthedocs/addons#211

@humitos humitos requested review from a team as code owners February 8, 2024 09:19
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the unresolve parts, but the JS side looks good. The JS parts would be benefit from some test cases later, when we're ready for that.

@humitos
Copy link
Member Author

humitos commented Feb 12, 2024

@stsewd can you review this PR, please? I think you mentioned something on Slack but I'm not sure to remember what it was 😅

Copy link
Member

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We should set this header from the serve views, since we have access to the unresolved object from there already, instead of executing the unresolved twice.
  • Do we want to always add that header, even on 404s?
  • Everything in the worker that we are inserting as HTML should be HTML escaped, project and version are sluggified, so there is no problem there, but filenames can be anything, the current code is probably vulnerable to XSS.

Base automatically changed from humitos/wrangler-locally to main February 13, 2024 11:03
@humitos
Copy link
Member Author

humitos commented Feb 13, 2024

@stsewd

We should set this header from the serve views, since we have access to the unresolved object from there already, instead of executing the unresolved twice.

I didn't find a good place to do this. _serve_docs, _serve_file and similar methods don't receive the filename and I didn't find a way to get access to it from there. So, I ended up injecting the UnresolvedURL already resolved in ServeDocsBase.serve_path into the HttpRequest object and then accessing it from the middleware. That way, we don't call the unresolver twice.

Do we want to always add that header, even on 404s?

Yes. Are you asking for something in particular? Is there any reason why we wouldn't want it on 404s? Maybe I'm missing something.

Everything in the worker that we are inserting as HTML should be HTML escaped, project and version are sluggified, so there is no problem there, but filenames can be anything, the current code is probably vulnerable to XSS.

👍🏼 -- I added django.html.utils.escape(...) to the filename.

@humitos humitos requested a review from stsewd February 13, 2024 11:55
readthedocs/proxito/middleware.py Outdated Show resolved Hide resolved
readthedocs/proxito/middleware.py Show resolved Hide resolved
readthedocs/proxito/tests/test_headers.py Show resolved Hide resolved
Comment on lines +269 to +270
# We could resolve it again from the middleware, but we would duplicating DB queries.
request.unresolved_url = unresolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be in the 404 view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 -- I put it there.

I found there could be cases where the unresolve_path could raise and exception making the request.unresolved_url=None but there would be a filename anyways, since it's got from exc.path.

I think we could eventually refactor the proxito/unresolver code to support UnresolvedURL without all the components. This is out of the scope of this PR, tho and probably also related to #10456

@stsewd
Copy link
Member

stsewd commented Feb 13, 2024

👍🏼 -- I added django.html.utils.escape(...) to the filename.

We should escape all values in the context from where they are used, in this case, the worker (where we should also escape the project and version out of caution). We should check if the worker exposes HTML objects that we can use, instead of building the HTML by hand.

@humitos
Copy link
Member Author

humitos commented Feb 14, 2024

We should escape all values in the context from where they are used, in this case, the worker (where we should also escape the project and version out of caution).

I'm not sure about this. Why we cannot trust our own code? If we escape the values in El Proxito there is no need to escape them from the worker. I prefer to put this logic on the server side (Python) and always return something we can trust from the worker. It's more likely to have security issues with JavaScript than with Python, given our expertise in the field 😃

We should check if the worker exposes HTML objects that we can use, instead of building the HTML by hand.

I'm not sure to understand what you mean here. Can you expand on this?

@stsewd
Copy link
Member

stsewd commented Feb 14, 2024

@humitos we can't trust the values, since they are reflected from the user, if we need to modify the values we receive for some reason, we would need to unescaped them and escape them back again, it's easy to forget to escape back the values, or if there is a bug in our code where the project or version slug are passed as is without being valid slugs, we would be exposed. Of course, we can try escaping all values we set in the headers, but it's easy to miss escaping a value.

Writing a function to escaping HTML isn't that hard, you can take a look at the Python's implementation https://github.com/python/cpython/blob/4b2d1786ccf913bc80ff571c32b196be1543ca54/Lib/html/__init__.py#L12-L25, it would be better if CF provided an API for creating HTML elements safely... probably worth opening an issue, they don't talk about this in their docs, but they do in their examples :D

https://github.com/cloudflare/workers-sdk/blob/89482d4475780c3cd3efbbd004c6b4f85c8daf53/fixtures/pages-plugin-example/functions/_middleware.ts#L17-L20

@humitos
Copy link
Member Author

humitos commented Feb 20, 2024

@stsewd thanks for the explanation 👍🏼

we can't trust the values, since they are reflected from the user

I get this, but that's why we are escaping these values before passing them as HTTP header values from Python. After escaping them in the server, we can trust these values from the client.

Of course, we can try escaping all values we set in the headers, but it's easy to miss escaping a value.

I think this is the correct solution. Always escape all the values we set in the headers from the server and trust on them. If we miss escaping a value is a bug and we should fix it. I don't think we should escape all the values everywhere each time we use them just in case we have missed escaping it in the previous layer.

Writing a function to escaping HTML isn't that hard, you can take a look at the Python's implementation python/cpython@4b2d178/Lib/html/init.py#L12-L25,

I'd prefer to avoid re-implementing this in JavaScript and maintain it over time. I prefer to rely on Django/Python standards to cover myself on this and keep it updated.

but they do in their examples :D
cloudflare/workers-sdk@89482d4/fixtures/pages-plugin-example/functions/_middleware.ts#L17-L20

Unfortunately, it doesn't explain why HTML shouldn't be set like that.

@humitos
Copy link
Member Author

humitos commented Feb 20, 2024

I opened #11129 to keep the discussion alive and unblock this PR. The current code is secure and safe, so we can move forward with this implementation and decide whether or not change the implementation in a future PR.

@stsewd
Copy link
Member

stsewd commented Feb 20, 2024

I don't think we should escape all the values everywhere each time we use them just in case we have missed escaping it in the previous layer.

What I'm proposing is escaping all values only the context they are used, we don't need to escape them twice (that would be wrong, the original string would be lost). Currently, we are only escaping the filename header, all other headers aren't.

I'd prefer to avoid re-implementing this in JavaScript and maintain it over time. I prefer to rely on Django/Python standards to cover myself on this and keep it updated.

There is also the option to use a JS library, but seems straight forward to just implement the escape function.

@humitos
Copy link
Member Author

humitos commented Feb 21, 2024

@stsewd are you happy moving forward here and merging this PR apart from the escape/unescape discussion? We can come back to that later once we finish our conversation in #11129 👍🏼

@stsewd
Copy link
Member

stsewd commented Feb 21, 2024

I still think we should stick to safe patterns from the beginning, building HTML by hand without escaping is not. And solving this problem just requires one small function. If anyone else feels fine with the current implementation I won't oppose to merging.

@humitos
Copy link
Member Author

humitos commented Feb 27, 2024

We talked a little about this in Slack and it seems we are fine moving forward with this approach for now and continue the conversation about the implementation details in #11129.

@humitos humitos merged commit 48203f2 into main Feb 27, 2024
7 checks passed
@humitos humitos deleted the humitos/resolver-filename-header branch February 27, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flyout: link to change version drops page name
3 participants