-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch() #1926
Conversation
…yption error, though
I was considering splitting this PR into:
It would make things a bit clearer but I don't think it would help that much. The changes in this PR are already logically grouped using the file structure, e.g.:
At the same time, it would take me a few hours to split everything, get the tests to pass in each sub-PR etc. I think I'll stick to a single PR here. It is an atomic-ish change, even if large. |
Let's get it in 🎉 Next stop: Curl support in the browser. |
Enables the CURL PHP extension on playground.wordpress.net when networking is enabled. The heavy lifting was done in #1926. All this PR does is: * Enables the curl extension * Rebuilds PHP.wasm for the web * Enables curl_exec and curl_multiexec functions in web browsers * Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so that we can easily learn which PHP.wasm build Playground is running. Related to #85 Closes #1008 ## Testing instrucions Confirm the new E2E tests are sound and that they work in CI. You could also try installing a CURL-reliant plugin such as Plausible and confirm it installs without the fatal errors reported in #1008
Enables the CURL PHP extension on [playground.wordpress.net](http://playground.wordpress.net/) when networking is enabled. This is made possible by the TLS 1.2 implementation merged in #1926. This PR: * Enables the curl extension * Rebuilds PHP.wasm for the web * Enables curl_exec and curl_multiexec functions in web browsers * **Strips the response content-length and switches to Transfer-Encoding: Chunked** * Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so that we can easily learn which PHP.wasm build Playground is running Related to #85 Closes #1008 ## Why use Transfer-Encoding: chunked? Web servers often respond with a combination of Content-Length and Content-Encoding. For example, a 16kb text file may be compressed to 4kb with gzip and served with a Content-Encoding of `gzip` and a Content-Length of 4KB. The web browser, however, exposes neither the Content-Encoding header nor the gzipped data stream. All we have access to is the original Content-Length value of the gzipped file and a decompressed data stream. If we just pass that along to the PHP-side request handler, it would see a 16KB body stream with a Content-Length of 4KB. It would then truncate the body stream at 4KB and discard the rest of the data. This is not what we want. To correct that behavior, we're stripping the Content-Length entirely. We do that for every single response because we don't have any way of knowing whether any Content-Encoding was used. Furthermore, we can't just calculate the correct Content-Length value without consuming the entire content stream – and we want to pass each data chunk to PHP as we receive it. Instead of a fixed Content-Length, this PR uses Content-Encoding: Chunked, and then provides a per-chunk Content-Length. ## Testing instrucions Confirm the new E2E tests are sound and that they work in CI. You could also try installing a CURL-reliant plugin such as Plausible and confirm it installs without the fatal errors reported in #1008
…2058) ## Description Adds the Data Liberation WXR importer as an option in the `importWxr` step. The new importer is turned by including the `"importer": "data-liberation"` option: ```json { "steps": [ { "step": "importWxr", "file": { "resource": "url", "url": "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml" }, "importer": "data-liberation" } ] } ``` When the `importer` option is missing or set to "default," nothing changes in the behavior of the step and it continues using the https://github.com/humanmade/WordPress-Importer importer. The new importer: * Rewrites links in the imported content * Downloads assets through Playground's CORS proxy * Parallelizes the downloads * Communicates progress This PR is a part of #1894 ## Implementation details This `importWxr` step fetches and includes the `data-liberation-core.phar` file. The phar file is built with [Box](https://box-project.github.io/box/configuration/) and contains the importer library with its dependencies, which is a subset of the Data Liberation library, a subset of the Blueprints library, and a few vendor libraries. This, unfortunately, means that any changes in the PHP files require rebuilding the .phar file. Here's how you can do it: ```bash nx build:phar playground-data-liberation ``` You can also build the entire Data Liberation package as a WordPress plugin complete with a wp-admin page: ```bash nx build:plugin playground-data-liberation ``` Both commands will output the built files to `packages/playground/data-liberation/dist` The progress updates are a first-class feature of the new importer. The updated `importer` step receives them in real-time via a `post_message_to_js()` call running after every import step. Then, it passes them on to the progress bar UI. ### Other changes * **TLS traffic now goes through the CORS proxy.** Since the new importer uses `AsyncHTTP\Client` which deals with raw sockets, Playground's [TLS-based network bridge](#1926) runs the outbound traffic through a cors proxy. Technically, `TCPOverFetchWebsocket` gets the `corsProxy` URL passed to the `playground.boot()` call. * A few composer dependencies were forked, downgraded to PHP 7.2 using Rector, and bundled with this PR to keep the Data Liberation importer working. ## Remaining work - [x] PHP 7.2 compatibility. Done by forking and Rector-downgrading dependencies that were incompatible with PHP 7.2. - [x] Report the importer's progress on the overall Blueprint progress bar - [x] Enqueue the data liberation plugin files for downloading at the blueprint compilation stage - [x] Don't eagerly rewrite attachments URLs in `WP_Stream_Importer`. Exposing this information to the API consumer requires an explicit decision. Do we rewrite it? Or do we ignore it? - [x] Fix the TLS errors at the intersection of Playground network transport and the async HTTP client library - [x] Separate the markdown importer and its dependencies (md parser, frontmatter parser, Symfony libraries) from the core plugin - [x] Ship the importer and its tree-shaken deps (URL parser) as a minified zip/phar ## Follow-up work - [ ] Reconsider the `WP_Import_Session` API – do we need so many verbosely named methods? Can we achieve the same outcomes with fewer methods? - [ ] Investigate why there's a significant delay before media downloads start on PHP 7.2 – 7.4. It's likely a PHP.wasm issue. ## Testing instructions * Default importer – [Open this link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}) and confirm it does what the current `importWxr` step do, that is it stays at "Importing content" for a moment, fails to fetch media files (CORS issues in network tools), but inserts posts and pages. * Data Liberation – [Open this link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22importer%22:%20%22data-liberation%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}), confirm the import progress is visible and that the content and media indeed get imported: ![CleanShot 2024-12-08 at 14 54 49@2x](https://github.com/user-attachments/assets/a7da3244-a10f-43d2-8e94-43d305220a7e) ## Related issues * #1211 * #2012 * #1477 * #1250 * #1780
Motivation for the change, related issues
Enables HTTPS requests from PHP via
file_get_contents()
, curl, and all other networking mechanisms. This PR effectively performs a MITM attack on the PHP instance to decrypt the outbound traffic, run the request usingfetch()
, and then provide an encrypted response – everything as if PHP was directly talking to the right server.How is it implemented?
Emscripten can be configured to stream all network traffic through a WebSocket.
@php-wasm/node
andwp-now
use that to access the internet via a local WebSocket->TCP proxy, but the in-browser version of WordPress Playground exposes no such proxy.This PR ships a "fake" WebSocket class. Instead of starting a
ws://
connection, it translates the raw HTTP/HTTPS bytes into afetch()
call.In case of HTTP, the raw request bytes are parsed into a Request object with a body stream and passes it to
fetch()
. Then, as the response status, headers, and the body arrive, they're stream-encoded as raw response bytes and exposed as incoming WebSocket data.In case of HTTPS, we the raw bytes are first piped through a custom TCPConnection class as follows:
openssl.cafile
PHP.ini optionFrom there, the plaintext data is treated by the same HTTP<->fetch() machinery as described in the previous paragraph.
Implementation details
This PR ships:
WebSocket <-> TLS <-> fetch()
pipeline.TLS 1.2
window.crypto()
for encryption.TLS1_CK_ECDHE_RSA_WITH_AES_128_GCM_SHA256
mode.ChangeCipherSpec
messages.SSL certificate generator
Avenues explored but not pursued
This work supersedes #1093 where
node-forge
was used. Here's why I'm moving to a custom TLS implementation:node-forge
runs everything synchronously and ships a lot of code.window.crypto
is async, faster, bundles less code, and is more convenient thannode-forge
.node-forge
, every error made me question fundamentals like the RSA implementation. Withwindow.crypto()
, I feel confident assuming that encryption, hashing, signing etc. are implemented correctly.node-forge
doesn't support TLS 1.3. Neither does this PR, but after implementing TLS 1.2 I think adding TLS 1.3 support would be reasonably easyTesting instructions
Go to the URL below and confirm you see "Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: int(1887)"
From there, you could manipulate the URL in the
file_get_contents()
call to fetch a different file, file with no CORS headers, invalid URLs etc. Confirm that each time PHP did something sensible, e.g. displayed the length, displayed the error message, etc. It should never just hang.Also, confirm the newly added CI tests work as expected.
Remaining work
fetch()
exceptionshttpRequestToFetch
about HTTP (plaintext) vs HTTPS (go through TLS) vs other protocols (reject connection). For example, check ports, pay attention to parsing errors, etc.fetch()
calls without the encrypt->decrypt->encrypt->decrypt overhead.Follow up work
CC @brandonpayton @bgrgicak @dmsnell @mho22