Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch() #1926

Merged
merged 46 commits into from
Oct 23, 2024

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Oct 21, 2024

Motivation for the change, related issues

Enables HTTPS requests from PHP via file_get_contents(), curl, and all other networking mechanisms. This PR effectively performs a MITM attack on the PHP instance to decrypt the outbound traffic, run the request using fetch(), and then provide an encrypted response – everything as if PHP was directly talking to the right server.

How is it implemented?

Emscripten can be configured to stream all network traffic through a WebSocket. @php-wasm/node and wp-now use that to access the internet via a local WebSocket->TCP proxy, but the in-browser version of WordPress Playground exposes no such proxy.

This PR ships a "fake" WebSocket class. Instead of starting a ws:// connection, it translates the raw HTTP/HTTPS bytes into a fetch() call.

In case of HTTP, the raw request bytes are parsed into a Request object with a body stream and passes it to fetch(). Then, as the response status, headers, and the body arrive, they're stream-encoded as raw response bytes and exposed as incoming WebSocket data.

In case of HTTPS, we the raw bytes are first piped through a custom TCPConnection class as follows:

  1. We generate a self-signed CA certificate and tell PHP to trust it using the openssl.cafile PHP.ini option
  2. We create a domain-specific child certificate and sign it with the CA private key.
  3. We start accepting raw encrypted bytes, process them as structured TLS records, and perform the TLS handshake.
  4. Encrypted tunnel is established
    • TLSConnection decrypts the encrypted outbound data sent by PHP
    • TLSConnection encrypts the unencrypted inbound data fed back to PHP

From there, the plaintext data is treated by the same HTTP<->fetch() machinery as described in the previous paragraph.

Implementation details

This PR ships:

  • PHP.wasm bindings to pipe the outbound bytes through a WebSocket <-> TLS <-> fetch() pipeline.
  • A subset of TLS 1.2 protocol implementation (parts of RFC 5246, RFC 6066, RFC 4492, RFC 8446, RFC 6070)
  • SSL certificate generator supporting CA certs signed certs

TLS 1.2

  • Parses all TLS record types: handshakes, alerts, application data.
  • Performs the full TLS handshake required for ECDH encryption including the necessary TLS 1.2 extensions.
  • Correctly encrypts and decrypts all the post-handshake data.
  • Uses window.crypto() for encryption.
  • Only supports the TLS1_CK_ECDHE_RSA_WITH_AES_128_GCM_SHA256 mode.
  • Doesn't support multiple ChangeCipherSpec messages.

SSL certificate generator

  • CA certificate is generated at WASM boot (if networking is enabled)
  • Host-specific certificate is generated at every request and signed with CA private key
  • Certificates are created using a custom ASN.1/DER encoder and a PEM exporter shipped in this PR
  • Only RSA 2048 with SHA-256 supported today

Avenues explored but not pursued

This work supersedes #1093 where node-forge was used. Here's why I'm moving to a custom TLS implementation:

  • node-forge runs everything synchronously and ships a lot of code. window.crypto is async, faster, bundles less code, and is more convenient than node-forge.
  • With node-forge, every error made me question fundamentals like the RSA implementation. With window.crypto(), I feel confident assuming that encryption, hashing, signing etc. are implemented correctly.
  • node-forge doesn't support TLS 1.3. Neither does this PR, but after implementing TLS 1.2 I think adding TLS 1.3 support would be reasonably easy

Testing instructions

Go to the URL below and confirm you see "Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: int(1887)"

http://localhost:5400/website-server/?php=8.0&wp=6.6&networking=yes&language=&multisite=no&random=f1qv1twpssr#%7B%22landingPage%22:%22/network-test.php%22,%22preferredVersions%22:%7B%22php%22:%228.0%22,%22wp%22:%22latest%22%7D,%22phpExtensionBundles%22:%5B%22kitchen-sink%22%5D,%22steps%22:%5B%7B%22step%22:%22writeFile%22,%22path%22:%22/wordpress/network-test.php%22,%22data%22:%22%3C?php%20echo%20'Hello-dolly.zip%20downloaded%20from%20https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip%20has%20this%20many%20bytes:%20';%20var_dump(strlen(file_get_contents('https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip')));%22%7D%5D%7D

From there, you could manipulate the URL in the file_get_contents() call to fetch a different file, file with no CORS headers, invalid URLs etc. Confirm that each time PHP did something sensible, e.g. displayed the length, displayed the error message, etc. It should never just hang.

Also, confirm the newly added CI tests work as expected.

Remaining work

  • Add a solid unit and E2E test suite, especially for:
    • Streaming: bytes, pause, more bytes
    • fetch() exceptions
    • Slow servers
    • POST requests
  • Add abundant docstrings to explain what's happening at each stage
  • Core work
    • Be more strict in httpRequestToFetch about HTTP (plaintext) vs HTTPS (go through TLS) vs other protocols (reject connection). For example, check ports, pay attention to parsing errors, etc.
    • Rebuild all the in-browser PHP.wasm versions
    • Don't run any of this code when networking is disabled
    • Continue using the custom handler for the Requests library to enable direct fetch() calls without the encrypt->decrypt->encrypt->decrypt overhead.
  • Clean it up

Follow up work

  • Caching – perhaps as a follow-up
    • Ship a precomputed CA cert and private key
    • Memoize host-specific certificates

CC @brandonpayton @bgrgicak @dmsnell @mho22

@adamziel adamziel marked this pull request as ready for review October 22, 2024 23:36
@adamziel adamziel requested a review from a team as a code owner October 22, 2024 23:36
@adamziel
Copy link
Collaborator Author

I was considering splitting this PR into:

  • TLS implementation.
  • Rebuilding PHP versions.
  • Glue code to connect the PHP instance to TLS.

It would make things a bit clearer but I don't think it would help that much. The changes in this PR are already logically grouped using the file structure, e.g.:

  • TLS implementation: packages/php-wasm/web/src/lib/tls
  • PHP versions: packages/php-wasm/web/public/php
  • Glue code: packages/playground/remote/src/lib/worker-thread.ts, packages/php-wasm/web/src/lib/load-runtime.ts

At the same time, it would take me a few hours to split everything, get the tests to pass in each sub-PR etc. I think I'll stick to a single PR here. It is an atomic-ish change, even if large.

@adamziel adamziel merged commit 7fa40be into trunk Oct 23, 2024
10 checks passed
@adamziel adamziel deleted the ssl-network-brodge-wrap branch October 23, 2024 21:25
@adamziel
Copy link
Collaborator Author

Let's get it in 🎉 Next stop: Curl support in the browser.

adamziel added a commit that referenced this pull request Oct 23, 2024
Enables the CURL PHP extension on playground.wordpress.net when
networking is enabled.

The heavy lifting was done in #1926. All this PR does is:

* Enables the curl extension
* Rebuilds PHP.wasm for the web
* Enables curl_exec and curl_multiexec functions in web browsers
* Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so
  that we can easily learn which PHP.wasm build Playground is running.

Related to #85
Closes #1008

 ## Testing instrucions

Confirm the new E2E tests are sound and that they work in CI. You could
also try installing a CURL-reliant plugin such as Plausible and confirm
it installs without the fatal errors reported in #1008
@adamziel adamziel restored the ssl-network-brodge-wrap branch October 24, 2024 08:58
@adamziel adamziel mentioned this pull request Jul 1, 2024
adamziel added a commit that referenced this pull request Oct 24, 2024
Enables the CURL PHP extension on
[playground.wordpress.net](http://playground.wordpress.net/) when
networking is enabled. This is made possible by the TLS 1.2
implementation merged in #1926.

This PR:

* Enables the curl extension
* Rebuilds PHP.wasm for the web
* Enables curl_exec and curl_multiexec functions in web browsers
* **Strips the response content-length and switches to
Transfer-Encoding: Chunked**
* Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so
that we can easily learn which PHP.wasm build Playground is running

Related to #85
Closes #1008

## Why use Transfer-Encoding: chunked?

Web servers often respond with a combination of Content-Length
and Content-Encoding. For example, a 16kb text file may be compressed
to 4kb with gzip and served with a Content-Encoding of `gzip` and a
Content-Length of 4KB.

The web browser, however, exposes neither the Content-Encoding header
nor the gzipped data stream. All we have access to is the original
Content-Length value of the gzipped file and a decompressed data stream.

If we just pass that along to the PHP-side request handler, it would
see a 16KB body stream with a Content-Length of 4KB. It would then
truncate the body stream at 4KB and discard the rest of the data.

This is not what we want.

To correct that behavior, we're stripping the Content-Length entirely.
We do that for every single response because we don't have any way
of knowing whether any Content-Encoding was used. Furthermore, we can't
just calculate the correct Content-Length value without consuming the
entire content stream – and we want to pass each data chunk to PHP
as we receive it.

Instead of a fixed Content-Length, this PR uses Content-Encoding:
Chunked,
and then provides a per-chunk Content-Length. 

## Testing instrucions

Confirm the new E2E tests are sound and that they work in CI. You could
also try installing a CURL-reliant plugin such as Plausible and confirm
it installs without the fatal errors reported in #1008
adamziel added a commit that referenced this pull request Dec 11, 2024
…2058)

## Description

Adds the Data Liberation WXR importer as an option in the `importWxr`
step. The new importer is turned by including the `"importer":
"data-liberation"` option:

```json
{
  "steps": [
    {
      "step": "importWxr",
      "file": {
        "resource": "url",
        "url": "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml"
      },
      "importer": "data-liberation"
    }
  ]
}
```

When the `importer` option is missing or set to "default," nothing
changes in the behavior of the step and it continues using the
https://github.com/humanmade/WordPress-Importer importer.

The new importer:

* Rewrites links in the imported content
* Downloads assets through Playground's CORS proxy
* Parallelizes the downloads
* Communicates progress

This PR is a part of
#1894

## Implementation details

This `importWxr` step fetches and includes the
`data-liberation-core.phar` file. The phar file is built with
[Box](https://box-project.github.io/box/configuration/) and contains the
importer library with its dependencies, which is a subset of the Data
Liberation library, a subset of the Blueprints library, and a few vendor
libraries.

This, unfortunately, means that any changes in the PHP files require
rebuilding the .phar file. Here's how you can do it:

```bash
nx build:phar playground-data-liberation
```

You can also build the entire Data Liberation package as a WordPress
plugin complete with a wp-admin page:

```bash
nx build:plugin playground-data-liberation
```

Both commands will output the built files to
`packages/playground/data-liberation/dist`

The progress updates are a first-class feature of the new importer. The
updated `importer` step receives them in real-time via a
`post_message_to_js()` call running after every import step. Then, it
passes them on to the progress bar UI.

### Other changes

* **TLS traffic now goes through the CORS proxy.** Since the new
importer uses `AsyncHTTP\Client` which deals with raw sockets,
Playground's [TLS-based network
bridge](#1926)
runs the outbound traffic through a cors proxy. Technically,
`TCPOverFetchWebsocket` gets the `corsProxy` URL passed to the
`playground.boot()` call.
* A few composer dependencies were forked, downgraded to PHP 7.2 using
Rector, and bundled with this PR to keep the Data Liberation importer
working.

## Remaining work

- [x] PHP 7.2 compatibility. Done by forking and Rector-downgrading
dependencies that were incompatible with PHP 7.2.
- [x] Report the importer's progress on the overall Blueprint progress
bar
- [x] Enqueue the data liberation plugin files for downloading at the
blueprint compilation stage
- [x] Don't eagerly rewrite attachments URLs in `WP_Stream_Importer`.
Exposing this information to the API consumer requires an explicit
decision. Do we rewrite it? Or do we ignore it?
- [x] Fix the TLS errors at the intersection of Playground network
transport and the async HTTP client library
- [x] Separate the markdown importer and its dependencies (md parser,
frontmatter parser, Symfony libraries) from the core plugin
- [x] Ship the importer and its tree-shaken deps (URL parser) as a
minified zip/phar

## Follow-up work

- [ ] Reconsider the `WP_Import_Session` API – do we need so many
verbosely named methods? Can we achieve the same outcomes with fewer
methods?
- [ ] Investigate why there's a significant delay before media downloads
start on PHP 7.2 – 7.4. It's likely a PHP.wasm issue.

## Testing instructions

* Default importer – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20})
and confirm it does what the current `importWxr` step do, that is it
stays at "Importing content" for a moment, fails to fetch media files
(CORS issues in network tools), but inserts posts and pages.
* Data Liberation – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22importer%22:%20%22data-liberation%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}),
confirm the import progress is visible and that the content and media
indeed get imported:

![CleanShot 2024-12-08 at 14 54
49@2x](https://github.com/user-attachments/assets/a7da3244-a10f-43d2-8e94-43d305220a7e)

## Related issues

* #1211 
* #2012 
* #1477 
* #1250 
* #1780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant