Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.

Refactor transform to be isomorphic (Node/web compatibility) #171

Merged
merged 24 commits into from
Mar 15, 2023

Conversation

eyelidlessness
Copy link
Contributor

@eyelidlessness eyelidlessness commented Feb 21, 2023

This replaces #164, with many of the same goals and benefits, but taking a different approach. Rather than running the transform in a headless browser, this change introduces a partial DOM compatibility layer on top of libxmljs when building for Node. Most of the changes in transformer.ts are quite similar to #164, probably a bit safer as none of the XPath expressions have been converted to CSS query selectors. I also caught and fixed a couple of minor implementation mistakes.

As in #164, a demo app is included to demonstrate the web transform.

Performance

As discussed in #165, this change of approach seems more reasonable, because that PR brought such great performance improvements that the complexity of running a headless browser on the server no longer makes much sense.

This change may incur a small performance hit on Node, simply because the DOM compatibility layer introduces a bit of runtime indirection. However, the impact will depend on the specific XForm being transformed, and may even perform better for certain forms, i.e. in cases where parts of the XSLT now reimplemented as DOM/libxmljs might perform better than the equivalent libxslt behavior they replace.

Performance will also vary by environment. The current benchmark suite performs about the same in Chrome and Node, slightly faster in WebKit, and faster still in Firefox. The improved performance in Firefox also results in the VA WHO form being slightly faster than the SOAR form, which may provide good hints for where to look new optimization targets.

My hope is that the benchmark summaries will be reported in comments below once the current CI runs complete. If that's not the case I may add them manually.

Development and maintenance improvements

This change will have many of the same dev/maintenance benefits as #164, while adding less complexity to the runtime environment. And while this change does add some complexity to the build, it adds a great deal less complexity than #164 would have.

How this change has been tested and validated

As with #164 and #165, my goal making this change is to retain 100% feature/bug compatibility. The same snapshot tests added in #164 (merged in #165) continue to provide strong assurance to that end.

As with #164, for the portions of code converted from XSLT to DOM, the implementations are written to mirror the original XSLT as closely as possible, from naming parity to logical flow. As noted above, this change goes even further in that regard by preserving existing XPath expressions rather than converting them to DOM equivalents. We may want to go further in that direction in the future, but I believe for this change it's best to stick as closely as possible to existing semantics.

Unlike #164, this change runs all tests and benchmarks in CI in the following environments:

  • Node 14
  • Node 16
  • Firefox
  • Chromium
  • WebKit

The project has again been installed locally in enketo-core and enketo-express, to verify that all tests pass, and that flows known to depend on Transformer work as expected.

And again, the browser demo also serves as a validation that the new build setup works in a web project.

Known issues

I neglected to mention this in #164, but it's worth calling out now. The web build is currently fairly large: a little over 1.6 MB (just over 200 KB gzipped). The vast majority of that size comes from the language-tags dependency, which must be bundled to be used in a browser.

One low hanging fruit improvement for this might be to import the dependency dynamically, only for forms which actually need it (and I'd be happy to make that change in this PR if preferred). In the future we should consider:

  • Can we optimize our use of language-tags to reduce the overall size?
  • Can language-tags itself be optimized to improve downstream bundle size?
  • Is there a more suitable alternative to language-tags (whether another package, or possibly equivalent native browser functionality)?

Includes tests around API stability, as these expectations were all broken in a previous naive attempt at the same change
1. Introduces an abstract subset of DOM APIs corresponding to previous usage of semantically equivalent `libxmljs` APIs. It would have been nicer to import these types and export the subset we care about, but this is inconsistent with how TypeScript `lib` works (it always augments the global scope).

2. Provides a Node implementation of those DOM APIs by extending `libxmljs` prototypes. This kind of extension isn't ideal, but it's the most reasonable way to achieve such a compatibility layer without sacrificing performance. An ealirer alternative approach used `WeakMap`s to cross reference the DOM/`libxmljs` interfaces, but this had a significant impact on perf. This implementation is type checked by default.

3. Provides a web implementation by... just exporting the relevant globals. This implementation is type checked by `tsc` using the `tsconfig.web.json` project file. This ensures that the abstract DOM interfaces are actully consistent with the built in DOM `lib` types.

4. Refactors transformer.ts to use the abstract DOM APIs. This uses the Node DOM compatibility implementation by default, and the native web DOM implementation when the environment variable `ENV` is set to "web".

5. Also when `ENV` is web, tests and benchmarks call `transform` in the specified `BROWSER` environment variable (defaulting to "fiefox") via a simple `playwright` bridge. In CI, all tests and benchmarks are run in: Node 14, Node 16, Firefox, Chromium, Webkit.

6. Build is updated to produce both Node and web targets. The build config itself is fairly complex, but it's been consolidated in `vite.config.ts`. This also includes a corresponding change to build `app.ts` rather than the previous, much more complex `app.js` using a Vite dev server (and roughly restores its implementation to what it had been prior to the initial TypeScript migration).

7. There are several known issues at the point of this commit. These either correspond to XSLT extensions not supported by browser targets, or to differences in behavior between DOM/`libxmljs`, all of which will be addressed in separate commits discussing each in greater detail.
This addresses a slight difference between Node's and browsers' respective implementations of `URL`, namely that browsers do not implicitly escape special path characters for unknown schemes (i.e. `jr`). This change is a workaround, temporarily substituting the `jr` scheme to consistently take advantage of built-in escaping behavior, then restoring it for the final return value.
This addresses:

- an inconsistency between how `libxslt` and the native DOM `XSLTProcessor` handle XSLT transforms to HTML.`libxslt` (correctly, IMO) treats the XSLT document's expressed root element as the resulting document element, whereas browsers produce a document with an implicit `html > body > root` structure.

- a slight change in `renderMarkdown` which parses to a document rather than a fragment, for the purpose of a simpler DOM compatibiity API in this case. (That call to `correctHTMLDocHierarchy` was mistakenly included in an earlier commit, but I'm hoping this subsequent commit will be clear enough and we'll likely squash these commits anyhow.)
The functionality depending on them will be reimplemented with DOM APIs
It's possible there are other attribute cases I haven't identified, but these are the ones I was able to find in test failures, snapshot mismatches, and the spec.
These dynamic template calls are now injected into the XSL, allowing us to eliminate the use of `str:replace` and `dyn:evaluate` extensions.
This eliminates the need to use `str:tokenize` extension
Faster Firefox startup, improved logging
@eyelidlessness eyelidlessness force-pushed the refactor/dom-compat branch 4 times, most recently from a720aee to 739faa8 Compare February 22, 2023 20:46
Also bumps actions versions which have been warning about Node 12 for no apparent reason
- Add demo dev mode (very useful for manual testing during dev)
- Actually display errors
- Don't show Invalid state on errors

- The `openclinica` flag. This functionality is used by OpenClinica's fork of Enketo Express.

- The deprecated `preprocess` option. This functionality _may_ be used to update XForms with deprecated content, but its use is discouraged as users can achieve the same thing by preprocessing their XForms before calling `transform`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with deprecated content" -> "with preprocessed content"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom content? I don't think "deprecated" is right but I'm not entirely sure of the intent!

Copy link
Contributor

@lognaturel lognaturel Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to test that use preprocess and show an example of continuing to support deprecated form features. Remove the "what it's used for" part and just recommend doing any XForms preprocessing externally.

declare const document: DOM.Document | undefined;

/**
* This addresses a difference in how Firefox treats XML namespace declarations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enketo Core relies on namespaces being being copied onto the first child of the main instance.

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating the obvious, there's a lot here.

@eyelidlessness has really carefully considered the process and I trust that the snapshot tests provide a lot of safety. I think the biggest risk is that the indirection adds complexity but I think it's separated well enough that it can be followed, especially with the brief narrative in the readme.

I've marked a couple of areas that I want to dig deeper into but I expect that this will be merged today.

src/transformer.ts Show resolved Hide resolved
</xsl:if>
</xsl:if>
</xsl:for-each>
<xsl:variable name="rows" select="./@rows" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully traced the rows logic. I'm counting on the test at

it('and outputs a textarea for rows="x" attribute on text input, with a rows appearance', async () => {
to be enough to support the refactor.

In general, this appearance logic feels like an area of risk because I assume there aren't a ton of test forms with appearances (I could be wrong). I read through it enough to convince myself it was likely correct but if we agree it's risky I can spend more time with it.

src/transformer.ts Show resolved Hide resolved
Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through commit by commit looking for areas of risk. I've asked questions and looked for tests related to the biggest areas of risk. I'm satisfied that this is largely a safe change, especially given its scope!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants