-
Notifications
You must be signed in to change notification settings - Fork 114
RFC: How to prepare for a future merge of nokogumbo into the nokogiri gem #170
Comments
One idea is to have the next release of Nokogumbo check to see if Nokogiri has already defined these modules, methods, and constants, and to avoid loading itself. We could do this with a Nokogiri version check ( Another idea is for the next release of Nokogumbo require I'd love feedback on these ideas, and to hear other folks' ideas. |
I think having an empty gem as a final nokogumbo release is a great idea along with some release notes that it's a purely transitional release. For the penultimate release, I don't have an informed opinion on whether it is better to check if things are defined or do a version check. In many cases, functionality checks are better than version checks (I'm thinking of things like feature tests in compilers), but I'm not sure that applies here. Requiring nokogiri < 1.12 might be sufficient. Does Nokogiri follow semantic versioning? It might be best for the combined Nokogiri to have version 2. The situation I'm worried about is existing versions of Nokogumbo being used with Nokogiri that already defines the symbols. I believe that current Nokogumbo requires Nokogiri to be less than 2.0 s.add_runtime_dependency 'nokogiri', '~> 1.8', '>= 1.8.4' thus this situation would be avoided. Separately from this, have you thought about importing the tests? I wrote a lot of tests for gumbo, but there are also a lot of ruby tests, including tests for the html tree-construction. I had a bunch of pull requests for those tests fixing the number of errors that should be emitted, but they never got merged. (See this in particular.) So I've just been using my own fork of the tests which isn't ideal. |
|
Thanks for the thoughtful responses. @stevecheckoway I agree that I like having a (basically) empty gem at the end of Nokogumbo's lifecycle. @rubys I also agree that using For the purposes of this analysis, let's assume
If we use the
If we use the "requirements constraints" strategy starting in Nokogumbo next:
If we use both strategies, though, we could still end up in the desired final state regardless of the upgrade path followed by users. Imagine "next" using the
I think I like this last option best, because it gives us flexibility: the change between Nokogumbo "current" and "next" is small, and we can roll out the final release at our leisure. WDYT?
I don't have a strong opinion, but if we're depending on the tests and upstream isn't responsive, then we may as well keep them in-repo. If upstream merges and/or becomes more active, we can always periodically update the local files from upstream. WDYT? |
I may need an ELI5 :-) Here is an example application that currently uses nokogumbo: https://github.com/apache/whimsy/blob/5273b3067f939a98f7e40805e5fcf1b7b5c6e020/www/board/agenda/Gemfile#L20 If Nokogumbo were changed to add an Once Nokogiri ships At some future date, perhaps a warning could be added to Nokogumbo indicating that this gem is now unnecessary and obsolete, recommending that Gemfiles be changed to reference nokogiri instead. |
@rubys Sorry for using more words than necessary, writing helps me think but has the unfortunate side effect of being a wall of words. We're in complete agreement, that's exactly what I'm suggesting. |
Closes #170 A future version of Nokogiri will provide Nokogumbo's API (see sparklemotion/nokogiri#2204). This change will allow Nokogumbo to detect whether Nokogiri provides the HTML5 API and become a "shim" -- gracefully defer to Nokogiri by refusing to load itself. Some contractual assumptions I'm making about Nokogiri: - Nokogiri will faithfully reproduce the `::Nokogiri::HTML5` singleton method, module, and namespace (including classes `Nokogiri::HTML5::Node`, `Nokogiri::HTML5::Document`, and `Nokogiri::HTML5::DocumentFragment`) - Nokogiri will not provide a `::Nokogumbo` module/namespace, but will provide a similar `::Nokogiri::Gumbo` module which will provide the same constants and singleton methods as `::Nokogumbo`: - `Nokogumbo.parse()` will be provided as `Nokogiri::Gumbo.parse()` - `Nokogumbo.fragment()` → `Nokogiri::Gumbo.fragment()` - `Nokogumbo::DEFAULT_MAX_ATTRIBUTES` → `Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES` - `Nokogumbo::DEFAULT_MAX_ERRORS` → `Nokogiri::Gumbo::DEFAULT_MAX_ERRORS` - `Nokogumbo::DEFAULT_MAX_TREE_DEPTH` → `Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH` This change checks for the existence of `Nokogiri::HTML5`, `Nokogiri::Gumbo`, and an expected singleton method on each. We could do a more- or less-thorough check here. This change also provides an "escape hatch" using an environment variable `NOKOGUMBO_IGNORE_NOKOGIRI_HTML5` which can be set to avoid the "shim" behavior. This escape hatch might be unnecessary, but this change is invasive enough to make me want to be cautious. In "shim" mode, `Nokogumbo.parse()` and `.fragment()` will be forwarded to the Nokogiri implementation. The `Nokogumbo::DEFAULT*` constants will always be defined, but when in "shim" mode will be set to the `Nokogiri`-provided values. Nokogumbo will emit a single warning message at `require`-time when it is in "shim" mode. This message points users to sparklemotion/nokogiri#2205 which will explain what's going on and help people migrate their applications (but is an empty placeholder right now). I did not include deprecation warning messages in `Nokogumbo.parse` and `.fragment`. If you feel strongly that we should, let me know.
Closes #170 A future version of Nokogiri will provide Nokogumbo's API (see sparklemotion/nokogiri#2204). This change will allow Nokogumbo to detect whether Nokogiri provides the HTML5 API and whether to use Nokogiri's implementation or Nokogumbo's implementation. Some contractual assumptions I'm making about Nokogiri: - Nokogiri will faithfully reproduce the `::Nokogiri::HTML5` singleton method, module, and namespace (including classes `Nokogiri::HTML5::Node`, `Nokogiri::HTML5::Document`, and `Nokogiri::HTML5::DocumentFragment`) - Nokogiri will not provide a `::Nokogumbo` module/namespace, but will provide a similar `::Nokogiri::Gumbo` module which will provide the same public API as `::Nokogumbo`. This change checks for the existence of `Nokogiri::HTML5`, `Nokogiri::Gumbo`, and an expected singleton method on each. We could do a more- or less-thorough check here. This change also provides an "escape hatch" using an environment variable `NOKOGUMBO_IGNORE_NOKOGIRI_HTML5` which can be set to force Nokogumbo to use its own implementation. This escape hatch might be unnecessary, but this change is invasive enough to make me want to be cautious. Nokogumbo will emit a single warning message at `require`-time when it is uses Nokogiri's implementation. This message points users to sparklemotion/nokogiri#2205 which will explain what's going on and help people migrate their applications (but is an empty placeholder right now).
ci: nokogumbo test suite --- **What problem is this PR intended to solve?** Merge the remaining Nokogumbo tests into Nokogiri's test suite. - [x] submodule @stevecheckoway's fork of html5lib-tests (see see rubys/nokogumbo#170 (comment)) - [x] make sure these tests only run under CRuby - [x] gumbo tests **Have you included adequate test coverage?** Tests are the entire substance of this PR. **Does this change affect the behavior of either the C or the Java implementations?** Notably, these additional tests should only run under CRuby.
As discussed elsewhere, Nokogumbo and Nokogiri will be merging. The net result of this will be a single gem that provides both Nokogiri::HTML (nokogiri 1.11 and earlier) and Nokogiri::HTML5 (nokogumbo) in the same gem.
My current plan is simply to lift-and-shift the Nokogumbo code into the Nokogiri repository and gem in Nokogiri v1.12. Unless we do something, this will mean that users of Nokogumbo will see warning messages like this when they upgrade Nokogiri:
These warnings aren't just annoying; they tell us that Nokogumbo's implementation is clobbering the Nokogiri implementation, meaning that until the issue is resolved, updates to Nokogiri's code won't have any effect.
The text was updated successfully, but these errors were encountered: