How to Handle Breaking Changes in PML/PMLC #95

pml-lang · 2022-11-21T05:25:27Z

pml-lang
Nov 21, 2022
Maintainer

How to Handle Breaking Changes in PML/PMLC

Table of Contents

Introduction
People
Possible Approaches
Tool/Tool-Chain Maintainers
Breaking Changes in the CLI
Breaking Changes in the Public API
Practical Guidelines
Next Steps

Introduction

On one hand, breaking changes should be implemented whenever they improve PML.

On the other hand, breaking changes should be avoided because they can easily end up in update/maintenance nightmares.

The goal of this discussion is to agree on how breaking changes should be handled in PML/PMLC, and to establish guidelines to be applied in the future.

People

In the context of breaking changes it is useful to consider the following groups of people:

Existing users
New users
New tool/tool-chain developers
Tool/tool-chain maintainers
PMLC developers
PML/PMLC documentation writers/editors
other people

Breaking changes are usually most troublesome for existing users and tool/tool-chain maintainers, because it can be time-consuming, error-prone, and frustrating to update PML documents, tools, and tool-chains.

Special consideration should be given to users (especially existing users) because:

They are more numerous than people in other groups
They expect new PMLC versions to simply work with their existing document.
They are often non-tech-savvy and lack the knowledge of editor features and tools that help to mitigate the pain of updating their PML documents (e.g. they don't know how to use regexes).

Possible Approaches

Don't Introduce Breaking Changes

This is not an option in the case of PML, because it precludes progress and useful improvements.

Ensure Backwards Compatibility

This approach allows breaking changes to be introduced under the condition that old documents can still be used in new PMLC versions (without the need to update documents). That is, a new PMLC version still supports the old way of doing things.

At first this seams to be a good approach for existing users, because they can simply upgrade without the need to adapt their documents.

The drawback of this approach is that complexity increases exponentially over time, because 'the old and new ways to do things' must be continuously supported in PMLC and third-party tools, and be well documented. There is a high risk to eventually end up with software that is neither user-friendly (because new and existing users get confused), nor maintainable.

In the long term, this is not a viable approach for PML.

However, in some (rather exceptional) cases it might make sense to keep things backwards-compatible. For example, if backwards-compatibility is easy to implement, and many users would benefit from it.

Therefore, the decision to support backwards-compatibility should be taken case by case.

Introduce Breaking Changes Whenever They Improve PML

This approach is convenient for new users, new tool developers, PMLC developers, and PML/PMLC documentation writers, because complexity is kept at a minimum. There is no need to support old/new ways of doing things.

However it makes life hard for existing users and tool maintainers, because they need to update their documents and tools each time a new major PML version is published.

Introduce Breaking Changes and Update Existing Documents Automatically

In this case, PMLC (or a dedicated tool) provides an update feature that converts old PML documents to make them compatible with new versions.

This can be done as follows:

Make a backup of the old PML document.
Read the old document into a PDML AST.
Transform the AST so that it complies to the new version (add, change, delete nodes and attributes).
Write the new AST into a new PML document.

Note: In some cases it might be challenging (or even impossible) to reliably and perfectly convert PML documents. In such cases the update feature should inform the user about limitations, and provide useful instructions (maybe embedded in the new document) to guide the user.

Advantages of this approach:

PML/PMLC improves continuously over time.
Complexity is kept at a minimum - good for everybody.
The pain of breaking changes is mitigated to a minimum (or even eliminated in some cases).

This is the best overall long-term approach for all groups of people.

Hence, this is the approach to apply in PML/PMLC.

Tool/Tool-Chain Maintainers

It is obviously not possible to provide a one-size-fits-all automatic update procedure for all existing third-party tools.

The best we can do is to provide structured PML meta-data that can be used by tool maintainers to simplify the update process. In some lucky cases it might even be possible to completely automate the tool update process. For example, an editor plugin that validates nodes and attributes could use these meta-data to auto-generate the code used to validate nodes and attributes. Hence, whenever nodes/attributes are added, deleted, or changed, the editor plugin can be updated quickly and reliably.

Note: A first step is done already. The PMLC command export_tags creates a JSON file containing PML meta-data about nodes and their attributes.

Breaking Changes in the CLI

While a PML document updater helps to eliminate (or at least mitigate) the pain of updating PML documents manually, it does not help in case of breaking changes related to the CLI.
The best we can do is to give clear and helpful instructions for users and tool/tool-chain maintainers.
Therefore breaking changes in the CLI should be introduced carefully, because they require existing users to learn and apply new commands, and they require manual updates in tool chains (e.g. OS script files) and maybe in tools too.

Breaking Changes in the Public API

Breaking changes in the public API (eg. Java .jar files and OS library files) should be introduced carefully, because they require developers to update their source code manually.

However it is often easy to keep backwards-compatibility by proceeding as follows:

Keep the old functions in the new API and mark them as deprecated.
Change the implementation of the old functions to simply call the new functions.

Practical Guidelines

This chapter contains a summary of suggestions to handle breaking changes in the future.

Basic Principles

Breaking changes should be avoided if possible, but they should also be accepted and embraced whenever they are necessary to improve PML/PMLC.
We should provide useful tools and instructions to minimize the maintenance pain for existing users and tool/tool-chain maintainers.
Breaking changes are only allowed in major versions, as required by Semantic Versioning.

Procedure

Before implementing breaking changes, they must be announced so that people are aware of them, can ask questions, make suggestions, and discuss them in a democratic way.
In each case we should consider the option of keeping PML backwards-compatible. However, to minimize long-term complexity, backwards-compatibility should be applied only if there are good reasons to do so.
Breaking changes must be well documented in the Changelog and helpful upgrade instructions must be provided for end-users and tool maintainers.
Breaking changes should not be postponed, unless there is a good reason to do so. After being approved, they should be implemented as soon as possible, and published in the next major version, in order to minimize the growing number of people affected (especially existing users and tool maintainers).

Support for Users and Maintainers

The PMLC download page should warn users about breaking changes in new major versions.
Features that will be removed in an upcoming major version should be marked as deprecated, and PMLC should emit a warning if a deprecated feature is used in a PML document.
There should be a PML document updater to convert old PML documents into new ones, so that they can be used in new major versions (preferably without the need for manual updates). This feature could be integrated in PMLC or provided as an external tool.
PMLC provides a CLI command to export structured PML meta-data that helps to create and maintain third-party tools.
Breaking changes in the CLI should be avoided, because these changes require manual or semi-automatic updates. However if there are good reasons to implement them, then helpful update instructions must be provided for users and maintainers.
If breaking changes appear in the API, then backwards-compatibility should be ensured by keeping the old functions and marking them as deprecated.

Next Steps

After discussing and improving the above suggestions, we should:

Publish a WIP manual titled "PML/PMLC Development Guidelines".
Add chapter "How to Handle Breaking Changes" and include practical instructions derived from this discussion.

Feedback and other ideas are very welcome.

tajmone · 2022-11-22T12:07:55Z

tajmone
Nov 22, 2022

SemVer vs PML Project(s)

Breaking Changes in the Public API

Breaking changes in the public API (eg. Java .jar files and OS library files) should be introduced carefully, because they require developers to update their source code manually.

When it comes to the API, SemVer 2.0 is very strict about breaking changes only being allowed on MAJOR version bumps.

The problem with SemVer is that it was designed for APIs, and when applied to other type of products things kind of get blurred. E.g. in PML, what should the versioning scheme apply to? The Java API, the PML syntax, the PMLC interface?

Trying to apply the scheme to all of those things doesn't really play out well, so in theory the best solution would be to have a separate versioning scheme for each — in that case, the SemVer version of the repository would apply to PMLC, since that's the product of the repository.

In my projects, I've often faced this dilemma, especially with projects that involve multiple dependencies. E.g. if a project depends on Lua for scripting, it's not easy to determine if updating the Lua DLL version justifies affecting the SemVer scheme of the project, and similarly with dependencies like templates, external tools, etc., all of which could potentially break a project since the end user will need to update the dependencies to their new version (one which might introduce breaking changes indeed).

But since PMLC has a Java API, I would say that the API seems like the strongest candidate to SemVer version tracking, and that the rule that no breaking changes are allowed (ever) within a same MAJOR version.

Trying to use a version scheme as an umbrella for both PMLC, the PML syntax and the PML API is challenging, and we might need to address this. After all, PML is also a standard in its own right (like Markdown and AsciiDoc), so it might make more sense that it has its own SemVer, since other implementations would be allowed. A syntax is not an API, and although SemVer is being used for all types of projects, we might need to give some thought on whether it really fits the versioning needs of PML, and consider that we could come up with our own dedicated scheme tailor to match how PML evolves.

On Breaking Changes

The important question is, What's the impact of breaking changes on users?

Let's focus on this, distinguishing between the different users categories when needed.

Regardless of whether a user is a hobbyist blogger, a part-time fiction writer, the maintainer of projects documentation, or a digital publishing company, breaking changes are always a pain since they require adapting the old document sources, build scripts, etc. — but of course, along with the pain also come the benefits of new features and other types of improvements.

Consider the worst case scenarios, those of a projects documentation maintainer and a digital publisher. Here we're dealing with multiple projects, all of considerable size, so a change in the syntax (e.g. an attribute becoming a node in its own right) would require fixing all the PML sources, AND having to check that the output is again as expected — which basically means carefully reading through the entire documentation and books, especially when there are multiple syntax changes or changes affecting commonly used elements.

Documentation projects often share some common assets, e.g. appendices material, chapters applying to more than one tools (e.g. a RegEx prime), which means that a breaking change in the syntax requires updating all projects at once, not just those currently being worked on — the purpose of having shared assets being avoiding duplication, it would make little sense to keep one project stuck with an older PML version and another one with the latest, since this would require branching off the shared assets.

For a publisher, the nightmare would consist in having different publication projects (books, manuals, etc.) each using a different PML version, as he might focus on updating only those publication which are currently being worked on. Publishers tend to go back and revisit a work whenever there's a need for a new edition (printed or digital), usually an updated one, so the risk is that of having all these projects out of synch in terms of PML version.

This is why breaking changes within a same MAJOR version are seen as really bad in the publishing world, and why usually the expectation is that any breaking changes would only occur in new MAJOR updates, so that migration from one version to the next can be carefully planned for — in terms of time and staff requirements. For a publisher, the transition would freeze all current editing work until completed, so it's a very delicate step which most likely would be carried out during Summer holidays if the books at stakes are in the hundreds, to ensure that by the time the new working year starts all titles have been migrated to the new version, and thoroughly checked, and the old versions safely backed up and stored away (to avoid cross-contamination of sources), so that all editors will be working with the new standard henceforth.

Migrating a whole book AND checking it can be a very long process, since editors can't afford mistakes to slip by. For a single author having to deal just with his/her own books, we're talking of hundreds of hours; but for a digital publisher with hundreds of books we're most likely talking of a month work, depending on how many staff members will be dedicated to the task (which, BTW, is not your usual editing job).

This is why publishers simply don't adopt any tools which don't offer strong and explicit guarantees regarding non-breaking changes within a same MAJOR version — they simply can't afford unplanned migrations. And it's also the reason why publishing-grade standards have always been abiding by this rule.

What I'm saying here is that when you propose that some allowance for breaking changes withing the syntax should be tolerated you need to bear in mind that this will come at the cost of loosing publishers as end users, and most likely also big documentation maintainers, because those work daily with large documents projects have a zero tolerance policy for breaking changes. So, it's really about whether that's a price you're willing to pay for PML in order to uphold your development strategy (which of course it's fine: your project, your vision).

I'm the first one to abide by this rules, and never used markup syntaxes in my project unless they guaranteed non-breaking changes within a MAJOR version (like AsciiDoc). Even when I relied on markdown (long before it was standardized) I would only use pandoc markdown, since pandoc honors the no-breaking changes contract. Having gone through the pains of MAJOR version migration of multiple projects, I have a realistic idea of how stressful and time consuming it is, and I simply couldn't handle working with a tool that imposes breaking changes along the line.

The proposal of announcing breaking changes with an "ample" margin of time is no solution really, except that it allows end users to simply not update PMLC until they are ready for migration — which even for non-profit FOSS maintainers could mean "next Summer", since they need lots of time to migrate and check everything. And until then, what? No updates at all? even bug and security patches? because a breaking change within the same MAJOR cycle means that some users won't be able to update beyond the breaking version, not even for important patches. Far from ideal.

Tools that abide to the non-braking rule are able to release important patches even for the old MAJOR versions, even if they're no longer actively maintained. Security should be a priority, so users shouldn't be cornered into a situation where if they want the security patch they also must accept the braking change.

So, there aren't really any shortcuts or workarounds when it comes to breaking-changes, at least not in my experience.

As for breaking changes in the PMLC CLI, or its API, that's another thing all together — I'm speaking of the PML syntax here.

Breaking changes to the PMLC CLI probably have a lesser impact on big projects, since these usually rely on shared automation tools, so probably it boils down to changing some script or library, depending on the toolchain in question.

Then, there's the impact of braking changes on third party tools developers which needs to be considered, and which I think is also very important since chances are that most PML tools developers will be FOSS volunteers who are doing it free and in their free time (at least for a number of years to come).

For people working on templates, themes, scripts, editor packages, syntax highlighters, etc., any changes (breaking or not) in PML and PMLC is most likely to require fixing and updating work. If these changes are too frequent it might become stressful to keep up, but at least non-breaking changes can be postponed, since their assets and tools will still probably work in some measure, whereas breaking changes will break their projects too.

Bear in mind that usually editor packages are automatically updated whenever there's a new package version, so for a user who doesn't want to update to the latest PMLC version due to breaking changes it might be hard to prevent their editor from updating the package to the new versions. Even packages follow the MAJOR version rule: the new MAJOR version, new package, so users can work with both version by installing separate packages (or just the version they need).

Dual Version Approach

As I mentioned elsewhere, IMO the best solution to the breaking changes problem would be to keep two versions of PML/PMLC at all times:

Current stable — no breaking changes.
Upcoming beta — beta preview of next MAJOR version (breaking changes allowed).

New breaking features and changes should be implemented only in the upcoming beta, which then allows beta testers to try out, and if there are afterthoughts apply changes or roll back, since no change is definitive until the release is official, so breaking changes during the Beta stage are allowed.

People in a rush to enjoy a certain feature could embrace the Beta version, knowing that it's not entirely tested, and subject to breaking changes.

Having a dual standard would allow a more thoughtful approach to features implementation, since instead of them being based just on discussions they can actually be tried out and tweaked accordingly, before becoming officially endorsed (these would be called "experimental features" within the Beta, use at your own risk).

IMO it would also lead to a more disciplined ROADMAP planning (which PML really needs right now), but also a more relaxed one, without having to constantly face the drastic choice between renouncing to a good feature vs accepting breaking changes.

Most tools find their own development rhythm eventually, so PML will ultimately find its own balance regarding when a stable cycle ends and the next one begins, which translates to the latest features becoming stable and the transition and migration of projects.

Ideally PMLC (or a bundled tool) should be able to transpile sources from the latest PML version to the next MAJOR, it would be already a huge help — even if such a tool was designed just to deal with the first MAJOR.0.0 version, and not later updated to include newer nodes and features, but just ensure that previous documents are usable again. That should be easier to deal with than having to deal with multiple breaking changes within a same cycle, since the threshold between the last release of the previous version and the first of the new one is more clearly defined by two very specific release versions.

In any case, whether you're willing to adopt the dual standard or not is an essential point for further discussing this topic, since all definitions and solutions largely depend on this.

The dual-version approach is commonly seen when a tool is approaching its next MAJOR release, as a means to involve users to further beta-test it, and have a chance to call in any objections and last-minute changes before features are finalized, but also to prepare users for the transition and their projects migrations.

From what I've seen in PML life cycle so far, I think that it would make sense for PML to adopt this dual approach at all times, not just when a MAJOR update is approaching. This is because PML is still growing and has to face a great deal of braking changes before it reaches full maturity and become a publishing grade standard, so it might make sense to have some fast MAJOR bump cycles right now (as opposed to other syntaxes which make MAJOR bumps every some many years).

1 reply

pdml-lang Nov 30, 2022

Trying to use a version scheme as an umbrella for both PMLC, the PML syntax and the PML API is challenging, and we might need to address this. After all, PML is also a standard in its own right (like Markdown and AsciiDoc), so it might make more sense that it has its own SemVer

I agree. I suggest to keep it simple for now and continue with a single version scheme, until it becomes obvious that this doesn't work well anymore. Maybe later we'll need to have separate version schemes for (1) PML (the specification), and (2) PMLC (our implementation in Java).

a change in the syntax (e.g. an attribute becoming a node in its own right) would require fixing all the PML sources, AND having to check that the output is again as expected — which basically means carefully reading through the entire documentation and books

This is not necessary if we provide a PML Documents Updater, as suggested in my initial post.

we're talking of hundreds of hours; but for a digital publisher with hundreds of books we're most likely talking of a month work, depending on how many staff members will be dedicated to the task

A digital publisher of that size (with hundreds of books) should IMO have technical staff (in-house or outsourced) able to automate the whole update process as far as possible, to minimize (or even eliminate) manual updates which are always error-prone, time-consuming and frustrating. Keep in mind also that PML/PDML is much easier to parse than other markup syntaxes that use whitespace to define structure. It is therefore easier to create very customized tools integrated in complex project workflows. PDML (and soon PML) documents can also be converted to XML and transformed with existing XML-tools, which might be useful in some cases.

This is why breaking changes within a same MAJOR version are seen as really bad in the publishing world, and why usually the expectation is that any breaking changes would only occur in new MAJOR updates, so that migration from one version to the next can be carefully planned for

As stated already, breaking changes in PML will only occur in major versions. The last non-major PML version that had breaking changes was version 1.4.0 2021-04-16.

Having a dual standard would allow a more thoughtful approach to features implementation, since instead of them being based just on discussions they can actually be tried out and tweaked accordingly, before becoming officially endorsed (these would be called "experimental features" within the Beta, use at your own risk).

Good idea! There should definitely be two version: (1) current stable and (2) upcoming beta. We are using already two branches (main and develop), but in the future we should also publish beta-distributions for users who want to explore and send feedback about features under development.

Ideally PMLC (or a bundled tool) should be able to transpile sources from the latest PML version to the next MAJOR, it would be already a huge help

The before-mentioned PML Documents Updater will be able to do that.

PML is still growing and has to face a great deal of braking changes before it reaches full maturity and become a publishing grade standard, so it might make sense to have some fast MAJOR bump cycles right now

True!

We should do our best to make breaking changes as painless as possible for users and maintainers of third-party software (breaking changes only allowed in major versions, automatic PDML documents updater, meta-data for software maintainers, etc.).

To avoid frequent breaking changes, all planned breaking changes will be included in the next major version 4.0.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Handle Breaking Changes in PML/PMLC #95

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

Breaking Changes in the Public API

{{title}}

Select a reply

How to Handle Breaking Changes in PML/PMLC #95

pml-lang Nov 21, 2022 Maintainer

How to Handle Breaking Changes in PML/PMLC

Introduction

People

Possible Approaches

Don't Introduce Breaking Changes

Ensure Backwards Compatibility

Introduce Breaking Changes Whenever They Improve PML

Introduce Breaking Changes and Update Existing Documents Automatically

Tool/Tool-Chain Maintainers

Breaking Changes in the CLI

Breaking Changes in the Public API

Practical Guidelines

Basic Principles

Procedure

Support for Users and Maintainers

Next Steps

Replies: 1 comment · 1 reply

tajmone Nov 22, 2022

SemVer vs PML Project(s)

Breaking Changes in the Public API

On Breaking Changes

Dual Version Approach

pdml-lang Nov 30, 2022

pml-lang
Nov 21, 2022
Maintainer

Replies: 1 comment 1 reply

tajmone
Nov 22, 2022