PML 2.0 and Attributes Lenient Parsing #56

tajmone · 2021-09-03T08:44:10Z

tajmone
Sep 3, 2021

@pml-lang, I was looking into the syntax changes of PML v2.0, and how the attributes notation has changed.

Although the new syntax is way cooler on the end user's side, all the new optional features of the notation, and especially the new Lenient Parsing conventions, are going to make the creation of PML syntaxes for editors much harder.

Without a proper parser and context awareness, it's going to be very hard — if not impossible — to handle all these notation variants. With some luck (and a lot of hard work) it should be possible to cover most cases in Sublime Text 4, thanks to the new syntax branching features which can roll back a parsing context before enforcing it, but I think that there's simply now way that we could implement a PML 2.0 syntax in VSCode (or any TextMate based grammar) at this point.

I think you really need to look into creating an official PML Language Server right now, so that the PML syntax can be achieved at least by editors that support PML. Ideally, the PML language server should be part of the PML package itself, so that whoever has installed the PML converter will also have the matching language server on the machine, which would spare end users from having to install and update them separately.

I'll try to update the Sublime PML syntax to PML 2.0, which is going to take quite some time due to having to update all the syntax tests and completions, along with updating the syntax itself. But I'm still not 100% sure that I'll ultimately be able to fully cover the new syntax.

For example, new leniency rules like:

Some nodes have a default attribute. In that case, the attribute's name doesn't need to be specified, but only if it's the first attribute in the list of attribute assignments.

[...]

If a node in a PML document has no attributes, it is not necessary to explicitly state the absence of attributes by writing (). Hence, the following code:

[...] However, if the node's text starts with (, then () is required.

are really hard to cover with any RegEx based syntax definition. So far, the ST syntax was defining attributes types, which would be handled differently, and were being captured thanks to the attribute tag. But now that some of these tags become optional, it's going to be very hard to handle them via guess work — in syntax definitions we have no variables, and branching conditions are painfully emulated via contexts switches, which are dumb in terms of context awareness.

Add to that the fact that editor syntaxes need to account for malformed markup too (and catch it as an invalid case), things get even more complicated.

Hence I think that resorting to LSP is the only viable option at this point.

Did you manage to look into a PML Lang Server?

How hard would it be to integrate the Lang Server into the PML project and package, so that it can automatically mirror the latest PML version and automatically ship with every package?

pml-lang · 2021-09-03T10:49:26Z

pml-lang
Sep 3, 2021
Maintainer

especially the new Lenient Parsing conventions, are going to make the creation of PML syntaxes for editors much harder.

Lenient parsing was more lenient (and more difficult to parse) prior to version 2. For example, instead of:

[ch ( title = "Final Thoughts" id = thoughts )

... you could instead write this in the previous version 1.5.0 :

[ch Final Thoughts id = thoughts

... and the parser would use some over-complex regex to figure out that "Final Thoughts" is the value for attribute title.
However, lenient parsing in version 1.5.0. did not work reliably in some corner cases.
Moreover, lenient parsing was not documented before version 2, so it might seem that the rules are more complicated now.
I actually simplified the rules in version 2.0.0., to keep the syntax reasonably conscise, easier to parse, reliable and documented. But still ...

Without a proper parser and context awareness, it's going to be very hard — if not impossible — to handle all these notation variants.

Yes, I fully agree.
My suggestion would be to start with a plugin that doesn't support lenient parsing, and clearly state that in the project's readme, so that users are aware of it. Or, alternatively, just support the first two lenient parsing rules (and ignore the two last rules mentioned in your comment). I think that such a simplified version would still be very very useful for end-users.

I think you really need to look into creating an official PML Language Server right now

That would certainly be the ideal solution. Even more in future PML versions with extensions that power-users will love, but that will be even more difficult to support in editor plugins. For example: user-defined nodes, and embedded source code that generates PML markup.

How hard would it be to integrate the Lang Server into the PML project and package, so that it can automatically mirror the latest PML version and automatically ship with every package?

Very hard and time consuming, I guess. At least for me, because I've no practical experience with language server implementations. However, as the new parser is no more written in PPL, but written entirely in Java (including lenient parsing, and all text processing rules and nodes such as !ins-file, !set, and !get), a language server could be written by any experienced Java programmer, using the pXML parser.

Add to that the fact that editor syntaxes need to account for malformed markup too (and catch it as an invalid case), things get even more complicated.

In my opinion, editor plugins should not try to implement error-tolerant parsing, because it's just too hard (and in some contexts even impossible) to make it work correctly in all cases. They should just stop parsing at the first error encountered. The new PML parser distinguishes between 'canceling errors' and 'non-cancelling' errors. In case of a 'canceling error' (e.g. the final quote of a quoted attribute value is missing), parsing is simply cancelled, because IMO it's impossible to reliably guess how to continue. Take IntelliJ IDEA, for example. Support for Java is really awesome in this IDE. However, when it comes to fault-tolerant parsing, it often fails miserably, and displays a whole avalanche of false positives that are just disturbing. Sometimes, even correct code that precedes the error is displayed as illegal. I would much prefer to have the first error displayed in red, and all subsequent code just displayed in grey.

In a nutshell: For editor plugins, I suggest to consider keeping it simple and:

not support (or only partially support) lenient parsing
stop parsing at the first error encountered

As you pointed out already, ideal editor support can only be achieved with a dedicated language server that uses the pXML parser.

2 replies

tajmone Sep 3, 2021
Author

My suggestion would be to start with a plugin that doesn't support lenient parsing, and clearly state that in the project's readme, so that users are aware of it. Or, alternatively, just support the first two lenient parsing rules (and ignore the two last rules mentioned in your comment). I think that such a simplified version would still be very very useful for end-users.

I don't think that would work out nicely. It's not really a matter of syntax highlighting, it's about the editor being aware of semantic scopes so that end user can write plug-ins to act on specific syntax elements — from global renaming, refactoring, scope-based search & replaces, linting, etc. If the syntax is unable to catch some elements the results could be disastrous.

The only viable option would be to just focus on non-semantic syntax highlighting, i.e. matching anything between double quotes as a string, matching any [\w+ as a tag, etc.

How hard would it be to integrate the Lang Server into the PML project and package, so that it can automatically mirror the latest PML version and automatically ship with every package?

Very hard and time consuming, I guess. At least for me, because I've no practical experience with language server implementations. However, as the new parser is no more written in PPL, but written entirely in Java (including lenient parsing, and all text processing rules and nodes such as !ins-file, !set, and !get), a language server could be written by any experienced Java programmer, using the pXML parser.

If the parse is now in Java, there should be various solutions to auto-generate a language server then, but it might require defining the PML lang via some BNF like grammar I guess.

I think that unless the Lang Server is twinned the PML sources, it's going to be unlikely to see a PML Lang Server because any change to the language would entail complex changes to the LSP package with each update.

In my opinion, editor plugins should not try to implement error-tolerant parsing, because it's just too hard (and in some contexts even impossible) to make it work correctly in all cases. They should just stop parsing at the first error encountered.

The problem is that as far as I know, only Sublime Text 4 is able to handle this, by cancelling the current parsing and go back to the latest branching point. All other editors, once they started parsing a construct, they can't cancel it, so it's all about being 100% that what you're about to parse is what you expected, and the only instrument you have here is a single look-ahead RegEx to decide if the contents of the current single line are a valid match for any syntax element (which is not much).

The new PML parser distinguishes between 'canceling errors' and 'non-cancelling' errors. In case of a 'canceling error' (e.g. the final quote of a quoted attribute value is missing), parsing is simply cancelled, because IMO it's impossible to reliably guess how to continue.

Being able to this in most editors would be already a great achievement, but as mentioned in the previous paragraph, it might not be doable in most editors.

Take IntelliJ IDEA, for example. Support for Java is really awesome in this IDE. However, when it comes to fault-tolerant parsing, it often fails miserably, and displays a whole avalanche of false positives that are just disturbing.

That's a known problem, i.e. that a broken construct can have an avalanche effect on the rest of the source. Here's were compiler parsing and LSP parsing differ, for an LSP parser would have to gracefully recover from such errors. The only documented experiment I've stumbled across so far, is the PHP Lang Server, which has developed a methodology to handle gracefully recovery (none of which seems simple to implement).

On the other hand, the nature of PML is closer to the HTML Dom than a programming language, so things should be easier in this respect, for you only have to step back and carry on parsing the parent node — the only concrete risk here is mismatching closing ]s, which might lead to unbalanced nested constructs (since the closing brackets don't include the tag, except in raw tags).

Unfortunately, as I mentioned already in out very early discussions, most editors are built around the needs of main stream languages (C, Python, etc.) which has always come at the cost of markup syntaxes. It's in the nature of lightweight markup syntaxes to develop alternative syntaxes as they grow, and except for very simple syntaxes, they are usually very hard to cover in editors syntax definitions.

I haven't yet found a decent editor syntax for AsciiDoc, and even when it comes to Markdown you'll find that syntax coverage is superficial at best, which leaves little room for creating smart plug-ins.

I also believe that it's hard (maybe impossible) to define such syntaxes via BNF-like grammars, at least not without adding some code in the grammar to handle edge-cases. Even Markdown turned out hard to parse with PEG parsers, without some manual interventions in the parser.

So I'm not sure how and if a PML Lang Server could be coupled to the PML source project in order to auto-generate/update the LSP package along with the language, but this would indeed be the best solution. Also, you mentioned:

Even more in future PML versions with extensions that power-users will love, but that will be even more difficult to support in editor plugins. For example: user-defined nodes, and embedded source code that generates PML markup.

which is probably something that can only be supported if the PML Lang Server is integrated into the PML package itself — which makes sense, especially if in the future users should be able to switch between different PML versions, so that the LSP server matches the correct PML version chosen.

Once you have a context-aware editor, the only limit to what you can do with it is your imagination — e.g. writing a plug-in to migrate PML 1.5 docs to PML 2.0 would be easy, if only the editor has full awareness of every single tag and attribute.

Productivity in any language is tightly bound to editors support. E.g. I work a lot with AsciiDoc, but some features are less productive to use due to the limits in editor support for the syntax.

Alternatively, you could always consider creating a dedicated PML IDE in Java, which might be easier to implement than an LSP package. Probably many users would be happy with it, and could make the most out of PML, having an IDE that directly feeds on their PML local package and custom settings and extensions. The IDE could be part of the package too, since I doubt it would make much difference in size (compared to the Java runtime binaries).

Realistically, right now I don't see an easy road ahead for PML syntax support in general purpose editors, because the syntax has become harder to implement (much harder than the average package allows for). But there are numerous FOSS editors designed with hacking in mind, which end users can tweak extensively to create an ad hoc editor for their favourite language.

tajmone Sep 7, 2021
Author

Please, see my new dedicated Discussion on the parsing strategies and problems for the new attributes syntax:

tajmone/Sublime-PML#30

tajmone · 2021-12-26T06:33:28Z

tajmone
Dec 26, 2021
Author

I Think I've Nailed It!

@pml-lang, after various tests with Sublime PML I think I've now found a method to implement the new attributes system in a way that supports both lenient parsing and smart completions. I'm not 100% sure, but the tests so far seem to be working an promising (the new approach can be viewed in the pml-2.0.0 dev branch).

And the good new is that I didn't have to exploit the new ST4 syntax features either (i.e. branching or multi-pop), which means that it should be possible to replicate this also in VSCode using the old TextMate syntax format (i.e. if it supports meta scopes the way ST does).

The challenge was how to be able to pinpoint the specific scopes following an opening tag where attributes could occur — i.e. determining when the valid zone for node attributes ends and node contents begin. This is a twofold requirement:

The syntax scope needs to know when to pop from attributes awaiting to node contents.
Smart completions need to be aware of when to suggest specific attributes for any given tag.

Lenient parsing makes this harder because in some cases attributes might not be enclosed within parenthesis. But I've found a way to switch context from lenient- to enclosed-attributes, which means that we'll be able to save smart completions (which was my main worried). The whole process adds a bit of overhead, but not as I had foreseen, for I've come up with a new approach that allows sharing attributes definitions among nodes without loosing tag-specific contexts.

There are still some unexplored/undocumented questions left, which I'll have to work out by further trial and error:

What happens when PMLC encounters a node that:
- Contains an unsupported attribute in lenient parsing format? does it treat it as if it was ordinary text (and ignore it) or does it see it as a misplaced attribute and raise an error?
- Contains a valid attribute in lenient parsing (no parenthesis) followed by more attributes within parenthesis? i.e. does the lenient attribute mark the end of attributes space and the beginning of text contents, so that the parenthesis group is treated like text? or does PMLC parse both as attributes?

the answers to the above questions need to be taken into account in PML editor syntaxes, especially with the new approach I've come up with, since these RegEx based syntax definitions are all about handling expected token (valid or invalid) to determine when contexts start and end.

The new JSON Tags file really helped me out in finding this solution, because it allowed me to get a better picture of the different tags groups, which is why I've been updating the mustache templates at the PML Playground in these days.

Documenting the New Method

Since the new approach is fairly intricate I'm thinking of writing it out in a document first, so I have reference doc to work with, which I can then use for the Rouge syntax, Sublime PML and the VSCode syntax. Documenting it would provide a better action plan to stick with as I go along.

That's something I have been planning any way, since I believe that it would be very useful to have a guide for syntax developers (i.e. syntax highlighters or editor syntaxes). The document will cover practical parsing details which are not to be found in the PML official docs, i.e. dealing with edge cases, context switching, etc., all of which are important to developers working on PML syntax support for third party tools.

So my next step will be to start drafting this document, focusing on providing a list of all the nodes that require support for the different types of lenient parsing (they are not many, but one needs to know which they are and which kind of parsing leniency they support).

Once I have a clear reference to work with, the rest of the work on Sublime PML will be just writing out the rules, one node at the time, until the whole syntax is covered.

[ EDIT ]

The documentation can now be found at:

tajmone/pml-playground/syntax-guide/ — AsciiDoc source
PML-Syntax-Guide.html — HTML Live Preview.

VSCode Reference Links

0 replies

pml-lang · 2021-12-27T06:56:47Z

pml-lang
Dec 27, 2021
Maintainer

I think I've now found a method to implement the new attributes system in a way that supports both lenient parsing and smart completions.
it should be possible to replicate this also in VSCode

Great! That's very very good news!

What happens when PMLC encounters a node that: Contains an unsupported attribute in lenient parsing format?

Whether in lenient parsing mode or not, an invalid attribute always raises an error.

For example, this code:

[image source=ball.png foo=bar]

... generates the following error:

Error      Parameter 'foo' doesn't exist.
Code       [image source=foo.png foo=bar]
                                 ^^^

What happens when PMLC encounters a node that: Contains a valid attribute in lenient parsing (no parenthesis) followed by more attributes within parenthesis?

That would raise an error too, because all attributes must either be included or not included in parenthesis.

For example, this code:

[image source=foo.png (border=yes)]

... generates:

Error      Expecting a valid name. A name cannot start with '('.
Code       [image source=foo.png (border=yes)]
                                 ^

I believe that it would be very useful to have a guide for syntax developers

Yes, absolutely. And, besides being useful in the context of other plugins for PML, in the future it could also be useful for other projects that use the PDML syntax.

4 replies

tajmone Dec 27, 2021
Author

Thanks for the answers, they were really useful.

I have another brief question for you...

Up to now, in Sublime PML I would use the RegEx [a-z][a-z_]* to match PML tags, since all native tags are always lowercase, and [a-zA-Z][a-zA-Z_]* for attributes, since some attributes also contain uppercase letters.

But now that custom nodes have been introduced, I think I should try and match tags and attributes using the custom ID RegEx (which I believe allows also digits and maybe another char) since instead of scoping unknown tags/attributes as invalid I should now assume they are custom tags/attributes. Hence, it might be worth to use the ID pattern instead.

The only invalid tags and attributes should be native tags and attributes found in the wrong place (i.e. wrongly nested tags, or attributes unsupported by a specific node tag). All the rest should be assumed to be a custom tag/attribute.

Does this make sense?

pml-lang Dec 27, 2021
Maintainer

now that custom nodes have been introduced, I think I should try and match tags and attributes using the custom ID RegEx

Yes. To cover all cases, the plugin should use the regex shown in the PDML Specification: [a-zA-Z_][a-zA-Z1-9_\.-]*

BTW: Extension nodes must also be considered. They all have a ! between the [ and the node name (e.g. [!ins-file ...]). Currently there is just !get and !set for Parameters, as well as !ins-file. More extension nodes will be added in the future. Therefore it might be useful to include them in the JSON file too and add field is_extension_node, so that they can be checked for errors, like standard nodes (e.g. invalid extension node name or attribute). What do you think?

instead of scoping unknown tags/attributes as invalid I should now assume they are custom tags/attributes

Yes. They can be invalid or custom tags. But you can't know, because custom tags are read from config files at PML built-time. It would be good, however, to scope them differently (and thus use a different color), to make these tags distinguishable from standard PML tags. Extension nodes should also (IMO) be scoped differently.

The only invalid tags and attributes should be native tags and attributes found in the wrong place (i.e. wrongly nested tags, or attributes unsupported by a specific node tag). All the rest should be assumed to be a custom tag/attribute.

Yes.

pml-lang Dec 27, 2021
Maintainer

I forgot to mention this:

In the future, it will be possible to dynamically add extension nodes at build-time (like user-defined-nodes). Hence, the plugin should differentiate between native extension nodes (with known rules in the JSON file), as well as customized extension nodes (with no known rules).

tajmone Dec 27, 2021
Author

BTW: Extension nodes must also be considered. They all have a ! between the [ and the node name (e.g. [!ins-file ...]). [...] More extension nodes will be added in the future.

I see. I guess it depends how each syntax is going to treat these extension nodes. From what I've seen so far, I have the impression they should be scoped as pre-processor directives, which usually get a different colour of its own in most colour schemes. In this case, any unknown tag with a trailing ! should be scoped as an "unknown preprocessor directive", if the usage is going to be consistent.

Therefore it might be useful to include them in the JSON file too and add field is_extension_node, so that they can be checked for errors, like standard nodes (e.g. invalid extension node name or attribute). What do you think?

Probably the mechanism for detecting invalid nodes/attributes is not going to be affected by this, since the best approach seems to include: first the valid nodes/attributes in any given context, and then include: all invalid nodes/attributes matching RegExs — i.e. since after every match the scope-loop is reset, and starts over again from the beginning, we're sure that invalid node-matching will never occur if a valid node was matched first. This approach has the advantage of reusing tokens definitions, with each token having a valid matching RegEx and another one that matches the same token but scope it as invalid.

But having the is_extension_node field is probably going to be useful in various context, including documentation, since it can be used via mustache to create a list of all the [! tags.

Yes. They can be invalid or custom tags. But you can't know, because custom tags are read from config files at PML built-time. It would be good, however, to scope them differently (and thus use a different color), to make these tags distinguishable from standard PML tags. Extension nodes should also (IMO) be scoped differently.

Unfortunately this is partly out of the syntax designer control, due to the scoping guidelines for TextMate-like syntaxes. As mentioned before, probably [! tags will be coloured differently because they are scoped as pre-processor directives, but ensuring that other tags are coloured differently can only be achieved via custom colour schemes, since we can distinguish them only by extra scope suffixes which are ignored by most colour schemes.

So far I've been using extra scopes to distinguish between inline and block nodes, but that's again done via extra trailing scopes, and only results in different colouring with our custom scheme. The problem is that we need to follow the general guidelines for most tags scoping in order to allow plug-ins to work properly.

For the same reasons, all markup syntax packages usually ship with some custom colour schemes designed exclusively for the syntax (markdown and AsciiDoc both do that, otherwise they look poor with the default colour schemes).

In the future, it will be possible to dynamically add extension nodes at build-time (like user-defined-nodes). Hence, the plugin should differentiate between native extension nodes (with known rules in the JSON file), as well as customized extension nodes (with no known rules).

IRC this pertains the possibility of overriding native PML nodes right?

I'm afraid that without a PML language server it would be impossible to track these, since the only means to match tokens are dumb RegExs matching one source line at the time (or what's left after partial matches), but without any context awareness, just a stack machine that pushes/sets and pops in and out of syntax contexts (no variables, no tracing, nothing). There's no way to hook into the editor's API.

pml-lang · 2021-12-28T06:24:43Z

pml-lang
Dec 28, 2021
Maintainer

From what I've seen so far, I have the impression they should be scoped as pre-processor directives

Yes.

any unknown tag with a trailing ! should be scoped as an "unknown preprocessor directive", if the usage is going to be consistent.

Yes.

having the is_extension_node field is probably going to be useful in various context, including documentation, since it can be used via mustache to create a list of all the [! tags.

Rather than adding an is_extension_node field in the JSON file generated by PML, I now think that it would be much better to create another JSON file describing extension nodes. And this file should be created by PDML (not by PML), because:

extension nodes are part of PDML (but they can be used in PML)
a dedicated JSON file describing PDML extension nodes can be useful in other contexts not related to PML (e.g. Sublime-PDML).
extension nodes might have other fields that don't exist in standard PML nodes.
the file could contain other information related to PDML (e.g. meta-data like what you suggested here.

Maybe that would complicate the PML plugin, because you then would have to consider two JSON files (if you want to include support for PDML extension nodes in PML). Alternatively, you could simply scope all extension nodes (i.e. nodes with !) as "preprocessor directive". What's your take on this?

this pertains the possibility of overriding native PML nodes right?

PDML extension nodes cannot directly be used to override native PML nodes. Overriding native PML nodes could later be achieved with "user-defined-nodes" (UDNs) in PML (which have been added in version 2.2.). UDNs can currently only be used to add new nodes, not to override existing native PML nodes. The ability to override native nodes (by defining a UDN with the same name as a native node, and adding a field like [override_native yes]) could be added in a future version.

However, because UDNs are defined in PDML files, PDML extension nodes can be used to define UDNs . For example, you could use PDML extension node [!ins-file ...] to use shared code in UDN definition files. I hope that all this will be more clear when PDML extension nodes are fully documented on PDML's website.

1 reply

tajmone Dec 28, 2021
Author

Alternatively, you could simply scope all extension nodes (i.e. nodes with !) as "preprocessor directive". What's your take on this?

I think the safe approach right now is to treat all [! node as pre-processor (after all, user defined macros are scoped as pre-processor too in many language), then will see as PDML/PML become more mature, along with their Sublime packages — bear in mind that if the end user has installed both Sublime PDML and Sublime PML then the two syntaxes and packages can interact, via syntax inheritance and shared assets.

Unfortunately PackageControl doesn't accept "compound packages", otherwise it would have been really nice to bundle together Sublime PML and PDML into a single package. Probably we could still do that, by bypassing PackageControl and finding a way to deliver the package independently — installation via Git doesn't work as expected (or as the documentation wants you to believe), but there are other ways to distribute packages, e.g. via servers, zipped archives, etc.

UDNs can currently only be used to add new nodes, not to override existing native PML nodes. The ability to override native nodes (by defining a UDN with the same name as a native node, and adding a field like [override_native yes]) could be added in a future version.

Except that due to case-sensitive tag IDs, a user could create a [DOC node, which PML would see as being entirely different from the native [doc, but an editor syntax might have trouble dealing with it, since the most sensible action would be to treat it as a "misspelled tag" rather than assume it's a custom defined node with a different-cased native node name.

I hope that all this will be more clear when PDML extension nodes are fully documented on PDML's website.

Hopefully. But then, if the Sublime PDML and PML package are kept separate, sharing this kind of info between them is going to be hard (unless we duplicate functionality in a PML package, e.g. to detect UDF definitions, etc.).

In any case, without the possibility of hooking into syntax parser, I don't think that having this info might be of great help in terms of syntax highlighting, although it could be used to provide "pop-up info" (IntelliSense).

The VSCode API is more powerful in this respect, since it's a FOSS tool and nothing is hidden from end users (unlike ST which exposes only API interfaces, and even keeps some internal API functions inside DLLs).

tajmone · 2022-08-21T06:35:04Z

tajmone
Aug 21, 2022
Author

PML 3 & Lenient Parsing Attributes

@pml-lang, I noticed that in the PML Changelog for v3.0.0 it mentions:

The source attribute name for image nodes is now required.

Does this means that the lenient parsing rule that allowed to omit the key for default attributes no longer applies to PML in general? I've noticed that the User Manual doesn't mention it any longer, but since the Changelog doesn't specifically mention that the rule is now dropped, I wanted to be sure if that's the case.

Also, since now the default_attribute_id has been removed from the JSON tags file, replaced by position, I thought this might be yet another confirmation in that direction — but in terms of having to work on syntaxes, I'd like confirmation of this fact.

Furthermore, regarding the new position entry in JSON tags, since I couldn't find any entry that didn't have a null value, I wanted to ask what it's for? Is it currently used? or introduced for future uses?

2 replies

pml-lang Aug 22, 2022
Maintainer

Does this means that the lenient parsing rule that allowed to omit the key for default attributes no longer applies to PML in general?

This feature has been disabled in version 3.0.0, and will probably be re-implemented in a future version. Prior to version 3, the default_attribute_id was only used for attribute source of node image. So, it's not a big deal for end-users.

The reason for removing default_attribute_id is that I've started to implement positional arguments in the pp-parameters package. Positional parameters didn't exist in PPL, and default_attribute_id was therefore used as a simple replacement in PPL. In a future version, positional parameters will probably be supported in PML, so that it will be possible again to write:

[image ball.png]

... instead of:

[image source=ball.png]

Attribute source will then be defined as a positional attribute with position 1. Using the named syntax (source=ball.png) will still be valid after introducing positional attributes, because a positional attribute still must have a name, and the user can choose if he/she wants to specify the (optional) name.

since now the default_attribute_id has been removed from the JSON tags file, replaced by position ...

Yes, that's part of the ongoing implementation of positional arguments.

I wanted to ask what it's for?

Currently, all position fields in the JSON file are set to null (including attribute source of node image), because positional arguments are not yet supported in PML. In the future, attribute source of node image will have a non-null value (probably 1), to state that the value of attribute source can be defined at the first position, without the need to mention the attribute's name.

Is it currently used?

No

or introduced for future uses?

Yes.

tajmone Aug 24, 2022
Author

So, if I've understood correctly, this type of lenient parsing feature — which I refer to as «implicit default attribute», for the sake of being able to distinguish it from other lenient parsing types in documentation — is not going away but actually will be extended to include all positional attributes?

Basically it's going to be like functions/methods calls in those languages that support the positional parameters notation along with named parameters, allowing callers to omit the parameter keys for all positional parameters that are passed in the same exact order as in the function/method definition, and use named parameters for out-of-order parameters (e.g. when skipping optional parameters, etc.).

I guess this means that, unlike the previous default-attribute version, the upcoming version will be used much more in the syntax since assigning ordinal values to attributes could result in less verbose source files via lenient parsing.

Surely, the new version makes more sense than the previous one (which introduced lots of problems for editor syntaxes for a single node's lenient benefit) in terms of end users benefiting from it. At the same time, I'm worried how editor syntaxes are going to handle this (i.e. if they can handle it at all).

I think that you should strongly consider the idea of creating an official LSP language server for PML, which is auto-generated from PMLC sources so that it always mirrors the latest syntax. If you manage to achieve this using some of LSP related tools that are available in Java it would be really great.

Lacking an LSP lang server, I'm afraid that we risk that PML won't have an editor/IDE with decent syntax support (definitely not one that support refactoring operations). I also believe that maintaining an independent PML lang server would make little sense, since the frequent syntax changes call for an LSP package which is generated from the very same source files of PMLC, IMO.

I know that I've been pressing on this rather pedantically, but I'm aware of how lack of good editor support can have a negative impact on any syntax, and how postponing such a project can too easily result in a huge gap that could prevent automating the creation of a lang server from the converter sources. It would be much easier (and better) to have the convert and the lang server evolve together along with the syntax. Most modern programming language are designed in tandem with their LSP lang server from the onset today.

Seeing how this newer type of «implicit positional attributes» lenient parsing is going to impact editors support, along with the (recently discovered) complications of supporting alternative raw-text syntaxes, I can't avoid seeing how PML strongly depends on LSP for a good editing experience.

IMO, the priority of an official LSP server is high enough to take precedence over syntax development at this stage. The more it's procrastinated, the harder is going to be to integrate it into PMLC source code.

References

PML Syntax Guide» implicit default attribute

pml-lang · 2022-08-24T08:21:48Z

pml-lang
Aug 24, 2022
Maintainer

So, if I've understood correctly, this type of lenient parsing feature ... is not going away but actually will be extended to include all positional attributes?

The current PMLC version 3.0.0 does not support positional attributes/parameters.

I'm not sure yet if it's a good idea to add positional attributes in a future version. Yes, positional attributes save keystrokes, but they can also make code less readable and more error-prone. Moreover, they make parsing PML/PDML documents more challenging, and (as you mentioned) it is not easy (or even impossible) to support them in editor plugins and other PML tools. Hence, I do not plan to add positional PML/PMDL attributes in the near future.

Maybe we should even remove field position in the JSON file, because its value is always null and therefore currently not needed (just causes confusion).

The reason to have positional parameters in pp-libs is that they will be needed to parse CLI arguments. Positional parameters are quite common in CLIs (and used in PMLC commands as well), and some users expect them, because it saves keystrokes. (Note: The Picocli dependency (currently used parse PMLC CLI arguments) will probably be removed in the future, when pp-libs is ready to take over)

Moreover, pp-libs is a general library that could be used in other projects (not related to PML), and therefore positional parameters have been added in pp-libs (although currently not used in PML/PDML).

Basically it's going to be like functions/methods calls in those languages that support the positional parameters notation along with named parameters

Yes, it's like that. However, if positional and named arguments are both supported, then we also need clear usage rules, because ideally, it should be possible to mix positional/named arguments, and to (optionally) use names for positional arguments (to increase readability). Another reason to think twice before adding positional arguments to PML/PDML.

consider the idea of creating an official LSP language server for PML, which is auto-generated from PMLC sources so that it always mirrors the latest syntax

Yes, that would be the best solution, as agreed already.

maintaining an independent PML lang server would make little sense, since the frequent syntax changes call for an LSP package which is generated from the very same source files of PMLC, IMO.

Yes, absolutely. PML and the PML lang server must evolve together seamlessly, and without code duplication.

3 replies

tajmone Aug 24, 2022
Author

I'm not sure yet if it's a good idea to add positional attributes in a future version. Yes, positional attributes save keystrokes, but they can also make code less readable and more error-prone. Moreover, they make parsing PML/PDML documents more challenging, and (as you mentioned) it is not easy (or even impossible) to support them in editor plugins and other PML tools. Hence, I do not plan to add positional PML/PMDL attributes in the near future

I have mixed feelings about this. On the one hand, coming from Ruby (where positional and named parameters can be freely mixed and are commonplace), I do see the value of keeping attributes short by omitting the parameters, especially in text documents, where the main focus should be the text and not its formatting (attributes are longer than node tags). On the other hand, I really foresee how this type of lenient parsing could damage editor's support.

LSP aside, most modern editors today seem to adopt the TextMate approach of using regex again single lines of the source code, which has it's well known limits (no backtracking, etc.).

Personally, based on my experience, I think that big documentation project could really benefit from proper semantic scoping in order to allow end users to carry out global operations on a documentation source files — from renaming every occurrence of a constant, to quickly round up every use of a given attribute, or occurrence of a reference to a certain asset, etc. — all of which
depend on the syntax highlighter being able to properly handle every syntax construct.

The fact hat markup documents are not code in the strict sense doesn't mean that end users don't need "power tools" for refactoring projects — it's just that so far we haven't seen examples of full fledged support for lightweight markup syntaxes due to their parsing complexity, but it doesn't mean it's not doable in principle, or undesirable. Personally, I'd love to see that type of power editing features, rather than having to rely on multi-pass RegEx search and replaces to achieve something that should be handled via syntax scopes.

A simple example would be converting straight quotes into curly quotes in one click, which requires the syntax to distinguish between in-text quotes and those occurring within attributes, verbatim/raw or code blocks, etc. — i.e. such operations need the syntax to be fully aware of the source contexts, no exceptions allowed (or it could be a disaster). Other example could range from title casing, up to anything related to gathering word statistics, spell checking, etc. (you name it).

Obviously, having an LSP lang server would solve the question, and only require end users to employ an editor/IDE that supports LSP, and thus put an end to any worried about syntax constructs that might jeopardize syntax support in "modern editors" — we're basically sacrificing language design to accommodate the limitations of "dumb" syntax definitions; but unfortunately lack of editor support can have a bad impact on any syntax.

I've been working with AsciiDoc for years, on a daily basis, and even though the AsciiDoc standard has been out there for ages, its editor support is far from optimal, ranging from very basic (non semantic) support to broken packages that simply clog-up halfway through any complex document. The situation is so bad that I had to take the existing "official" ST package and hack it down (by stripping syntax elements) until it stopped breaking documents highlighting, so that I could benefit from the limited features it supports (e.g. Goto Symbol to quickly jump to any section title by typing a few words from it).

Having worked on that syntax package for quite some time, I've come to the conclusion that it will never fully support the AsciiDoc syntax (all the preprocessor and macros expansions, etc.), and that only an LSP lang server could allow power tools like I'd love to see — e.g. to enforce consistent formatting styles, etc. It's quite sad not to have the editing support that AsciiDoc deserves, but I guess that if in all those years no one came up with a good solution it's because it's just too hard to do so. The recent introduction of LSP could change things in this respect, but so far I haven't got wind of any projects in that direction.

If were ever to create my own lightweight markup syntax, these are learnt lessons which I'd take into account as part of the design process — either keeping the syntax strict (sugar-free, no "lazy variants") and simple to parse, at the cost of sacrificing beauty and "human friendliness", or alternatively integrating an official lang server into the native project (the latter seems hard though). It's always a delicate balance of choices, where something needs to be sacrificed to keep all elements within bounds.

I think that assuming decent editor support should have precedence over any syntactic sugar or "human friendly" lazy variants — after all, it's reasonable to assume that end users can access a good editor nowadays, and that the editor can handle all the repetitive tasks for us, so we don't actually have to type boilerplate when there's decent syntax support. Even syntax ugliness can be mitigated by good syntax support, through the highlighter sending all the formatting nitty gritty to the background by simply assigning to it less prominent colours (and in some cases, even hiding long constructs by folding them outright).

pml-lang Aug 25, 2022
Maintainer

Great comment!

I think that assuming decent editor support should have precedence over any syntactic sugar or "human friendly" lazy variants

I agree. I would therefore suggest to proceed like this:

Now (to facilitate decent editor support):
- No support for positional attributes. Only named attributes.
- Officially only support the Delimited Text Syntax in PML. This is the most powerful of the three variations, but a bit more verbose.
  I will add a link to that chapter in the PML docs. The other two syntaxes will still work in PMLC, but they are not supported officially (anyway, the Text Block Syntax needs to still work for backwards compatibility)
Later, when we have an LSP-server using the PDML parser:
- Add support for positional attributes (easy to do, because everything is already prepared in pp-libs).
- Officially support the three syntax variations for raw_text nodes.

tajmone Aug 26, 2022
Author

No support for positional attributes. Only named attributes.

Yes, that would greatly help in the meantime.

Officially only support the Delimited Text Syntax in PML. This is the most powerful of the three variations, but a bit more verbose.

I agree. Although both the Delimited Text Syntax and Text Block Syntax can handle code indentations properly, only the former can preserve extra indentation when the code minimum indentation is bigger than the base indentation — e.g. in two different code blocks extracted from the same source file, where one of the two needs to preserve the extra indentation as found in the original source (e.g. in languages where indentation is meaningful). The Text Block Syntax syntax will always flush the code to the smallest indentation found.

A simpler case example would be representing a single-line markdown code block via indentation, which can't be done with the latter syntax since the indentation would be stripped away.

So IMO the extra verbosity added by the fenced delimiters is a price worth paying in this case. As far as I know, all TextMate syntax definitions should be able to handle this via RegEx backreferences, since these would usually refer to last match in the RegEx that introduced the context — at least, in ST that's how it works, but since markdown fenced blocks are supported in most modern editors I assume that the same applies to other editors too.

Later, when we have an LSP-server using the PDML parser:

Add support for positional attributes (easy to do, because everything is already prepared in pp-libs).

Officially support the three syntax variations for raw_text nodes.

Definitely. I know that sacrificing syntax power for editors support sucks, but in this early stages we might prioritize PML taking momentum rather than getting trapped in a corner. Once you have a wide users base you can afford many changes that you simply can't when the community is small, and some development gaps have the tendency to quickly grow into chasms that can't be filled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PML 2.0 and Attributes Lenient Parsing #56

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

PML 2.0 and Attributes Lenient Parsing #56

tajmone Sep 3, 2021

Replies: 6 comments · 12 replies

pml-lang Sep 3, 2021 Maintainer

tajmone Sep 3, 2021 Author

tajmone Sep 7, 2021 Author

tajmone Dec 26, 2021 Author

I Think I've Nailed It!

Documenting the New Method

VSCode Reference Links

pml-lang Dec 27, 2021 Maintainer

tajmone Dec 27, 2021 Author

pml-lang Dec 27, 2021 Maintainer

pml-lang Dec 27, 2021 Maintainer

tajmone Dec 27, 2021 Author

pml-lang Dec 28, 2021 Maintainer

tajmone Dec 28, 2021 Author

tajmone Aug 21, 2022 Author

PML 3 & Lenient Parsing Attributes

pml-lang Aug 22, 2022 Maintainer

tajmone Aug 24, 2022 Author

References

pml-lang Aug 24, 2022 Maintainer

tajmone Aug 24, 2022 Author

pml-lang Aug 25, 2022 Maintainer

tajmone Aug 26, 2022 Author

tajmone
Sep 3, 2021

Replies: 6 comments 12 replies

pml-lang
Sep 3, 2021
Maintainer

tajmone Sep 3, 2021
Author

tajmone Sep 7, 2021
Author

tajmone
Dec 26, 2021
Author

pml-lang
Dec 27, 2021
Maintainer

tajmone Dec 27, 2021
Author

pml-lang Dec 27, 2021
Maintainer

pml-lang Dec 27, 2021
Maintainer

tajmone Dec 27, 2021
Author

pml-lang
Dec 28, 2021
Maintainer

tajmone Dec 28, 2021
Author

tajmone
Aug 21, 2022
Author

pml-lang Aug 22, 2022
Maintainer

tajmone Aug 24, 2022
Author

pml-lang
Aug 24, 2022
Maintainer

tajmone Aug 24, 2022
Author

pml-lang Aug 25, 2022
Maintainer

tajmone Aug 26, 2022
Author