-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get marker delimitation #96
Comments
Welcome @jokteur! 👋
If you want to work with raw events/tokens, rather than the AST, use the parse file/function. |
And would they be any way to use the parser file/function without forking the project ? Because currently this API is private, which doesn't allow me to implement my own compiler on top of markdown-rs. |
There is also a JavaScript version of this project, on the JS side there is a lower level package micromark that exposes this. https://github.com/micromark/micromark |
No, it’s not exposed yet. This project is currently at the state where it has to get some traction IMO before all the internals are exposed, to figure out how to expose things, and whether to expose things. |
Hello,
I am writing a WYSIWYG Markdown editor focused on math and science, and I want to use Markdown as the base format. The problem I am going to describe is present in many other Markdown parsers, as a result I decided to completely write a new parser from scratch (in C++) and make some modifications to the Markdown standard to fit my own needs (this is the result).
The prototype I wrote was working okay, but now I've decided to rewrite the whole application in Rust, and also decided to not maintain my own parser which is much more prone to bugs and crashes.
The marker delimitation problem
I am rewriting what I wrote here: https://github.com/jokteur/ab-parser#the-delimitation-marker-problem.
For my WYSIWYG application, I need to know where the markers of a specific block / span are, to temporarily display to the user the markers, like on this demo here: https://github.com/wooorm/markdown-rs/assets/25845695/420c1496-7306-4c69-b7ca-74059ec95886
Let's say that we have the following Markdown example:
This example would generate an abstract syntax tree (AST) like:
How do we attribute each non-text markers (like
-
,>
,[
, ...) to the correct block / span ?My parser was created to solve this specific problem, while keeping reasonable performance. To do this, each object (BLOCK or SPAN) is represented by an vector of boundaries. A boundary is defined as follows:
This struct designates offsets in the raw text which form its structure.
line_number
is the line number in the raw text on which the boundary is currently operating. Offsets betweenpre
andbeg
are the pre-delimiters, and offsets betweenend
andpost
are the post-delimiters. Everything betweenbeg
andend
is the content of the block / span.Here is a simple example. Suppose we have the following text:
_italic_
, which starts at line 0 and offset 0 then the boundary struct would look like{0, 0, 1, 7, 8}
.Going back to the first example, we now use the following notation to illustrate ownership of markers: if there is
x
, it indicates a delimiter, if there is_
it indicates content, and.
indicates not in boundary. Here are the ownership for each block and span:Is there any simple way to rewrite this kind of information ?
Currently,
markdown-rs
provides positional information like this:I may have a workaround to rewrite this kind of information (after it has been parsed, go from leaf nodes, compare the text with raw text, and check which chars are part of the node or node, and attribute them to the parent). This workaround may be slow, but it is okay for my usage because I only need marker delimitation information where the cursor is (not on the whole document).
I don't really know how well
markdown-rs
works, how difficult would it be that have this information built-in the parser ?The text was updated successfully, but these errors were encountered: