rework the flattening process #31

silby · 2024-03-22T18:56:08Z

I don't know if my commit message or documentation comments for this change are useful enough so feedback on that is welcome as well as on the code.

Fixes #30, the added test fails before and succeeds after.

doclayout and relevant pandoc benchmarks do not seem to have regressed.

Introduce FlatDocs and use them for rendering

Doc was previously pulling double-duty as the data structure constructed by clients using the smart constructor/combinator functions (or directly) and as the elements of the flattened structure rendered by renderList. While this saved on duplication (FlatDoc looks a lot like Doc!), it meant that renderList had to account for Doc constructors that weren’t meant to occur after calling unfoldD by throwing a runtime error. The situation became more complicated with the introduction of ANSI styling: we neglected at first to account for Styled and Linked docs with inner Concats in unfoldD, which ultimately broke line-breaking in some situations when styled text appeared at the end of the line. Unfolding one Styled Concat into many Styled documents in the result was somewhat plausible, but unaesthetic and seemed like it would be hard to make correct.

Now we have FlatDoc, which is in effect an “intermediate representation” for the Doc “interpreter”. The general design is that any Doc can be turned into a list of FlatDocs that carry equivalent information. Doc constructors without an “inner” Doc that they modify have more or less direct equivalents. There’s no FlatDoc Empty or Concat constructors, because these things are going to live in a list. The equivalents to the constructors that have inner Docs instead have a NonEmpty of FlatDocs. So really these FlatDocs aren’t completely flat, they’re just flat enough for our purposes.

The main actual point of doing this is to replace the nested Styled and Linked Docs, which form a more complicated tree structure than previously existed in DocLayout, with FStyleOpen/FStyleClose and FLinkOpen/FLinkClose pairs, surrounding their flattened inner contents. This makes it much simpler to measure the next printable non-space span that follows a breaking space when that span happens to be styled.

Since FlatDocs aren’t completely flat, just mostly flat, there’s still some contrived situations that can be measured incorrectly, which have always existed, for example:

ghci> let p = "hi" <+> (prefixed "x" "mom")
ghci> render (Just 2) p
"hi\nxmom"
ghci> render (Just 3) p
"hi mom"

This is an arbitrary outcome, and the rendering of Docs that don’t really make sense is not a design goal of the library. Thus FlatDoc doesn’t completely unfold Prefixed, BeforeNonBlank, or Flush docs using a tag-like idea, so we can keep refactoring of the rendering implementation to a minimum and because it’s not necessary to get the fix we need for styled text.

Doc was previously pulling double-duty as the data structure constructed by clients using the smart constructor/combinator functions (or directly) and as the elements of the flattened structure rendered by renderList. While this saved on duplication (FlatDoc looks a lot like Doc!), it meant that renderList had to account for Doc constructors that weren't meant to occur after calling unfoldD by throwing a runtime error. The situation became more complicated with the introduction of ANSI styling: we neglected at first to account for Styled and Linked docs with inner Concats in unfoldD, which ultimately broke line-breaking in some situations when styled text appeared at the end of the line. Unfolding one Styled Concat into many Styled documents in the result was somewhat plausible, but unaesthetic and seemed like it would be hard to make correct. Now we have FlatDoc, which is in effect an "intermediate representation" for the Doc "interpreter". The general design is that any Doc can be turned into a list of FlatDocs that carry equivalent information. Doc constructors without an "inner" Doc that they modify have more or less direct equivalents. There's no FlatDoc Empty or Concat constructors, because these things are going to live in a list. The equivalents to the constructors that have inner Docs instead have a NonEmpty of FlatDocs. So really these FlatDocs aren't completely flat, they're just flat enough for our purposes. The main _actual_ point of doing this is to replace the nested Styled and Linked Docs, which form a more complicated tree structure than previously existed in DocLayout, with FStyleOpen/FStyleClose and FLinkOpen/FLinkClose pairs, surrounding their flattened inner contents. This makes it much simpler to measure the next printable non-space span that follows a breaking space when that span happens to be styled. Since FlatDocs aren't completely flat, just mostly flat, there's still some contrived situations that can be measured incorrectly, which have always existed, for example: ghci> let p = "hi" <+> (prefixed "x" "mom") ghci> render (Just 2) p "hi\nxmom" ghci> render (Just 3) p "hi mom" This is an arbitrary outcome, and the rendering of Docs that don't really make sense is not a design goal of the library. Thus FlatDoc doesn't completely unfold Prefixed, BeforeNonBlank, or Flush docs using a tag-like idea, so we can keep refactoring of the rendering implementation to a minimum and because it's not necessary to get the fix we need for styled text.

jgm · 2024-03-22T23:48:51Z

Looks good to me!

silby · 2024-03-22T23:53:48Z

sweet. One thing I didn't explicitly note here is that now unfoldD is unused internally, and it's not called in pandoc. I'd have removed it if it weren't part of the exported API. I'm not sure if the thing that it does is useful externally, especially since it might not do what people expect when Styled and Linked docs are present. My p.o.v. is you could line it up for deprecation. But it's no skin off my back either.

jgm · 2024-03-23T04:46:25Z

Deprecating unfoldD seems like the right thing to do.

silby added 2 commits March 22, 2024 10:36

Add test for breaking space in styled text

b83da04

jgm merged commit fc29e45 into jgm:master Mar 22, 2024
7 checks passed

silby mentioned this pull request Mar 23, 2024

add ANSI writer jgm/pandoc#9565

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework the flattening process #31

rework the flattening process #31

silby commented Mar 22, 2024 •

edited

Loading

jgm commented Mar 22, 2024

silby commented Mar 22, 2024

jgm commented Mar 23, 2024

rework the flattening process #31

rework the flattening process #31

Conversation

silby commented Mar 22, 2024 • edited Loading

jgm commented Mar 22, 2024

silby commented Mar 22, 2024

jgm commented Mar 23, 2024

silby commented Mar 22, 2024 •

edited

Loading