Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework the flattening process #31

Merged
merged 2 commits into from
Mar 22, 2024
Merged

rework the flattening process #31

merged 2 commits into from
Mar 22, 2024

Conversation

silby
Copy link
Contributor

@silby silby commented Mar 22, 2024

I don't know if my commit message or documentation comments for this change are useful enough so feedback on that is welcome as well as on the code.

Fixes #30, the added test fails before and succeeds after.

doclayout and relevant pandoc benchmarks do not seem to have regressed.


Introduce FlatDocs and use them for rendering

Doc was previously pulling double-duty as the data structure constructed by clients using the smart constructor/combinator functions (or directly) and as the elements of the flattened structure rendered by renderList. While this saved on duplication (FlatDoc looks a lot like Doc!), it meant that renderList had to account for Doc constructors that weren’t meant to occur after calling unfoldD by throwing a runtime error. The situation became more complicated with the introduction of ANSI styling: we neglected at first to account for Styled and Linked docs with inner Concats in unfoldD, which ultimately broke line-breaking in some situations when styled text appeared at the end of the line. Unfolding one Styled Concat into many Styled documents in the result was somewhat plausible, but unaesthetic and seemed like it would be hard to make correct.

Now we have FlatDoc, which is in effect an “intermediate representation” for the Doc “interpreter”. The general design is that any Doc can be turned into a list of FlatDocs that carry equivalent information. Doc constructors without an “inner” Doc that they modify have more or less direct equivalents. There’s no FlatDoc Empty or Concat constructors, because these things are going to live in a list. The equivalents to the constructors that have inner Docs instead have a NonEmpty of FlatDocs. So really these FlatDocs aren’t completely flat, they’re just flat enough for our purposes.

The main actual point of doing this is to replace the nested Styled and Linked Docs, which form a more complicated tree structure than previously existed in DocLayout, with FStyleOpen/FStyleClose and FLinkOpen/FLinkClose pairs, surrounding their flattened inner contents. This makes it much simpler to measure the next printable non-space span that follows a breaking space when that span happens to be styled.

Since FlatDocs aren’t completely flat, just mostly flat, there’s still some contrived situations that can be measured incorrectly, which have always existed, for example:

ghci> let p = "hi" <+> (prefixed "x" "mom")
ghci> render (Just 2) p
"hi\nxmom"
ghci> render (Just 3) p
"hi mom"

This is an arbitrary outcome, and the rendering of Docs that don’t really make sense is not a design goal of the library. Thus FlatDoc doesn’t completely unfold Prefixed, BeforeNonBlank, or Flush docs using a tag-like idea, so we can keep refactoring of the rendering implementation to a minimum and because it’s not necessary to get the fix we need for styled text.

silby added 2 commits March 22, 2024 10:36
Doc was previously pulling double-duty as the data structure constructed
by clients using the smart constructor/combinator functions (or
directly) and as the elements of the flattened structure rendered by
renderList. While this saved on duplication (FlatDoc looks a lot like
Doc!), it meant that renderList had to account for Doc constructors that
weren't meant to occur after calling unfoldD by throwing a runtime
error. The situation became more complicated with the introduction of
ANSI styling: we neglected at first to account for Styled and Linked
docs with inner Concats in unfoldD, which ultimately broke line-breaking
in some situations when styled text appeared at the end of the line.
Unfolding one Styled Concat into many Styled documents in the result was
somewhat plausible, but unaesthetic and seemed like it would be hard to
make correct.

Now we have FlatDoc, which is in effect an "intermediate representation"
for the Doc "interpreter". The general design is that any Doc can be
turned into a list of FlatDocs that carry equivalent information. Doc
constructors without an "inner" Doc that they modify have more or less
direct equivalents.  There's no FlatDoc Empty or Concat constructors,
because these things are going to live in a list. The equivalents to the
constructors that have inner Docs instead have a NonEmpty of FlatDocs.
So really these FlatDocs aren't completely flat, they're just flat
enough for our purposes.

The main _actual_ point of doing this is to replace the nested Styled
and Linked Docs, which form a more complicated tree structure than
previously existed in DocLayout, with FStyleOpen/FStyleClose and
FLinkOpen/FLinkClose pairs, surrounding their flattened inner contents.
This makes it much simpler to measure the next printable non-space span
that follows a breaking space when that span happens to be styled.

Since FlatDocs aren't completely flat, just mostly flat, there's still
some contrived situations that can be measured incorrectly, which have
always existed, for example:

    ghci> let p = "hi" <+> (prefixed "x" "mom")
    ghci> render (Just 2) p
    "hi\nxmom"
    ghci> render (Just 3) p
    "hi mom"

This is an arbitrary outcome, and the rendering of Docs that don't
really make sense is not a design goal of the library. Thus FlatDoc
doesn't completely unfold Prefixed, BeforeNonBlank, or Flush docs using
a tag-like idea, so we can keep refactoring of the rendering
implementation to a minimum and because it's not necessary to get the
fix we need for styled text.
@jgm jgm merged commit fc29e45 into jgm:master Mar 22, 2024
7 checks passed
@jgm
Copy link
Owner

jgm commented Mar 22, 2024

Looks good to me!

@silby
Copy link
Contributor Author

silby commented Mar 22, 2024

sweet. One thing I didn't explicitly note here is that now unfoldD is unused internally, and it's not called in pandoc. I'd have removed it if it weren't part of the exported API. I'm not sure if the thing that it does is useful externally, especially since it might not do what people expect when Styled and Linked docs are present. My p.o.v. is you could line it up for deprecation. But it's no skin off my back either.

@silby silby mentioned this pull request Mar 23, 2024
@jgm
Copy link
Owner

jgm commented Mar 23, 2024

Deprecating unfoldD seems like the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Styled documents interact poorly with line breaking.
2 participants