Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could rustybuzz be ported to run on fontations? #956

Open
rsheeter opened this issue Jun 17, 2024 · 18 comments
Open

Could rustybuzz be ported to run on fontations? #956

rsheeter opened this issue Jun 17, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@rsheeter
Copy link
Collaborator

https://github.com/RazrFalcon/rustybuzz has done a lot of the work to port HarfBuzz. Some of the dependencies, most notably ttf-parser, would have to go (https://github.com/RazrFalcon/rustybuzz/blob/master/Cargo.toml).

Probably worth taking a swing at to see if we find a hard blocker.

@rsheeter rsheeter added the enhancement New feature or request label Jun 17, 2024
@LaurenzV
Copy link
Contributor

I'm planning on taking a look at this, unless someone else already is.

@dfrg
Copy link
Member

dfrg commented Jul 1, 2024

I’ve done a local exploration of this with most lookups (non-chaining contextuals aren’t done) running through fontations with all tests passing. It’s a bit of a Frankensteinish hack because the transition is incremental but it does work. I’ll find some time soon to bring my fork up to current and publish the code so you can take a look.

We’re still missing AAT support but I have a PR up for the initial pieces at #964

Also, thank you for taking the time to bring RustyBuzz up to date!

@LaurenzV
Copy link
Contributor

LaurenzV commented Jul 1, 2024

Oh, that's awesome! I just looked at it very briefly, but something that was missing for me was access to phantom points as well as non-Unicode cmaps. But it's possible that I just missed the right API for them.

@dfrg
Copy link
Member

dfrg commented Jul 1, 2024

So far I’ve only focused on the lookups so the rest of the logic still runs through ttf-parser. Skrifa should do the correct thing wrt phantom points and symbol cmaps but ideally, that logic would be moved to read-fonts.

@rsheeter
Copy link
Collaborator Author

rsheeter commented Jul 1, 2024

I'm planning on taking a look at this, unless someone else already is.

That would be amazing! We have to finish landing Skrifa (https://chromestatus.com/feature/5717358869217280) before we put too much focus elsewhere.

@dfrg
Copy link
Member

dfrg commented Jul 2, 2024

I pushed a branch here: https://github.com/dfrg/rustybuzz/tree/fontate

We don’t currently support cmap0 with the Mac Roman encoding so a few tests failed when I hooked up our cmap but that’s an easy fix.

As Rod said, we don’t have any scheduled time to work on this yet but I’ll hack at it occasionally and any contributions would be appreciated!

@dfrg
Copy link
Member

dfrg commented Jul 2, 2024

It now uses fontations for cmap and I merged your benchmarking changes. I’ll probably spend some time over the long weekend and isolate ttf-parser usage to just the AAT tables.

@LaurenzV
Copy link
Contributor

LaurenzV commented Jul 3, 2024

Sounds good!

@RazrFalcon
Copy link

RazrFalcon commented Jul 11, 2024

Not sure how to phrase it with less confrontation, but why?
What benefits it would bring? Is it faster? Is it safer? Is it more tested?
I would love to hear some concrete examples.

Even more, I would love to see an in-depth ttf-parser vs fontations comparison. I understand and appreciate the NIH, but I'm really curious to learn the reasoning behind fontations.

Am I interested in swapping ttf-parser with fontations? Not really. I've spend an absurd amount of time working on tff-parser (~6 months full-time) and it would be really sad to deprecate it.
What I'm interested in is handing rustybuzz over to someone. Or even better, deprecating it in favor of a better solution. Which doesn't exist yet.

As for rustybuzz, TrueType parsing is a tip of an iceberg. By switching ttf-parser with fontations we wouldn't make it easier to maintain. The hard part is understanding the harfbuzz codebase and trying to replicate it in Rust. That's the fundamental problem.

TL;DR: show me benchmarks

@rsheeter
Copy link
Collaborator Author

For context I'm the eng lead for Google Fonts and the ... sponsor is perhaps the right word ... of our efforts to migrate to Rust. The rationale for the overall effort is outlined in https://github.com/googlefonts/oxidize. Ultimately we would like text processing to not require a sandbox under the rule of 2.

Not sure how to phrase it with less confrontation, but why?

I appreciate the question and do not see it as confrontational. @behdad will confirm that I'm guilty of asking why myself :D

What benefits it would bring? Is it faster? Is it safer? Is it more tested?

We are working toward shipping Skrifa in Chrome (https://chromestatus.com/feature/5717358869217280). Skrifa is built on read-fonts, as is our new font compiler, as is our nascent rewrite of hb-subset.

If I ship rustybuzz on ttf-parser I now have two implementations of font parsing (read-fonts and ttf-parser) shipping to users. I have to maintain both, fuzz both, implement new features in both, etc. My hypothesis is that it's worth the effort to avoid this state so I have to pick a winner. Maybe I'm guilty of NIH but the winner I currently choose is read-fonts.

What I'm interested in is handing rustybuzz over to someone

We volunteer! Concretely, I suggest that we:

  • Migrate rustybuzz to the harfbuzz org
  • Port it to run on fontations
  • Commit to keeping it up to date with the C++ edition
    • In my fantasy world we migrate all users to the memory safe version but that will take a long time
  • Seek to guide users to prefer it as part of a memory safe text stack

Or even better, deprecating it in favor of a better solution. Which doesn't exist yet.

I would be very interested in your thoughts on what would be different in a better solution. Bear in mind that the more the api or results differ from harfbuzz the harder it will be to migrate clients.

The hard part is understanding the harfbuzz codebase and trying to replicate it in Rust. That's the fundamental problem.

I agree wrt this being the hard part. I think with @behdad on board we can do this. I hope we can make it easier to understand but that remains to be seen.

@LaurenzV
Copy link
Contributor

LaurenzV commented Jul 12, 2024

What benefits it would bring? Is it faster? Is it safer? Is it more tested?

Maybe I can share some of the advantages I see in both crates.

Advantages of ttf-parser

I've spend an absurd amount of time working on tff-parser (~6 months full-time) and it would be really sad to deprecate it.

That's naturally very understandable, but I don't think any of this would actually imply having to deprecate ttf-parser (if by deprecate you mean archiving it?). As far as I can tell, the considerable advantages of using ttf-parser over skrifa (since ttf-parser also provides outlining, I think it's better to compare those two) are:

  • Light-weightedness: It's a much more light-weight dependency, as it has zero dependencies. skrifa unfortunately indirectly relies on bytemuck_derive, which in turn relies on syn, so it is comparatively heavy unfortunately. And I don't think they would change that: Consider using bytemuck as replacement for read_fonts::FromBytes #808. So people who care about that (which probably includes you) would probably still prefer ttf-parser.
  • Configurability: It's much more configurable since you can disable things like variable fonts, apple layout, opentype layout, etc. As far as I can tell, you can't configure that in skrifa/read_fonts. With that said, it seems like like skrifa is still a more lightweight dependency in terms of binary size? Takes this with a big grain of salt as maybe I'm doing something wrong, all I did is compile the following code in release mode:
fn main() {
    let font_data = vec![];
    // With ttf-parser
    // let face = ttf_parser::Face::parse(&font_data, 0).unwrap();
    // With skrifa
    // let face = skrifa::FontRef::from_index(&font_data, 0).unwrap();
}

With ttf-parser using ttf-parser = { path = "../../GitHub/ttf-parser", default-features = false, features = ["no-std-float"]} I get a binary of 406KB, while with skrifa I get 390. But perhaps I'm missing something obvious here or things change once you start to use more features of each crate, so take it with a grain of salt.

  • Tooling: At least of right now, ttf-parser seems to have much more traction and tooling around it, while skrifa is mainly used by vello and a few others. Although lots of the ttf-parser dependencies are presumably because of rustybuzz. Although for fontations we also have write-fonts and fontc.
  • Flexibility: At least to me it seems you are always very active on GitHub and quick to respond to PRs, and I've seen a couple of instances where people asked for a new release and you were always willing to do that, which is much appreciated! I could be very wrong about this, but I imagine we might lose some of this flexibility if we switched to skrifa since there is probably some overhead to making new releases for them considering that it's used in Chrome and might need some additional vetting? And there are some other crates like write-fonts which probably need to be updated in sync. But this is just a guess from my side.

Advantages of skrifa/read-fonts

  • More active development: I mean, to be honest, if we look at the commit history of ttf-parser, there are barely any contributions outside of you and me in the last year. Sure, I guess you could argue that ttf-parser is already pretty much feature-complete, but there are still lots of things that could be done to improve the crate (more testing, more APIs, improvements to COLRv1, etc.) and it doesn't seem like there is a lot of interest in contributing to that. Because of this I'm also not sure if someone would implement the bigger changes that could be coming to fonts at some point from the boring expansion spec. fontations on the other hand is being actively developed and improved by multiple people working full-time on it, and sure, there is no guarantee that it will always stay this way, but it feels to me like this crate has higher odds of also supporting the future improvements, especially if the harfbuzz folks will be involved, too.
  • Features/API: I barely have any insight into all the APIs provided by skrifa/read-fonts, but some of the major features that are available are hinting and outline scaling, which maybe are not crucial to have for some, but definitely nice features. And sure, perhaps those could also be added to ttf-parser, but I doubt that anyone will, especially hinting because it's very complex to implement. And overall, while, as mentioned, I haven't used skrifa that much before, I must say I really enjoyed the provided API and data types and it feels very robust, but that's obviously very subjective.
  • Testing: We've discussed this before, ttf-parser by itself is to a large degree untested and to some degree relies on the rustybuzz test suite. read-fonts has pretty extensive tests itself, and is also constantly being fuzz-tested from what I gathered. I would argue that this is a pretty nice advantange, too. Perhaps in the future, rustybuzz could get that kind of fuzz-testing, too, if it's integrated into the fontations stack.
  • Potential for tooling: ttf-parser is basically a combination of skrifa and read-fonts, where it provides some general high-level API and also some mid-level APIs for the specific tables. From what I gathered, skrifa only provides a high-level API, while read-fonts provides a relatively low-level API (and maybe mid-level, too?). I think that this could make it easier to build additional tooling on top of it in the future. For example, a while ago I reworked a font subsetter for PDFs, and I basically ended up having to copy parts of the ttf-parser API because it didn't give me enough low-level access to some of the tables. I think it would've been possible when using read-fonts instead, although I haven't tried.

Conclusion

In principle, I think what the whole discussion boils down to is that I (and I think many other people too) would love to see the Rust ecosystem converge towards a single and unified set of crates with proper backing that can be used for different font-processing tasks, instead of having a fragmented set of crates that tackle different parts of the font stack, but can't really work together. It would also allow us to join efforts on improving that one single set of crates, instead of working on two different ones that more or less do the same.

And again, I really hope you don't interpret this as trying to discredit your work, the fact that you managed to build ttf-parser and rustybuzz (as well as all your other crates) nearly all by yourself is nothing short of amazing, having contributed to both of them in the last few months, I know how much work this must have been for you, and I'm happy that I've also been able to contribute at least a bit to making those crates better. :) As I mentioned, ttf-parser does have some nice advantages that people might care about, so it could be worth it to involve maintainers from some of the other major crates that depend on it in the discussion. But if you ask me whether ttf-parser or the fontations has more potential to become the center of that unified font-processing stack, at least to me it seems like fontations is currently in a much better position to be that.

With that said, I don't want to try to force anything here, perhaps an option for now is just to create a fork of rustybuzz that runs on fontations so that crates using that font stack have a text-shaping solution, and see where things develop from there. But as I mentioned, it would be nice to avoid fragmentation, and I have the feeling that doing this might just result in rustybuzz becoming abandoned again (at least I don't feel like porting the same changes to two different crates...).

I hope that this can help a bit in the discussion.

P.S.: And as for speed, no idea whether one crate is faster than the other, the only way to find out would be to try the benchmarks once we have a working version running on fontations.

PP.S.: And if at some point you do think we should switch, I can definitely take care of trying to port it in the resvg stack of crates.

@RazrFalcon
Copy link

@rsheeter as the man in charge from the Google side I will ask you directly, because I'm super curious: why not ttf-parser?
Again, I get the NIH, but I would like to know if there were any technical reasons for this, if any.
I know that ttf-parser has a couple questionable design decisions that some people do not like, but fontations went mostly in the same route.
At a quick glance fontations is fundamentally identical to ttf-parser. The only two differences I see are:

  • code generation for some table, which I'm very skeptical about
  • a bit more low-level access to internals; ttf-parser never exposes offsets and raw data by design

Maybe I'm missing something big? I would love to know.
The only thing I can think of is that read-fonts might be a bit more suitable for subsetting, which ttf-parser was never designed for. But I'm not sure if this was impossible to implement.

In the end, tff-parser vs read-fonts fragmentation isn't great. I would even say harmful for the Rust ecosystem. And we all know who would win (Google).

The rationale for the overall effort is outlined in https://github.com/googlefonts/oxidize.

Well, C/C++ == bad isn't something you have to convince me in 😄

Ultimately we would like text processing to not require a sandbox under the rule of 2.

That's was my goal with ttf-parser + rustybuzz as well. But unlike fontations you can use them for 4 years now and fontations is still years behind. Not only TrueType parsing isn't finished yet, shaping isn't even started to my knowledge.

Skrifa is built on read-fonts, as is our new font compiler, as is our nascent rewrite of hb-subset.
If I ship rustybuzz on ttf-parser I now have two implementations of font parsing (read-fonts and ttf-parser) shipping to users.

I'm sorry, but isn't this just NIH? If Skrifa was implemented on top of tff-parser it wouldn't be a problem.
Also, aren't you still calling harfbuzz before subsetting? Meaning that each Rust subsetting call goes through C code anyway? How does it solve memory safety then?
Wouldn't the more logical order be to write parsing -> shaping -> subsetting in Rust? Aka effectively write just a subsetter, because the first two are already exist.

We volunteer! Concretely, I suggest that we:

Once again, swapping ttf-parser with fontations and moving it to the harfbuzz org would not change much. That's not the hard part of maintaining rustybuzz.
The only solution is to rewrite harfbuzz itself in Rust.

Not to mention that fontations do not have feature parity with tff-parser at the moment, afaik.
Also, benchmarks.

I would be very interested in your thoughts on what would be different in a better solution.

By a "better solution" I do not mean architecture wise. I'm not a shaping expert to criticize harfbuzz for its shaping logic.
My "better solution" is to have a pure Rust harfbuzz alternative which everyone use (including outside the Rust ecosystem). Aka harfuzz in Rust.
If Google will switch to a new shaper - everyone else will.

Bear in mind that the more the api or results differ from harfbuzz the harder it will be to migrate clients.

Luckily for us the harfbuzz API is trivial. And no one really uses it directly, so only the wrapper libraries would have to adapt. Not an issue.


In the end, all I care about is to have a good shaping library written in Rust that I can use in resvg. And right now, my only option is unfortunately rustybuzz.

I would love to see a PR that would swap ttf-parser with read-fonts in rustybuzz. I'm curious how it will look like.

@RazrFalcon
Copy link

@LaurenzV

skrifa unfortunately indirectly relies on bytemuck_derive, which in turn relies on syn, so it is comparatively heavy unfortunately.

Ugh... None of my projects will ever use or depend on proc-macros. I will die on this hill.

Also, one of the main reasons why many people use/prefer ttf-parser is because it has zero dependencies and near instant compile times.
It takes ~1.744s to build ttf-parser in Release on M1 Pro vs ~9.284s for skrifa.

With that said, it seems like like skrifa is still a more lightweight dependency in terms of binary size?

Well, the difference is tiny and skrifa has less features. No AAT at the moment, afaik. So not an apple-to-apple comparison.

I mean, to be honest, if we look at the commit history of ttf-parser, there are barely any contributions outside of you and me in the last year. Sure, I guess you could argue that ttf-parser is already pretty much feature-complete, but there are still lots of things that could be done to improve the crate

Well, while I had basically zero free time in the past couple of years, ttf-parser is indeed an exception: it is complete.
Yes, performance tweaks, edge-cases, testing, sure. But it's done. Finished.
The only reason fontations is more active is because isn't finished yet.
It's a bad metric.

I will not argue that a company-backed development is more favorable, but it can randomly halt as well.

especially hinting because it's very complex to implement

Is it? I genuinely have no idea how hinting works. And I do not know which one fontations implement. Because there are CFF hints and glyf assembly instructions. Maybe more.
Also, parsing and rendering are two separate steps to my knowledge. And rendering is suppose to be the hard part, which does affect us. No idea.


Again, this discussion boils down to: we have two implementations of a very complicated file format and we have no idea how to compare them, because one side doesn't know anything about the other one.
How can we compare to libs, when I still have no idea what parts of the spec are supported by fontations. I would love to see a table like in the ttf-parser's readme to have at least some high-level understanding.

@LaurenzV
Copy link
Contributor

LaurenzV commented Jul 13, 2024

That's was my goal with ttf-parser + rustybuzz as well. But unlike fontations you can use them for 4 years now and fontations is still years behind.

From what I understood it should support everything that ttf-parser supports except for AAT. Support for some AAT tables seems to have been merged recently, although I'm not sure whether it contains everything needed for rustybuzz. But still, it doesn't seem to me like read-fonts is years behind.

Not only TrueType parsing isn't finished yet, shaping isn't even started to my knowledge.

Shaping indeed hasn't started, but I think the whole point is to use rustybuzz as a basis for that?

The only solution is to rewrite harfbuzz itself in Rust.

Again, isn't that exactly what rustybuzz is? Or are you talking about a complete rewrite, i.e. only the core shaping logic stays the same but the implementation is written 100% with Rust in mind? I'm not exactly sure what you mean.

Also, one of the main reasons why many people use/prefer ttf-parser is because it has zero dependencies and near instant compile times.
It takes ~1.744s to build ttf-parser in Release on M1 Pro vs ~9.284s for skrifa.

I agree, that's why I pointed it out as a clear disadvantage for skrifa.

Well, the difference is tiny and skrifa has less features. No AAT at the moment, afaik. So not an apple-to-apple comparison.

That's why I passed no-default-features to ttf-parser, as this (should?) exclude AAT. But yes, it's possible I'm missing someting else.

I will not argue that a company-backed development is more favorable, but it can randomly halt as well.

I know, that's why I wrote there is no guarantee it will stay this way.

Is it? I genuinely have no idea how hinting works. And I do not know which one fontations implement. Because there are CFF hints and glyf assembly instructions.

I don't know either, but considering the amount of code that was necessery to implement it, it does look pretty complex. But it seems to support both, CFF and glyf hints.

But perhaps it's just best to defer more in-depth discussion until there is a working fork of rustybuzz running on fontations.

@RazrFalcon
Copy link

But still, it doesn't seem to me like read-fonts is years behind.

I never said read-fonts. I was talking about fontations as a ttf-parser + rustybuzz alternative.

Shaping indeed hasn't started, but I think the whole point is to use rustybuzz as a basis for that?

No one will be swapping HB with RB in Chrome. Unless someone will be able to prove it has an identical output, which would be very hard.

Again, isn't that exactly what rustybuzz is?

Ok, "rewrite" was a vague word. I meant replace. Aka there should be no HB in C.

But yes, it's possible I'm missing someting else.

Strange. There is no reason for ttf-parser to be bigger.

I don't know either, but considering the amount of code that was necessery to implement it, it does look pretty complex.

Damn... Half of it seems to be tests, but still a lot.

@rsheeter
Copy link
Collaborator Author

FreeType is a bigger pain point for me than HarfBuzz. That is why FreeType => Skrifa is our first step, meant to ship in the not too distant future (https://chromestatus.com/feature/5717358869217280).

No one will be swapping HB with RB in Chrome. Unless someone will be able to prove it has an identical output, which would be very hard.

We specifically want to reach a position where this is plausible. We aren't there yet but we can get there. If we start from RustyBuzz we can get there significantly faster than if we start from scratch.

To make swapping HB for RB plausible I believe we need to:

  • Swap out ttf-parser for read-fonts
    • By the time we get serious about shaping read-fonts will be shipping to billions of users
    • I don't want to ship both to billions of users, I need to pick a winner
    • All our Rust font processing and generation (read-fonts has a write-fonts twin generated from the same definitions) sits on fontations; from my pov swapping out ttf-parser makes the most sense. I don't believe this will be a particularly large chunk of work.
    • We intend to work on this after Skrifa lands (unless @LaurenzV beats us to it)
  • Establish a test suite to demonstrate RB generally produces the same output as HB
    • As @RazrFalcon notes, this may be a significant chunk of work
  • Establish a perf suite to demonstrate RB is not significantly slower
    • Based on "At the moment, performance isn't that great. We're 1.5-2x slower than harfbuzz." (here) I assume we'll have to invest in trying to close the gap.
  • Establish a fuzzer suite to demonstrate RB is not a security risk
    • HB has been fuzzed for at least 7.5 years, presumably we can lean on it's prior art

@RazrFalcon I appreciate your arguments wrt ttf-parser vs read-fonts. I suggest we simply agree to disagree, I don't really see either of us convincing the other.

IMHO testing and performance are likely to be where the bulk of the remaining work is. I'm optimistic we can get both done and intend to invest in it after Skrifa lands.

@LaurenzV
Copy link
Contributor

For future reference, here is what I recently did to test correctness and find bugs, maybe it can serve as a source of inspiration for whatever you plan on trying once you get to it: harfbuzz/rustybuzz#126 (comment) :)

I was able to find quite a few bugs this way and there are still a few unfixed ones remaining, although I don't know when I'll have time to continue working on it.

@RazrFalcon
Copy link

Will wait for fontations PR to rustybuzz then. Not much to discuss before that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants