Discussion: should we split the huge common crate? #11665

TennyZhuang · 2023-08-14T09:30:09Z

Background and motivation

In our early design, we chose to maintain “common” as a separate crate. This helped us avoid many potential circular dependency issues.

We tried to avoid adding codes that take much compile time in “common”, so the full compilation of “common” is not slow. However, the real problem occurs during incremental compilation. Due to the large number of dependents on “common”, any changes to “common” will trigger a lot of crate recompilation. This problem becomes very evident when we develop and continuously debug by adding a new “mod” in “common”.

The low incremental compilation increased our development cost, which guide us to rethink about the "huge-common-crate" decision. At least, there are many mods in common don't really depends other mods (except the real common ones, like data_type). Should they suffer from these meaningless re-compilation?

Step-by-step migration

A huge reorg will introduce many conflicts, and make it hard to blame via git history. We can propose a step-by-step migration.

For new-added mods under common, we'd prefer make it a single crate if there are no cyclic dependencies.
For single-file mods under common, like format.rs, we can just create a directory for them, and make them a single crate.
For directory-based mods under common, like field_generator, we can just create a new Cargo.toml file, and specify the lib path without nested src directory, so that we don't need to move files in file system.
For crates with complicated and cyclic dependencies, we can leave them in the common crate.

The text was updated successfully, but these errors were encountered:

liurenjie1024 · 2023-08-14T11:07:35Z

I think we had a discussion in early days about "small crates vs large crates". Smaller crates increase compilation speeds, but may cause performance degration since compiler optimization performs mostly on crates. Things may become more complicated when lto is involved. Personally I prefer small crates since it makes things clearer and the optimization from compiler is hard to predict.

xxchan · 2023-08-14T15:45:30Z

For reference, databend's organization is like

common/
  crate1/
  crate2/
meta/
  crate1/
  crate2/
query/
  crate1/
  crate2/

databendlabs/databend#6180

So we may also split other crates into smaller ones. But it doesn't necessarily help, if a crate's modules are linearly depended. e.g., Jon tried to split frontend, but

The reason why this may not work however is that there may be a very strong >> linear dependency between frontend components.

Actually, apart from perhaps batch and stream plannodes, the dependency is > basically linear. So this project will be abandoned.

But meta might be worth splitting, as it contains so many things, and compiles so slow. #9553

However, the real problem occurs during incremental compilation. Due to the large number of dependents on “common”, any changes to “common” will trigger a lot of crate recompilation.

It's especially good if you can avoid the common subcrate is not depended by meta.

Step-by-step migration

Similar to #9878, I would recommend "if you feel painful about that you may consider refactoring a small part...". A whole refactoring doesn't necessarily help others if they never modify those code. 😇 But some developers might also be too tolerable about slow compilation, or don't know it can be improved...

P.S., I've thought about this multiple times before. I think splitting common into smaller creates is just purely beneficial. But I didn't change it just because it's not painful for me. 🤪

github-actions · 2024-06-12T09:01:04Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

TennyZhuang added the type/feature label Aug 14, 2023

github-actions bot added this to the release-1.2 milestone Aug 14, 2023

TennyZhuang added type/refactor needs-discussion and removed type/feature labels Aug 14, 2023

TennyZhuang changed the title ~~Discussion: should we keep a huge common crate?~~ Discussion: should we split the huge common crate? Aug 14, 2023

TennyZhuang modified the milestones: release-1.2, future-release-1.4 Sep 11, 2023

xxchan mentioned this issue Oct 17, 2023

fix(java_binding): fix java binding TLS allocation failure #12862

Merged

8 tasks

fuyufjh removed this from the release-1.4 milestone Nov 8, 2023

github-actions bot added the no-issue-activity label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: should we split the huge common crate? #11665

Discussion: should we split the huge common crate? #11665

TennyZhuang commented Aug 14, 2023

liurenjie1024 commented Aug 14, 2023

xxchan commented Aug 14, 2023 •

edited

Loading

github-actions bot commented Jun 12, 2024

Discussion: should we split the huge common crate? #11665

Discussion: should we split the huge common crate? #11665

Comments

TennyZhuang commented Aug 14, 2023

Background and motivation

Step-by-step migration

liurenjie1024 commented Aug 14, 2023

xxchan commented Aug 14, 2023 • edited Loading

github-actions bot commented Jun 12, 2024

xxchan commented Aug 14, 2023 •

edited

Loading