Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: should we split the huge common crate? #11665

Open
TennyZhuang opened this issue Aug 14, 2023 · 3 comments
Open

Discussion: should we split the huge common crate? #11665

TennyZhuang opened this issue Aug 14, 2023 · 3 comments

Comments

@TennyZhuang
Copy link
Contributor

Background and motivation

In our early design, we chose to maintain “common” as a separate crate. This helped us avoid many potential circular dependency issues.

We tried to avoid adding codes that take much compile time in “common”, so the full compilation of “common” is not slow. However, the real problem occurs during incremental compilation. Due to the large number of dependents on “common”, any changes to “common” will trigger a lot of crate recompilation. This problem becomes very evident when we develop and continuously debug by adding a new “mod” in “common”.

The low incremental compilation increased our development cost, which guide us to rethink about the "huge-common-crate" decision. At least, there are many mods in common don't really depends other mods (except the real common ones, like data_type). Should they suffer from these meaningless re-compilation?

Step-by-step migration

A huge reorg will introduce many conflicts, and make it hard to blame via git history. We can propose a step-by-step migration.

  1. For new-added mods under common, we'd prefer make it a single crate if there are no cyclic dependencies.
  2. For single-file mods under common, like format.rs, we can just create a directory for them, and make them a single crate.
  3. For directory-based mods under common, like field_generator, we can just create a new Cargo.toml file, and specify the lib path without nested src directory, so that we don't need to move files in file system.
  4. For crates with complicated and cyclic dependencies, we can leave them in the common crate.
@github-actions github-actions bot added this to the release-1.2 milestone Aug 14, 2023
@TennyZhuang TennyZhuang changed the title Discussion: should we keep a huge common crate? Discussion: should we split the huge common crate? Aug 14, 2023
@liurenjie1024
Copy link
Contributor

I think we had a discussion in early days about "small crates vs large crates". Smaller crates increase compilation speeds, but may cause performance degration since compiler optimization performs mostly on crates. Things may become more complicated when lto is involved. Personally I prefer small crates since it makes things clearer and the optimization from compiler is hard to predict.

@xxchan
Copy link
Member

xxchan commented Aug 14, 2023

For reference, databend's organization is like

common/
  crate1/
  crate2/
meta/
  crate1/
  crate2/
query/
  crate1/
  crate2/

databendlabs/databend#6180

So we may also split other crates into smaller ones. But it doesn't necessarily help, if a crate's modules are linearly depended. e.g., Jon tried to split frontend, but

The reason why this may not work however is that there may be a very strong >> linear dependency between frontend components.

Actually, apart from perhaps batch and stream plannodes, the dependency is > basically linear. So this project will be abandoned.

But meta might be worth splitting, as it contains so many things, and compiles so slow. #9553

image

However, the real problem occurs during incremental compilation. Due to the large number of dependents on “common”, any changes to “common” will trigger a lot of crate recompilation.

It's especially good if you can avoid the common subcrate is not depended by meta.

Step-by-step migration

Similar to #9878, I would recommend "if you feel painful about that you may consider refactoring a small part...". A whole refactoring doesn't necessarily help others if they never modify those code. 😇 But some developers might also be too tolerable about slow compilation, or don't know it can be improved...


P.S., I've thought about this multiple times before. I think splitting common into smaller creates is just purely beneficial. But I didn't change it just because it's not painful for me. 🤪

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants