Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Lightweight schema change of add/drop column #10135

Closed
3 tasks done
Lchangliang opened this issue Jun 14, 2022 · 1 comment · Fixed by #10136
Closed
3 tasks done

[Feature] Lightweight schema change of add/drop column #10135

Lchangliang opened this issue Jun 14, 2022 · 1 comment · Fixed by #10136
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@Lchangliang
Copy link
Contributor

Lchangliang commented Jun 14, 2022

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Background

Add/drop column are heavy operators. They will do linkedSchemaChange or will copy data when data in s3. When add/drop column frequently, a lot of time is wasted waiting. So we need a new way to optimize the process.

Improvement

This improvement involves three aspects, read, writer, compaction. In original impl, BE will hold the tablet schema, set unique id for each column. When read/writer/compaction, BE can get the schema from tablet meta. The core of the modification is
that get the schema from FE when read/writer. And Every rowset will hold its schema. Using the newest schema when doing compaction.

Modification

  1. Generate Unique ID by FE.
  2. When reading/inserting data, FE will send the newest schema to BE.
  3. When inserting, BE will persistent the schema with rowset meta.
  4. When doing compaction, BE will choose newest schema from compation rowsets and make it persistent with new rowset meta after compaction.
  5. The improvement is only acting on add/drop value. If add/drop key, it will be done by the old way.
  6. It will compatible with old table. Old table is mean that system already has tables before the upgrade. But old table will always do the change by old way although the column is value.

Result

When add/drop value column, they will be lightweight operators. They don't need rewrite the data and complete quickly.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Lchangliang Lchangliang added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 14, 2022
@Lchangliang
Copy link
Contributor Author

TODO:

  1. FE synchronization unique_id from BE for old table.
  2. optimize that too more tablet_schemas are in memory.
  3. About flink connector. Support Light Schema Change and optimize that remove data use streaming load must need head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant