Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-5829: Support field comment. #5952

Merged
merged 6 commits into from
Jun 17, 2022
Merged

ISSUE-5829: Support field comment. #5952

merged 6 commits into from
Jun 17, 2022

Conversation

RinChanNOWWW
Copy link
Contributor

@RinChanNOWWW RinChanNOWWW commented Jun 13, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  • Table comment is already supported.
  • Support field comment for table under the new planner.

image

Currently, the comment is stored in DataField:

pub struct DataField {
    name: String,
    /// default_expr is serialized representation from PlanExpression
    default_expr: Option<String>,
    #[ignore_malloc_size_of = "insignificant"]
    data_type: DataTypeImpl,
    comment: Option<String>,
}

@youngsofun suggested:

just store column comments in tableInfos in catelog . don`t need to carry it in Schema. #5939 (reply in thread)

I don't know which one is better. I think the size in storage and the retrieving efficiency is the same.

And I only implemented it under the new parser and planner, should I implement it in the old one? If so, it seems need to modify sqlparser-rs.

==================

2022-06-14 12:54 Update:

Put field comments in TableMeta:

pub struct TableMeta {
    pub schema: Arc<DataSchema>,
    pub engine: String,
    pub engine_options: BTreeMap<String, String>,
    pub options: BTreeMap<String, String>,
    pub cluster_key: Option<String>,
    // The vector of cluster keys.
    pub cluster_keys: Vec<String>,
    // The default cluster keys id.
    pub default_cluster_key_id: Option<u32>,
    pub created_on: DateTime<Utc>,
    pub updated_on: DateTime<Utc>,
    pub comment: String,
    pub field_comments: Vec<String>,

    // if used in CreateTableReq, this field MUST set to None.
    pub drop_on: Option<DateTime<Utc>>,
    pub statistics: TableStatistics,
}

Like MySQL, if comment is an empty string(''), ignore it:

image

Changelog

  • New Feature

Related Issues

Fixes #5829

@vercel
Copy link

vercel bot commented Jun 13, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jun 17, 2022 at 6:18AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Jun 13, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Jun 13, 2022
@youngsofun
Copy link
Member

youngsofun commented Jun 13, 2022

for example, a schema is used not only for tables but also for results and intermedia results of query and used in the plan, block etc.
but only the user-defined tables need comments.
even for user-defined tables, comments are not related to any computing.
it is only used when desc table.

@BohuTANG BohuTANG requested review from leiysky and andylokandy June 13, 2022 13:37
@RinChanNOWWW
Copy link
Contributor Author

for example, a schema is used not only for tables but also for results and intermedia results of query and used in the plan, block etc. but the user-defined tables need comments. even for user-defined tables, comments are not related to any computing. it is only used when desc table.

I see. I will change it later.

@RinChanNOWWW RinChanNOWWW marked this pull request as draft June 13, 2022 14:41
@RinChanNOWWW RinChanNOWWW marked this pull request as ready for review June 14, 2022 06:00

-- create table with column comment
-- SELECT '====CREATE TABLE WITH COLUMN COMMENT====';
-- CREATE TABLE column_comment_test (a INT COMMENT 'comment for a', b FLOAT NULL DEFAULT 0 COMMENT 'comment for b');
Copy link
Member

@BohuTANG BohuTANG Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disable the test here?

Copy link
Contributor Author

@RinChanNOWWW RinChanNOWWW Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the test_stateless_cluster_* will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should deep dive why the cluster has errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is because #4544 is not finished.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the create table need cluster, what's the error is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR 1105 (HY000) at line 62: Code: 1022, displayText = COMMENT 'comment for a' column option is not supported, please do not specify them in the CREATE TABLE statement.
ERROR 1105 (HY000) at line 63: Code: 1025, displayText = Unknown table 'column_comment_test'.

This will not happen in standalone test.

Do standalone and cluster use different parsers?

Copy link
Contributor Author

@RinChanNOWWW RinChanNOWWW Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is because the sqls will not be parsed and executed on the same node, and set enable_planner_v2 = 1 don't take effect on every node?

And CREATE TABLE column_comment_test (a INT COMMENT 'comment for a', b FLOAT NULL DEFAULT 0 COMMENT 'comment for b') is parsed on a node using planner_v1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RinChanNOWWW I think the reason is that we won't enable new planner in cluster mode for now, see https://github.com/datafuselabs/databend/blob/04a6c54fb958e424d2cacdb06eec09e0360525eb/query/src/servers/mysql/mysql_interactive_worker.rs#L301-L302

It is because new planner haven't implemented distributed query yet, which we are working on recently.

To resolve the test issue, you can create a particular result file for cluster mode, just as https://github.com/datafuselabs/databend/blob/main/tests/suites/0_stateless/04_explain/04_0000_explain_cluster.result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip this like ./databend-test '^0[^4]_' --mode 'cluster' --run-dir 0_stateless --skip '02_0057_function_nullif'

@RinChanNOWWW
Copy link
Contributor Author

RinChanNOWWW commented Jun 15, 2022

image

It seems that the locations of error messages are different between test_stateless_cluster_linux and test_stateless_cluster_macos...

If it is better to comment the tests at this time...

@leiysky
Copy link
Contributor

leiysky commented Jun 15, 2022

If it is better to comment the tests at this time...

I think so. @BohuTANG What do you think of this?

@BohuTANG
Copy link
Member

There are conflicts files

If it is better to comment the tests at this time...

I think so. @BohuTANG What do you think of this?

Ok.
Conflict files :) @RinChanNOWWW

@RinChanNOWWW
Copy link
Contributor Author

Conflicts solved.

@BohuTANG BohuTANG requested a review from drmingdrmer June 16, 2022 05:22
@@ -183,6 +183,7 @@ pub struct TableMeta {
pub created_on: DateTime<Utc>,
pub updated_on: DateTime<Utc>,
pub comment: String,
pub field_comments: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is field_comments a vec of exactly the same number of elements as schema?

If it is, does it mean it does not distinguish between an empty string comment and no comment at all?

Copy link
Contributor Author

@RinChanNOWWW RinChanNOWWW Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is field_comments a vec of exactly the same number of elements as schema?

Yes.

If it is, does it mean it does not distinguish between an empty string comment and no comment at all?

Empty string comment will be considered as no comment, just like MySQL.

image

// compatibility: creating table in the old planner will not have `fields_comments`
let comment = if field_comments.len() == n_fields && !field_comments[idx].is_empty()
{
format!(" COMMENT '{}'", &field_comments[idx])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the field_comments[idx] need to be escaped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be escaped

What does this mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. when there is a single quote ' in the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't consider this. I'll take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create table t (a int comment 'aaa\'bbb\'ccc');
show create table t;

MySQL's result:

+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                         |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` int DEFAULT NULL COMMENT 'aaa''bbb''ccc'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+

MySQL split the string.

This PR's result:

+-------+------------------------------------------------------------------+
| Table | Create Table                                                     |
+-------+------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` INT COMMENT 'aaa'bbb'ccc'
) ENGINE=FUSE |
+-------+------------------------------------------------------------------+

Should we be the same with MySQL's behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need to follow MySQL style, 'aaa''bbb''ccc' is hard to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should be able to create a new table with this format SQL.

Copy link
Contributor Author

@RinChanNOWWW RinChanNOWWW Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean create table t (a int comment 'aaa''bbb''ccc'); ? This will fail during the parse phase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I suggest to display as create table t (a int comment 'aaa\'bbb\'ccc');, thus we can use this SQL to create table in another instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the expected show-create-table result is:

+-------+------------------------------------------------------------------+
| Table | Create Table                                                     |
+-------+------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` INT COMMENT 'aaa\'bbb\'ccc'
) ENGINE=FUSE |
+-------+------------------------------------------------------------------+

right?

I'll update tomorrow.

@@ -114,6 +114,7 @@ fn new_table_info() -> mt::TableInfo {
created_on: Utc.ymd(2014, 11, 28).and_hms(12, 0, 9),
updated_on: Utc.ymd(2014, 11, 29).and_hms(12, 0, 10),
comment: s("table_comment"),
field_comments: vec![],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has to add some field comments and check if it encodes/decodes the table meta correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

48, 57, 32, 85, 84, 67, 170, 1, 23, 50, 48, 49, 52, 45, 49, 49, 45, 50, 57, 32, 49, 50,
58, 48, 48, 58, 49, 48, 32, 85, 84, 67, 178, 1, 13, 116, 97, 98, 108, 101, 95, 99, 111,
109, 109, 101, 110, 116, 160, 6, 1, 160, 6, 1,
10, 10, 8, 5, 16, 6, 160, 6, 1, 168, 6, 1, 18, 3, 102, 111, 111, 26, 3, 98, 97, 114,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These data is for compatibility test that new program can load old data. Do not modify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta data part looks good to me.

BTW, you can increase the VER to indicate it's a new version of meta data. It is not mandatory: not changing it means it has to be compatible with all changes happened in this VER=1:
https://github.com/datafuselabs/databend/blob/400fd52134cba2cc09eacad4d6a8d18f914b8410/common/proto-conv/src/util.rs#L17

Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
Signed-off-by: RinChanNOWWW <hzy427@gmail.com>
@BohuTANG BohuTANG merged commit 02a7b11 into databendlabs:main Jun 17, 2022
@RinChanNOWWW RinChanNOWWW deleted the ISSUE-5829-FIELD-COMMENT branch June 17, 2022 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Support table & field comment
7 participants