Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC + Support of BACKUP and RESTORE statements #15274

Merged
merged 10 commits into from
Apr 30, 2020
Merged

Conversation

kennytm
Copy link
Contributor

@kennytm kennytm commented Mar 11, 2020

What problem does this PR solve?

Support running BR inside TiDB directly.

What is changed and how it works?

Recognize the new *ast.BRIEStmt in pingcap/parser#746, and forward to the library functions in BR. When we execute

BACKUP DATABASE `tpcc` TO 'local:///tmp/storage/';

TiDB will spawn a new BR manager which backs up the database tpcc into the provided storage. The query blocks until backup completes. Returns an empty set on success:

MySQL [tpcc]> backup database tpcc to 'local:///tmp/br_tpcc_32';
Empty set (58.453 sec)

and returns an error on failure:

MySQL [tpcc]> backup table tpcc.stock to 'local:///tmp/br_tpcc_30';
ERROR 8124 (HY000): Backup failed: backup meta exists, may be some backup files in the path already

BRIE tasks must be executed sequentially. Currently, for simplicity, tasks are queued in the local server only. In the future we make the entire cluster share the same queue.

Use SHOW BACKUP / SHOW RESTORE in another session to list the tasks

MySQL [(none)]> show backup;
+-------------------------+---------+-------------------+---------------------+---------------------+------+
| Storage                 | State   | Progress          | Init_time           | Step_start_time     | ID   |
+-------------------------+---------+-------------------+---------------------+---------------------+------+
| local:///tmp/br_tpcc_30 | Backup  | 98.38709677419355 | 2020-04-12 23:09:03 | 2020-04-12 23:09:25 |    3 |
| local:///tmp/br_tpcc_30 | Wait    |                 0 | 2020-04-12 23:09:48 | 2020-04-12 23:09:48 |    4 |
+-------------------------+---------+-------------------+---------------------+---------------------+------+

Use KILL TIDB QUERY n to cancel a task.

Note: Currently running RESTORE may make the tables enter a "non-ACID" state where the backup archives are partially ingested. Maybe we need to pessimistically lock the entire database?

Note: No test cases yet. What to do?

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Running backup on a simple table W=30 TPC-C database (2 GB), drop it, and run restore from the archive.

Code changes

Side effects

Related changes

  • Need to update the documentation

Release note

  • Added the BACKUP statement to create a logical backup archive.
  • Added the RESTORE statement to restore from the backup archive. (don't include into release note yet, do so after the entire feature is complete.)

@kennytm kennytm added status/DNM sig/execution SIG execution release-note Denotes a PR that will be considered when it comes time to generate release notes. type/new-feature labels Mar 11, 2020
@kennytm kennytm force-pushed the for-backup branch 6 times, most recently from d35194b to 19fb519 Compare March 17, 2020 19:13
@kennytm kennytm removed the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Mar 17, 2020
@kennytm kennytm changed the title [DNM] Support BACKUP and RESTORE statements Support BACKUP and RESTORE statements Mar 17, 2020
@kennytm kennytm force-pushed the for-backup branch 5 times, most recently from 99e3e22 to 87fb787 Compare March 17, 2020 21:50
@kennytm kennytm changed the title Support BACKUP and RESTORE statements Preliminary support of BACKUP and RESTORE statements Mar 17, 2020
@kennytm kennytm marked this pull request as ready for review March 17, 2020 21:56
@kennytm kennytm requested review from a team as code owners March 17, 2020 21:56
@ghost ghost requested review from SunRunAway, wshwsh12, francis0407 and lzmhhh123 and removed request for a team March 17, 2020 21:56
@kennytm
Copy link
Contributor Author

kennytm commented Mar 17, 2020

We have a dependency problem preventing the plugins to be run, which blocks the Required "idc-jenkins-ci-tidb/build" CI.

  1. BR imports zap 1.14.0, which also means TiDB's zap version is increased to 1.14.0 too.
  2. But the plugins still use zap 1.9.1.
  3. And thus we get the "plugin was built with a different version of package go.uber.org/multierr" error (multierr is a dependency of zap).

OTOH we can't upgrade the plugin's dependency to 1.14.0 before this PR is merged, because this would cause the version mismatch error in the other way, and blocks other PRs.

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@kennytm merge failed.

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@kennytm merge failed.

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

"The page you are looking for is temporarily unavailable. Please try again later."

Take a rest.

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@kennytm merge failed.

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@kennytm merge failed.

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

https://internal.pingcap.net/idc-jenkins/blue/rest/organizations/jenkins/pipelines/tidb_ghpr_unit_test/runs/35187/nodes/69/steps/321/log/?start=0

[2020-04-30T10:44:18.199Z] ----------------------------------------------------------------------
[2020-04-30T10:44:18.199Z] FAIL: point_get_test.go:470: testPointGetSuite.TestSelectCheckVisibility
[2020-04-30T10:44:18.199Z] 
[2020-04-30T10:44:18.199Z] point_get_test.go:491:
[2020-04-30T10:44:18.199Z]     // Test point get.
[2020-04-30T10:44:18.199Z]     checkSelectResultError("select * from t where a='1'", tikv.ErrGCTooEarly)
[2020-04-30T10:44:18.199Z] point_get_test.go:487:
[2020-04-30T10:44:18.199Z]     c.Assert(err, NotNil)
[2020-04-30T10:44:18.199Z] ... value = nil
[2020-04-30T10:44:18.199Z] 

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

@kennytm merge failed.

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/tidb_ghpr_integration_copr_test/detail/tidb_ghpr_integration_copr_test/4446/pipeline

[2020-04-30T11:28:53.945Z] Statement: #813 -  SELECT SHA1( `col_smallint_key` ) AS field1, ENCODE( `col_bit_key`, `col_bit` ) AS field2 FROM `table1_int_autoinc` WHERE ! `col_float_key` ORDER BY field1, field2 LIMIT 4 /* QNO 815 CON_ID 152 */ ;

[2020-04-30T11:28:53.945Z] NoPushDown Output: 

[2020-04-30T11:28:53.945Z] field1	field2

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] WithPushDown Output: 

[2020-04-30T11:28:53.945Z] field1	field2

[2020-04-30T11:28:53.945Z] fcaf7c134f49c65f2c5765d81c181f3334cefe18	

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] NoPushDown Plan: 

[2020-04-30T11:28:53.945Z] id	estRows	task	access object	operator info

[2020-04-30T11:28:53.945Z] Projection_7	0.00	root		sha1(cast(push_down_test_db.table1_int_autoinc.col_smallint_key, var_string(20)))->Column#62, encode(cast(push_down_test_db.table1_int_autoinc.col_bit_key, var_string(20)), cast(push_down_test_db.table1_int_autoinc.col_bit, var_string(20)))->Column#63

[2020-04-30T11:28:53.945Z] └─Projection_23	0.00	root		push_down_test_db.table1_int_autoinc.col_smallint_key, push_down_test_db.table1_int_autoinc.col_bit, push_down_test_db.table1_int_autoinc.col_bit_key, push_down_test_db.table1_int_autoinc.col_float_key

[2020-04-30T11:28:53.945Z]   └─TopN_8	0.00	root		Column#66:asc, Column#67:asc, offset:0, count:4

[2020-04-30T11:28:53.945Z]     └─Projection_24	0.00	root		push_down_test_db.table1_int_autoinc.col_smallint_key, push_down_test_db.table1_int_autoinc.col_bit, push_down_test_db.table1_int_autoinc.col_bit_key, push_down_test_db.table1_int_autoinc.col_float_key, sha1(cast(push_down_test_db.table1_int_autoinc.col_smallint_key, var_string(20)))->Column#66, encode(cast(push_down_test_db.table1_int_autoinc.col_bit_key, var_string(20)), cast(push_down_test_db.table1_int_autoinc.col_bit, var_string(20)))->Column#67

[2020-04-30T11:28:53.945Z]       └─TableReader_13	0.00	root		data:Selection_12

[2020-04-30T11:28:53.945Z]         └─Selection_12	0.00	cop[tikv]		not(istrue(push_down_test_db.table1_int_autoinc.col_float_key))

[2020-04-30T11:28:53.945Z]           └─TableFullScan_11	1.00	cop[tikv]	table:table1_int_autoinc	keep order:false, stats:pseudo

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] 

[2020-04-30T11:28:53.945Z] WithPushDown Plan: 

[2020-04-30T11:28:53.945Z] id	estRows	task	access object	operator info

[2020-04-30T11:28:53.945Z] Projection_7	4.00	root		sha1(cast(push_down_test_db.table1_int_autoinc.col_smallint_key, var_string(20)))->Column#62, encode(cast(push_down_test_db.table1_int_autoinc.col_bit_key, var_string(20)), cast(push_down_test_db.table1_int_autoinc.col_bit, var_string(20)))->Column#63

[2020-04-30T11:28:53.945Z] └─Projection_23	4.00	root		push_down_test_db.table1_int_autoinc.col_smallint_key, push_down_test_db.table1_int_autoinc.col_bit, push_down_test_db.table1_int_autoinc.col_bit_key, push_down_test_db.table1_int_autoinc.col_float_key

[2020-04-30T11:28:53.945Z]   └─TopN_9	4.00	root		Column#66:asc, Column#67:asc, offset:0, count:4

[2020-04-30T11:28:53.945Z]     └─Projection_24	20.00	root		push_down_test_db.table1_int_autoinc.col_smallint_key, push_down_test_db.table1_int_autoinc.col_bit, push_down_test_db.table1_int_autoinc.col_bit_key, push_down_test_db.table1_int_autoinc.col_float_key, sha1(cast(push_down_test_db.table1_int_autoinc.col_smallint_key, var_string(20)))->Column#66, encode(cast(push_down_test_db.table1_int_autoinc.col_bit_key, var_string(20)), cast(push_down_test_db.table1_int_autoinc.col_bit, var_string(20)))->Column#67

[2020-04-30T11:28:53.945Z]       └─IndexLookUp_16	20.00	root		

[2020-04-30T11:28:53.945Z]         ├─IndexRangeScan_14(Build)	20.00	cop[tikv]	table:table1_int_autoinc, index:col_float_key(col_float_key)	range:[NULL,NULL], [0,0], keep order:false, stats:pseudo

[2020-04-30T11:28:53.945Z]         └─TableRowIDScan_15(Probe)	20.00	cop[tikv]	table:table1_int_autoinc	keep order:false, stats:pseudo

@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

/run-all-tests

@sre-bot sre-bot merged commit 43764a5 into pingcap:master Apr 30, 2020
@kennytm
Copy link
Contributor Author

kennytm commented Apr 30, 2020

/run-cherry-picker

@sre-bot
Copy link
Contributor

sre-bot commented Apr 30, 2020

cherry pick to release-4.0 in PR #16960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/release-blocker This issue blocks a release. Please solve it ASAP. sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants