Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

dumpling may dump extra data if --where is specified #371

Closed
lichunzhu opened this issue Oct 13, 2021 · 1 comment · Fixed by #372
Closed

dumpling may dump extra data if --where is specified #371

lichunzhu opened this issue Oct 13, 2021 · 1 comment · Fixed by #372
Labels
severity/critical This issue is a critical bug type/bug This issue is a bug

Comments

@lichunzhu
Copy link
Contributor

What did you do?

Use dumpling to dump TiDB v5.x data using the --where condition.

What did you expect to see?

Dumpling only dumps data that match where condition.

What did you see instead?

Dumpling dumps some data out of where condition.

Versions of the cluster

Dumpling version (run dumpling -V):

Release version: v5.3.0-alpha-5-gb2388dd-dev
Git commit hash: b2388dd8658bae58a5b8c533ae60aecd577eec08
Git branch:      master
Build timestamp: 2021-10-13 08:46:58Z
Go version:      go version go1.16.4 darwin/amd64```
Source database version (execute `SELECT version();` in a MySQL client):
```console
5.7.25-TiDB-v5.0.0

Other interesting information (system version, hardware config, etc):

>
>
@lichunzhu lichunzhu added type/bug This issue is a bug severity/critical This issue is a critical bug labels Oct 13, 2021
@lichunzhu
Copy link
Contributor Author

lichunzhu commented Oct 13, 2021

Root cause

func buildWhereCondition(conf *Config, where string) string {

When dumpling builds a where condition, if both --where and --rows are specified, dumpling will generate a where condition which is in the following format:

... WHERE ${where} AND ${chunks_split_sql}

which actually should be:

... WHERE (${where}) AND (${chunks_split_sql})

However, dump may dump incorrectly data if either of the following cases is matched:

  1. --where contains OR operator. Dumpling changes (A∨B)∧C directly to A∨B∧C which breaks the distributivity law.
  2. Dumpling(>=v4.0.9 or >=v5.0.0) dumps a TiDB v5.0+ table which contains a CLUSTERED PRIMARY key which contains at least two columns. In this case ${chunks_split_sql} will contain OR operator generated in
    func buildWhereClauses(handleColNames []string, handleVals [][]string) []string {

    which will also breaks the distributivity law.

Comment

(A∨B)∧C = (A∧C)∨(B∧C)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
severity/critical This issue is a critical bug type/bug This issue is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant