Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumpling & Lightning supports S3 #3331

Open
DanielZhangQD opened this issue Sep 30, 2020 · 11 comments
Open

Dumpling & Lightning supports S3 #3331

DanielZhangQD opened this issue Sep 30, 2020 · 11 comments
Assignees
Labels
area/tools enhancement New feature or request Hacktoberfest status/help-wanted Extra attention is needed
Milestone

Comments

@DanielZhangQD
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

Dumpling has supported S3 in pingcap/dumpling#155, Lightning also supports S3 so we should support this in Operator.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@DanielZhangQD DanielZhangQD added this to the v1.2.0 milestone Sep 30, 2020
@DanielZhangQD DanielZhangQD added the status/help-wanted Extra attention is needed label Sep 30, 2020
@lichunzhu lichunzhu self-assigned this Sep 30, 2020
@lichunzhu lichunzhu removed their assignment Nov 25, 2020
@DanielZhangQD DanielZhangQD modified the milestones: v1.2.0, v1.2.0-rc.1 Jan 5, 2021
@DanielZhangQD DanielZhangQD added the enhancement New feature or request label Mar 26, 2021
@DanielZhangQD DanielZhangQD modified the milestones: v1.2.0-rc.1, v1.3.0 Apr 6, 2021
@zwpaper
Copy link

zwpaper commented Oct 10, 2021

/assign

@zwpaper
Copy link

zwpaper commented Oct 21, 2021

hi @DanielZhangQD, I was working on this feature, but I found that dumpling will dump backup to s3 use in dir, not the tgz format like the rclone we used before.

Is this ok to break this behavior, or should I raise a PR to support saving a tgz to s3?

@DanielZhangQD
Copy link
Contributor Author

@zwpaper Sorry for the late response!
Thanks for contributing to this issue!
I think we can:

  1. If lightning support restoring from the dir directly, we can go ahead with this.
  2. We also need to support the original backup with dumpling and restore with lightning using rclone because
  • We need to support the previous versions of dumpling and lightning that cannot access S3 directly
  • Keep the backward compatibility in case the user wants to restore the previous tag file after upgrading the TiDB Operator, so we may have to add options in the CR and only use the S3 direct access way when the option is enabled explicitly.
  • GCS support

@zwpaper
Copy link

zwpaper commented Oct 22, 2021

Hi @DanielZhangQD, I have checked both lightning and dumpling, both of them supported both s3 and gcs, so that we could drop the rclone for both s3 and gcs.

as for the compatibility part, you point out a good point! I will keep the rclone, and add an option in backup to indicate the compress.

how about an option spec.compress: true, true to use the original rclone with tgz while false for the native dumpling with dir.

yaml like this:

apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
  name: demo1-backup-s3
spec:
  compress: true
  from:
    host: advanced-tidb-tidb
    port: 4000
    user: root
    secretName: backup-demo1-tidb-secret
  s3:
    provider: aws
    secretName: s3-secret
    bucket: tidb
    # prefix: ${prefix}
    storageClass: STANDARD
    # acl: private

PS, I found an issue in dumpling may fix this compress issue, I will try to work on it after finish this one pingcap/dumpling#220

@DanielZhangQD
Copy link
Contributor Author

@csuzhangxc WDYT?

@csuzhangxc
Copy link
Member

csuzhangxc commented Oct 26, 2021

I agree we can add a new option for this dir behavior, but we may keep the original/tgz as the default behavior. so, this new option has a default false (not spec.compress: true) may be better? but I have no idea about this new option's name now.

@zwpaper
Copy link

zwpaper commented Oct 26, 2021

if the spec.compress named is our choice, then the default option spec.compress: true would keep the original/tgz behavior.

btw, I am trying to add this compress output in dumpling, I may look back to this after the dumpling try.

@DanielZhangQD
Copy link
Contributor Author

I think spec.compress is not a good name, the key is whether the dumpling/lightning accesses the remote storage directly, what about the name spec.accessRemoteStorageDirectly (just a suggestion) and default to false to use the current way. And we have to make it clear in the doc that this option only applies to Dumpling and Lightning.
BTW, it seems that even if the process accesses the remote storage directly, we still have to mount a PV for Lightning to sort the data, right? Does Dumpling also need a PV? @csuzhangxc @july2993

@csuzhangxc
Copy link
Member

we still have to mount a PV for Lightning to sort the data, right?

YES

Does Dumpling also need a PV?

NO

@zwpaper
Copy link

zwpaper commented Oct 27, 2021

Hi @DanielZhangQD, the reason I proposed spec.compress is that I thought we should encapsulate the implementation inside the operator, and expose the user interfaces to users.

users should care about neither the dumpling nor the rclone part, what they should care about is how their backups are stored, which is compressed, right?

spec.accessRemoteStorageDirectly indicates how the backups are transferred, but did not tell users their backups are stored in directory or tgz.

@DanielZhangQD
Copy link
Contributor Author

@zwpaper Sorry that I missed the above comment.
With spec.compress, what if the dumpling supports compressing the files and then uploading to S3 as you have shown the issue pingcap/dumpling#220?
I agree that spec.accessRemoteStorageDirectly is not a good name.
Any other names for suggestions? (I cannot find a good one myself...)

@DanielZhangQD DanielZhangQD modified the milestones: v1.3.0, v1.4.0 Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tools enhancement New feature or request Hacktoberfest status/help-wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants