Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native compress support #70

Closed
3 tasks
Xuanwo opened this issue Feb 25, 2022 · 8 comments · Fixed by #227
Closed
3 tasks

Native compress support #70

Xuanwo opened this issue Feb 25, 2022 · 8 comments · Fixed by #227
Assignees

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Feb 25, 2022

Make opendal work well with zstd, zip and so on.

For the first stage, we will focus on the read part of compress which means

  • decompress of a single compressed file like xxx.zstd, yyy.gz
  • unarchive of an archived file like aaa.tar, bbb.zip

It's obvious that unarchive support will depend on decompress support.

As requested by our community, we will support zip first to make sure our design is in the right direction.

Promise

  • No API breakage (existing code will work as usual)
  • Zero cost (please sending PR if you can write better implementations)
  • Features gated (compress and archive will be gated by carge features)

Future

The specific API is subject to RFC

After this feature supported, OpenDAL users can read compressed file like the following:

let o = op.object("abc.zip");
let meta = o.stat().await?;
let r = if o.actions().decompress() {
   o.decompress()
} else {
   o.reader()
}
// Read the data as usual.

OpenDAL will help address all problems around gz, zstd, zip, xz, everything works!

One more thing: docker image / CD / DVD are also archived files, so we can ...

Unresolved Tasks

  • Introduce object actions (so that we can treat decompress/unarchive as an object action)
  • Support decompress action
  • Support unarchive action
@wubx
Copy link

wubx commented Mar 29, 2022

We will support unloading table data to the stage, if not support ZIP ontime table CSV format is: 60G+, ZIP is later: 6G+.
if we can read ZIP files, this is also a great option.

@BohuTANG
Copy link

I think compress suites for many pieces compressed files(/a.zip, b.zip ...), not good for one big file.

@wubx
Copy link

wubx commented Mar 29, 2022

Yeah, unloading table data, copy into best practice support multi-file zip

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 1, 2022

I'm working on this feature now!

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 5, 2022

@BohuTANG @wubx I came up with a basic design and have updated the description, PTAL

@BohuTANG
Copy link

BohuTANG commented Apr 5, 2022

Looks great to me 👍

@BohuTANG
Copy link

BohuTANG commented Apr 5, 2022

In the Databend, it will adds:
COMPRESSION = AUTO | GZIP | ZSTD | NONE (Default is NONE)
to formatTypeOptions where COMPRESSION:

AUTO: Compression algorithm detected automatically
NONE: Data files to load have not been compressed.

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 7, 2022

This feature has been implemented, planned to be released in opendal 0.5

@Xuanwo Xuanwo moved this from 🔨 In Progress to 📦 Done in Xuanwo's Work Apr 7, 2022
@Xuanwo Xuanwo moved this to 📦 Done in Databend Storage Layer May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants