Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New formats support #84

Open
7 of 17 tasks
aonez opened this issue Oct 17, 2017 · 46 comments
Open
7 of 17 tasks

New formats support #84

aonez opened this issue Oct 17, 2017 · 46 comments
Assignees
Milestone

Comments

@aonez
Copy link
Owner

aonez commented Oct 17, 2017

Some compression formats that could be added. Being in the list does not mean they will be added, just taked into account:

Just extraction:

@aonez aonez added this to the Future milestone Oct 17, 2017
@aonez aonez self-assigned this Oct 17, 2017
@dezzeus
Copy link

dezzeus commented Mar 12, 2018

It would be nice to also have Zstandard.

@MaxPower85
Copy link

Since you have lrzip on the list, add zpaq too since lrzip can optionally use zpaq for it's 2nd stage... but you can use zpaq independently too.

You can also add rzip... lrzip is similar, but it's not the same format.

Maybe add Apple's lzfse too... but I'm not sure did Apple mean it to be used on it's own as a format for archives (or did they mean it to just be used within some other formats), since I can't find info about what kind of extension could be used for archives compressed with lzfse... although you can compress some file or a tar archive with lzfse and it seems pretty good for a format that doesn't use multithreading... and people are saying that lzfse is supposed to be energy efficient... but even the file command on Sierra doesn't seem to recognize what type of archive is that if you compress files with lzfse...

https://github.com/lzfse/lzfse

If you look at a file compressed with lzfse in some HexEditor, it says "bvx2" at the begining... and here's a clue about what that means: https://github.com/lzfse/lzfse/blob/497c5c176732769abf36ccc71a31c06bad93a84d/src/lzfse_internal.h#L276-L281

So it doesn't seem that it would be difficult to recognize lzfse compressed archives... but the question is did Apple intend for it to be used just on it's own like bzip2 or gzip.

It can also be used for compressed .dmg images when you create a compressed .dmg with hdiutil and you use -format ULFO like hdiutil create -volname vol_name -srcfolder source_folder -ov -format ULFO new_dmg_image.dmg

I'm reading that 7z beta for Windows has added support for .dmg images that use lzfse compression... but 7z for macOS or Linux doesn't seem to recognize them yet.

@yetisyny
Copy link

The .WIM format (Windows Imaging Format) has been supported for both compression and decompression by 7-Zip for Windows for several years. Since it is part of the relatively short list of filetypes 7-Zip for Windows supports not only reading but also writing to, even in the GUI, it ought to be included in Keka for feature parity with the Windows version of 7-Zip. There is also already a library and utility for the .WIM format that is cross-platform, at https://wimlib.net/, although this library is under GNU GPL version 3 so you cannot use it legally unless you start using that license too which I doubt you would want to do.

So using the 7-Zip implementation would probably work better license-wise. And actually the 7-Zip implementation for the .WIM format is already included in the p7zip ports to UNIX-based operating systems (including macOS, Linux, etc.). So directly using p7zip is probably the easiest way to do this, in fact you already use p7zip for other things. And as far as the virtues of the .WIM format or why anyone would want to use it, it is a file-based imaging format that can archive advanced filesystem features and can be used with several different compression algorithms, and is in widespread use, especially by Microsoft which uses it for everything. Plus it is the ONLY compression format supported by the GUI and command-line versions 7-Zip for Windows which Keka does not also support, so adding it would bring Keka to feature parity with 7-Zip regarding supported formats to compress to, and of course it is also there in p7zip too. The other formats 7-Zip advertises on its website as being able to compress besides .WIM are 7Z, XZ, BZIP2, GZIP, TAR, ZIP, and you already support all of those! (I think 7-Zip also supports maybe a few more such as ISO but those are not mentioned there, anyway you already support ISO too.)

@aonez
Copy link
Owner Author

aonez commented Mar 29, 2018

this library is under GNU GPL version 3 so you cannot use it legally

If that is true, then lbzip2 also can't be bundled within Keka.

@d235j
Copy link

d235j commented May 3, 2018

Regarding bundling, please see https://www.gnu.org/licenses/gpl-faq.en.html#MereAggregation. If the proprietary components are not linking to the GPL components, then you should be OK; however, you need to provide source code to the GPL components.

@aonez
Copy link
Owner Author

aonez commented May 4, 2018

Thanks @d235j, you're right. Already started pushing the GPL code here 😊

@magitk
Copy link

magitk commented Jun 18, 2018

+1 for zpaq

@dh1337
Copy link

dh1337 commented Jan 14, 2019

any news on brotli?

@gingerbeardman
Copy link
Contributor

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

@p2k
Copy link

p2k commented Jan 14, 2019

+1 for zpaq

It is the best pack format I know combining deduplication and a strong compression that outperforms every competitor. It actually allows multiple versions of the same file(s) so it is suitable for incremental backups. Needless to say it offers industrial standard encryption.

Having a GUI for zpaq would be a bliss, but is considerably harder to do than for all the other formats since it has some unique features (like the aforementioned multi-version capability).

More information on zpaq: http://mattmahoney.net/dc/zpaq.html

@dh1337
Copy link

dh1337 commented Jan 14, 2019

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

I read some interesting benchmarks lately (e.g.: http://www.instantshift.com/2018/03/02/gzip-vs-brotli-compression/)
On the same brotli is supported by the 7z extension (see: https://github.com/mcmilk/7-Zip-zstd) and I would love to have the same "compatbility" in Keka compared to 7z on windows.
I feel like having better performance for some usecases and being supported next to gzip in all major browsers makes it a defacto standard (see: https://caniuse.com/#search=brotli).

@gingerbeardman
Copy link
Contributor

@p2k is this not a problem?

zpaq is for user-level backups. Do not use it to back up the operating system or any software that requires a password to install. zpaq saves regular files and directories, last-modified dates (to the nearest second), and (optionally) Windows attributes or Linux permissions. It does not follow or save symbolic links or junctions. It unknowingly follows hard links. It does not save owner or group IDs, ACLs, extended attributes, the registry, or special file types like devices, sockets, or named pipes.

@p2k
Copy link

p2k commented Jan 14, 2019

@gingerbeardman not for me. I don't use it to backup an operating system or things like an .app bundle on macOS (which often contain symlinks). But if I wanted to, I could always resort to piping a tar archive to zpaq.

It might be an idea to do a pre-check when archiving stuff with zpaq, though, and warn the user. That's a good point.

@aonez
Copy link
Owner Author

aonez commented Jan 14, 2019

@denishamann1337 I checked again and still Brotli does not even have a magic number. So it is still focused in data stream over the network, for use in browsers. That is why it is compared with gzip, also used in browsers.

That said, as it is fairly easy to add support for Brotli, here a test build:
https://github.com/aonez/Keka/releases/tag/dev-test-builds

@dh1337
Copy link

dh1337 commented Jan 14, 2019

@aonez I see, I assumed the magic number was existant by now. Thank for the effort for checking :)

@jamie-arcc
Copy link

+1 for Zstd and zpaq!

@systemcrash
Copy link

+1 for Zstd / Zstandard

dual BSD and GPLv2 licensed C library

@aonez
Copy link
Owner Author

aonez commented Jun 25, 2019

@jamie-arcc and @systemcrash check out the latest v1.2.0-dev.3494 test build, it has Zstandard support 😊

@systemcrash
Copy link

First thoughts on
https://github.com/aonez/Keka/releases/tag/v1.2.0-dev.3417

What Zstd compression numbers correspond to the slider? (Store, Fastest, Fast...) - could this info be hinted in the GUI?

-# : # compression level (1-19, default: 3)
Store = 1
Fastest = 4
Fast = 7
Normal = 10
Slow = 14
Slowest = 19
?

@aonez
Copy link
Owner Author

aonez commented Jun 26, 2019

@systemcrash it goes 1, 2, 3, 6, 8 and 9. The method (level) slider should be enhanced to adapt #112. Most cases use 0-9, this case and also RAR (0-5) are different. Also a dynamic slider is much needed for a finer selection.

@systemcrash
Copy link

Forget everything above 15 - tradeoffs are rarely worth it for the diminishing gains above level 15.

why make things static? Look at the library range, then draw the slider based on this. Now 6 stops on the slider,
(int)floor(15/6 * 1)
(int)floor(15/6 * 2)
(int)floor(15/6 * 3)
(int)floor(15/6 * 4)
(int)floor(15/6 * 5)
(int)floor(15/6 * 6)

You closed the source because of all the copy-cats in the App Store, yah?

@aonez
Copy link
Owner Author

aonez commented Jun 27, 2019

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

You closed the source because of all the copy-cats in the App Store, yah?

It was the trigger, yep.

@systemcrash
Copy link

7z is a format - not an algorithm. Which algo was used LZMA?

@gingerbeardman
Copy link
Contributor

How is the support for Zstd across platforms?

@akrabu
Copy link

akrabu commented Sep 16, 2019

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

For what it's worth, I ran the latest build (1.2.0.3542) at the highest compression level for Zstd on an old Outlook PST file I was intending to archive, and achieved the following:

Original: 7.34GB
Brotli: 5.84GB (Keka, slowest method)
Zstd: 5.11GB (Keka, slowest method)
7z: 4.86GB (Keka, slowest method)
ZPAQ: 4.84GB (zpaq a mailbox.pst.zpaq mailbox.pst -m5)
XZ: 4.53GB (xz -e --lzma2=preset=9,dict=1610612736,nice=273 --memory=90% mailbox.pst)
Zstd: 4.46GB (zstd -22 --ultra --long=31 --single-thread mailbox.pst)
Lrzip (LZMA): 4.44GB (lrzip --lzma -L 9 -U mailbox.pst)
Lrzip (ZPAQ): 4.34GB (lrzip -z -L 9 -U mailbox.pst)

The "long range mode" in Zstd is rather impressive. The only thing that seems to beat it is Lrzip (aka Long Range ZIP, not LZIP), which takes significantly longer (and the ZPAQ method takes the same amount of time to DEcompress as well - in this case, 10 hours).

With that in mind, could we...

  • Leave the current Zstd compression slider as-is.
  • Create a slider for the window size
  • Create a checkmark for "--ultra" that would go straight to level 22 and grey out the compression level slider (but not the window slider)

Apologies if I'm over-complicating the UI, but I thought I'd throw it out there. I just really love using Zstd's long range option for very large files with redundant data (archiving mailboxes, for instance). It works WAY faster than Lrzip, which tries to do something somewhat similar. Zstd appears to use a window of 2147483648 bytes (~2GB) to look for patterns, at least on this specific test file, which isn't quite as effective as Lrzip's "sliding window" but it sure performs faster.

Note: Zstd will throw an error during testing or extraction if you don't use a large enough window for an archive that was compressed with a larger than normal window. Example:

akrabu-macbook-air:~ akrabu$ zstd --test mailbox.pst.zst
mailbox.pst.zst : Decoding error (36) : Frame requires too much memory for decoding
mailbox.pst.zst : Window size larger than maximum : 2147483648 > 134217728
mailbox.pst.zst : Use --long=31 or --memory=2048MB

This also means that, presently, Keka will fail to extract files made with large windows:

Screen Shot 2019-09-16 at 2 34 52 PM

Ps. I also tried Brotli's --large-window option, but it was unremarkable in this case, and resulted in a size comparable to what Keka's max accomplished already.

@MaxPower85
Copy link

  • zpaq (LRZIP as suggested by @MaxPower85) -> 1.2.0r 3806+ LRZIP in slow method

This needs a correction...

LRZIP can use various compression formats on parts of the archive, but it's a separate format... it can use ZPAQ, but ZPAQ is its own archiving format which can be quite useful to have on its own too, especially if files that share a lot of the same data are added to an existing archive later, since it does not compress them again and just reuses the data that was the same... the archive can also be "rolled back" to retrieve an earlier version of some file.

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Aug 30, 2021

DAR (Disk ARchive)
https://dar.sourceforge.io

@akrabu
Copy link

akrabu commented Aug 30, 2021

Oh I'd love to have DAR support. It can do SO much. I just thought it might be too much to support in such a little Keka window, you know? It's SO configurable, though I guess basic support would be fine.

I use it to make 50GB archives with par2 files and burn them all to Blu-rays to back up my pictures.

@gingerbeardman
Copy link
Contributor

Another odd LZH, from Atari ST

http://discmaster.textfiles.com/file/11869/www.umich.edu.archive.2014.03.zip/www.umich.edu/~archive/atari/Games/Puzzle/nanjin11.lzh

@akrabu
Copy link

akrabu commented May 4, 2023

Another odd LZH, from Atari ST

http://discmaster.textfiles.com/file/11869/www.umich.edu.archive.2014.03.zip/www.umich.edu/~archive/atari/Games/Puzzle/nanjin11.lzh

Wow. That brings back memories. Been a long time since I came across an LHA/LZH archive!

@gingerbeardman
Copy link
Contributor

gingerbeardman commented May 4, 2023

Wow. That brings back memories. Been a long time since I came across an LHA/LZH archive!

I dive into old software often and this was also a wow moment for me @akrabu !

lzh are encountered frequently on classic Mac, especially with Japanese software. Atari ST was my first computer!

@Sytten
Copy link

Sytten commented Aug 2, 2023

The latest version doesn't seem to be able to decompress zip with zstd compression.

@aonez
Copy link
Owner Author

aonez commented Aug 7, 2023

@Sytten can you open a new issue including a test file that meet that conditions?

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Feb 15, 2024

https://web.archive.org/web/20040318005247/http://www.mars.dti.ne.jp:80/~odaki/sounds/crutch.mdz

.mdz is a zip file containing a "MOD" (module music file, but can be many types: .mod, .xm, .it, etc)

Simple rename to zip and extract is a workaround.

  • not recognised by any other GUI unzipping apps
  • command line 7z extracts it without question

@aonez
Copy link
Owner Author

aonez commented Feb 15, 2024

Is a ZIP indeed. Can extract it with Keka (using the alternate option or the extract action in the contextual menu). Will add it to the supported formats :)

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Apr 18, 2024

  • iOS Emulator Delta has support for themes/skins, which are zip files with the extension .deltaskin

https://delta-skins.github.io/nds.html

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jun 13, 2024

@gingerbeardman
Copy link
Contributor

LOL, just came here to request Alfred Workflow (again)

@gingerbeardman
Copy link
Contributor

.deb #1498

@akrabu
Copy link

akrabu commented Aug 12, 2024

Hmm, does it really need official support, when Keka already does zip files, though?

You can right (control) click a .alfredworkflow file, and change the app it opens with to Keka (then click "change all" if you want all .alfredworkflow files to open with Keka)

"Support" as in associating it with Keka by default for everyone doesn't really make sense, does it? Wouldn't you want to open that with Alfred most of the time?

And creating one yourself is as easy as renaming .zip to .alfredworkflow. I just don't understand why this should be a "format" baked into Keka. It's not like we're going to start sharing any kind of file to each other in .alfredworkflow format, you know?

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Aug 12, 2024

Hmm, does it really need official support, when Keka already does zip files, though?

Yes, it does. That's what this thread is for.

A previous example was Delta skins added to the supported extraction list #84 (comment)

Search the changelog for "supported extraction list": https://changelog.keka.io

You can right (control) click a .alfredworkflow file, and change the app it opens with to Keka (then click "change all" if you want all .alfredworkflow files to open with Keka)

This results in the "unrecognised" file being archived, when the goal is for Keka to unarchive it.

"Support" as in associating it with Keka by default for everyone doesn't really make sense, does it? Wouldn't you want to open that with Alfred most of the time?

This is not what this issue is doing, and not what we ask for when posting here.

This issue is for:

  • adding entirely new format support (eg. ace, cpio, brotli, etc)
  • adding aliases of existing formats to the supported extraction list (eg. so I'm asking for Keka to know a .workflow is a .zip and unarchive it by default)
  • sometimes it's a combination of the two (eg. .deb)

I just don't understand why...

See above 😬 @akrabu we're not asking for support as a format, but rather for it to be added to the supported extraction list. I guess the issue title could be clearer. So it goes!

@akrabu
Copy link

akrabu commented Aug 12, 2024

See above 😬 @akrabu we're not asking for support as a format, but rather for it to be added to the supported extraction list. I guess the issue title could be clearer. So it goes!

Ahh, I get what you mean... although, it seems to work perfectly fine for me already?? See here:

https://v.usetapes.com/M5rgv0FXcK

EDIT: As of v1.4.0, Keka supports "Detecting format on unrecognized extension files to extract by default". It was originally limited to 10 files at a time, but in the most recent version, that limit has been removed.

So that should alleviate the need for requests like the Alfred workflows, and any other "format" that is just a renamed .zip file. 🙂

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Aug 12, 2024

That's great! Thanks

I think I found that setting 😎

Screen shot 2024-08-12 at 18 52 32

@magitk
Copy link

magitk commented Sep 11, 2024 via email

@aonez
Copy link
Owner Author

aonez commented Sep 11, 2024

@magitk please open an issue with some more information and some test files. I suppose you're asking for extraction support.

@p2k
Copy link

p2k commented Sep 13, 2024

Any chance Keka would support https://github.com/hxim/paq8px in the future?

When mentioning PAQ-style compression formats, albeit that paq8px is the "longest living branch of the PAQ series", it is still considered experimental while zpaq is considered stable and I'd hope to at least see it added before other PAQ variants. Just my opinion, though. I'm patiently waiting since 2019 :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests