Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add svb compression to slow5 #328

Closed
wants to merge 3 commits into from
Closed

Add svb compression to slow5 #328

wants to merge 3 commits into from

Conversation

Koeng101
Copy link
Contributor

@Koeng101 Koeng101 commented Aug 15, 2023

This PR adds a simple way to use StreamVByte compression with slow5. This is the primary way that slow5 files are compressed, so if you are using poly to io between different file stores or pipe between different programs, StreamVByte greatly helps - I've seen reduction in total database size (storing fastq and slow5 reads) from 12Gb to 7.1Gb using just this one weird trick (database engineers hate him)

The code changes have 100% code coverage, and I'm pretty sure I wrote more comments than actual code (not including tests).

@@ -4,6 +4,7 @@ go 1.18

require (
github.com/google/go-cmp v0.5.8
github.com/koeng101/svb v0.0.0-20230815034912-d6737f9ed8b8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency has assembly in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It make it go zoom!

@Koeng101
Copy link
Contributor Author

@TimothyStiles Any thoughts?

Copy link
Collaborator

@carreter carreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few fixes here and there.

Most importantly, needs either clearer documentation on how to use the compression functions or to have compression functions be methods that convert between a Read and a new CompressedRead struct. See comments for more details.

io/slow5/slow5.go Show resolved Hide resolved

// SvbCompressRawSignal takes a read and converts its raw signal field to two
// arrays: a mask array and a data array. Both are needed for decompression.
func SvbCompressRawSignal(rawSignal []int16) (mask, data []byte) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have these functions be methods of Read/another type? As it stands it's not clear to me how a client is supposed to take advantage of them.

Maybe use struct inheritance to define a new CompressedRead class and have a method that goes from CompressedRead -> Read and vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using these functions for taking the rawSignal in and out of an SQL database, so having it be a method doesn't work as well as a raw function.

CompressedRead I think would just be a blow5 read rather than slow5 read, which we SHOULD implement at some point. (there is a whole binary format specifically for doing those compressed reads well, with some real performance improvements)

io/slow5/slow5_test.go Show resolved Hide resolved
io/slow5/slow5_test.go Show resolved Hide resolved
@Koeng101
Copy link
Contributor Author

Moving to #341

@Koeng101 Koeng101 closed this Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants