-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add svb compression to slow5 #328
Conversation
@@ -4,6 +4,7 @@ go 1.18 | |||
|
|||
require ( | |||
github.com/google/go-cmp v0.5.8 | |||
github.com/koeng101/svb v0.0.0-20230815034912-d6737f9ed8b8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this dependency has assembly in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It make it go zoom!
@TimothyStiles Any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a few fixes here and there.
Most importantly, needs either clearer documentation on how to use the compression functions or to have compression functions be methods that convert between a Read
and a new CompressedRead
struct. See comments for more details.
|
||
// SvbCompressRawSignal takes a read and converts its raw signal field to two | ||
// arrays: a mask array and a data array. Both are needed for decompression. | ||
func SvbCompressRawSignal(rawSignal []int16) (mask, data []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have these functions be methods of Read/another type? As it stands it's not clear to me how a client is supposed to take advantage of them.
Maybe use struct inheritance to define a new CompressedRead class and have a method that goes from CompressedRead -> Read and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using these functions for taking the rawSignal in and out of an SQL database, so having it be a method doesn't work as well as a raw function.
CompressedRead
I think would just be a blow5 read rather than slow5 read, which we SHOULD implement at some point. (there is a whole binary format specifically for doing those compressed reads well, with some real performance improvements)
Moving to #341 |
This PR adds a simple way to use StreamVByte compression with slow5. This is the primary way that slow5 files are compressed, so if you are using poly to io between different file stores or pipe between different programs, StreamVByte greatly helps - I've seen reduction in total database size (storing fastq and slow5 reads) from 12Gb to 7.1Gb using just this one weird trick (database engineers hate him)
The code changes have 100% code coverage, and I'm pretty sure I wrote more comments than actual code (not including tests).