Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc CRAM cleanup #1253

Merged
merged 19 commits into from
Jan 17, 2019
Merged

Misc CRAM cleanup #1253

merged 19 commits into from
Jan 17, 2019

Conversation

jmthibault79
Copy link
Contributor

@jmthibault79 jmthibault79 commented Jan 7, 2019

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@jmthibault79
Copy link
Contributor Author

jmthibault79 commented Jan 7, 2019

Removed lots of throws and ExposedByteArrayOutputStream.

Review with https://github.com/samtools/htsjdk/pull/1253/files?w=1 to ignore whitespace changes.

@@ -80,11 +85,11 @@ public Boundary(final long start, final long end) {
if (start >= end) throw new RuntimeException("Boundary start is greater than end.");
}

boolean hasNext() throws IOException {
private boolean hasNext() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these throwing methods explicitly private

{
def.setLevel(GZIP_COMPRESSION_LEVEL);
}
};
IOUtil.copyStream(new ByteArrayInputStream(data), gos);
gos.close();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now auto-closes

@codecov-io
Copy link

codecov-io commented Jan 7, 2019

Codecov Report

Merging #1253 into master will increase coverage by 0.537%.
The diff coverage is 46.008%.

@@               Coverage Diff               @@
##              master     #1253       +/-   ##
===============================================
+ Coverage     69.356%   69.893%   +0.537%     
- Complexity      8164      8628      +464     
===============================================
  Files            548       555        +7     
  Lines          32675     34440     +1765     
  Branches        5520      5980      +460     
===============================================
+ Hits           22662     24071     +1409     
- Misses          7795      8035      +240     
- Partials        2218      2334      +116
Impacted Files Coverage Δ Complexity Δ
...m/encoding/core/huffmanUtils/HuffmanIntHelper.java 91.011% <ø> (ø) 18 <0> (ø) ⬇️
...amtools/cram/encoding/writer/CramRecordWriter.java 89.063% <ø> (ø) 26 <0> (ø) ⬇️
...ols/cram/encoding/external/ByteArrayStopCodec.java 73.333% <ø> (ø) 4 <0> (ø) ⬇️
...mtools/cram/ref/InMemoryReferenceSequenceFile.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...mtools/cram/build/CramContainerHeaderIterator.java 100% <ø> (ø) 2 <0> (ø) ⬇️
...a/htsjdk/samtools/cram/build/ContainerFactory.java 93.478% <ø> (ø) 20 <0> (ø) ⬇️
...n/java/htsjdk/samtools/cram/ref/EnaRefService.java 46% <ø> (+4.929%) 4 <0> (ø) ⬇️
...va/htsjdk/samtools/cram/build/ContainerParser.java 91.525% <ø> (ø) 19 <0> (ø) ⬇️
...jdk/samtools/cram/structure/CompressionHeader.java 90.244% <ø> (ø) 28 <0> (ø) ⬇️
src/main/java/htsjdk/samtools/cram/CRAIIndex.java 82.5% <0%> (-2.115%) 27 <0> (ø)
... and 57 more

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmthibault79 Some comments. Looks good otherwise. I don't know if @cmnbroad has comments as well.

src/main/java/htsjdk/samtools/CRAMIterator.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/cram/CRAIEntry.java Outdated Show resolved Hide resolved
@@ -235,15 +242,15 @@ public static CramHeader readCramHeader(final InputStream inputStream) throws IO
final ByteArrayOutputStream headerOS = new ByteArrayOutputStream();
try {
headerOS.write(bytes);
headerOS.write(headerBodyOS.getBuffer(), 0, headerBodyOS.size());
headerOS.write(headerBodyOS.toByteArray(), 0, headerBodyOS.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an additional redundant array copy, but it's the header so it's probably not a bottleneck at all. If it is a bottleneck then headerOs should probably be an exposed one as well...

@@ -28,15 +28,17 @@
* @param data byte array to compress
* @return compressed blob
*/
public static byte[] gzip(final byte[] data) throws IOException {
public static byte[] gzip(final byte[] data) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these operations that copy between byteArrayOutputStreams are doing a lot of unnecessary array copies which might be worth looking into as performance problems in future profiling.


import java.io.ByteArrayOutputStream;

public class ExposedByteArrayOutputStream extends ByteArrayOutputStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class does server the purpose of reducing unnecessary array copies. I don't know if any of the places you removed if it from are bottlenecks, but if they are I would be sure that changing this isn't impacting performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I've been removing it in other PRs over the last few months too, so we should also check those.

@cmnbroad cmnbroad self-assigned this Jan 9, 2019
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmthibault79 One additional comment, you can choose what to do with it... Looks good to me. 👍

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one or two minor comments (in addition to sharing Louis' concern around perf when removing ExposedByteArray). Otherwise this looks good - I've wanted to do this for a long time.

src/main/java/htsjdk/samtools/CRAMIterator.java Outdated Show resolved Hide resolved
@jmthibault79
Copy link
Contributor Author

rebased to resolve conflict

@jmthibault79 jmthibault79 merged commit 5217fe4 into master Jan 17, 2019
@jmthibault79 jmthibault79 deleted the jt_general_cram branch January 17, 2019 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants