Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yf add interval list codec #1327

Merged
merged 9 commits into from
Apr 3, 2019
Merged

Yf add interval list codec #1327

merged 9 commits into from
Apr 3, 2019

Conversation

yfarjoun
Copy link
Contributor

  • moved the business part of the dec from IntervalList to the Tribble dec.
  • testing the Tribble directly.

@codecov-io
Copy link

codecov-io commented Mar 15, 2019

Codecov Report

Merging #1327 into master will increase coverage by 0.307%.
The diff coverage is 75%.

@@               Coverage Diff               @@
##              master     #1327       +/-   ##
===============================================
+ Coverage     67.822%   68.129%   +0.307%     
- Complexity      8253      8406      +153     
===============================================
  Files            562       563        +1     
  Lines          33644     34260      +616     
  Branches        5640      5778      +138     
===============================================
+ Hits           22818     23341      +523     
- Misses          8654      8720       +66     
- Partials        2172      2199       +27
Impacted Files Coverage Δ Complexity Δ
.../java/htsjdk/samtools/util/BufferedLineReader.java 68.421% <ø> (ø) 6 <0> (ø) ⬇️
...c/main/java/htsjdk/samtools/util/IntervalList.java 73.897% <100%> (-0.613%) 69 <0> (-6)
src/main/java/htsjdk/samtools/util/Interval.java 61.29% <33.333%> (-2.995%) 28 <1> (+1)
...main/java/htsjdk/variant/vcf/AbstractVCFCodec.java 73.846% <60%> (+0.081%) 86 <1> (ø) ⬇️
...htsjdk/tribble/IntervalList/IntervalListCodec.java 79.31% <79.31%> (ø) 14 <14> (?)
src/main/java/htsjdk/samtools/cram/CRAIIndex.java 86.813% <0%> (-1.593%) 39% <0%> (+13%)
...a/htsjdk/samtools/cram/build/ContainerFactory.java 95.122% <0%> (-1.545%) 6% <0%> (+1%)
...jdk/samtools/cram/build/CramContainerIterator.java 88.889% <0%> (-1.111%) 16% <0%> (+6%)
...java/htsjdk/samtools/cram/structure/Container.java 90.411% <0%> (-0.065%) 42% <0%> (+20%)
...mtools/cram/build/CramContainerHeaderIterator.java 100% <0%> (ø) 4% <0%> (+2%) ⬆️
... and 13 more

@cmnbroad cmnbroad self-assigned this Mar 15, 2019
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this was painful. A few suggestions and some cleanup comments.

* @param <T> The feature type this codec reads
*/
public abstract class AsciiSamFeatureCodec<T extends Feature> extends AbstractFeatureCodec<T, LineReader> {
private static final Log log = Log.getInstance(AsciiSamFeatureCodec.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

public interface LineReader extends Closeable{
public interface LineReader extends Closeable {

int EOF_VALUE = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs javadoc, and peek's javadoc should reference it.

*
* @param <T> The feature type this codec reads
*/
public abstract class AsciiSamFeatureCodec<T extends Feature> extends AbstractFeatureCodec<T, LineReader> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam is in the name but I'm not sure what is particularly Sam about it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps I'll use LineReaderFeatureCodec?

}

@Override
public boolean isDone(final LineReader lineIterator) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments arguments aren' iterators would be better called lineReader rather than lineIterator here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 leftover cruft.


static final Log log = Log.getInstance(IntervalListCodec.class);

final char[] shortArray;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be write-only...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more cruft

// Make sure we have the right number of fields
final String[] fields = line.split("\t");
if (fields.length != 5) {
throw new SAMException("Invalid interval record contains " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - maybe we've already crossed this line elsewhere (?) but throwing a SAMException from tribble might not be great. Maybe this should be IllegalArgumentException ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. using TribbleException

if (seq.equals(lastSeq)) {
seq = lastSeq;
}
lastSeq = seq;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you just moved this code from elsewhere, but I assume this is just cheap interning ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in master already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, was just curious what is was doing.

* Also contains the parseing code for the non-tribble parsing of IntervalLists
*/

public class IntervalListCodec extends AsciiSamFeatureCodec<Interval> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably eliminate the need for AsciiSamFeatureCodec altogether by using AsciiFeatureCodec as the base class, and writing an anonymous class implementation of LineReader inside of readActualHeader (see below) as a just-in-time bridge to `SamTextCodec'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I understood...


final SAMFileHeader header = headerCodec.decode(lineReader, "");
dictionary = header.getSequenceDictionary();
return header;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return header;
@Override
public Object readActualHeader(LineIterator lineIterator) {
final SAMTextHeaderCodec headerCodec = new SAMTextHeaderCodec();
final SAMFileHeader header = headerCodec.decode(new LineReader() {
int lineNo = 0;
@Override
public String readLine() {
lineNo++;
return lineIterator.next();
}
@Override
public int getLineNumber() {
return lineNo;
}
@Override
public int peek() {
return lineIterator.hasNext() ?
lineIterator.peek().charAt(0) :
LineReader.EOF_VALUE;
}
@Override
public void close() { }
}, "IntervalListCodec");
dictionary = header.getSequenceDictionary();
return header;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I tried that first and failed...this is better.

if (line == null) {
return list;
}

final IntervalListCodec intervalListCodec = new IntervalListCodec(dict);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally IntervalListCodec wouldn't have to provide this constructor, since the codec should really get the dictionary from it's input stream directly. Also, the code above that parses out the header lines is redundant with code thats already in the codec; the codec implementation is better because it doesn't require reading the entire header into a string that is then re-parsed. But, I don't see a way to bridge from the BufferedReader to the codec. We might want to think about deprecating fromReader(final BufferedReader in); it would be easy to replace it with a (stream/file/path) overload that could use the codec directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that there's an action item here...please correct me if I'm wrong :-)

Yossi Farjoun added 4 commits March 19, 2019 14:16
- moved the business part of the dec from IntervalList to the Tribble dec.
- testing the Tribble directly.
- using Strand class to parse the strand
@yfarjoun yfarjoun force-pushed the yf_add_intervalListCodec branch from d71944e to d5e317f Compare March 19, 2019 18:20
}

public IntervalListCodec(final SAMSequenceDictionary dict) {
this();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally when chaining constructors you want to chain from fewer arguments to more arguments, as the chaining lets you provide default values (or computed values) for arguments that can also be specified. In this case it would be better IMO to make the no-arg constructor call this constructor using this(null); and make this constructor call super(Interval.class);


public class IntervalListCodec extends AsciiFeatureCodec<Interval> {

static final Log log = Log.getInstance(IntervalListCodec.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the log field is private since no other class has a need to refer to it.

@yfarjoun
Copy link
Contributor Author

put in a indexed test

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 1, 2019

back to you @cmnbroad

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun Only one thing missing that (I think) we discussed last time - can you add an index test, maybe one in TribbleIndexFeatureReaderTest, similar to the others that are there to make sure the header is properly consumed from the stream when querying the new codec with a FeatureReader. Otherwise looks good.

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Apr 2, 2019

@cmnbroad better?

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 when tests pass. thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants