-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yf add interval list codec #1327
Conversation
yfarjoun
commented
Mar 15, 2019
- moved the business part of the dec from IntervalList to the Tribble dec.
- testing the Tribble directly.
Codecov Report
@@ Coverage Diff @@
## master #1327 +/- ##
===============================================
+ Coverage 67.822% 68.129% +0.307%
- Complexity 8253 8406 +153
===============================================
Files 562 563 +1
Lines 33644 34260 +616
Branches 5640 5778 +138
===============================================
+ Hits 22818 23341 +523
- Misses 8654 8720 +66
- Partials 2172 2199 +27
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this was painful. A few suggestions and some cleanup comments.
* @param <T> The feature type this codec reads | ||
*/ | ||
public abstract class AsciiSamFeatureCodec<T extends Feature> extends AbstractFeatureCodec<T, LineReader> { | ||
private static final Log log = Log.getInstance(AsciiSamFeatureCodec.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused.
public interface LineReader extends Closeable{ | ||
public interface LineReader extends Closeable { | ||
|
||
int EOF_VALUE = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs javadoc, and peek
's javadoc should reference it.
* | ||
* @param <T> The feature type this codec reads | ||
*/ | ||
public abstract class AsciiSamFeatureCodec<T extends Feature> extends AbstractFeatureCodec<T, LineReader> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sam
is in the name but I'm not sure what is particularly Sam
about it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps I'll use LineReaderFeatureCodec?
} | ||
|
||
@Override | ||
public boolean isDone(final LineReader lineIterator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These arguments arguments aren' iterators would be better called lineReader
rather than lineIterator
here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 leftover cruft.
|
||
static final Log log = Log.getInstance(IntervalListCodec.class); | ||
|
||
final char[] shortArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be write-only...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more cruft
// Make sure we have the right number of fields | ||
final String[] fields = line.split("\t"); | ||
if (fields.length != 5) { | ||
throw new SAMException("Invalid interval record contains " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - maybe we've already crossed this line elsewhere (?) but throwing a SAMException from tribble might not be great. Maybe this should be IllegalArgumentException ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. using TribbleException
if (seq.equals(lastSeq)) { | ||
seq = lastSeq; | ||
} | ||
lastSeq = seq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you just moved this code from elsewhere, but I assume this is just cheap interning ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in master already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was just curious what is was doing.
* Also contains the parseing code for the non-tribble parsing of IntervalLists | ||
*/ | ||
|
||
public class IntervalListCodec extends AsciiSamFeatureCodec<Interval> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably eliminate the need for AsciiSamFeatureCodec
altogether by using AsciiFeatureCodec
as the base class, and writing an anonymous class implementation of LineReader
inside of readActualHeader
(see below) as a just-in-time bridge to `SamTextCodec'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I understood...
|
||
final SAMFileHeader header = headerCodec.decode(lineReader, ""); | ||
dictionary = header.getSequenceDictionary(); | ||
return header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return header; | |
@Override | |
public Object readActualHeader(LineIterator lineIterator) { | |
final SAMTextHeaderCodec headerCodec = new SAMTextHeaderCodec(); | |
final SAMFileHeader header = headerCodec.decode(new LineReader() { | |
int lineNo = 0; | |
@Override | |
public String readLine() { | |
lineNo++; | |
return lineIterator.next(); | |
} | |
@Override | |
public int getLineNumber() { | |
return lineNo; | |
} | |
@Override | |
public int peek() { | |
return lineIterator.hasNext() ? | |
lineIterator.peek().charAt(0) : | |
LineReader.EOF_VALUE; | |
} | |
@Override | |
public void close() { } | |
}, "IntervalListCodec"); | |
dictionary = header.getSequenceDictionary(); | |
return header; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I tried that first and failed...this is better.
if (line == null) { | ||
return list; | ||
} | ||
|
||
final IntervalListCodec intervalListCodec = new IntervalListCodec(dict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally IntervalListCodec
wouldn't have to provide this constructor, since the codec should really get the dictionary from it's input stream directly. Also, the code above that parses out the header lines is redundant with code thats already in the codec; the codec implementation is better because it doesn't require reading the entire header into a string that is then re-parsed. But, I don't see a way to bridge from the BufferedReader
to the codec. We might want to think about deprecating fromReader(final BufferedReader in)
; it would be easy to replace it with a (stream/file/path) overload that could use the codec directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that there's an action item here...please correct me if I'm wrong :-)
- moved the business part of the dec from IntervalList to the Tribble dec. - testing the Tribble directly.
- using Strand class to parse the strand
d71944e
to
d5e317f
Compare
} | ||
|
||
public IntervalListCodec(final SAMSequenceDictionary dict) { | ||
this(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally when chaining constructors you want to chain from fewer arguments to more arguments, as the chaining lets you provide default values (or computed values) for arguments that can also be specified. In this case it would be better IMO to make the no-arg constructor call this constructor using this(null);
and make this constructor call super(Interval.class);
|
||
public class IntervalListCodec extends AsciiFeatureCodec<Interval> { | ||
|
||
static final Log log = Log.getInstance(IntervalListCodec.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the log
field is private
since no other class has a need to refer to it.
put in a indexed test |
back to you @cmnbroad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun Only one thing missing that (I think) we discussed last time - can you add an index test, maybe one in TribbleIndexFeatureReaderTest, similar to the others that are there to make sure the header is properly consumed from the stream when querying the new codec with a FeatureReader. Otherwise looks good.
@cmnbroad better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 when tests pass. thx.