Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML tag not correctly parsed? #142

Closed
jrobinso opened this issue Oct 2, 2024 · 4 comments
Closed

ML tag not correctly parsed? #142

jrobinso opened this issue Oct 2, 2024 · 4 comments

Comments

@jrobinso
Copy link
Contributor

jrobinso commented Oct 2, 2024

Hi @cmdcolin , this is a test case for the issue alluded to in igvteam/igv.js#1895. I'm not super familiar with either Typescript or Node but the code itself looks correct (parseTagValueArray), I don't see anything obviously wrong, so this is a mystery.

The attached CRAM file has a single alignment on genome hg38 at ~ location chr15:23,627,837-23,654,650. The ML tag should contain a byte array of size 656. However the value in record.tags is an array of size 144.

Tested with cram-js v3.0.5

ML_Test.zip

The value of the ML tag, which can be verified with samtools view, should be

[3,4,3,17,5,7,2,5,5,3,5,9,2,5,7,3,1,100,1,3,2,5,2,0,2,5,24,1,4,136,8,2,2,5,7,8,78,4,4,54,9,10,4,26,7,1,9,3,6,1,8,1,3,43,153,1,9,1,157,47,52,20,1,3,35,124,1,84,35,8,3,7,56,4,46,2,4,96,3,8,7,1,1,1,34,4,45,235,7,9,81,10,1,7,178,1,4,1,81,3,6,14,10,5,10,2,13,0,0,1,3,5,147,127,1,9,4,104,164,11,204,36,79,167,55,3,19,2,2,19,3,4,7,5,7,2,2,11,150,30,13,26,3,11,5,18,3,2,52,9,5,252,144,0,110,10,208,92,129,1,6,7,2,251,7,3,3,22,1,55,6,2,11,2,3,3,9,24,15,1,88,3,5,153,144,12,1,2,1,4,92,0,1,2,131,3,2,10,5,6,4,6,2,3,3,5,17,6,78,7,5,3,7,3,3,3,4,45,11,4,3,34,31,226,2,10,2,4,2,2,4,34,5,1,11,2,180,54,3,6,21,3,5,3,7,4,13,49,6,3,3,6,12,2,8,9,3,3,4,4,33,2,81,3,1,0,15,4,1,2,142,1,6,43,1,2,78,7,1,1,39,12,16,9,194,4,12,4,3,5,6,2,3,7,4,6,4,3,2,11,7,4,32,8,3,2,4,52,5,12,1,165,9,4,0,16,3,17,3,34,13,1,1,19,4,4,28,18,0,0,1,0,0,1,0,1,1,1,1,12,0,1,0,0,254,151,1,0,0,0,1,0,0,1,2,0,0,1,1,1,0,0,3,2,167,1,251,20,246,245,251,229,2,0,246,252,248,254,247,254,252,212,102,254,1,254,97,208,202,235,254,252,219,131,253,168,219,245,251,248,198,251,209,253,0,158,0,0,248,254,254,253,221,0,210,20,248,246,173,245,254,248,73,254,251,254,173,252,249,241,245,249,244,253,242,255,255,254,252,249,107,128,254,246,1,81,85,244,51,219,176,80,199,252,235,253,253,236,1,2,1,1,248,253,0,244,105,224,242,226,0,0,249,237,0,253,203,246,1,3,111,255,120,245,5,155,126,254,0,248,0,0,1,0,1,233,254,200,249,0,244,253,252,252,1,231,240,254,167,252,2,102,52,238,254,252,0,0,163,255,254,0,0,0,1,1,2,0,0,1,0,0,0,1,3,1,1,1,0,1,1,0,1,0,1,210,244,1,1,221,224,28,253,1,1,0,0,0,0,221,0,254,244,0,75,200,0,249,4,0,249,252,1,0,1,1,1,1,0,0,243,0,1,0,1,252,1,1,222,0,0,251,254,253,240,251,254,253,113,254,249,208,254,253,177,248,254,254,216,232,239,246,45,1,243,1,0,0,249,253,1,1,1,1,0,1,0,21,248,1,9,0,1,253,251,203,1,243,254,90,246,251,254,239,1,10,2,221,242,254,254,236,1,0,227,237]
@jrobinso
Copy link
Contributor Author

jrobinso commented Oct 2, 2024

Here's some information. The problem appears to be in the parsing of length here, at ine 35 of decodeRecord.ts.

  const length = Int32Array.from(buffer.slice(1))[0]!

It returns the incorrect value of 144. I'm sure its no coincidence that the first byte of the correct value of 656 is 144, but I don't know why this fails. The complete set of bytes BTW is [144, 2, 0, 0]

Replacing line 35 with the following fixes the problem, length is now the correct value of 656

  const dataView = new DataView(buffer.buffer)
  const littleEndian = true
  const length = dataView.getUint32(1, littleEndian)

I can submit a PR for this if you like, but that is the only change.

@ihh
Copy link
Member

ihh commented Oct 2, 2024

Thanks for this Jim. I am guessing the bug arises because buffer is a Uint8Array and the conversion to Int32Array converts each byte to a 32-bit word rather than batching them in groups of 4 bytes, which was what was presumably intended. Looks like your fix should work. Thank you! @cmdcolin is away this week, but if you can submit a PR that would be great, I will let him merge it when he gets back.

@jrobinso
Copy link
Contributor Author

jrobinso commented Oct 2, 2024

I'll submit a PR.

Your guess would make sense, since the value returned was the value of the first entry in the Uint8Array.

@cmdcolin
Copy link
Contributor

fixed in v3.0.6. thanks again for pushing the fix :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants