Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about twoBit.c / java #22

Closed
lindenb opened this issue Sep 10, 2019 · 2 comments
Closed

Question about twoBit.c / java #22

lindenb opened this issue Sep 10, 2019 · 2 comments

Comments

@lindenb
Copy link

lindenb commented Sep 10, 2019

Hi the UCSC team,

I'm currently writing a PR for the "Java API for high-throughput sequencing data (HTS) formats". htsjdk project .

The goal of my PR samtools/htsjdk#1417 is to write a java code handling the '.2bit' format. My java code largely inspired by your C code twoBit.c

  1. are you ok with including my code in the htsjdk project ? should I add any specific license (currently MIT) or any author in my code ?

  2. a technical question: I need to build a SequenceDictionary where the order of the contigs must be the same than in the input fasta.
    When faToTwoBit builds a '.2bit' file, is the order of the sequences in the original fasta file always the same than in the '.2bit' file (at this position, when reading : https://github.com/ucscGenomeBrowser/kent/blob/master/src/lib/twoBit.c#L658 ) or is there any re-ordering by a hash-table ?

Thank you,

Pierre

@NullModel
Copy link
Contributor

Good Morning Pierre:

Your proposed license is fine. Yes, you can use your code elsewhere.
Yes, it appears that the order you put fasta into the 2bit is the order you
will get out if you simply read it all:

faCount sequences.fa
faToTwoBit sequences.fa test.2bit
twoBitToFa test.2bit stdout | faCount stdin

Produces the same faCount output

Please be aware of the byte swapping issue. The kent C code will write out
files in the native byte order of the machine it is running on. There is a tag
in the file to indicate the byte order so that the reader can adjust to any
file encountered regardless of where it was produced. If you are also
creating a writer function, it should tag the file appropriately.

@lindenb
Copy link
Author

lindenb commented Sep 11, 2019

many thanks for your answer

@lindenb lindenb closed this as completed Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants