Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max chrom size too small #66

Open
dmalzl opened this issue Jan 31, 2021 · 8 comments
Open

max chrom size too small #66

dmalzl opened this issue Jan 31, 2021 · 8 comments

Comments

@dmalzl
Copy link

dmalzl commented Jan 31, 2021

Hi there,

I am using pairix in conjunction with cooler to generate cool files for my Hi-C data. Pairix generation seemed to work without an issue using cooler csort pairix. However, I am now generating the base resolution with cooler cload pairix and getting maximum chromosome size warnings since some of chromosomes are larger than 2^30.
Does this have an effect on my results and can this be resolved in your implementation to also support larger chromosomes?

These are the sizes:

chr1p   1416415443
chr1q   1471228731
chr2p   1381227519
chr2q   1413870220
chr3p   867469519
chr3q   1525846468
chr4p   1204678091
chr4q   1215118724
chr5p   1279525745
chr5q   1285306171
chr6p   1462196291
chr6q   1625855492
chr7p   907274393
chr7q   1083033678
chr8p   745249056
chr8q   885924281
chr9p   454801983
chr9q   1001906603
chr10p  1081352037
chr10q  525520881
chr11p  305453894
chr11q  1078813816
chr12p  288882840
chr12q  857146890
chr13p  241582323
chr13q  482900735
chr14p  184541253
chr14q  436048769

Thanks

@dmalzl
Copy link
Author

dmalzl commented Jan 31, 2021

To make it a bit more concrete. cooler cload pairix calls the query2D method of the pairix file instance with the size of the whole chromosome and later uses the returned iterator to fetch pairs from up to 5 chunks the chromosome is divided into.

@nvictus
Copy link
Contributor

nvictus commented Feb 1, 2021

Honestly, I'm not sure what the repercussions are. I would try querying the pairs file directly with pairix using out-of-bounds coordinates and see what it does or wait to hear back from the pairix maintainers.

I'm curious, would plain old cload pairs (not pairix) not work in your case? It does a two-pass ingestion using mergesort without requiring an index or even sorted pairs. Or do you run into performance issues?

@nvictus
Copy link
Contributor

nvictus commented Feb 1, 2021

Meant to post that in the cooler issue, but whatevs...

@dmalzl
Copy link
Author

dmalzl commented Feb 1, 2021

Yes, I already thought about the out-of-bounds querying and wanted to try it today, but didn't manage to. However, I will do so tomorrow and let you know.

As far as I can remember I never tried to use cload pairs because we decided to go with pairix due to the nice querying with pypairix, but I will also look into this.

Thanks anyway

@SooLee
Copy link
Member

SooLee commented Feb 1, 2021

I'll have to modify pairix to add an option to increase max chrom size. Just curious, what specie is it?

@dmalzl
Copy link
Author

dmalzl commented Feb 1, 2021

Thanks for moving so quickly. It's axolotl and it defies every heuristic you have learned for processing genomic data ^^

@SooLee
Copy link
Member

SooLee commented Feb 2, 2021

That's cool! I'll take a look at it next couple of days and let you know.

@dmalzl
Copy link
Author

dmalzl commented Feb 2, 2021

I also leave this here:
querying ranges outside the MAX_CHR threshold does not return any pairs from the file and thus the resulting matrices are empty in regions above MAX_CHR. cooler cload pairs is a convenient workaround for this for now. Thanks for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants