Human vs mouse synteny demo #3444

cmdcolin · 2023-01-09T19:33:35Z

This PR adds a human vs mouse synteny demo to our example data. I wanted to try this to test some of our scalability limits, which I elaborate on below

It uses the T2T-CHM13 (aka hs1) genome vs mm39, all using data from the UCSC genome browser

Scalability brainstorming

The hs1 vs mm39 liftover.chain.gz file that we use for the SyntenyTrack is 69MB of gzip data, which is 219MB ungzipped. The maximum size of ungzipped data we support is 512MB due to that being the maximum size of strings in chrome, so it comes pretty close to our limits. I can certainly imagine species (plant genomes, etc) that would exceed our limits.

An indexed file format could help us in some cases. We have not thus far focused on indexed file formats, because we were using somewhat small PAF files that could be loaded into memory but scalability concerns are referenced here #2788. But, with indexing, we may not need to download the entire file when accessing a local region on the LGV synteny track (currently, synteny track adapters generally download the entirety of the file. this is an adapter behavior that could be adjusted for)

The bigChain format from UCSC could possibly help as an example of an indexed file format, it is only indexed in "one dimension" e.g. for the query genome and not the target genome, so accessing the data from the target genome would be unindexed. A custom tabix-y style chain format can probably be made also, similar to mafviewer. "2-D" indexed formats would be cool, but may not be available. Also, "biologically", it may be better to have two tracks: "hs1 (query) vs mm39 (target)" and "mm39 (query) vs hs1 (target)", which would mean the 1D indexing is fine.

github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Jan 9, 2023

cmdcolin removed the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Jan 9, 2023

cmdcolin force-pushed the human_vs_mouse branch from cf5522e to 9c3051a Compare January 9, 2023 20:45

github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Jan 9, 2023

cmdcolin added documentation and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) labels Jan 9, 2023

[skip ci] Human vs mouse synteny

bd71cd3

cmdcolin force-pushed the human_vs_mouse branch 2 times, most recently from 4c0a998 to 9ee2c94 Compare January 9, 2023 21:50

Optimize drawing long alignments with useCallback

9a231d3

cmdcolin force-pushed the human_vs_mouse branch from 9ee2c94 to 9a231d3 Compare January 9, 2023 22:07

cmdcolin merged commit f0c493a into main Jan 9, 2023

cmdcolin deleted the human_vs_mouse branch January 9, 2023 22:08

This was referenced Jan 9, 2023

Improve scalability of synteny datasets #2788

Closed

v2.3.3 release announcement #3450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human vs mouse synteny demo #3444

Human vs mouse synteny demo #3444

cmdcolin commented Jan 9, 2023

Human vs mouse synteny demo #3444

Human vs mouse synteny demo #3444

Conversation

cmdcolin commented Jan 9, 2023

Scalability brainstorming