How is MatrixTable Entry data partitioned? #3715

iris-garden · 2024-05-11T01:36:25Z

iris-garden
May 11, 2024
Maintainer

Note

The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.

(Oct 19, 2023 at 01:16) roise3000 said:

Hi THere!
I’m wondering how does HAIL partition data exactly? Does each spark partition contain the data for a portion of the entries (markers/variants and a portion of the samples)? let’s say we have a MatrixTable with 40M SNPs and 1000 samples. Could it happen that each spark partition contains portions of the data for 10k SNPs and 100 samples, or is it guaranteed that each partition will contain all samples?

and a follow up… whatever the answer is, has that changed in recent years or has it always been like that since 2015?

iris-garden · 2024-05-11T01:36:26Z

iris-garden
May 11, 2024
Maintainer Author

Note

The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.

(Oct 19, 2023 at 15:39) danking said:

Hail Tables and MatrixTables are only partitioned along the row axis. Tables and MTs created from VCFs always have variants for rows, so: Hail partitions contain all the samples at a contiguous interval of rows. These intervals are always non-overlapping.

This has not changed since 2015; however, we’re actively designing a “blocked” matrix table because we anticipate memory needs making 2M sample rows impractical.

Why do you ask?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is MatrixTable Entry data partitioned? #3715

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How is MatrixTable Entry data partitioned? #3715

iris-garden May 11, 2024 Maintainer

(Oct 19, 2023 at 01:16) roise3000 said:

Replies: 1 comment

iris-garden May 11, 2024 Maintainer Author

(Oct 19, 2023 at 15:39) danking said:

iris-garden
May 11, 2024
Maintainer

iris-garden
May 11, 2024
Maintainer Author