How is MatrixTable Entry data partitioned? #3715
Unanswered
iris-garden
asked this question in
Support Requests
Replies: 1 comment
-
Note The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated. (Oct 19, 2023 at 15:39) danking said:Hail Tables and MatrixTables are only partitioned along the row axis. Tables and MTs created from VCFs always have variants for rows, so: Hail partitions contain all the samples at a contiguous interval of rows. These intervals are always non-overlapping. This has not changed since 2015; however, we’re actively designing a “blocked” matrix table because we anticipate memory needs making 2M sample rows impractical. Why do you ask? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Note
The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.
(Oct 19, 2023 at 01:16) roise3000 said:
Hi THere!
I’m wondering how does HAIL partition data exactly? Does each spark partition contain the data for a portion of the entries (markers/variants and a portion of the samples)? let’s say we have a MatrixTable with 40M SNPs and 1000 samples. Could it happen that each spark partition contains portions of the data for 10k SNPs and 100 samples, or is it guaranteed that each partition will contain all samples?
and a follow up… whatever the answer is, has that changed in recent years or has it always been like that since 2015?
Beta Was this translation helpful? Give feedback.
All reactions