Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do reference partitioners restrict a partition to contain keys from a single contig? #573

Closed
fnothaft opened this issue Feb 8, 2015 · 1 comment
Labels

Comments

@fnothaft
Copy link
Member

fnothaft commented Feb 8, 2015

I wasn't sure about this and couldn't tell 100% for sure from looking at the code. When the two reference partitioners map keys to a partition, do they ensure that a partition only contains keys from a single reference contig?

This is relevant to my interests because I've got code that currently assumes that after a:

rdd.keyBy(ReferencePosition(_))
  .repartitionAndSortWithinPartitions(new GenomicPositionPartitioner(...))

The _.pos of all keys will be monotonically increasing (i.e., I'll never wrap from the end of one contig to the start of the next contig). I don't think it's a big deal either way (i.e., it would be easy to make my code handle wrapping), but it isn't documented in the partitioners file and it does impact whether my code will work or not.

@laserson
Copy link
Contributor

Looks to me like a single partition can have multiple contigs. I believe GenomicRegionPartitioner wil not though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants