-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[workshop] Add necessary infrastructure for IBG 2023 workshop #12769
Conversation
tpoterba
commented
Mar 8, 2023
- add qq_plot and manhattan built on plotly. It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.
- Fix relatedness estimation for mating simulation
It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.
@@ -3905,7 +3905,7 @@ def group_by(f: Callable, collection) -> DictExpression: | |||
@typecheck(f=func_spec(2, expr_any), | |||
zero=expr_any, | |||
collection=expr_oneof(expr_set(), expr_array())) | |||
def fold(f: Callable, zero, collection) -> Expression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by fix. Hinting the superclass of the returned value causes lint issues (in my IDE at least), leaving it out and using duck typing doesn't. Can remove if you want, but I've been stripping these out when I see them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Pyright complains about our use of Callable
, but I don't understand the error message [1]. With or without this change, pyright says the type of
fold(lambda _: 3, literal(0), literal([1]))
is Expression
. What type do you get?
Hmm, doesn't seem like there's integration for Pyright with IntelliJ unfortunately. The Python community seems to unifying on it since it's remarkably good.
[1]: Illegal type annotation: variable not allowed unless it is a type alias (lsp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh, I bet this is because your IDE thinks the return type is the return type of the function after modification by @typecheck
@@ -44,6 +45,8 @@ def aes(**kwargs): | |||
hail_field_properties = {} | |||
|
|||
for k, v in kwargs.items(): | |||
if isinstance(v, (tuple, dict)): | |||
v = hail.str(v) | |||
if not isinstance(v, Expression): | |||
v = literal(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's an example of using this new behavior? I feel a bit concerned that aes(foo=(ht.a, ht.b))
is different from:
ht = ht.annotate(c=(ht.a, ht.b))
aes(foo=ht.c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this, I think I added this in the process of trying to build the manhattan plot in ggplot. The issue I was trying to solve is that passing hl.tuple([something1, something2])
produces a tuple label, but if you pass (something1, something2)
, you get an exception after it tries to take the literal path. This is a consequence of trying to support both expressions and non-expr sequences. This is sort of a hack, but catches more cases of things that are probably meant to go the expr route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b))
you should also get an error from my example above. If that's true, we should fix it for both use cases.
|
||
Returns | ||
------- | ||
:class:`bokeh.plotting.figure.Figure` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a plotly figure
|
||
Returns | ||
------- | ||
:class:`bokeh.plotting.figure.Figure` if no label or a single label was given, otherwise :class:`bokeh.models.layouts.Column` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type is wrong
return new_samples | ||
ns = mt.count_cols() | ||
|
||
# dict of true nonzero relatedness. indeed by tuples of (id1, id2) where a pair is stored with the larger (later) id first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexed
|
||
new_pairs = int(mating_generation_size * pairs_per_generation_multiplier) | ||
|
||
curr_gen_fwd = defaultdict(dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this for? It gets updated but, afaict, is not returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the primary relatedness graph stores edges from higher-number sample to lower-number sample. However, we need to be able to look up the edges within a generation for any sample, and that requires a graph in the other direction. We create this graph for each generation and use it in the next, overwriting last_gen_fwd
each generation.
set_rel(curr_sample_idx, k, v, curr_gen_fwd, last_generation_start_idx) | ||
|
||
mother_sibs = parent_to_child[mother] | ||
father_sibs = parent_to_child[father] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but these are more like mothers_kids
and fathers_kids
, right? I get that you mean siblings by way o mother, but I think mothers_kids
is a bit easier to parse.
rel = sibling_rel | ||
else: | ||
_, _, sib_mom, sib_dad = samples[sib] | ||
other_parent = sib_mom if sib_mom != mother else sib_dad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFICT, there's no sex involved here, otherwise it would obviously be the sib's dad that is the other parent, right? Since we know this sib is the mother's child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
hl.simulate_random_mating(mt, n_rounds=2, pairs_per_generation_multiplier=0.5, | ||
children_per_pair=2)._force_count_rows() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test where we simulate two rounds with just a few people and manually verify the relatednesses?
|
||
curr_sample_idx = len(samples) | ||
parent_to_child = defaultdict(set) | ||
for pair in range(new_pairs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but I think you've basically built a pedigree simulator. We should pull that out as a separately useful thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, curious what other applications might be.
@@ -44,6 +45,8 @@ def aes(**kwargs): | |||
hail_field_properties = {} | |||
|
|||
for k, v in kwargs.items(): | |||
if isinstance(v, (tuple, dict)): | |||
v = hail.str(v) | |||
if not isinstance(v, Expression): | |||
v = literal(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b))
you should also get an error from my example above. If that's true, we should fix it for both use cases.
Closing abandoned PRs. If someone wants to adopt please start a new PR. cc: @iris-garden this added som ggplot functionality that would be great to resurrect some day. |