Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

Closed
wants to merge 11 commits into from

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented Mar 8, 2023

  • add qq_plot and manhattan built on plotly. It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.
  • Fix relatedness estimation for mating simulation

It's a snarly problem to add support for all the customization necessary
to build these out of ggplot directly, but this lets us use ggplot in the
tutorial notebooks without having to use a separate plotting lib.
@@ -3905,7 +3905,7 @@ def group_by(f: Callable, collection) -> DictExpression:
@typecheck(f=func_spec(2, expr_any),
zero=expr_any,
collection=expr_oneof(expr_set(), expr_array()))
def fold(f: Callable, zero, collection) -> Expression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by fix. Hinting the superclass of the returned value causes lint issues (in my IDE at least), leaving it out and using duck typing doesn't. Can remove if you want, but I've been stripping these out when I see them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Pyright complains about our use of Callable, but I don't understand the error message [1]. With or without this change, pyright says the type of

fold(lambda _: 3, literal(0), literal([1]))

is Expression. What type do you get?

Hmm, doesn't seem like there's integration for Pyright with IntelliJ unfortunately. The Python community seems to unifying on it since it's remarkably good.

[1]: Illegal type annotation: variable not allowed unless it is a type alias (lsp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, I bet this is because your IDE thinks the return type is the return type of the function after modification by @typecheck

@@ -44,6 +45,8 @@ def aes(**kwargs):
hail_field_properties = {}

for k, v in kwargs.items():
if isinstance(v, (tuple, dict)):
v = hail.str(v)
if not isinstance(v, Expression):
v = literal(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example of using this new behavior? I feel a bit concerned that aes(foo=(ht.a, ht.b)) is different from:

ht = ht.annotate(c=(ht.a, ht.b))
aes(foo=ht.c)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this, I think I added this in the process of trying to build the manhattan plot in ggplot. The issue I was trying to solve is that passing hl.tuple([something1, something2]) produces a tuple label, but if you pass (something1, something2), you get an exception after it tries to take the literal path. This is a consequence of trying to support both expressions and non-expr sequences. This is sort of a hack, but catches more cases of things that are probably meant to go the expr route.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b)) you should also get an error from my example above. If that's true, we should fix it for both use cases.


Returns
-------
:class:`bokeh.plotting.figure.Figure`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a plotly figure


Returns
-------
:class:`bokeh.plotting.figure.Figure` if no label or a single label was given, otherwise :class:`bokeh.models.layouts.Column`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type is wrong

return new_samples
ns = mt.count_cols()

# dict of true nonzero relatedness. indeed by tuples of (id1, id2) where a pair is stored with the larger (later) id first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexed


new_pairs = int(mating_generation_size * pairs_per_generation_multiplier)

curr_gen_fwd = defaultdict(dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for? It gets updated but, afaict, is not returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary relatedness graph stores edges from higher-number sample to lower-number sample. However, we need to be able to look up the edges within a generation for any sample, and that requires a graph in the other direction. We create this graph for each generation and use it in the next, overwriting last_gen_fwd each generation.

set_rel(curr_sample_idx, k, v, curr_gen_fwd, last_generation_start_idx)

mother_sibs = parent_to_child[mother]
father_sibs = parent_to_child[father]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but these are more like mothers_kids and fathers_kids, right? I get that you mean siblings by way o mother, but I think mothers_kids is a bit easier to parse.

rel = sibling_rel
else:
_, _, sib_mom, sib_dad = samples[sib]
other_parent = sib_mom if sib_mom != mother else sib_dad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFICT, there's no sex involved here, otherwise it would obviously be the sib's dad that is the other parent, right? Since we know this sib is the mother's child.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

hl.simulate_random_mating(mt, n_rounds=2, pairs_per_generation_multiplier=0.5,
children_per_pair=2)._force_count_rows()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test where we simulate two rounds with just a few people and manually verify the relatednesses?


curr_sample_idx = len(samples)
parent_to_child = defaultdict(set)
for pair in range(new_pairs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR, but I think you've basically built a pedigree simulator. We should pull that out as a separately useful thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, curious what other applications might be.

@@ -44,6 +45,8 @@ def aes(**kwargs):
hail_field_properties = {}

for k, v in kwargs.items():
if isinstance(v, (tuple, dict)):
v = hail.str(v)
if not isinstance(v, Expression):
v = literal(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b)) you should also get an error from my example above. If that's true, we should fix it for both use cases.

@danking
Copy link
Contributor

danking commented Aug 24, 2023

Closing abandoned PRs. If someone wants to adopt please start a new PR.

cc: @iris-garden this added som ggplot functionality that would be great to resurrect some day.

@danking danking closed this Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants