[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

tpoterba · 2023-03-08T13:01:53Z

add qq_plot and manhattan built on plotly. It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.
Fix relatedness estimation for mating simulation

It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.

danking · 2023-03-08T16:18:31Z

hail/python/hail/expr/functions.py

@@ -3905,7 +3905,7 @@ def group_by(f: Callable, collection) -> DictExpression:
 @typecheck(f=func_spec(2, expr_any),
           zero=expr_any,
           collection=expr_oneof(expr_set(), expr_array()))
-def fold(f: Callable, zero, collection) -> Expression:


why this change?

drive-by fix. Hinting the superclass of the returned value causes lint issues (in my IDE at least), leaving it out and using duck typing doesn't. Can remove if you want, but I've been stripping these out when I see them.

Hmm. Pyright complains about our use of Callable, but I don't understand the error message [1]. With or without this change, pyright says the type of

fold(lambda _: 3, literal(0), literal([1]))

is Expression. What type do you get?

Hmm, doesn't seem like there's integration for Pyright with IntelliJ unfortunately. The Python community seems to unifying on it since it's remarkably good.

[1]: Illegal type annotation: variable not allowed unless it is a type alias (lsp)

Ohhhh, I bet this is because your IDE thinks the return type is the return type of the function after modification by @typecheck

danking · 2023-03-08T16:22:21Z

hail/python/hail/ggplot/aes.py

@@ -44,6 +45,8 @@ def aes(**kwargs):
    hail_field_properties = {}

    for k, v in kwargs.items():
+        if isinstance(v, (tuple, dict)):
+            v = hail.str(v)
        if not isinstance(v, Expression):
            v = literal(v)


What's an example of using this new behavior? I feel a bit concerned that aes(foo=(ht.a, ht.b)) is different from:

ht = ht.annotate(c=(ht.a, ht.b)) aes(foo=ht.c)

I can remove this, I think I added this in the process of trying to build the manhattan plot in ggplot. The issue I was trying to solve is that passing hl.tuple([something1, something2]) produces a tuple label, but if you pass (something1, something2), you get an exception after it tries to take the literal path. This is a consequence of trying to support both expressions and non-expr sequences. This is sort of a hack, but catches more cases of things that are probably meant to go the expr route.

Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b)) you should also get an error from my example above. If that's true, we should fix it for both use cases.

danking · 2023-03-08T16:23:22Z

hail/python/hail/ggplot/premade.py

+
+    Returns
+    -------
+    :class:`bokeh.plotting.figure.Figure`


should be a plotly figure

danking · 2023-03-08T16:47:59Z

hail/python/hail/ggplot/premade.py

+
+    Returns
+    -------
+    :class:`bokeh.plotting.figure.Figure` if no label or a single label was given, otherwise :class:`bokeh.models.layouts.Column`


return type is wrong

danking · 2023-03-08T16:49:53Z

hail/python/hail/methods/relatedness/mating_simulation.py

-            return new_samples
+    ns = mt.count_cols()
+
+    # dict of true nonzero relatedness. indeed by tuples of (id1, id2) where a pair is stored with the larger (later) id first.


danking · 2023-03-08T17:04:58Z

hail/python/hail/methods/relatedness/mating_simulation.py

+
+        new_pairs = int(mating_generation_size * pairs_per_generation_multiplier)
+
+        curr_gen_fwd = defaultdict(dict)


what's this for? It gets updated but, afaict, is not returned.

the primary relatedness graph stores edges from higher-number sample to lower-number sample. However, we need to be able to look up the edges within a generation for any sample, and that requires a graph in the other direction. We create this graph for each generation and use it in the next, overwriting last_gen_fwd each generation.

danking · 2023-03-08T17:06:36Z

hail/python/hail/methods/relatedness/mating_simulation.py

+                    set_rel(curr_sample_idx, k, v, curr_gen_fwd, last_generation_start_idx)
+
+                mother_sibs = parent_to_child[mother]
+                father_sibs = parent_to_child[father]


nit, but these are more like mothers_kids and fathers_kids, right? I get that you mean siblings by way o mother, but I think mothers_kids is a bit easier to parse.

danking · 2023-03-08T17:07:43Z

hail/python/hail/methods/relatedness/mating_simulation.py

+                        rel = sibling_rel
+                    else:
+                        _, _, sib_mom, sib_dad = samples[sib]
+                        other_parent = sib_mom if sib_mom != mother else sib_dad


AFICT, there's no sex involved here, otherwise it would obviously be the sib's dad that is the other parent, right? Since we know this sib is the mother's child.

danking · 2023-03-08T17:08:41Z

hail/python/test/hail/methods/test_simulation.py

+    hl.simulate_random_mating(mt, n_rounds=2, pairs_per_generation_multiplier=0.5,
+                              children_per_pair=2)._force_count_rows()


Can we add a test where we simulate two rounds with just a few people and manually verify the relatednesses?

danking · 2023-03-08T17:10:12Z

hail/python/hail/methods/relatedness/mating_simulation.py

+
+        curr_sample_idx = len(samples)
+        parent_to_child = defaultdict(set)
+        for pair in range(new_pairs):


not for this PR, but I think you've basically built a pedigree simulator. We should pull that out as a separately useful thing.

Yeah, curious what other applications might be.

danking · 2023-03-14T19:04:33Z

hail/python/hail/ggplot/aes.py

@@ -44,6 +45,8 @@ def aes(**kwargs):
    hail_field_properties = {}

    for k, v in kwargs.items():
+        if isinstance(v, (tuple, dict)):
+            v = hail.str(v)
        if not isinstance(v, Expression):
            v = literal(v)


Let's remove it from this PR and revisit separately. I seems to me that if you get an error from hl.literal((ht.a, ht.b)) you should also get an error from my example above. If that's true, we should fix it for both use cases.

danking · 2023-08-24T18:43:46Z

Closing abandoned PRs. If someone wants to adopt please start a new PR.

cc: @iris-garden this added som ggplot functionality that would be great to resurrect some day.

tpoterba added 6 commits March 8, 2023 08:00

[plotting] Add manhattan and ggplot methods built on plotly

0d1d737

It's a snarly problem to add support for all the customization necessary to build these out of ggplot directly, but this lets us use ggplot in the tutorial notebooks without having to use a separate plotting lib.

fix aes

c890305

add arg to ibd

98ce2fd

Additional changes used for workshop

42b1e63

Additional changes used for workshop

5fb4408

fix relatedness estimation

1ab4690

tpoterba assigned danking Mar 8, 2023

danking suggested changes Mar 8, 2023

View reviewed changes

tpoterba added 5 commits March 9, 2023 07:22

add test

b123465

logging improvements

b3dfdc9

add max to show

8ff98ba

col-warning

37f305e

fix log

e406e9c

danking suggested changes Mar 14, 2023

View reviewed changes

danking closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

tpoterba commented Mar 8, 2023

danking Mar 8, 2023

tpoterba Mar 8, 2023

danking Mar 13, 2023

danking Mar 13, 2023

danking Mar 8, 2023

tpoterba Mar 8, 2023

danking Mar 14, 2023

danking Mar 8, 2023

danking Mar 8, 2023

danking Mar 8, 2023

danking Mar 8, 2023

tpoterba Mar 8, 2023

danking Mar 8, 2023

danking Mar 8, 2023

tpoterba Mar 8, 2023

danking Mar 8, 2023

danking Mar 8, 2023

tpoterba Mar 8, 2023

danking Mar 14, 2023

danking commented Aug 24, 2023


		new_pairs = int(mating_generation_size * pairs_per_generation_multiplier)

		curr_gen_fwd = defaultdict(dict)

		hl.simulate_random_mating(mt, n_rounds=2, pairs_per_generation_multiplier=0.5,
		children_per_pair=2)._force_count_rows()

[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

[workshop] Add necessary infrastructure for IBG 2023 workshop #12769

Conversation

tpoterba commented Mar 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danking commented Aug 24, 2023