Skip to content

Commit

Permalink
fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
patrick-schultz committed Oct 21, 2022
1 parent d5bf722 commit 799ec45
Show file tree
Hide file tree
Showing 14 changed files with 115 additions and 133 deletions.
1 change: 1 addition & 0 deletions hail/python/hail/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def init(doctest_namespace):
"docs"))

hl.init(global_seed=0)
hl.reset_global_randomness()

try:
generate_datasets(doctest_namespace)
Expand Down
87 changes: 38 additions & 49 deletions hail/python/hail/docs/functions/random.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,28 +13,25 @@ Evaluating the same expression will yield the same value every time, but multipl
calls of the same function will have different results. For example, let `x` be
a random number generated with the function :func:`.rand_unif`:

.. testsetup::
hl.reset_global_randomness()

>>> x = hl.rand_unif(0, 1)

The value of `x` will not change, although other calls to :func:`.rand_unif`
will generate different values:

>>> hl.eval(x)
0.7769696130603699
0.9828239225846387

>>> hl.eval(x)
0.5562065047992025
0.9828239225846387

>>> hl.eval(hl.rand_unif(0, 1))
0.4678132874101748
0.49094525115847415

>>> hl.eval(hl.rand_unif(0, 1))
0.9097632224065403
0.3972543766997359

>>> hl.eval(hl.array([x, x, x]))
[0.5562065047992025, 0.5562065047992025, 0.5562065047992025]
[0.9828239225846387, 0.9828239225846387, 0.9828239225846387]

If the three values in the last expression should be distinct, three separate
calls to :func:`.rand_unif` should be made:
Expand All @@ -43,25 +40,26 @@ calls to :func:`.rand_unif` should be made:
>>> b = hl.rand_unif(0, 1)
>>> c = hl.rand_unif(0, 1)
>>> hl.eval(hl.array([a, b, c]))
[0.8846327207915881, 0.14415148553468504, 0.8202677741734825]
[0.992090957001768, 0.9564448098124774, 0.3905029525642664]

Within the rows of a :class:`.Table`, the same expression will yield a
consistent value within each row, but different (random) values across rows:

>>> table = hl.utils.range_table(5, 1)
>>> table = table.annotate(x1=x, x2=x, rand=hl.rand_unif(0, 1))
>>> table.show()
+-------+-------------+-------------+-------------+
| idx | x1 | x2 | rand |
+-------+-------------+-------------+-------------+
| int32 | float64 | float64 | float64 |
+-------+-------------+-------------+-------------+
| 0 | 8.50369e-01 | 8.50369e-01 | 9.64129e-02 |
| 1 | 5.15437e-01 | 5.15437e-01 | 8.60843e-02 |
| 2 | 5.42493e-01 | 5.42493e-01 | 1.69816e-01 |
| 3 | 5.51289e-01 | 5.51289e-01 | 6.48706e-01 |
| 4 | 6.40977e-01 | 6.40977e-01 | 8.22508e-01 |
+-------+-------------+-------------+-------------+
+-------+----------+----------+----------+
| idx | x1 | x2 | rand |
+-------+----------+----------+----------+
| int32 | float64 | float64 | float64 |
+-------+----------+----------+----------+
| 0 | 4.68e-01 | 4.68e-01 | 6.36e-01 |
| 1 | 8.24e-01 | 8.24e-01 | 9.72e-01 |
| 2 | 7.33e-01 | 7.33e-01 | 1.43e-01 |
| 3 | 8.99e-01 | 8.99e-01 | 5.52e-01 |
| 4 | 4.03e-01 | 4.03e-01 | 3.50e-01 |
+-------+----------+----------+----------+


The same is true of the rows, columns, and entries of a :class:`.MatrixTable`.

Expand All @@ -72,44 +70,35 @@ All random functions can take a specified seed as an argument. This guarantees
that multiple invocations of the same function within the same context will
return the same result, e.g.

.. testsetup::
hl.reset_global_randomness()

>>> hl.eval(hl.rand_unif(0, 1, seed=0))
0.5488135008937808
0.2664972565962568

>>> hl.eval(hl.rand_unif(0, 1, seed=0))
0.5488135008937808

This does not guarantee the same behavior across different contexts; e.g., the
rows may have different values if the expression is applied to different tables:

.. testsetup::
hl.reset_global_randomness()
0.2664972565962568

>>> table = hl.utils.range_table(5, 1).annotate(x=hl.rand_bool(0.5, seed=0))
>>> table = hl.utils.range_table(5, 1).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5488135008937808,
0.7151893652121089,
0.6027633824638369,
0.5448831893094143,
0.42365480398481625]
[0.5820244750020055,
0.33150686392731943,
0.20526631289173847,
0.6964416913998893,
0.6092952493383876]

>>> table = hl.utils.range_table(5, 1).annotate(x=hl.rand_bool(0.5, seed=0))
>>> table = hl.utils.range_table(5, 1).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5488135008937808,
0.7151893652121089,
0.6027633824638369,
0.5448831893094143,
0.42365480398481625]
[0.5820244750020055,
0.33150686392731943,
0.20526631289173847,
0.6964416913998893,
0.6092952493383876]

>>> table = hl.utils.range_table(5, 5).annotate(x=hl.rand_bool(0.5, seed=0))
>>> table = hl.utils.range_table(5, 5).annotate(x=hl.rand_unif(0, 1, seed=0))
>>> table.x.collect()
[0.5488135008937808,
0.9595974306263271,
0.42205690070893265,
0.828743805759555,
0.6414977904324134]
[0.5820244750020055,
0.33150686392731943,
0.20526631289173847,
0.6964416913998893,
0.6092952493383876]

The seed can also be set globally using :func:`.set_global_seed`. This sets the
seed globally for all subsequent Hail operations, and a pipeline will be
Expand Down
8 changes: 4 additions & 4 deletions hail/python/hail/docs/guides/agg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Multiple aggregations
>>> mt.aggregate_cols(hl.struct(
... fraction_female=hl.agg.fraction(mt.pheno.is_female),
... case_ratio=hl.agg.count_where(mt.is_case) / hl.agg.count()))
Struct(fraction_female=0.48, case_ratio=1.0)
Struct(fraction_female=0.44, case_ratio=1.0)

:**dependencies**: :meth:`.MatrixTable.aggregate_cols`, :func:`.aggregators.fraction`, :func:`.aggregators.count_where`, :class:`.StructExpression`

Expand All @@ -129,7 +129,7 @@ One aggregation
:**code**:

>>> mt.aggregate_rows(hl.agg.mean(mt.qual))
544323.8915384616
140054.73333333334

:**dependencies**: :meth:`.MatrixTable.aggregate_rows`, :func:`.aggregators.mean`

Expand All @@ -148,7 +148,7 @@ Multiple aggregations
>>> mt.aggregate_rows(
... hl.struct(n_high_quality=hl.agg.count_where(mt.qual > 40),
... mean_qual=hl.agg.mean(mt.qual)))
Struct(n_high_quality=13, mean_qual=544323.8915384616)
Struct(n_high_quality=9, mean_qual=140054.73333333334)

:**dependencies**: :meth:`.MatrixTable.aggregate_rows`, :func:`.aggregators.count_where`, :func:`.aggregators.mean`, :class:`.StructExpression`

Expand All @@ -167,7 +167,7 @@ Aggregate Entry Values Into A Local Value
>>> mt.aggregate_entries(
... hl.struct(global_gq_mean=hl.agg.mean(mt.GQ),
... call_rate=hl.agg.fraction(hl.is_defined(mt.GT))))
Struct(global_gq_mean=64.01841473178543, call_rate=0.9607692307692308)
Struct(global_gq_mean=69.60514541387025, call_rate=0.9933333333333333)

:**dependencies**: :meth:`.MatrixTable.aggregate_entries`, :func:`.aggregators.mean`, :func:`.aggregators.fraction`, :class:`.StructExpression`

Expand Down
15 changes: 7 additions & 8 deletions hail/python/hail/docs/scans.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,13 @@ along the genome:
+---------------+------------+-----------+---------------+
| locus<GRCh37> | array<str> | int64 | int64 |
+---------------+------------+-----------+---------------+
| 20:10579373 | ["C","T"] | 1 | 0 |
| 20:10579398 | ["C","T"] | 1 | 1 |
| 20:10633237 | ["G","A"] | 69 | 2 |
| 20:10636995 | ["C","T"] | 2 | 71 |
| 20:10639222 | ["G","A"] | 22 | 73 |
| 20:13763601 | ["A","G"] | 2 | 95 |
| 20:16223922 | ["T","C"] | 66 | 97 |
| 20:17479617 | ["G","A"] | 9 | 163 |
| 20:10627772 | ["C","T"] | 2 | 2 |
| 20:10633237 | ["G","A"] | 69 | 4 |
| 20:10636995 | ["C","T"] | 2 | 73 |
| 20:10639222 | ["G","A"] | 22 | 75 |
| 20:13763601 | ["A","G"] | 2 | 97 |
| 20:16223922 | ["T","C"] | 66 | 99 |
| 20:17479617 | ["G","A"] | 9 | 165 |
+---------------+------------+-----------+---------------+
<BLANKLINE>

Expand Down
12 changes: 6 additions & 6 deletions hail/python/hail/experimental/full_outer_join_mt.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.Matrix
+---------------+------------+------+------+
| locus<GRCh37> | array<str> | call | call |
+---------------+------------+------+------+
| 1:3 | ["A","C"] | 1/1 | 1/1 |
| 1:4 | ["A","C"] | 0/1 | 1/1 |
| 1:3 | ["A","C"] | 0/0 | 0/1 |
| 1:4 | ["A","C"] | 1/1 | 0/1 |
| 1:5 | ["A","C"] | 0/0 | 0/0 |
+---------------+------------+------+------+
<BLANKLINE>
Expand All @@ -57,10 +57,10 @@ def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.Matrix
+---------------+------------+------+------+------+------+
| locus<GRCh37> | array<str> | call | call | call | call |
+---------------+------------+------+------+------+------+
| 1:1 | ["A","C"] | 0/0 | 0/0 | NA | NA |
| 1:2 | ["A","C"] | 1/1 | 0/0 | NA | NA |
| 1:3 | ["A","C"] | 0/1 | 0/0 | 1/1 | 1/1 |
| 1:4 | ["A","C"] | NA | NA | 0/1 | 1/1 |
| 1:1 | ["A","C"] | 0/1 | 0/1 | NA | NA |
| 1:2 | ["A","C"] | 0/0 | 1/1 | NA | NA |
| 1:3 | ["A","C"] | 0/0 | 0/0 | 0/0 | 0/1 |
| 1:4 | ["A","C"] | NA | NA | 1/1 | 0/1 |
| 1:5 | ["A","C"] | NA | NA | 0/0 | 0/0 |
+---------------+------------+------+------+------+------+
<BLANKLINE>
Expand Down
21 changes: 11 additions & 10 deletions hail/python/hail/expr/aggregators/aggregators.py
Original file line number Diff line number Diff line change
Expand Up @@ -1139,16 +1139,16 @@ def inbreeding(expr, prior) -> StructExpression:
+------------------+-----------+-------------+------------------+------------------+
| str | float64 | int64 | float64 | int64 |
+------------------+-----------+-------------+------------------+------------------+
| "C1046::HG02024" | 2.69e-01 | 8 | 6.63e+00 | 7 |
| "C1046::HG02025" | -4.62e-01 | 8 | 6.63e+00 | 6 |
| "C1046::HG02026" | -4.62e-01 | 8 | 6.63e+00 | 6 |
| "C1047::HG00731" | 2.69e-01 | 8 | 6.63e+00 | 7 |
| "C1047::HG00732" | 2.69e-01 | 8 | 6.63e+00 | 7 |
| "C1047::HG00733" | 2.69e-01 | 8 | 6.63e+00 | 7 |
| "C1048::HG02024" | -4.62e-01 | 8 | 6.63e+00 | 6 |
| "C1048::HG02025" | -4.62e-01 | 8 | 6.63e+00 | 6 |
| "C1048::HG02026" | -4.62e-01 | 8 | 6.63e+00 | 6 |
| "C1049::HG00731" | 2.69e-01 | 8 | 6.63e+00 | 7 |
| "C1046::HG02024" | 2.79e-01 | 9 | 7.61e+00 | 8 |
| "C1046::HG02025" | -4.41e-01 | 9 | 7.61e+00 | 7 |
| "C1046::HG02026" | -4.41e-01 | 9 | 7.61e+00 | 7 |
| "C1047::HG00731" | 2.79e-01 | 9 | 7.61e+00 | 8 |
| "C1047::HG00732" | 2.79e-01 | 9 | 7.61e+00 | 8 |
| "C1047::HG00733" | 2.79e-01 | 9 | 7.61e+00 | 8 |
| "C1048::HG02024" | -4.41e-01 | 9 | 7.61e+00 | 7 |
| "C1048::HG02025" | -4.41e-01 | 9 | 7.61e+00 | 7 |
| "C1048::HG02026" | -4.41e-01 | 9 | 7.61e+00 | 7 |
| "C1049::HG00731" | 2.79e-01 | 9 | 7.61e+00 | 8 |
+------------------+-----------+-------------+------------------+------------------+
showing top 10 rows
<BLANKLINE>
Expand Down Expand Up @@ -1219,6 +1219,7 @@ def call_stats(call, alleles) -> StructExpression:
+---------------+--------------+---------------------+-------------+---------------------------+
| 20:10579373 | [199,1] | [9.95e-01,5.00e-03] | 200 | [99,0] |
| 20:10579398 | [198,2] | [9.90e-01,1.00e-02] | 200 | [99,1] |
| 20:10627772 | [198,2] | [9.90e-01,1.00e-02] | 200 | [98,0]
| 20:10633237 | [108,92] | [5.40e-01,4.60e-01] | 200 | [31,23] |
| 20:10636995 | [198,2] | [9.90e-01,1.00e-02] | 200 | [98,0] |
| 20:10639222 | [175,25] | [8.75e-01,1.25e-01] | 200 | [78,3] |
Expand Down
34 changes: 16 additions & 18 deletions hail/python/hail/expr/expressions/base_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -972,35 +972,33 @@ def export(self, path, delimiter='\t', missing='NA', header=True):
>>> with open('output/gt-no-header.tsv', 'r') as f:
... for line in f:
... print(line, end='')
1:1 ["A","C"] 0/1 0/1 0/0 0/0
1:2 ["A","C"] 1/1 0/1 1/1 1/1
1:3 ["A","C"] 1/1 0/1 0/1 0/0
1:4 ["A","C"] 1/1 0/1 1/1 1/1
<BLANKLINE>
1:1 ["A","C"] 1/1 1/1 0/1 0/1
1:2 ["A","C"] 1/1 1/1 0/0 1/1
1:3 ["A","C"] 0/0 0/0 0/1 0/0
1:4 ["A","C"] 1/1 0/1 1/1 0/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
... for line in f:
... print(line, end='')
sample_idx pop
0 2
1 2
2 0
3 2
0 0
1 0
2 2
3 0
<BLANKLINE>
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
... for line in f:
... print(line, end='')
locus alleles ancestral_af
1:1 ["A","C"] 5.3905e-01
1:2 ["A","C"] 8.6768e-01
1:3 ["A","C"] 4.3765e-01
1:4 ["A","C"] 7.6300e-01
1:1 ["A","C"] 5.6562e-01
1:2 ["A","C"] 3.6521e-01
1:3 ["A","C"] 2.6421e-01
1:4 ["A","C"] 6.5715e-01
<BLANKLINE>
>>> mt = small_mt
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
... for line in f:
Expand All @@ -1024,10 +1022,10 @@ def export(self, path, delimiter='\t', missing='NA', header=True):
... for line in f:
... print(line, end='')
locus alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1 ["A","C"] 0/1 0/1 0/0 0/0
1:2 ["A","C"] 1/1 0/1 1/1 1/1
1:3 ["A","C"] 1/1 0/1 0/1 0/0
1:4 ["A","C"] 1/1 0/1 1/1 1/1
1:1 ["A","C"] 1/1 1/1 0/1 0/1
1:2 ["A","C"] 1/1 1/1 0/0 1/1
1:3 ["A","C"] 0/0 0/0 0/1 0/0
1:4 ["A","C"] 1/1 0/1 1/1 0/1
<BLANKLINE>
Expand Down
Loading

0 comments on commit 799ec45

Please sign in to comment.