-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement annotate/drop in terms of select_entries #3200
Merged
patrick-schultz
merged 9 commits into
hail-is:master
from
catoverdrive:select-entries-for-real
Mar 22, 2018
Merged
Changes from 8 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
2bd08ee
rewrite MatrixTable.selectEntries to take single expr
5bbb3b5
remove annotateEntriesExpr and dropEntries from scala MatrixTable
76fbaea
python wip
813958e
wip
25db511
wip
aaa1b35
fixed selectEntries bug
bab6aa1
cleanup
91e4f5c
fix
2db0a72
fix select_entries
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
import itertools | ||
from typing import * | ||
from collections import OrderedDict | ||
|
||
import hail | ||
import hail as hl | ||
|
@@ -940,16 +941,12 @@ def annotate_entries(self, **named_exprs: NamedExprs) -> 'MatrixTable': | |
:class:`.MatrixTable` | ||
Matrix table with new row-and-column-indexed field(s). | ||
""" | ||
exprs = [] | ||
named_exprs = {k: to_expr(v) for k, v in named_exprs.items()} | ||
base, cleanup = self._process_joins(*named_exprs.values()) | ||
|
||
for k, v in named_exprs.items(): | ||
analyze('MatrixTable.annotate_entries', v, self._entry_indices) | ||
exprs.append('g.{k} = {v}'.format(k=escape_id(k), v=v._ast.to_hql())) | ||
check_collisions(self._fields, k, self._entry_indices) | ||
m = MatrixTable(base._jvds.annotateEntriesExpr(",\n".join(exprs))) | ||
return cleanup(m) | ||
|
||
return self._select_entries("MatrixTable.annotate_entries", self.entry.annotate(**named_exprs)) | ||
|
||
def select_globals(self, *exprs: FieldRefArgs, **named_exprs: NamedExprs) -> 'MatrixTable': | ||
"""Select existing global fields or create new fields by name, dropping the rest. | ||
|
@@ -1188,31 +1185,20 @@ def select_entries(self, *exprs: FieldRefArgs, **named_exprs: NamedExprs) -> 'Ma | |
""" | ||
exprs = [to_expr(e) if not isinstance(e, str) else self[e] for e in exprs] | ||
named_exprs = {k: to_expr(v) for k, v in named_exprs.items()} | ||
strs = [] | ||
all_exprs = [] | ||
base, cleanup = self._process_joins(*itertools.chain(exprs, named_exprs.values())) | ||
assignments = OrderedDict() | ||
|
||
ids = [] | ||
for e in exprs: | ||
all_exprs.append(e) | ||
analyze('MatrixTable.select_entries', e, self._entry_indices) | ||
if not e._indices == self._entry_indices: | ||
# detect row or col fields here | ||
raise ExpressionException("method 'select_entries' parameter 'exprs' expects entry-indexed fields," | ||
" found indices {}".format(list(e._indices.axes))) | ||
if e._ast.search(lambda ast: not isinstance(ast, TopLevelReference) and not isinstance(ast, Select)): | ||
raise ExpressionException("method 'select_entries' expects keyword arguments for complex expressions") | ||
strs.append(e._ast.to_hql()) | ||
ids.append(e._ast.name) | ||
for k, e in named_exprs.items(): | ||
all_exprs.append(e) | ||
analyze('MatrixTable.select_entries', e, self._entry_indices) | ||
check_collisions(self._fields, k, self._entry_indices) | ||
strs.append('{} = {}'.format(escape_id(k), e._ast.to_hql())) | ||
ids.append(k) | ||
check_field_uniqueness(ids) | ||
m = MatrixTable(base._jvds.selectEntries(strs)) | ||
return cleanup(m) | ||
assignments[k] = e | ||
check_field_uniqueness(assignments.keys()) | ||
return self._select_entries("MatrixTable.select_entries", hl.struct(**assignments)) | ||
|
||
@typecheck_method(exprs=oneof(str, Expression)) | ||
def drop(self, *exprs: FieldRefArgs) -> 'MatrixTable': | ||
|
@@ -1290,10 +1276,11 @@ def drop(self, *exprs: FieldRefArgs) -> 'MatrixTable': | |
new_col_fields = [f for f in m.col if f not in fields_to_drop] | ||
m = m.select_cols(*new_col_fields) | ||
|
||
entry_fields = [x for x in fields_to_drop if self._fields[x]._indices == self._entry_indices] | ||
if any(self._fields[field]._indices == self._entry_indices for field in fields_to_drop): | ||
# need to drop entry fields | ||
m = MatrixTable(m._jvds.dropEntries(entry_fields)) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why so much space? |
||
entry_fields = [field for field in fields_to_drop if self._fields[field]._indices == self._entry_indices] | ||
if entry_fields: | ||
m = m._select_entries("MatrixTable.drop_entries", m.entry.drop(*entry_fields)) | ||
|
||
return m | ||
|
||
|
@@ -2199,7 +2186,6 @@ def joiner(left: MatrixTable): | |
src_cols_indexed = src_cols_indexed.annotate(**{col_uid: hl.int32(src_cols_indexed[col_uid])}) | ||
left = left._annotate_all(row_exprs = {row_uid: localized.index(*row_exprs)[row_uid]}, | ||
col_exprs = {col_uid: src_cols_indexed.index(*col_exprs)[col_uid]}) | ||
|
||
return left.annotate_entries(**{uid: left[row_uid][left[col_uid]]}) | ||
|
||
return construct_expr(Select(TopLevelReference('g', self._entry_indices), uid), | ||
|
@@ -2238,12 +2224,9 @@ def _annotate_all(self, | |
check_collisions(self._fields, k, self._col_indices) | ||
jmt = jmt.annotateColsExpr(",\n".join(col_strs)) | ||
if entry_exprs: | ||
entry_strs = [] | ||
for k, v in entry_exprs.items(): | ||
analyze('MatrixTable.annotate_entries', v, self._entry_indices) | ||
entry_strs.append('g.{k} = {v}'.format(k=escape_id(k), v=v._ast.to_hql())) | ||
check_collisions(self._fields, k, self._entry_indices) | ||
jmt = jmt.annotateEntriesExpr(",\n".join(entry_strs)) | ||
entry_struct = self.entry.annotate(**entry_exprs) | ||
analyze("MatrixTable.annotate_entries", entry_struct, self._entry_indices) | ||
jmt = jmt.selectEntries(entry_struct._ast.to_hql()) | ||
if global_exprs: | ||
global_strs = [] | ||
for k, v in global_exprs.items(): | ||
|
@@ -2613,6 +2596,12 @@ def add_col_index(self, name: str = 'col_idx') -> 'MatrixTable': | |
def _same(self, other, tolerance=1e-6): | ||
return self._jvds.same(other._jvds, tolerance) | ||
|
||
@typecheck_method(caller=str, s=expr_struct()) | ||
def _select_entries(self, caller, s): | ||
base, cleanup = self._process_joins(s) | ||
analyze(caller, s, self._entry_indices) | ||
return cleanup(MatrixTable(base._jvds.selectEntries(s._ast.to_hql()))) | ||
|
||
@typecheck(datasets=matrix_table_type) | ||
def union_rows(*datasets: Tuple['MatrixTable']) -> 'MatrixTable': | ||
"""Take the union of dataset rows. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like anything in
exprs
is getting passed to_select_entries
. I might be missing something, but if not you should add a test that fails, and then fixselect_entries
to make it pass.