Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tuples as sample IDs in Bootstrapping #362

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions fuse/eval/metrics/metrics_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -887,6 +887,23 @@ def reset(self) -> None:
self._metric.reset()
return super().reset()

@staticmethod
def _convert_tuples(ids: List[Tuple[Any, ...]]) -> np.ndarray:

sample_tuple = ids[0]
dtype_tuple = []

for i, tuple_elem in enumerate(sample_tuple):
if isinstance(tuple_elem, str):
max_len = max(len(str(el[i])) for el in ids)
dtype_tuple.append((f"field{i}", f"U{max_len}"))
else:
dtype_tuple.append((f"field{i}", type(tuple_elem)))

ids = np.array(ids, dtype=dtype_tuple)
ids = [tuple(x) for x in ids]
return ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you returning here a list of tuples or numpy array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When applying only line 903 the results numpy array includes tuples but they have a problem of being hashable and I was getting error in the line :
permutation = [original_ids_pos[sample_id] for sample_id in required_ids]
TypeError: unhashable type: 'writeable void-scalar'

The suggestion for the fix I found was to updated their types to tuple that is supported in the list and not in array. If I am trying to convert them after to np.array again the tuples are broken to separate elements and we having 2D array


def eval(
self, results: Dict[str, Any] = None, ids: Optional[Sequence[Hashable]] = None
) -> Dict[str, Any]:
Expand All @@ -902,7 +919,11 @@ def eval(
raise Exception(
"Error: confidence interval is supported only when a unique identifier is specified. Add key 'id' to your data"
)
ids = np.array(ids)

if isinstance(ids[0], tuple):
ids = self._convert_tuples(ids)
else:
ids = np.array(ids)

rnd = np.random.RandomState(self._rnd_seed)
original_sample_results = self._metric.eval(results, ids=ids)
Expand All @@ -920,7 +941,11 @@ def eval(
stratum_filter = stratum_id == stratum
n_stratum = sum(stratum_filter)
random_sample = rnd.randint(0, n_stratum, size=n_stratum)
sampled_ids[stratum_filter] = ids[stratum_filter][random_sample]

flt_indx = np.where(stratum_filter)[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You needed to move to "for loop implementation" even though you converted it ids to numpy array?

for i, idx in enumerate(random_sample):
sampled_ids[flt_indx[i]] = ids[flt_indx[idx]]

boot_results.append(self._metric.eval(results, sampled_ids))

# results can be either a list of floats or a list of dictionaries
Expand Down
Loading