Support tuples as sample IDs in Bootstrapping #362

ellabarkan · 2024-08-11T13:02:28Z

The following things were updated to support sample ids presented as tuples:

Added static method "_convert_tuples" to support converting input list of tuples to np.array format.
Updated the managing of bootstrapping indexes to support indexing using tuples

mosheraboh · 2024-08-15T14:10:58Z

fuse/eval/metrics/metrics_common.py

@@ -887,6 +887,23 @@ def reset(self) -> None:
        self._metric.reset()
        return super().reset()

+    @staticmethod
+    def _convert_tuples(ids: List[Tuple[str, int]]) -> np.ndarray:


Are you assuming tuple of str and int?
Can we do something more general?

Thx. The method of convert is supporting tuple of everything of any length, I need to change the interface definition

mosheraboh · 2024-09-05T11:58:42Z

fuse/eval/metrics/metrics_common.py

+
+        ids = np.array(ids, dtype=dtype_tuple)
+        ids = [tuple(x) for x in ids]
+        return ids


Are you returning here a list of tuples or numpy array?

When applying only line 903 the results numpy array includes tuples but they have a problem of being hashable and I was getting error in the line :
permutation = [original_ids_pos[sample_id] for sample_id in required_ids]
TypeError: unhashable type: 'writeable void-scalar'

The suggestion for the fix I found was to updated their types to tuple that is supported in the list and not in array. If I am trying to convert them after to np.array again the tuples are broken to separate elements and we having 2D array

mosheraboh · 2024-09-05T12:03:53Z

fuse/eval/metrics/metrics_common.py

@@ -920,7 +941,11 @@ def eval(
                stratum_filter = stratum_id == stratum
                n_stratum = sum(stratum_filter)
                random_sample = rnd.randint(0, n_stratum, size=n_stratum)
-                sampled_ids[stratum_filter] = ids[stratum_filter][random_sample]
+
+                flt_indx = np.where(stratum_filter)[0]


You needed to move to "for loop implementation" even though you converted it ids to numpy array?

sivanravidos · 2024-11-21T15:51:57Z

@ellabarkan @mosheraboh
I have the same issue I think when calling GroupAnalysis or Filter, who uses the Collector
in line 410:
permutation = [original_ids_pos[sample_id] for sample_id in required_ids]

original_ids_pos has tuples and required_ids has ndarrays.

I added this hack just to use the code in the meantime
permutation = [original_ids_pos[(sample_id[0], int(sample_id[1]))] for sample_id in required_ids]

Can we add solving this too to this PR and also finalizing it?

sivanravidos · 2024-11-25T16:41:54Z

@mosheraboh I checked and on this branch I get the same exception on line 410, this solution doesn't fix the get method

sivanravidos · 2024-11-26T15:37:08Z

@mosheraboh Ella and I discussed again, the problem is as follows:

we use Fuse wrapper that wraps dataset ids as tuples (dataset_name, id)
(dataset_wrap_seq_to_dict.py line 104)
Tuple arrays are problematic for some operations, you can't apply boolean filters on them,
this happens in several places, specificaly in CI, Filter and GroupAnalysis so we need to fix everywhere
see an alternative fix using list comprehension instead of filtering in branch sample_id_fix (I reverted my prev suggestion to this). This fixes GroupAnalysis only:
https://github.com/BiomedSciAI/fuse-med-ml/compare/group_analyis_configuration..sample_id_fix

ellabarkan added 2 commits August 11, 2024 09:00

added fix for tupples

34d3aad

Merge branch 'master' into fix_bootstrapping_for_tuple_sample_id

6f74634

ellabarkan requested a review from mosheraboh August 11, 2024 16:02

mosheraboh reviewed Aug 15, 2024

View reviewed changes

ellabarkan added 2 commits August 19, 2024 08:18

fixed the signature of tupple convert method

f549140

Merge branch 'master' into fix_bootstrapping_for_tuple_sample_id

96e9795

ellabarkan requested a review from mosheraboh August 20, 2024 06:13

mosheraboh reviewed Sep 5, 2024

View reviewed changes

ellabarkan requested a review from mosheraboh September 15, 2024 11:36

sivanravidos mentioned this pull request Dec 5, 2024

Sample id fix in GroupAnalysis #384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tuples as sample IDs in Bootstrapping #362

Support tuples as sample IDs in Bootstrapping #362

ellabarkan commented Aug 11, 2024 •

edited

Loading

mosheraboh Aug 15, 2024

ellabarkan Aug 15, 2024 •

edited

Loading

mosheraboh Sep 5, 2024

ellabarkan Sep 8, 2024

mosheraboh Sep 5, 2024

sivanravidos commented Nov 21, 2024

sivanravidos commented Nov 25, 2024

sivanravidos commented Nov 26, 2024 •

edited

Loading

Support tuples as sample IDs in Bootstrapping #362

Are you sure you want to change the base?

Support tuples as sample IDs in Bootstrapping #362

Conversation

ellabarkan commented Aug 11, 2024 • edited Loading

mosheraboh Aug 15, 2024

Choose a reason for hiding this comment

ellabarkan Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

mosheraboh Sep 5, 2024

Choose a reason for hiding this comment

ellabarkan Sep 8, 2024

Choose a reason for hiding this comment

mosheraboh Sep 5, 2024

Choose a reason for hiding this comment

sivanravidos commented Nov 21, 2024

sivanravidos commented Nov 25, 2024

sivanravidos commented Nov 26, 2024 • edited Loading

ellabarkan commented Aug 11, 2024 •

edited

Loading

ellabarkan Aug 15, 2024 •

edited

Loading

sivanravidos commented Nov 26, 2024 •

edited

Loading