Ss test mapie #41

ssorou1 · 2025-01-31T21:26:19Z

Implementation of MAPIE for rf and mlp models.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

fix: add back in accidental removal of read_type argument.

…tion-selector into ss_test_fci_dev3

ssorou1 · 2025-01-31T21:27:04Z

Changes are made in fs_algo_train_eval.py

…ng calculation. Convert rf Bagging ci as a separate function

…ing calculation. Convert mlp Bagging ci as a separate function

…trapping runs from the yaml file

…strapping runs from the yaml file

…ccordingly

…prediction algorithms pipeline to remove ci

glitt13 · 2025-02-18T18:00:13Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            algo = algo_data['algo']
+            mapie = MapieRegressor(algo, cv="prefit", agg_function="median")  
+            mapie.fit(self.X_train, self.y_train)  
+            algo_data['mapie'] = mapie


@ssorou1, the mapie object needs to be returned somehow. I suggest adding it to the self.algs_dict object.

Done!
https://github.com/ssorou1/formulation-selector/blob/ef57c391bf4bd9a5fbcb143aea9ff6ee8bcc445f/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1052

glitt13 · 2025-02-18T18:04:36Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            lower_bound, upper_bound = np.percentile(predictions, [(100 - cl) / 2, 100 - (100 - cl) / 2], axis=0)
+            confidence_intervals[cl] = (lower_bound, upper_bound)
+
+        return mean_pred, std_pred, confidence_intervals


@ssorou1, Like mapie, I also suggest creating these as objects inside the class (e.g. another sub-dict inside self.algs_dict[algo_str] rather than something that gets returned. Then the user could access those data easily within the when running fs_proc_algo, e.g. train_eval.algs_dict['rf'].name_of_object_for_bagging_here['mean_pred']

Sure. The calculate_bagging_ci() is updated:
https://github.com/ssorou1/formulation-selector/blob/ef57c391bf4bd9a5fbcb143aea9ff6ee8bcc445f/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1042-L1044

https://github.com/ssorou1/formulation-selector/blob/ef57c391bf4bd9a5fbcb143aea9ff6ee8bcc445f/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1295-L1298

glitt13 · 2025-02-18T18:08:12Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            if algo_str in self.algo_config and self.bagging_ci_params.get('n_algos', None):
+                n_algos = self.bagging_ci_params.get('n_algos', None)
+                mean_pred, _, confidence_intervals = self.calculate_bagging_ci(algo_str,n_algos)
+                self.algs_dict[algo_str]['Uncertainty']['bagging_mean_pred'] = mean_pred


@ssorou1, see previous comment - you could substitute this self assignment in to the calculate_bagging_ci rather than here to simplify the code.

We would use a return in calculate_bagging_ci if we anticipated running this function independently, but since all these data are so closely tied together, we can just these data as class objects, as you already end up doing in these lines 1330 and 1331.

Implemented. Please refer to the comment above.

…fs_proc

… than the csv file

…fter joblib update

…ion under train_eval()

… and update fs_algo_train_eval accordingly

glitt13

I'm continuing the discussion on the random state/resampling after skimming over the new work.

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

…ring reproducibility.

ssorou1 · 2025-02-25T16:57:25Z

Hi Guy,
Regarding your last comment, I updated the code as follows:
random.seed(self.rs) is implemented to ensure that the sequence of random numbers is the same each time the function runs.
random_states = [random.randint(1, 10000) for _ in range(n_algos)] pre-generates n_algos unique random states.
These random states are then passed to resample(), making resampling deterministic.

resample(self.X_train, self.y_train, random_state=rand_state) ensures consistent bootstrap sampling across runs.

algo_tmp = type(base_algo)(**{**base_algo.get_params(), "random_state": rand_state}) ensures each model instance gets a unique but reproducible random_state.
https://github.com/ssorou1/formulation-selector/blob/5b357359289a83fb1c6abc6562838bd25ad48382/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1059-L1072

Soroush Sorourian and others added 19 commits January 22, 2025 15:49

bring in the ci for rf in fs_algo_train_eval.py

31791cb

bring in ci to fs_proc_algo.py

dcc6c5a

bring in ci to fs_pred_algo.py

1805ec0

apply only one n_estimators (grid selection bug)

93ede73

fix a syntax error in fs_algo_train_eval

2d3dbfa

clean fs_algo_train_eval.py

1154eef

added unit test for std_Xtrain_path function

50e790c

Revert "Merge remote-tracking branch 'upstream/dev' into ss_test_fci_…

ffbbc62

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

added unit test for fci function

bdb7d32

Update fs_pred_algo.py

cb0efe1

fix: add back in accidental removal of read_type argument.

brought back list of values in n_estimators in xssa_algo_config.yaml

5eb00e4

Merge branch 'ss_test_fci_dev3' of https://github.com/ssorou1/formula…

c310236

…tion-selector into ss_test_fci_dev3

Incorporate Bagging into mlp in fs_algo_train_eval

737c36a

rf n_estimators=400

88661ad

Incorporate Bagging into rf in fs_algo_train_eval

ea914f4

incorporated mapie for rf model

ad24727

update the MAPIE to use the same fit as rf

983a662

incorporated mapie for mlp model

710038a

deleted unsed pred_rf variable inside tran_algos function

d24e19f

ssorou1 requested a review from glitt13 January 31, 2025 21:26

glitt13 changed the base branch from main to dev February 3, 2025 23:31

Soroush Sorourian added 8 commits February 6, 2025 13:17

add number of bootstrap runs as a parameter to the yaml file

08deb3c

Implemented multiple confidence intervals (90, 95 & 99%) for rf Baggi…

6476710

…ng calculation. Convert rf Bagging ci as a separate function

Implemented multiple confidence intervals (90, 95 & 99%) for mlp Bagg…

e94b32d

…ing calculation. Convert mlp Bagging ci as a separate function

Update rf_Bagging_ci function to dynamically read the number of boots…

746b172

…trapping runs from the yaml file

Update mlp_Bagging_ci function to dynamically read the number of boot…

6fdb8d6

…strapping runs from the yaml file

develop a separate function for MAPIE and update fs_algo_train_eval a…

52f088f

…ccordingly

Rename ci for rf model for clarification

048ca30

Update fs_pred to consider forestci only for rf model. Update saving …

5c9e14b

…prediction algorithms pipeline to remove ci

Soroush Sorourian added 2 commits February 14, 2025 15:59

revert back the COMID retrieval section as pynhd is updated to >=0.19

9825210

add additional comment for MAPIE_alpha

8129667

ssorou1 requested a review from glitt13 February 14, 2025 22:34

resolve merge conflicts and apply fixes by Guy

9263a49

ssorou1 force-pushed the ss_test_mapie branch from fa1c2a1 to 9263a49 Compare February 18, 2025 15:38

glitt13 reviewed Feb 18, 2025

View reviewed changes

Soroush Sorourian added 3 commits February 18, 2025 13:48

simplify the calculate_bagging_ci function

2390b90

update calculate_mapie() to write mapie to self.algs_dict

4ac5ce7

clean up calculate_bagging_ci()

ef57c39

ssorou1 requested a review from glitt13 February 18, 2025 20:16

Soroush Sorourian added 14 commits February 18, 2025 16:24

update train_algos_grid_search to calculate forestci with grid search

b3cbc62

Merge remote-tracking branch 'upstream/dev' into ss_test_mapie

3336544

add save_Xtrain_to_csv function to the model and update the code and …

e263848

…fs_proc

update the save_algos() to output the X_train shape to the joblib file

36d6a62

Write forestci parameters in terms of numpy arrays instead of dataframe

2563199

updated fs_pred to read the X_train dimension from joblib file rather…

d3c6624

… than the csv file

clean up unnecessary Xtrain related functions in fs_algo_train_eval a…

1b3def2

…fter joblib update

clean up save_algos function

3142576

fix the forestci with grid search while keeping the forestci calculat…

5913aed

…ion under train_eval()

implement calculate_bagging_ci to work with grid search

982ec5b

update the keys and indices of tuples inside Bagging ci

6fee686

update the output format for mapie

80620a0

bring confidence_levels out of Bagging_uncertainty in the config file…

6b3c5b8

… and update fs_algo_train_eval accordingly

implement forestci with user-defined confidence intervals

c6f4554

glitt13 reviewed Feb 24, 2025

View reviewed changes

pkg/fs_algo/fs_algo/fs_algo_train_eval.py Show resolved Hide resolved

Soroush Sorourian added 2 commits February 24, 2025 17:48

update the code to save uncertainty dictionary

b17010b

Generate n_algos unique random states using self.rs as the seed, ensu…

5b35735

…ring reproducibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ss test mapie #41

Ss test mapie #41

ssorou1 commented Jan 31, 2025

ssorou1 commented Jan 31, 2025

glitt13 Feb 18, 2025

ssorou1 Feb 18, 2025

glitt13 Feb 18, 2025 •

edited

Loading

ssorou1 Feb 18, 2025

glitt13 Feb 18, 2025

ssorou1 Feb 18, 2025

glitt13 left a comment

ssorou1 commented Feb 25, 2025

Ss test mapie #41

Are you sure you want to change the base?

Ss test mapie #41

Conversation

ssorou1 commented Jan 31, 2025

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

ssorou1 commented Jan 31, 2025

glitt13 Feb 18, 2025

Choose a reason for hiding this comment

ssorou1 Feb 18, 2025

Choose a reason for hiding this comment

glitt13 Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ssorou1 Feb 18, 2025

Choose a reason for hiding this comment

glitt13 Feb 18, 2025

Choose a reason for hiding this comment

ssorou1 Feb 18, 2025

Choose a reason for hiding this comment

glitt13 left a comment

Choose a reason for hiding this comment

ssorou1 commented Feb 25, 2025

glitt13 Feb 18, 2025 •

edited

Loading