Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ss test mapie #41

Open
wants to merge 79 commits into
base: dev
Choose a base branch
from
Open

Ss test mapie #41

wants to merge 79 commits into from

Conversation

ssorou1
Copy link
Collaborator

@ssorou1 ssorou1 commented Jan 31, 2025

Implementation of MAPIE for rf and mlp models.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

@ssorou1 ssorou1 requested a review from glitt13 January 31, 2025 21:26
@ssorou1
Copy link
Collaborator Author

ssorou1 commented Jan 31, 2025

Changes are made in fs_algo_train_eval.py

@glitt13 glitt13 changed the base branch from main to dev February 3, 2025 23:31
@ssorou1 ssorou1 requested a review from glitt13 February 14, 2025 22:34
algo = algo_data['algo']
mapie = MapieRegressor(algo, cv="prefit", agg_function="median")
mapie.fit(self.X_train, self.y_train)
algo_data['mapie'] = mapie
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssorou1, the mapie object needs to be returned somehow. I suggest adding it to the self.algs_dict object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower_bound, upper_bound = np.percentile(predictions, [(100 - cl) / 2, 100 - (100 - cl) / 2], axis=0)
confidence_intervals[cl] = (lower_bound, upper_bound)

return mean_pred, std_pred, confidence_intervals
Copy link
Collaborator

@glitt13 glitt13 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssorou1, Like mapie, I also suggest creating these as objects inside the class (e.g. another sub-dict inside self.algs_dict[algo_str] rather than something that gets returned. Then the user could access those data easily within the when running fs_proc_algo, e.g. train_eval.algs_dict['rf'].name_of_object_for_bagging_here['mean_pred']

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if algo_str in self.algo_config and self.bagging_ci_params.get('n_algos', None):
n_algos = self.bagging_ci_params.get('n_algos', None)
mean_pred, _, confidence_intervals = self.calculate_bagging_ci(algo_str,n_algos)
self.algs_dict[algo_str]['Uncertainty']['bagging_mean_pred'] = mean_pred
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssorou1, see previous comment - you could substitute this self assignment in to the calculate_bagging_ci rather than here to simplify the code.

We would use a return in calculate_bagging_ci if we anticipated running this function independently, but since all these data are so closely tied together, we can just these data as class objects, as you already end up doing in these lines 1330 and 1331.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented. Please refer to the comment above.

@ssorou1 ssorou1 requested a review from glitt13 February 18, 2025 20:16
Copy link
Collaborator

@glitt13 glitt13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm continuing the discussion on the random state/resampling after skimming over the new work.

@ssorou1
Copy link
Collaborator Author

ssorou1 commented Feb 25, 2025

Hi Guy,
Regarding your last comment, I updated the code as follows:
random.seed(self.rs) is implemented to ensure that the sequence of random numbers is the same each time the function runs.
random_states = [random.randint(1, 10000) for _ in range(n_algos)] pre-generates n_algos unique random states.
These random states are then passed to resample(), making resampling deterministic.

resample(self.X_train, self.y_train, random_state=rand_state) ensures consistent bootstrap sampling across runs.

algo_tmp = type(base_algo)(**{**base_algo.get_params(), "random_state": rand_state}) ensures each model instance gets a unique but reproducible random_state.
https://github.com/ssorou1/formulation-selector/blob/5b357359289a83fb1c6abc6562838bd25ad48382/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1059-L1072

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants