Conformal prediction with conditional guarantees #455

Damien-Bouet · 2024-05-28T13:59:28Z

Description

Implementation of new classes SplitCPRegressor and CCPCalibrator (and other subclasses) to implement the method proposed by Gibbs et al. (2023), and described in the issue #449

Fixes #449

Type of change

Please remove options that are irrelevant.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Add tests to cover all the new features
Still pass the old tests

Checklist

I have read the contributing guidelines
I have updated the HISTORY.rst and AUTHORS.rst files
Linting passes successfully : make lint
Typing passes successfully : make type-check
Unit tests pass successfully : make tests
Coverage is 100% : make coverage
Documentation builds successfully : make doc

…se rtol and atol arguments

thibaultcordier

Very good first PR, thank you a lot! I give you some suggestions (format, style, code, content ...). To sum up:

Transform PhiFunction into abstract class.
Make the comparison with MapieRegressor to foow the same steps (check for instance).
Remove verbose warning and duplicate checks.

mapie/regression/__init__.py

mapie/regression/ccp_regression.py

mapie/regression/utils/ccp_phi_function.py

LacombeLouis

Hey @Damien-Bouet,
Thank you and well done!
I have made some initial comments in the ccp_regression.py file. The main thing that I'm noticing is a lot of cast() and functions that seem exist in other classes. Please note these initial comments, I will have a look further regarding the other files.

mapie/regression/ccp_regression.py

vincentblot28 · 2024-07-26T12:33:51Z

doc/theoretical_description_calibrators.rst

+but any conformal prediction method can be implemented by the user as
+a subclass of :class:`~mapie.calibrators.base.BaseCalibrator`.
+
+Example of naive Split CP:


Why taking example of the naive split CP ? (naive here means that you don't have coverage), so actually it's not a CP method

You are right, I changed it

vincentblot28 · 2024-07-26T12:36:09Z

doc/theoretical_description_ccp.rst

+Method's intuition
+--------------------
+
+We recall that the `naive` method estimates the absolute residuals by a constant :math:`\hat{q}_{n, \alpha}^+`


prefer standard instead of naive

vincentblot28 · 2024-07-26T12:49:40Z

mapie/calibrators/standard.py

+
+    def fit(
+        self,
+        X_calib: ArrayLike,


if you give X_calib is it truely equivalent to MapieRegressor ?

I wanted to do the splitting in the main class (SplitCPRegressor) and not in the calibrators, to have the maximum in the main class and the simplest calibrator possible. However, so calibrators may need the training or calibration data (ex: CQR, would need both). So here, I specify X_calib, but no worries, it is indeed equivalent to MapieRegressor.

LacombeLouis · 2024-07-31T08:14:36Z

doc/theoretical_description_ccp.rst

+- It can create very adaptative intervals (with a varying width which truly reflects the model uncertainty)
+- while providing coverage guantee on all sub-groups of interest (avoiding biases)
+- with the possibility to inject prior knowledge about the data or the model
+


I would also mention the disadvantages here!

I remove the advantages from the theoretical explanation, and added them, with some disadvantages, in the tutorial

LacombeLouis · 2024-07-31T09:21:35Z

examples/regression/4-tutorials/plot_ccp_tutorial.py

+mapie_ccp.fit(X_train, y_train)
+y_pred_ccp, y_pi_ccp = mapie_ccp.predict(X_test)
+
+# ================== PLOT ==================


Why don't we plot all the methods on the same graph? The colors are clearly different.

For the next plots, where there are 6 methods compared together, plotting all of them on the same figure was too much and made it difficult to understand. So I decided to plot them 3 by 3 (in this case, the first 3, and the 4th alone)

LacombeLouis · 2024-07-31T10:10:31Z

examples/regression/4-tutorials/plot_ccp_tutorial.py

-calibrator2 = PolynomialCCP(1)
-calibrator3 = PolynomialCCP([0, 3])
+calibrator2 = PolynomialCCP(1)  # degree=1 is equivalent to degree=[0, 1]
+calibrator3 = PolynomialCCP([1], variable="y_pred")


Could you give a bit of an explanation of the intuition behind these different calibrators?

I explained just above. Tell me if it is not clear enough :

f : X -> (1), will try to estimate the absolute residual with a constant, and will results in a prediction interval of constant width (like the basic split CP)

f : X -> (1, X), will result in a prediction interval of width equal to: a constant + a value proportional to the value of X (it seems a good idea here, as the uncertainty increase with X)

f : X, y_pred -> (y_pred), will result in a prediction interval of width proportional to the prediction (Like the basic split CP with a gamma conformity score).

LacombeLouis · 2024-07-31T10:11:20Z

examples/regression/4-tutorials/plot_ccp_tutorial.py

-
-calibrator1 = CustomCCP([lambda X: X < 0, lambda X: X >= 0])
+# To improve the results, we need to analyse the data
+# and the conformity scoreswe chose (here, the absolute residuals).


LacombeLouis · 2024-07-31T10:12:21Z

examples/regression/4-tutorials/plot_ccp_tutorial.py


 ##############################################################################
-# Using gaussian distances from randomly sampled points is a good solution
-# to have an overall good adaptativity.
+# The most adaptative interval is this last brown one, with the two groups


What's your intuition here? Why?

I updated the conclusion, also adding some disadvantages, to be more impartial. Tell me if you like it :

Conlusion:
The goal is to get prediction intervals which are the most adaptative possible. Perfect adaptativity whould result in a perfectly constant conditional coverage.

Considering this adaptativity criteria, the most adaptative interval is this last brown one, with the two groups and the gaussian calibrators. In this example, the polynomial calibrator (in purple) also worked well, but the gaussian one is more generic (It usually work with any dataset, assuming we use the correct parameters, whereas the polynomial features are not always adapted).

This is the power of the CCP method: combining prior knowledge and generic features (gaussian kernelsl) to have a great overall adaptativity.

However, it can be difficult to find the best calibrator and parameters. Sometimes, a simpler method (standard split with GammaConformityScore for example) can be enough. Don’t forget to try at first the simpler method, and move on with the more advanced if it is necessary.

LacombeLouis · 2024-07-31T14:23:24Z

mapie/calibrators/ccp/base.py

        - ``X``: Input dataset, of shape (n_samples, ``n_in``)
        - ``y_pred``: estimator prediction, of shape (n_samples,)
        - ``z``: exogenous variable, of shape (n_samples, n_features).
-            It should be given in the ``fit`` and ``predict`` methods.
+          It should be given in the ``fit`` and ``predict`` methods.


Do we perform a test for this? By that, I mean that we provide the same combination for the fit and predict.

I am not sure to understand correctly, but if the calibrator need a z value, it will not work neither in the fit nor predict if it is not given. So in a way, there can't be an issue of forgotten z only in the fit or predict

LacombeLouis · 2024-07-31T14:23:42Z

mapie/calibrators/ccp/base.py

-        cs_features = concatenate_functions(self.functions_, params_mapping,
-                                            self._multipliers)
+        cs_features = concatenate_functions(self.functions_, params_mapping)
+        # Normalize


Why do we add this comment?

LacombeLouis · 2024-07-31T14:24:34Z

mapie/calibrators/ccp/polynomial.py

@@ -10,7 +10,8 @@

 class PolynomialCCP(CCPCalibrator):
    """
-    Calibrator used for the in ``SplitCPRegressor`` or ``SplitCPClassifier``
+    Calibrator based on :class:`~mapie.calibrators.ccp.CCPCalibrator`,
+    used for the in ``SplitCPRegressor`` or ``SplitCPClassifier``


Why do we not use :class:?

I added it in the classes docstrings 👍

thomasfrederikhoeck · 2024-08-06T09:57:08Z

HISTORY.rst

@@ -17,6 +17,10 @@ History
 * Building unit tests for different `Subsample` and `BlockBooststrap` instances
 * Change the sign of C_k in the `Kolmogorov-Smirnov` test documentation
 * Building a training set with a fraction between 0 and 1 with `n_samples` attribute when using `split` method from `Subsample` class.
+* Add `SplitCPRegressor`, bsaed on new `SplitCP` abstract class, to support the new CCP method


typo with bsaed instead if based :-)

Thank you !

…p, as it is not used in the paper

thomasfrederikhoeck · 2024-08-08T08:45:58Z

HISTORY.rst

@@ -17,6 +17,10 @@ History
 * Building unit tests for different `Subsample` and `BlockBooststrap` instances
 * Change the sign of C_k in the `Kolmogorov-Smirnov` test documentation
 * Building a training set with a fraction between 0 and 1 with `n_samples` attribute when using `split` method from `Subsample` class.
+* Add `SplitCPRegressor`, based on new `SplitCP` abstract class, to support the new CCP method
+* Add `GaussianCCP`, `PolynomialCCP` and `CustomCCP` based on `CCPCalibrator` to implement the Conditional CP method
+* Add the `StandardCalibrator`, to reproduce standard CP and make sur that the `SplitCPRegressor` is implemented correctly.


sure instead of sur :-)

Damien Brouet and others added 2 commits May 28, 2024 14:57

ADD: first implementation of the CCP method

4a367bd

UPD: increase test_results_with_constant_sample_weights assert_allclo…

fffd511

…se rtol and atol arguments

Damien-Bouet linked an issue May 28, 2024 that may be closed by this pull request

Conformal Prediction With Conditional Guarantees #449

Open

Damien-Bouet added 15 commits May 31, 2024 17:34

MOVE PhiFunction into utils folder

56c67cf

ADD Polynomial and Gaussian PhiFunctions

765c70a

UPD docstrings and return self on fit and calibrate

c8c004d

FIX: tests

1767c1e

ADD: Paper simulations reproduction

8a46555

FIX: tests and coverage

d75a48b

FIX: paper simulation

e29d0d7

Remove Literal for python 3.7 compatibility

fb4dbcb

UPD: sample_weight_test tol values

16c7069

RMV: seaborn from paper simulation imports

b32626e

FIX: linting

4f4447e

FIX: useless seaborn grid style

015bdb7

FIX: Gaussian exp formula

22a901d

MOVE: check_parameters in outside of init

db23dc0

FIX: Gaussian exp formula test

7f2cf56

thibaultcordier requested changes Jun 4, 2024

View reviewed changes

MOVE: PhiFunctions import from regression to regression.utils

24d8e19

LacombeLouis requested changes Jun 4, 2024

View reviewed changes

Damien-Bouet added 9 commits June 5, 2024 17:07

UPD: Add fit/calib _attributes and Base classes inheritence

023207f

UPD: Improve parameters checks

79efe4e

MOVE: check_estimator into utils

c278273

UPD: CCP docstrings

595b037

UPD: Return predictions only if alpha is None

b6735c2

UPD: Improve and functions

e48a8e6

UPD: Convert PhiFunction into a Abstract class

5158462

ADD: CustomPhiFunction

2fcc380

UPD: Tests

88e9273

Damien-Bouet added 2 commits July 26, 2024 12:50

RMV not reproductible warning

c1a00dc

Merge branch 'master' into 449-cp-with-conditional-guarantees

fefd068

vincentblot28 reviewed Jul 26, 2024

View reviewed changes

Damien-Bouet added 10 commits July 29, 2024 11:49

REFACTO: Adapte the PR to the new Classifier refacto

20037f7

FIX: tests

8ac6ae9

UNDO changes in sets.utils

add96e6

Linting

4cb0aba

UPD: change naive by standard in doc

1afcf27

UPD: change optimize to SLSQP

20e94b2

UPD: update CandC notebook after changing optimizer

66fe957

FIX tests

3c0f1c7

UPD: fix coverage

e82b1c7

FIX: typo

40bf93a

LacombeLouis reviewed Jul 31, 2024

View reviewed changes

Damien-Bouet added 3 commits August 5, 2024 16:53

UPD: theoretical description

8ca56d9

ADD: reference in README

2fc54fd

UPD: HISTORY with new CCP content

26f07da

thomasfrederikhoeck reviewed Aug 6, 2024

View reviewed changes

Damien-Bouet added 5 commits August 7, 2024 16:13

UPD: theoretical doc update and typo

c33ed18

UPD: remove sample_weights and corrected alpha in the calibration ste…

db375a3

…p, as it is not used in the paper

UPD: Typos in the doc

53837bd

linting

57e15f8

UPD: test values

9fa15fe

thomasfrederikhoeck reviewed Aug 8, 2024

View reviewed changes

Damien-Bouet added 4 commits August 8, 2024 11:28

Merge branch 'master' into 449-cp-with-conditional-guarantees

ab14963

typo

744d56f

UPD: add :class: tag in docstrings

14e05b9

UPD: doc

10a419e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conformal prediction with conditional guarantees #455

Conformal prediction with conditional guarantees #455

Damien-Bouet commented May 28, 2024 •

edited

Loading

thibaultcordier left a comment

LacombeLouis left a comment

vincentblot28 Jul 26, 2024

Damien-Bouet Jul 29, 2024

vincentblot28 Jul 26, 2024

vincentblot28 Jul 26, 2024

Damien-Bouet Jul 29, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 5, 2024

LacombeLouis Jul 31, 2024

Damien-Bouet Aug 9, 2024

thomasfrederikhoeck Aug 6, 2024

Damien-Bouet Aug 7, 2024

thomasfrederikhoeck Aug 8, 2024

Conformal prediction with conditional guarantees #455

Are you sure you want to change the base?

Conformal prediction with conditional guarantees #455

Conversation

Damien-Bouet commented May 28, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist

thibaultcordier left a comment

Choose a reason for hiding this comment

LacombeLouis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Damien-Bouet commented May 28, 2024 •

edited

Loading