API Reference

Class APIs

The Sklearn Class Wrappers

TargetPermutationImportancesWrapper

TargetPermutationImportancesWrapper(
    model_cls: Any,
    model_cls_params: Dict,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
    shuffle_feature_order: bool = False,
    permutation_importance_calculator: PermutationImportanceCalculatorType = compute_permutation_importance_by_subtraction,
)

Compute the permutation importance of a model given a dataset.

Parameters:

Name	Type	Description	Default
`model_cls`	`Any`	The constructor/class of the model.	required
`model_cls_params`	`Dict`	The parameters to pass to the model constructor.	required
`model_fit_params`		The parameters to pass to the model fit method.	required
`num_actual_runs`	`PositiveInt`	Number of actual runs. Defaults to 2.	`2`
`num_random_runs`	`PositiveInt`	Number of random runs. Defaults to 10.	`10`
`shuffle_feature_order`	`bool`	Whether to shuffle the feature order for each run (only for X being pd.DataFrame). Defaults to False.	`False`
`permutation_importance_calculator`	`PermutationImportanceCalculatorType`	The function to compute the final importance. Defaults to compute_permutation_importance_by_subtraction.	`compute_permutation_importance_by_subtraction`

Example

# Import the package
import target_permutation_importances as tpi

# Prepare a dataset
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer

# Models
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier

data = load_breast_cancer()

# Convert to a pandas dataframe
Xpd = pd.DataFrame(data.data, columns=data.feature_names)

# Compute permutation importances with default settings
wrapped_model = tpi.TargetPermutationImportancesWrapper(
    model_cls=RandomForestClassifier, # The constructor/class of the model.
    model_cls_params={ # The parameters to pass to the model constructor. Update this based on your needs.
        "n_jobs": -1,
    },
    num_actual_runs=2,
    num_random_runs=10,
    # Options: {compute_permutation_importance_by_subtraction, compute_permutation_importance_by_division}
    # Or use your own function to calculate.
    permutation_importance_calculator=tpi.compute_permutation_importance_by_subtraction,
)
wrapped_model.fit(
    X=Xpd, # pd.DataFrame, np.ndarray
    y=data.target, # pd.Series, np.ndarray
    # And other fit parameters for the model.
)
# Get the feature importances as a pandas dataframe
result_df = wrapped_model.feature_importances_df
print(result_df[["feature", "importance"]].sort_values("importance", ascending=False).head())


# Select top-5 features with sklearn `SelectFromModel`
selector = SelectFromModel(
    estimator=wrapped_model, prefit=True, max_features=5, threshold=-np.inf
).fit(Xpd, data.target)
selected_x = selector.transform(Xpd)
print(selected_x.shape)
print(selector.get_feature_names_out())

Source code in target_permutation_importances/sklearn_wrapper.py

def __init__(
    self,
    model_cls: Any,
    model_cls_params: Dict,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
    shuffle_feature_order: bool = False,
    permutation_importance_calculator: PermutationImportanceCalculatorType = compute_permutation_importance_by_subtraction,
):
    """
    Compute the permutation importance of a model given a dataset.

    Args:
        model_cls: The constructor/class of the model.
        model_cls_params: The parameters to pass to the model constructor.
        model_fit_params: The parameters to pass to the model fit method.
        num_actual_runs: Number of actual runs. Defaults to 2.
        num_random_runs: Number of random runs. Defaults to 10.
        shuffle_feature_order: Whether to shuffle the feature order for each run (only for X being pd.DataFrame). Defaults to False.
        permutation_importance_calculator: The function to compute the final importance. Defaults to compute_permutation_importance_by_subtraction.

    Example:
        ```python
        # Import the package
        import target_permutation_importances as tpi

        # Prepare a dataset
        import pandas as pd
        import numpy as np
        from sklearn.datasets import load_breast_cancer

        # Models
        from sklearn.feature_selection import SelectFromModel
        from sklearn.ensemble import RandomForestClassifier

        data = load_breast_cancer()

        # Convert to a pandas dataframe
        Xpd = pd.DataFrame(data.data, columns=data.feature_names)

        # Compute permutation importances with default settings
        wrapped_model = tpi.TargetPermutationImportancesWrapper(
            model_cls=RandomForestClassifier, # The constructor/class of the model.
            model_cls_params={ # The parameters to pass to the model constructor. Update this based on your needs.
                "n_jobs": -1,
            },
            num_actual_runs=2,
            num_random_runs=10,
            # Options: {compute_permutation_importance_by_subtraction, compute_permutation_importance_by_division}
            # Or use your own function to calculate.
            permutation_importance_calculator=tpi.compute_permutation_importance_by_subtraction,
        )
        wrapped_model.fit(
            X=Xpd, # pd.DataFrame, np.ndarray
            y=data.target, # pd.Series, np.ndarray
            # And other fit parameters for the model.
        )
        # Get the feature importances as a pandas dataframe
        result_df = wrapped_model.feature_importances_df
        print(result_df[["feature", "importance"]].sort_values("importance", ascending=False).head())


        # Select top-5 features with sklearn `SelectFromModel`
        selector = SelectFromModel(
            estimator=wrapped_model, prefit=True, max_features=5, threshold=-np.inf
        ).fit(Xpd, data.target)
        selected_x = selector.transform(Xpd)
        print(selected_x.shape)
        print(selector.get_feature_names_out())
        ```
    """
    self.model_cls = model_cls
    self.model_cls_params = model_cls_params
    self.model = self.model_cls(**self.model_cls_params)
    self.num_actual_runs = num_actual_runs
    self.num_random_runs = num_random_runs
    self.shuffle_feature_order = shuffle_feature_order
    self.permutation_importance_calculator = permutation_importance_calculator

fit

fit(
    X: XType, y: YType, **fit_params
) -> TargetPermutationImportancesWrapper

Compute the permutation importance of a model given a dataset.

Parameters:

Name	Type	Description	Default
`X`	`XType`	The input data.	required
`y`	`YType`	The target vector.	required
`fit_params`		The parameters to pass to the model fit method.	`{}`

Source code in target_permutation_importances/sklearn_wrapper.py

def fit(
    self,
    X: XType,
    y: YType,
    **fit_params,
) -> "TargetPermutationImportancesWrapper":
    """
    Compute the permutation importance of a model given a dataset.

    Args:
        X: The input data.
        y: The target vector.
        fit_params: The parameters to pass to the model fit method.
    """
    result = compute(
        model_cls=self.model_cls,
        model_cls_params=self.model_cls_params,
        model_fit_params=lambda _: fit_params,
        X=X,
        y=y,
        num_actual_runs=self.num_actual_runs,
        num_random_runs=self.num_random_runs,
        shuffle_feature_order=self.shuffle_feature_order,
        permutation_importance_calculator=self.permutation_importance_calculator,
    )
    if isinstance(result, list):  # pragma: no cover
        result = result[0]

    if isinstance(X, pd.DataFrame):
        self._process_feature_importances_df(result, X.columns.to_list())
    else:
        self._process_feature_importances_df(result, list(range(X.shape[1])))
    return self

Functional APIs

The core APIs of this library.

compute

compute(
    model_cls: Any,
    model_cls_params: Dict,
    model_fit_params: Union[
        ModelFitParamsBuilderType, Dict
    ],
    X: XType,
    y: YType,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
    shuffle_feature_order: bool = False,
    permutation_importance_calculator: Union[
        PermutationImportanceCalculatorType,
        List[PermutationImportanceCalculatorType],
    ] = compute_permutation_importance_by_subtraction,
) -> Union[pd.DataFrame, List[pd.DataFrame]]

Compute the permutation importance of a model given a dataset.

Parameters:

Name	Type	Description	Default
`model_cls`	`Any`	The constructor/class of the model.	required
`model_cls_params`	`Dict`	The parameters to pass to the model constructor.	required
`model_fit_params`	`Union[ModelFitParamsBuilderType, Dict]`	A Dict or A function that return parameters to pass to the model fit method.	required
`X`	`XType`	The input data.	required
`y`	`YType`	The target vector.	required
`num_actual_runs`	`PositiveInt`	Number of actual runs. Defaults to 2.	`2`
`num_random_runs`	`PositiveInt`	Number of random runs. Defaults to 10.	`10`
`shuffle_feature_order`	`bool`	Whether to shuffle the feature order for each run (only for X being pd.DataFrame). Defaults to False.	`False`
`permutation_importance_calculator`	`Union[PermutationImportanceCalculatorType, List[PermutationImportanceCalculatorType]]`	The function to compute the final importance. Defaults to compute_permutation_importance_by_subtraction.	`compute_permutation_importance_by_subtraction`

Returns:

Type	Description
`Union[DataFrame, List[DataFrame]]`	The return DataFrame(s) contain columns ["feature", "importance"]

Example

# import the package
import target_permutation_importances as tpi

# Prepare a dataset
import pandas as pd
from sklearn.datasets import load_breast_cancer

# Models
from sklearn.ensemble import RandomForestClassifier

data = load_breast_cancer()

# Convert to a pandas dataframe
Xpd = pd.DataFrame(data.data, columns=data.feature_names)

# Compute permutation importances with default settings
result_df = tpi.compute(
    model_cls=RandomForestClassifier, # The constructor/class of the model.
    model_cls_params={ # The parameters to pass to the model constructor. Update this based on your needs.
        "n_jobs": -1,
    },
    model_fit_params={}, # The parameters to pass to the model fit method. Update this based on your needs.
    X=Xpd, # pd.DataFrame, np.ndarray
    y=data.target, # pd.Series, np.ndarray
    num_actual_runs=2,
    num_random_runs=10,
    # Options: {compute_permutation_importance_by_subtraction, compute_permutation_importance_by_division}
    # Or use your own function to calculate.
    permutation_importance_calculator=tpi.compute_permutation_importance_by_subtraction,
)

print(result_df[["feature", "importance"]].sort_values("importance", ascending=False).head())

Source code in target_permutation_importances/functional.py

@beartype
def compute(
    model_cls: Any,
    model_cls_params: Dict,
    model_fit_params: Union[ModelFitParamsBuilderType, Dict],
    X: XType,
    y: YType,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
    shuffle_feature_order: bool = False,
    permutation_importance_calculator: Union[
        PermutationImportanceCalculatorType, List[PermutationImportanceCalculatorType]
    ] = compute_permutation_importance_by_subtraction,
) -> Union[pd.DataFrame, List[pd.DataFrame]]:
    """
    Compute the permutation importance of a model given a dataset.

    Args:
        model_cls: The constructor/class of the model.
        model_cls_params: The parameters to pass to the model constructor.
        model_fit_params: A Dict or A function that return parameters to pass to the model fit method.
        X: The input data.
        y: The target vector.
        num_actual_runs: Number of actual runs. Defaults to 2.
        num_random_runs: Number of random runs. Defaults to 10.
        shuffle_feature_order: Whether to shuffle the feature order for each run (only for X being pd.DataFrame). Defaults to False.
        permutation_importance_calculator: The function to compute the final importance. Defaults to compute_permutation_importance_by_subtraction.

    Returns:
        The return DataFrame(s) contain columns ["feature", "importance"]

    Example:
        ```python
        # import the package
        import target_permutation_importances as tpi

        # Prepare a dataset
        import pandas as pd
        from sklearn.datasets import load_breast_cancer

        # Models
        from sklearn.ensemble import RandomForestClassifier

        data = load_breast_cancer()

        # Convert to a pandas dataframe
        Xpd = pd.DataFrame(data.data, columns=data.feature_names)

        # Compute permutation importances with default settings
        result_df = tpi.compute(
            model_cls=RandomForestClassifier, # The constructor/class of the model.
            model_cls_params={ # The parameters to pass to the model constructor. Update this based on your needs.
                "n_jobs": -1,
            },
            model_fit_params={}, # The parameters to pass to the model fit method. Update this based on your needs.
            X=Xpd, # pd.DataFrame, np.ndarray
            y=data.target, # pd.Series, np.ndarray
            num_actual_runs=2,
            num_random_runs=10,
            # Options: {compute_permutation_importance_by_subtraction, compute_permutation_importance_by_division}
            # Or use your own function to calculate.
            permutation_importance_calculator=tpi.compute_permutation_importance_by_subtraction,
        )

        print(result_df[["feature", "importance"]].sort_values("importance", ascending=False).head())
        ```
    """

    def _x_builder(is_random_run: bool, run_idx: int) -> XType:
        if shuffle_feature_order:
            if isinstance(X, pd.DataFrame):
                # Shuffle the columns
                rng = np.random.default_rng(seed=run_idx)
                shuffled_columns = rng.permutation(X.columns)
                return X[shuffled_columns]
            raise NotImplementedError(  # pragma: no cover
                "Only support pd.DataFrame when shuffle_feature_order=True"
            )
        return X

    def _y_builder(is_random_run: bool, run_idx: int) -> YType:
        rng = np.random.default_rng(seed=run_idx)
        if is_random_run:
            # Only shuffle the target for random runs
            return rng.permutation(y)
        return y

    def _model_builder(is_random_run: bool, run_idx: int) -> Any:
        # Model random state should be different for each run for both
        # actual and random runs
        _model_cls_params = model_cls_params.copy()
        if "MultiOutput" not in model_cls.__name__:
            _model_cls_params["random_state"] = run_idx
        else:
            _model_cls_params["estimator"].random_state = run_idx

        return model_cls(**_model_cls_params)

    def _model_fitter(model: Any, X: XType, y: YType) -> Any:
        if isinstance(model_fit_params, dict):  # pragma: no cover
            _model_fit_params = model_fit_params.copy()
        else:
            # Assume it is a function
            _model_fit_params = model_fit_params(
                list(X.columns) if isinstance(X, pd.DataFrame) else None,
            )
        if "Cat" in str(model.__class__):
            _model_fit_params["verbose"] = False
        return model.fit(X, y, **_model_fit_params)

    def _importance_getter(model: Any, X: XType, y: YType) -> pd.DataFrame:
        feature_names_attr = _get_feature_names_attr(model)
        is_pd = isinstance(X, pd.DataFrame)

        if "MultiOutput" not in str(model.__class__):
            if is_pd:
                features = getattr(model, feature_names_attr)
            else:
                features = list(range(0, X.shape[1]))

            model_importances_attr = _get_model_importances_attr(model)
            importances = np.abs(getattr(model, model_importances_attr))
            if len(importances.shape) > 1:
                importances = importances.mean(axis=0)
            return pd.DataFrame(
                {
                    "feature": features,
                    "importance": importances,
                }
            )

        features = []
        feature_importances = np.zeros(X.shape[1])
        for est in model.estimators_:
            if is_pd:
                feature_names_attr = _get_feature_names_attr(est)
                features = getattr(est, feature_names_attr)
            else:
                features = list(range(0, X.shape[1]))

            model_importances_attr = _get_model_importances_attr(est)
            importances = np.abs(getattr(est, model_importances_attr))
            if len(importances.shape) > 1:  # pragma: no cover
                importances = importances.mean(axis=0)
            feature_importances += importances
        return pd.DataFrame(
            {
                "feature": features,
                "importance": feature_importances / len(model.estimators_),
            }
        )

    return generic_compute(
        model_builder=_model_builder,
        model_fitter=_model_fitter,
        importance_getter=_importance_getter,
        permutation_importance_calculator=permutation_importance_calculator,
        X_builder=_x_builder,
        y_builder=_y_builder,
        num_actual_runs=num_actual_runs,
        num_random_runs=num_random_runs,
    )

generic_compute

generic_compute(
    model_builder: ModelBuilderType,
    model_fitter: ModelFitterType,
    importance_getter: ModelImportanceGetter,
    permutation_importance_calculator: Union[
        PermutationImportanceCalculatorType,
        List[PermutationImportanceCalculatorType],
    ],
    X_builder: XBuilderType,
    y_builder: YBuilderType,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
) -> Union[pd.DataFrame, List[pd.DataFrame]]

The generic compute function allows customization of the computation. It is used by the compute function.

Parameters:

Name	Type	Description	Default
`model_builder`	`ModelBuilderType`	A function that return a model.	required
`model_fitter`	`ModelFitterType`	A function that fit a model.	required
`importance_getter`	`ModelImportanceGetter`	A function that compute the importance of a model.	required
`permutation_importance_calculator`	`Union[PermutationImportanceCalculatorType, List[PermutationImportanceCalculatorType]]`	A function or list of functions that compute the final permutation importance.	required
`X_builder`	`XBuilderType`	A function that return the X data.	required
`y_builder`	`YBuilderType`	A function that return the y data.	required
`num_actual_runs`	`PositiveInt`	Number of actual runs. Defaults to 2.	`2`
`num_random_runs`	`PositiveInt`	Number of random runs. Defaults to 10.	`10`

Returns:

Type	Description
`Union[DataFrame, List[DataFrame]]`	The return DataFrame(s) contain columns ["feature", "importance"]

Source code in target_permutation_importances/functional.py

@beartype
def generic_compute(
    model_builder: ModelBuilderType,
    model_fitter: ModelFitterType,
    importance_getter: ModelImportanceGetter,
    permutation_importance_calculator: Union[
        PermutationImportanceCalculatorType, List[PermutationImportanceCalculatorType]
    ],
    X_builder: XBuilderType,
    y_builder: YBuilderType,
    num_actual_runs: PositiveInt = 2,
    num_random_runs: PositiveInt = 10,
) -> Union[pd.DataFrame, List[pd.DataFrame]]:
    """
    The generic compute function allows customization of the computation. It is used by the `compute` function.

    Args:
        model_builder (ModelBuilderType): A function that return a model.
        model_fitter (ModelFitterType): A function that fit a model.
        importance_getter (ModelImportanceGetter): A function that compute the importance of a model.
        permutation_importance_calculator (Union[ PermutationImportanceCalculatorType, List[PermutationImportanceCalculatorType] ]):
            A function or list of functions that compute the final permutation importance.
        X_builder (XBuilderType): A function that return the X data.
        y_builder (YBuilderType): A function that return the y data.
        num_actual_runs (PositiveInt, optional): Number of actual runs. Defaults to 2.
        num_random_runs (PositiveInt, optional): Number of random runs. Defaults to 10.

    Returns:
        The return DataFrame(s) contain columns ["feature", "importance"]
    """
    run_params = {
        "model_builder": model_builder,
        "model_fitter": model_fitter,
        "importance_getter": importance_getter,
        "X_builder": X_builder,
        "y_builder": y_builder,
    }
    partial_compute_one_run = partial(_compute_one_run, **run_params)
    # Run the base runs
    print(f"Running {num_actual_runs} actual runs and {num_random_runs} random runs")
    actual_importance_dfs = []
    for run_idx in tqdm(range(num_actual_runs)):
        actual_importance_dfs.append(
            partial_compute_one_run(
                is_random_run=False,
                run_idx=run_idx,
            )
        )

    # Run the random runs
    random_importance_dfs = []
    for run_idx in tqdm(range(num_random_runs)):
        random_importance_dfs.append(
            partial_compute_one_run(
                is_random_run=True,
                run_idx=run_idx,
            )
        )

    # Calculate the permutation importance
    if isinstance(permutation_importance_calculator, list):
        return [
            calc(actual_importance_dfs, random_importance_dfs)
            for calc in permutation_importance_calculator
        ]
    return permutation_importance_calculator(
        actual_importance_dfs, random_importance_dfs
    )

Type Definitions

XType `module-attribute`

XType = Union[np.ndarray, pd.DataFrame]

YType `module-attribute`

YType = Union[np.ndarray, pd.Series]

XBuilderType

A function/callable that return X data. This function is called once per run (actual and random)

Parameters:

Name	Type	Description	Default
`is_random_run`	`bool`	Indicate if this is a random run	required
`run_idx`	`int`	The run index	required

Returns: return (XType): The X data

YBuilderType

A function/callable that return Y data. This function is called once per run (actual and random)

Parameters:

Name	Type	Description	Default
`is_random_run`	`bool`	Indicate if this is a random run	required
`run_idx`	`int`	The run index	required

Returns: return (YType): The y data

ModelBuilderType

A function/callable that return a newly created model. This function is called once per run (actual and random)

Parameters:

Name	Type	Description	Default
`is_random_run`	`bool`	Indicate if this is a random run	required
`run_idx`	`int`	The run index	required

Returns: return (Any): The newly created model

ModelFitterType

A function/callable that fit a model. This function is called once per run (actual and random)

Parameters:

Name	Type	Description	Default
`model`	`Any`	The model to fit	required
`X`	`XType`	The X data	required
`y`	`YType`	The y data	required

Returns: return (Any): The fitted model

ModelImportanceGetter

A function/callable computes the feature importances of a fitted model. This function is called once per run (actual and random)

Parameters:

Name	Type	Description	Default
`model`	`Any`	The fitted model	required
`X`	`XType`	The X data	required
`y`	`YType`	The y data	required

Returns: return (pd.DataFrame): The return DataFrame with columns ["feature", "importance"]

PermutationImportanceCalculatorType

A function/callable that takes in a list of actual importance DataFrames and a list of random importance s and returns a single DataFrame

Parameters:

Name	Type	Description	Default
`actual_importance_dfs`	`List[DataFrame]`	list of actual importance DataFrames with columns ["feature", "importance"]	required
`random_importance_dfs`	`List[DataFrame]`	list of random importance DataFrames with columns ["feature", "importance"]	required

Returns:

Name	Type	Description
`return`	`DataFrame`	The return DataFrame with columns ["feature", "importance"]

API Reference

Class APIs

TargetPermutationImportancesWrapper

fit

Functional APIs

compute

generic_compute

Type Definitions

XType module-attribute

YType module-attribute

XBuilderType

YBuilderType

ModelBuilderType

ModelFitterType

ModelImportanceGetter

PermutationImportanceCalculatorType

XType `module-attribute`

YType `module-attribute`