Skip to content

Longitudinal Pipeline

LongitudinalPipeline

Bases: Pipeline

Machine Learning-based Longitudinal Pipeline for handling and processing longitudinal techniques (preprocessors, classifier, etc.).

The LongitudinalPipeline extends scikit-learn's Pipeline to provide specialised methods and attributes for working with longitudinal data. It ensures that the structure of longitudinal features is updated and maintained throughout the pipeline's transformations, making it ideal for longitudinal classification tasks.

Extension of scikit-learn's Pipeline

While maintaining the interface of scikit-learn's Pipeline, this class includes additional validations and methods to ensure the correct processing of longitudinal data. It integrates seamlessly with scikit-learn's ecosystem, allowing for the use of standard transformers and estimators as well.

No need to keep it Sklong only, you can use any scikit-learn compatible transformer or estimator.

Parameters:

Name Type Description Default
steps List[Tuple[str, Any]]

List of (name, transform) tuples that are chained in the order they are provided. The last object should be an estimator.

required
features_group List[List[int]]

A temporal matrix where each sublist contains indices of a longitudinal attribute's waves.

required
non_longitudinal_features List[Union[int, str]]

List of indices or names of non-longitudinal features. Defaults to None.

None
update_feature_groups_callback Union[Callable, str]

Callback function to update feature groups during transformations. Can be a string for built-in callbacks or a custom function. Defaults to None.

None
feature_list_names List[str]

List of feature names corresponding to the dataset columns. Defaults to None.

None

Attributes:

Name Type Description
_longitudinal_data ndarray

The longitudinal data being processed.

selected_feature_indices_ ndarray

Indices of the selected features after transformations.

final_estimator Any

The final estimator in the pipeline.

What is all about with Custom Callback Function?

The update_feature_groups_callback parameter allows users to customise how feature groups and non-longitudinal features are updated after each transformation in the pipeline. This is crucial for maintaining the temporal structure of longitudinal data as it passes through various preprocessing steps.

What should I put when I am not sure — What'ss the default? literally, "default". We cover it up for you, but you can also define your own logic to handle specific cases or transformations that may alter the structure of the data. This flexibility is particularly useful when dealing with complex datasets or when using custom transformers that may not conform to the standard behaviour expected by the pipeline.

In a nutshell:

  • Dynamic Updates: The callback ensures that features_group and non_longitudinal_features are updated after each transformation, preserving the temporal relationships in the data.
  • Flexibility: It provides a mechanism for users to inject custom logic tailored to their specific dataset or preprocessing needs.
Custom Implementation
  • Users can define their own callback function to handle specialised requirements. The function must follow this signature:

    def callback(
        step_idx: int,
        dummy_longitudinal_dataset: LongitudinalDataset,
        y: Union[pd.Series, np.ndarray],
        name: str,
        transformer: TransformerMixin
    ) -> Tuple[np.ndarray, List[List[int]], List[Union[int, str]], List[str]]:
        # Custom logic here
        pass
    

  • Parameters:

    • step_idx: The index of the current step in the pipeline.
    • dummy_longitudinal_dataset: A LongitudinalDataset instance representing the current state of the data.
    • y: The target variable (if provided).
    • name: The name of the current transformer.
    • transformer: The transformer being applied at this step.
  • Returns: A tuple containing:

    • Updated longitudinal data (np.ndarray).
    • Updated feature groups (List[List[int]]).
    • Updated non-longitudinal features (List[Union[int, str]]).
    • Updated feature names (List[str]).
Usage Example
  • You can pass a custom function or even a lambda for quick adjustments:
def custom_callback(step_idx, dataset, y, name, transformer):
    updated_data = transformer.transform(dataset.data)
    updated_groups = dataset.feature_groups()  # Custom logic can modify this
    updated_non_long = dataset.non_longitudinal_features()
    updated_names = dataset.data.columns.tolist()
    return updated_data, updated_groups, updated_non_long, updated_names

pipeline = LongitudinalPipeline(
    steps=[('transformer', SomeTransformer()), ('classifier', SomeClassifier())],
    features_group=[[0, 1, 2], [3, 4, 5]],
    update_feature_groups_callback=custom_callback
)
  • Or use a lambda for simplicity:
pipeline = LongitudinalPipeline(
    steps=[...],
    features_group=[...],
    update_feature_groups_callback=lambda step_idx, dataset, y, name, transformer: (
        transformer.transform(dataset.data),
        dataset.feature_groups(),
        dataset.non_longitudinal_features(),
        dataset.data.columns.tolist()
    )
)

Examples:

Below are examples demonstrating the usage of the LongitudinalPipeline class.

Basic Usage

from scikit_longitudinal.pipeline import LongitudinalPipeline
from scikit_longitudinal.data_preparation import LongitudinalDataset
from scikit_longitudinal.estimators.trees import LexicoDecisionTreeClassifier
from scikit_longitudinal.data_preparation import LongitudinalDataset
from scikit_longitudinal.data_preparation import MerWavTimePlus

# Load dataset
dataset = LongitudinalDataset('./stroke_longitudinal.csv')
dataset.load_data()
dataset.load_target(target_column="stroke_w2")
dataset.setup_features_group("elsa")
dataset.load_train_test_split(test_size=0.2, random_state=42)

# Define pipeline steps with LexicoDecisionTreeClassifier
steps = [
    ('MerWavTime Plus', MerWavTimePlus()), # Recall, a pipeline is at least two steps and the first one being a Data Transformation step. Here as we use a Longitudinal classifier, we need to use MerWavTimePlus, retaining the temporal dependency.
    # Feel free to add more steps like a feature selection step.
    ('classifier', LexicoDecisionTreeClassifier(features_group=dataset.feature_groups()))
]

# Note if you would like to do a pipeline of non-longitudinal classifier like RandomForestClassifier,
# rather than LexicoRandomForestClassifier, you can always use `Sklearn` pipeline directly, as follows:
# from sklearn.ensemble import RandomForestClassifier
# steps = [
#     ('AggrFunc', AggrFunc()),
#     ('classifier', RandomForestClassifier())
# ]

# Initialize pipeline
pipeline = LongitudinalPipeline(
    steps=steps,
    features_group=dataset.feature_groups(),
    non_longitudinal_features=dataset.non_longitudinal_features(),
    feature_list_names=dataset.data.columns.tolist(),
    update_feature_groups_callback="default"
)

# Fit and predict
pipeline.fit(dataset.X_train, dataset.y_train)
y_pred = pipeline.predict(dataset.X_test)
print(f"Predictions: {y_pred}")

Advanced: custom callback

from scikit_longitudinal.pipeline import LongitudinalPipeline

# Define a custom callback function
def custom_callback(step_idx, dataset, y, name, transformer):
    # Custom logic to update feature groups
    updated_data = transformer.transform(dataset.data)
    updated_groups = dataset.feature_groups()
    updated_non_long = dataset.non_longitudinal_features()
    updated_names = dataset.data.columns.tolist()
    return updated_data, updated_groups, updated_non_long, updated_names

# Initialize pipeline with custom callback
pipeline = LongitudinalPipeline(
    steps=[...],
    features_group=[...],
    update_feature_groups_callback=custom_callback
)
Source code in scikit_longitudinal/pipeline.py
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
class LongitudinalPipeline(Pipeline):
    """Machine Learning-based Longitudinal Pipeline for handling and processing longitudinal techniques (preprocessors, classifier, etc.).

    The `LongitudinalPipeline` extends scikit-learn's `Pipeline` to provide specialised methods and attributes for working
    with longitudinal data. It ensures that the structure of longitudinal features is updated and maintained throughout
    the pipeline's transformations, making it ideal for longitudinal classification tasks.

    !!! note "Extension of scikit-learn's Pipeline"
        While maintaining the interface of scikit-learn's `Pipeline`, this class includes additional validations and
        methods to ensure the correct processing of longitudinal data. It integrates seamlessly with scikit-learn's
        ecosystem, allowing for the use of standard transformers and estimators as well.

        No need to keep it `Sklong` only, you can use any scikit-learn compatible transformer or estimator.

    Args:
        steps (List[Tuple[str, Any]]): List of (name, transform) tuples that are chained in the order they are provided.
            The last object should be an estimator.
        features_group (List[List[int]]): A temporal matrix where each sublist contains indices of a longitudinal
            attribute's waves.
        non_longitudinal_features (List[Union[int, str]], optional): List of indices or names of non-longitudinal
            features. Defaults to None.
        update_feature_groups_callback (Union[Callable, str], optional): Callback function to update feature groups
            during transformations. Can be a string for built-in callbacks or a custom function. Defaults to None.
        feature_list_names (List[str], optional): List of feature names corresponding to the dataset columns. Defaults
            to None.

    Attributes:
        _longitudinal_data (np.ndarray): The longitudinal data being processed.
        selected_feature_indices_ (np.ndarray): Indices of the selected features after transformations.
        final_estimator (Any): The final estimator in the pipeline.


    ??? question "What is all about with Custom Callback Function?"
        The `update_feature_groups_callback` parameter allows users to customise how feature groups and non-longitudinal
        features are updated after each transformation in the pipeline. This is crucial for maintaining the temporal
        structure of longitudinal data as it passes through various preprocessing steps.

        What should I put when I am not sure — What'ss the default? literally, `"default"`. We cover it up for you, but you can also define your own logic to handle specific cases or
        transformations that may alter the structure of the data. This flexibility is particularly useful when dealing
        with complex datasets or when using custom transformers that may not conform to the standard behaviour expected
        by the pipeline.

        In a nutshell:

        - [x] **Dynamic Updates**: The callback ensures that `features_group` and `non_longitudinal_features` are updated after
          each transformation, preserving the temporal relationships in the data.
        - [x] **Flexibility**: It provides a mechanism for users to inject custom logic tailored to their specific dataset or
          preprocessing needs.

        #### Custom Implementation
        - Users can define their own callback function to handle specialised requirements. The function must follow this signature:
            ```python
            def callback(
                step_idx: int,
                dummy_longitudinal_dataset: LongitudinalDataset,
                y: Union[pd.Series, np.ndarray],
                name: str,
                transformer: TransformerMixin
            ) -> Tuple[np.ndarray, List[List[int]], List[Union[int, str]], List[str]]:
                # Custom logic here
                pass
            ```

        - **Parameters**:

              - `step_idx`: The index of the current step in the pipeline.
              - `dummy_longitudinal_dataset`: A `LongitudinalDataset` instance representing the current state of the data.
              - `y`: The target variable (if provided).
              - `name`: The name of the current transformer.
              - `transformer`: The transformer being applied at this step.

        - **Returns**: A tuple containing:

              - Updated longitudinal data (`np.ndarray`).
              - Updated feature groups (`List[List[int]]`).
              - Updated non-longitudinal features (`List[Union[int, str]]`).
              - Updated feature names (`List[str]`).

        #### Usage Example

        - You can pass a custom function or even a lambda for quick adjustments:

          ```python
          def custom_callback(step_idx, dataset, y, name, transformer):
              updated_data = transformer.transform(dataset.data)
              updated_groups = dataset.feature_groups()  # Custom logic can modify this
              updated_non_long = dataset.non_longitudinal_features()
              updated_names = dataset.data.columns.tolist()
              return updated_data, updated_groups, updated_non_long, updated_names

          pipeline = LongitudinalPipeline(
              steps=[('transformer', SomeTransformer()), ('classifier', SomeClassifier())],
              features_group=[[0, 1, 2], [3, 4, 5]],
              update_feature_groups_callback=custom_callback
          )
          ```

        - Or use a lambda for simplicity:

          ```python
          pipeline = LongitudinalPipeline(
              steps=[...],
              features_group=[...],
              update_feature_groups_callback=lambda step_idx, dataset, y, name, transformer: (
                  transformer.transform(dataset.data),
                  dataset.feature_groups(),
                  dataset.non_longitudinal_features(),
                  dataset.data.columns.tolist()
              )
          )
          ```


    Examples:
        Below are examples demonstrating the usage of the `LongitudinalPipeline` class.

        !!! example "Basic Usage"
            ```python
            from scikit_longitudinal.pipeline import LongitudinalPipeline
            from scikit_longitudinal.data_preparation import LongitudinalDataset
            from scikit_longitudinal.estimators.trees import LexicoDecisionTreeClassifier
            from scikit_longitudinal.data_preparation import LongitudinalDataset
            from scikit_longitudinal.data_preparation import MerWavTimePlus

            # Load dataset
            dataset = LongitudinalDataset('./stroke_longitudinal.csv')
            dataset.load_data()
            dataset.load_target(target_column="stroke_w2")
            dataset.setup_features_group("elsa")
            dataset.load_train_test_split(test_size=0.2, random_state=42)

            # Define pipeline steps with LexicoDecisionTreeClassifier
            steps = [
                ('MerWavTime Plus', MerWavTimePlus()), # Recall, a pipeline is at least two steps and the first one being a Data Transformation step. Here as we use a Longitudinal classifier, we need to use MerWavTimePlus, retaining the temporal dependency.
                # Feel free to add more steps like a feature selection step.
                ('classifier', LexicoDecisionTreeClassifier(features_group=dataset.feature_groups()))
            ]

            # Note if you would like to do a pipeline of non-longitudinal classifier like RandomForestClassifier,
            # rather than LexicoRandomForestClassifier, you can always use `Sklearn` pipeline directly, as follows:
            # from sklearn.ensemble import RandomForestClassifier
            # steps = [
            #     ('AggrFunc', AggrFunc()),
            #     ('classifier', RandomForestClassifier())
            # ]

            # Initialize pipeline
            pipeline = LongitudinalPipeline(
                steps=steps,
                features_group=dataset.feature_groups(),
                non_longitudinal_features=dataset.non_longitudinal_features(),
                feature_list_names=dataset.data.columns.tolist(),
                update_feature_groups_callback="default"
            )

            # Fit and predict
            pipeline.fit(dataset.X_train, dataset.y_train)
            y_pred = pipeline.predict(dataset.X_test)
            print(f"Predictions: {y_pred}")
            ```

        !!! example "Advanced: custom callback"
            ```python
            from scikit_longitudinal.pipeline import LongitudinalPipeline

            # Define a custom callback function
            def custom_callback(step_idx, dataset, y, name, transformer):
                # Custom logic to update feature groups
                updated_data = transformer.transform(dataset.data)
                updated_groups = dataset.feature_groups()
                updated_non_long = dataset.non_longitudinal_features()
                updated_names = dataset.data.columns.tolist()
                return updated_data, updated_groups, updated_non_long, updated_names

            # Initialize pipeline with custom callback
            pipeline = LongitudinalPipeline(
                steps=[...],
                features_group=[...],
                update_feature_groups_callback=custom_callback
            )
            ```
    """

    def __init__(
        self,
        steps: List[Tuple[str, Any]],
        features_group: List[List[int]],
        non_longitudinal_features: List[Union[int, str]] = None,
        update_feature_groups_callback: Union[Callable, str] = None,
        feature_list_names: List[str] = None,
    ) -> None:
        super().__init__(steps=steps)
        self._longitudinal_data: np.ndarray = np.array([])
        self.features_group: List[List[int]] = features_group
        self.non_longitudinal_features: List[Union[int, str]] = (
            non_longitudinal_features or []
        )
        self.feature_list_names: List[str] = feature_list_names
        self.selected_feature_indices_: np.ndarray = np.array([])
        self.final_estimator = self.steps[-1][1]

        if update_feature_groups_callback is not None:
            self.update_feature_groups_callback = (
                validate_update_feature_groups_callback(update_feature_groups_callback)
            )
        else:
            self.update_feature_groups_callback = "default"

    @handle_errors
    @validate_input
    def fit(
        self,
        X: np.ndarray,
        y: Optional[Union[pd.Series, np.ndarray]] = None,
        **fit_params: Dict[str, Any],
    ) -> "LongitudinalPipeline":
        """Fit the transformers and the final estimator in the pipeline.

        This method iterates through each transformer in the pipeline, configuring and fitting them while updating the
        longitudinal data and feature groups. The final estimator is then fitted using the transformed data.

        Args:
            X (np.ndarray): Input data.
            y (Optional[Union[pd.Series, np.ndarray]]): Target variable.
            **fit_params (Dict[str, Any]): Additional fitting parameters.

        Returns:
            LongitudinalPipeline: The fitted pipeline.
        """
        self._longitudinal_data = X.copy()
        self.selected_feature_indices_ = np.arange(X.shape[1])

        if y is not None:
            y = y.copy()

        filtered_steps = [
            (name, transformer)
            for name, transformer in self.steps[:-1]
            if transformer is not None
        ]

        def is_sep_wav(transformer):
            return isinstance(transformer, SepWav)

        if any(is_sep_wav(transformer) for _, transformer in filtered_steps):
            filtered_steps = [
                (name, transformer)
                for name, transformer in filtered_steps
                if not is_sep_wav(transformer)
            ]
            sep_wav_transformers = [
                (name, transformer)
                for name, transformer in self.steps[:-1]
                if is_sep_wav(transformer)
            ]
            filtered_steps.extend(sep_wav_transformers)

        for step_idx, (name, transformer) in enumerate(filtered_steps):
            (
                transformer,
                self._longitudinal_data,
                y,
                self.selected_feature_indices_,
                self.feature_list_names,
            ) = configure_and_fit_transformer(
                transformer,
                name,
                self._longitudinal_data,
                y,
                fit_params,
                self.selected_feature_indices_,
                self.feature_list_names,
                self.features_group,
                self.non_longitudinal_features,
                self.update_feature_groups_callback,
            )
            (
                self._longitudinal_data,
                self.features_group,
                self.non_longitudinal_features,
                self.feature_list_names,
            ) = self._update_longitudinal_data_callback(name, step_idx, transformer, y)

        self.steps = filtered_steps + [self.steps[-1]]
        if self._final_estimator is not None:
            self._final_estimator = handle_final_estimator(
                self._final_estimator,
                self.steps,
                self.features_group,
                self.non_longitudinal_features,
                self.feature_list_names,
                self._longitudinal_data,
                y,
                fit_params,
            )

        return self

    def _update_longitudinal_data_callback(
        self,
        name: str,
        step_idx: int,
        transformer: TransformerMixin,
        y: Optional[Union[pd.Series, np.ndarray]],
    ) -> Tuple[np.ndarray, List[List[int]], List[Union[int, str]], List[str]]:
        df = pd.DataFrame(self._longitudinal_data, columns=self.feature_list_names)

        dummy_longitudinal_dataset = LongitudinalDataset(file_path=None, data_frame=df)
        dummy_longitudinal_dataset._feature_groups = (
            self.features_group
        )  # pylint: disable=W0212
        dummy_longitudinal_dataset._non_longitudinal_features = (
            self.non_longitudinal_features
        )

        (
            updated_longitudinal_data,
            updated_features_group,
            updated_non_longitudinal_features,
            updated_feature_list_names,
        ) = self.update_feature_groups_callback(
            step_idx, dummy_longitudinal_dataset, y, name, transformer
        )

        return (
            updated_longitudinal_data,
            updated_features_group,
            updated_non_longitudinal_features,
            updated_feature_list_names,
        )

    @validate_input
    def predict(self, X: np.ndarray, **predict_params: Dict[str, Any]) -> np.ndarray:
        """Predict target values using the final estimator.

        Applies the selected feature indices to the input data and uses the final estimator to make predictions.

        Args:
            X (np.ndarray): Input data.
            **predict_params (Dict[str, Any]): Additional prediction parameters.

        Returns:
            np.ndarray: Predicted values.

        Raises:
            NotImplementedError: If the final estimator does not implement `predict`.
        """
        X = X[:, self.selected_feature_indices_]

        if hasattr(self._final_estimator, "predict"):
            return self._final_estimator.predict(X, **predict_params)
        raise NotImplementedError(
            f"predict is not implemented for this estimator: {type(self._final_estimator)}"
        )

    @available_if(_final_estimator_has("predict_proba"))
    @validate_input
    def predict_proba(
        self, X: np.ndarray, **predict_params: Dict[str, Any]
    ) -> np.ndarray:
        """Predict class probabilities using the final estimator.

        Applies the selected feature indices to the input data and uses the final estimator to predict probabilities.

        Args:
            X (np.ndarray): Input data.
            **predict_params (Dict[str, Any]): Additional prediction parameters.

        Returns:
            np.ndarray: Predicted probabilities.

        Raises:
            NotImplementedError: If the final estimator does not implement `predict_proba`.
        """
        X = X[:, self.selected_feature_indices_]

        if hasattr(self._final_estimator, "predict_proba"):
            return self._final_estimator.predict_proba(X, **predict_params)
        raise NotImplementedError(
            f"predict_proba is not implemented for this estimator: {type(self._final_estimator)}"
        )

    @available_if(_final_estimator_has("transform"))
    @validate_input
    def transform(
        self, X: np.ndarray, **transform_params: Dict[str, Any]
    ) -> np.ndarray:
        """Transform the input data using the final estimator.

        Applies the selected feature indices and transforms the data using the final estimator's `transform` method.

        Args:
            X (np.ndarray): Input data.
            **transform_params (Dict[str, Any]): Additional transformation parameters.

        Returns:
            np.ndarray: Transformed data.
        """
        if (
            self.selected_feature_indices_ is None
            or len(self.selected_feature_indices_) == 0
        ):
            print("No feature selection was performed. Returning the original data.")
            return X
        X = X[:, self.selected_feature_indices_]
        return self._final_estimator.transform(X, **transform_params)

    @available_if(_final_estimator_has("decision_function"))
    @validate_input
    def decision_function(self, X, **params):
        """Compute the decision function of the final estimator.

        Applies the selected feature indices and computes the decision function using the final estimator's `decision_function` method.

        Args:
            X (np.ndarray): Input data.
            **params (Dict[str, Any]): Additional parameters for the decision function.

        Returns:
            np.ndarray: Decision function values.
        """
        X = X[:, self.selected_feature_indices_]
        return self._final_estimator.decision_function(X, **params)

    @available_if(_final_estimator_has("score"))
    @validate_input
    def score(
        self,
        X: np.ndarray,
        y: Union[pd.Series, np.ndarray],
        **score_params: Dict[str, Any],
    ) -> float:
        """Compute the score of the final estimator.

        Applies the selected feature indices and computes the score using the final estimator's `score` method.

        Args:
            X (np.ndarray): Input data.
            y (Union[pd.Series, np.ndarray]): True target values.
            **score_params (Dict[str, Any]): Additional scoring parameters.

        Returns:
            float: Computed score.
        """
        X = X[:, self.selected_feature_indices_]
        return self._final_estimator.score(X, y, **score_params)

    @property
    def _final_estimator(self):
        return self.final_estimator

    @_final_estimator.setter
    def _final_estimator(self, value):
        self.final_estimator = value

fit(X, y=None, **fit_params)

Fit the transformers and the final estimator in the pipeline.

This method iterates through each transformer in the pipeline, configuring and fitting them while updating the longitudinal data and feature groups. The final estimator is then fitted using the transformed data.

Parameters:

Name Type Description Default
X ndarray

Input data.

required
y Optional[Union[Series, ndarray]]

Target variable.

None
**fit_params Dict[str, Any]

Additional fitting parameters.

{}

Returns:

Name Type Description
LongitudinalPipeline LongitudinalPipeline

The fitted pipeline.

Source code in scikit_longitudinal/pipeline.py
@handle_errors
@validate_input
def fit(
    self,
    X: np.ndarray,
    y: Optional[Union[pd.Series, np.ndarray]] = None,
    **fit_params: Dict[str, Any],
) -> "LongitudinalPipeline":
    """Fit the transformers and the final estimator in the pipeline.

    This method iterates through each transformer in the pipeline, configuring and fitting them while updating the
    longitudinal data and feature groups. The final estimator is then fitted using the transformed data.

    Args:
        X (np.ndarray): Input data.
        y (Optional[Union[pd.Series, np.ndarray]]): Target variable.
        **fit_params (Dict[str, Any]): Additional fitting parameters.

    Returns:
        LongitudinalPipeline: The fitted pipeline.
    """
    self._longitudinal_data = X.copy()
    self.selected_feature_indices_ = np.arange(X.shape[1])

    if y is not None:
        y = y.copy()

    filtered_steps = [
        (name, transformer)
        for name, transformer in self.steps[:-1]
        if transformer is not None
    ]

    def is_sep_wav(transformer):
        return isinstance(transformer, SepWav)

    if any(is_sep_wav(transformer) for _, transformer in filtered_steps):
        filtered_steps = [
            (name, transformer)
            for name, transformer in filtered_steps
            if not is_sep_wav(transformer)
        ]
        sep_wav_transformers = [
            (name, transformer)
            for name, transformer in self.steps[:-1]
            if is_sep_wav(transformer)
        ]
        filtered_steps.extend(sep_wav_transformers)

    for step_idx, (name, transformer) in enumerate(filtered_steps):
        (
            transformer,
            self._longitudinal_data,
            y,
            self.selected_feature_indices_,
            self.feature_list_names,
        ) = configure_and_fit_transformer(
            transformer,
            name,
            self._longitudinal_data,
            y,
            fit_params,
            self.selected_feature_indices_,
            self.feature_list_names,
            self.features_group,
            self.non_longitudinal_features,
            self.update_feature_groups_callback,
        )
        (
            self._longitudinal_data,
            self.features_group,
            self.non_longitudinal_features,
            self.feature_list_names,
        ) = self._update_longitudinal_data_callback(name, step_idx, transformer, y)

    self.steps = filtered_steps + [self.steps[-1]]
    if self._final_estimator is not None:
        self._final_estimator = handle_final_estimator(
            self._final_estimator,
            self.steps,
            self.features_group,
            self.non_longitudinal_features,
            self.feature_list_names,
            self._longitudinal_data,
            y,
            fit_params,
        )

    return self

predict(X, **predict_params)

Predict target values using the final estimator.

Applies the selected feature indices to the input data and uses the final estimator to make predictions.

Parameters:

Name Type Description Default
X ndarray

Input data.

required
**predict_params Dict[str, Any]

Additional prediction parameters.

{}

Returns:

Type Description
ndarray

np.ndarray: Predicted values.

Raises:

Type Description
NotImplementedError

If the final estimator does not implement predict.

Source code in scikit_longitudinal/pipeline.py
@validate_input
def predict(self, X: np.ndarray, **predict_params: Dict[str, Any]) -> np.ndarray:
    """Predict target values using the final estimator.

    Applies the selected feature indices to the input data and uses the final estimator to make predictions.

    Args:
        X (np.ndarray): Input data.
        **predict_params (Dict[str, Any]): Additional prediction parameters.

    Returns:
        np.ndarray: Predicted values.

    Raises:
        NotImplementedError: If the final estimator does not implement `predict`.
    """
    X = X[:, self.selected_feature_indices_]

    if hasattr(self._final_estimator, "predict"):
        return self._final_estimator.predict(X, **predict_params)
    raise NotImplementedError(
        f"predict is not implemented for this estimator: {type(self._final_estimator)}"
    )

predict_proba(X, **predict_params)

Predict class probabilities using the final estimator.

Applies the selected feature indices to the input data and uses the final estimator to predict probabilities.

Parameters:

Name Type Description Default
X ndarray

Input data.

required
**predict_params Dict[str, Any]

Additional prediction parameters.

{}

Returns:

Type Description
ndarray

np.ndarray: Predicted probabilities.

Raises:

Type Description
NotImplementedError

If the final estimator does not implement predict_proba.

Source code in scikit_longitudinal/pipeline.py
@available_if(_final_estimator_has("predict_proba"))
@validate_input
def predict_proba(
    self, X: np.ndarray, **predict_params: Dict[str, Any]
) -> np.ndarray:
    """Predict class probabilities using the final estimator.

    Applies the selected feature indices to the input data and uses the final estimator to predict probabilities.

    Args:
        X (np.ndarray): Input data.
        **predict_params (Dict[str, Any]): Additional prediction parameters.

    Returns:
        np.ndarray: Predicted probabilities.

    Raises:
        NotImplementedError: If the final estimator does not implement `predict_proba`.
    """
    X = X[:, self.selected_feature_indices_]

    if hasattr(self._final_estimator, "predict_proba"):
        return self._final_estimator.predict_proba(X, **predict_params)
    raise NotImplementedError(
        f"predict_proba is not implemented for this estimator: {type(self._final_estimator)}"
    )

transform(X, **transform_params)

Transform the input data using the final estimator.

Applies the selected feature indices and transforms the data using the final estimator's transform method.

Parameters:

Name Type Description Default
X ndarray

Input data.

required
**transform_params Dict[str, Any]

Additional transformation parameters.

{}

Returns:

Type Description
ndarray

np.ndarray: Transformed data.

Source code in scikit_longitudinal/pipeline.py
@available_if(_final_estimator_has("transform"))
@validate_input
def transform(
    self, X: np.ndarray, **transform_params: Dict[str, Any]
) -> np.ndarray:
    """Transform the input data using the final estimator.

    Applies the selected feature indices and transforms the data using the final estimator's `transform` method.

    Args:
        X (np.ndarray): Input data.
        **transform_params (Dict[str, Any]): Additional transformation parameters.

    Returns:
        np.ndarray: Transformed data.
    """
    if (
        self.selected_feature_indices_ is None
        or len(self.selected_feature_indices_) == 0
    ):
        print("No feature selection was performed. Returning the original data.")
        return X
    X = X[:, self.selected_feature_indices_]
    return self._final_estimator.transform(X, **transform_params)