Skip to content

Merging Waves and Keeping Time Indices

What is the MerWavTimePlus module?

The MerWavTimePlus module transforms longitudinal data by merging all features across waves into a single set while preserving their time indices. This maintains the temporal structure, enabling longitudinal machine learning methods to leverage temporal dependencies and patterns. It provides methods for data preparation and transformation, including prepare_data and transform.

What are features_group and non_longitudinal_features?

Two key attributes, features_group and non_longitudinal_features, enable algorithms to interpret the temporal structure of longitudinal data.

  • features_group: A list of lists where each sublist contains indices of a longitudinal attribute's waves, ordered from oldest to most recent. This captures temporal dependencies.
  • non_longitudinal_features: A list of indices for static, non-temporal features excluded from the temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

See More In Temporal Dependency Guide

MerWavTimePlus

Bases: DataPreparationMixin

MerWavTimePlus stands for Merge waves while keeping time indices in longitudinal datasets.

The MerWavTimePlus class transforms longitudinal data by merging all features across waves into a single set while preserving their time indices. This maintains the temporal structure, enabling longitudinal machine learning methods to leverage temporal dependencies and patterns. See all longitudinal-data-aware machine learning estimators.

Parameters:

Name Type Description Default
features_group List[List[int]]

A temporal matrix representing the temporal dependency of a longitudinal dataset. Each sublist contains indices of a longitudinal attribute's waves. Defaults to None.

None
non_longitudinal_features List[Union[int, str]]

A list of indices or names of non-longitudinal features. Defaults to None.

None
feature_list_names List[str]

A list of feature names in the dataset. Defaults to None.

None

Attributes:

Name Type Description
features_group List[List[int]]

The temporal matrix of feature groups.

non_longitudinal_features List[Union[int, str]]

The non-longitudinal features.

feature_list_names List[str]

The feature names in the dataset.

Examples:

Below is an example using the "stroke.csv" dataset to demonstrate the MerWavTimePlus class. Please, note that "stroke.csv" is a placeholder and should be replaced with the actual path to your dataset.

Basic Usage

from scikit_longitudinal.data_preparation import LongitudinalDataset
from scikit_longitudinal.data_preparation import MerWavTimePlus

# Load dataset
dataset = LongitudinalDataset('./stroke_longitudinal.csv')
dataset.load_data()
dataset.load_target(target_column="stroke_w2")
dataset.setup_features_group("elsa")
dataset.load_train_test_split(test_size=0.2, random_state=42)

# Initialize MerWavTimePlus
mer_wav_plus = MerWavTimePlus(
    features_group=dataset.feature_groups(),
    non_longitudinal_features=dataset.non_longitudinal_features(),
    feature_list_names=dataset.data.columns.tolist()
)

# No need to apply any transformation, MerWavTimePlus takes the dataset as it is
# Meaning that it keeps the temporal dependency intact.

# Later on, primitives understand this temporal dependency via the `features_group` attribute.
Source code in scikit_longitudinal/data_preparation/merwav_time_plus.py
class MerWavTimePlus(DataPreparationMixin):
    """MerWavTimePlus stands for Merge waves while keeping time indices in longitudinal datasets.

    The `MerWavTimePlus` class transforms longitudinal data by merging all features across waves into a single set
    while preserving their time indices. This maintains the temporal structure, enabling longitudinal machine learning
    methods to leverage temporal dependencies and patterns. See all
    [longitudinal-data-aware machine learning estimators](../estimators/trees/lexico_decision_tree_classifier.md).

    Args:
        features_group (List[List[int]], optional): A temporal matrix representing the temporal dependency of a
            longitudinal dataset. Each sublist contains indices of a longitudinal attribute's waves. Defaults to None.
        non_longitudinal_features (List[Union[int, str]], optional): A list of indices or names of non-longitudinal
            features. Defaults to None.
        feature_list_names (List[str], optional): A list of feature names in the dataset. Defaults to None.

    Attributes:
        features_group (List[List[int]]): The temporal matrix of feature groups.
        non_longitudinal_features (List[Union[int, str]]): The non-longitudinal features.
        feature_list_names (List[str]): The feature names in the dataset.

    Examples:
        Below is an example using the "stroke.csv" dataset to demonstrate the `MerWavTimePlus` class.
        Please, note that "stroke.csv" is a placeholder and should be replaced with the actual path to your dataset.

        !!! example "Basic Usage"
            ```python
            from scikit_longitudinal.data_preparation import LongitudinalDataset
            from scikit_longitudinal.data_preparation import MerWavTimePlus

            # Load dataset
            dataset = LongitudinalDataset('./stroke_longitudinal.csv')
            dataset.load_data()
            dataset.load_target(target_column="stroke_w2")
            dataset.setup_features_group("elsa")
            dataset.load_train_test_split(test_size=0.2, random_state=42)

            # Initialize MerWavTimePlus
            mer_wav_plus = MerWavTimePlus(
                features_group=dataset.feature_groups(),
                non_longitudinal_features=dataset.non_longitudinal_features(),
                feature_list_names=dataset.data.columns.tolist()
            )

            # No need to apply any transformation, MerWavTimePlus takes the dataset as it is
            # Meaning that it keeps the temporal dependency intact.

            # Later on, primitives understand this temporal dependency via the `features_group` attribute.
            ```
    """

    def __init__(
        self,
        features_group: List[List[int]] = None,
        non_longitudinal_features: List[Union[int, str]] = None,
        feature_list_names: List[str] = None,
    ):
        self.features_group = features_group
        self.non_longitudinal_features = non_longitudinal_features
        self.feature_list_names = feature_list_names

    def get_params(self, deep: bool = True):  # pylint: disable=W0613
        """Get the parameters of the MerWavTimePlus instance.

        Retrieves the configuration parameters of the instance, useful for inspection or integration with scikit-learn
        pipelines.

        Args:
            deep (bool, optional): Unused parameter but kept for consistency with the scikit-learn API.

        Returns:
            dict: The parameters of the MerWavTimePlus instance.
        """
        return {}

    @override
    def _prepare_data(self, X: np.ndarray, y: np.ndarray = None) -> "MerWavTimePlus":
        """Prepare the data for transformation.

        Overridden from `DataPreparationMixin`.

        Args:
            X (np.ndarray): The input data.
            y (np.ndarray, optional): The target data, stored but not used in transformation. Defaults to None.

        Returns:
            MerWavTimePlus: The instance with prepared data.
        """
        return self

get_params(deep=True)

Get the parameters of the MerWavTimePlus instance.

Retrieves the configuration parameters of the instance, useful for inspection or integration with scikit-learn pipelines.

Parameters:

Name Type Description Default
deep bool

Unused parameter but kept for consistency with the scikit-learn API.

True

Returns:

Name Type Description
dict

The parameters of the MerWavTimePlus instance.

Source code in scikit_longitudinal/data_preparation/merwav_time_plus.py
def get_params(self, deep: bool = True):  # pylint: disable=W0613
    """Get the parameters of the MerWavTimePlus instance.

    Retrieves the configuration parameters of the instance, useful for inspection or integration with scikit-learn
    pipelines.

    Args:
        deep (bool, optional): Unused parameter but kept for consistency with the scikit-learn API.

    Returns:
        dict: The parameters of the MerWavTimePlus instance.
    """
    return {}

_prepare_data(X, y=None)

Prepare the data for transformation.

Overridden from DataPreparationMixin.

Parameters:

Name Type Description Default
X ndarray

The input data.

required
y ndarray

The target data, stored but not used in transformation. Defaults to None.

None

Returns:

Name Type Description
MerWavTimePlus MerWavTimePlus

The instance with prepared data.

Source code in scikit_longitudinal/data_preparation/merwav_time_plus.py
@override
def _prepare_data(self, X: np.ndarray, y: np.ndarray = None) -> "MerWavTimePlus":
    """Prepare the data for transformation.

    Overridden from `DataPreparationMixin`.

    Args:
        X (np.ndarray): The input data.
        y (np.ndarray, optional): The target data, stored but not used in transformation. Defaults to None.

    Returns:
        MerWavTimePlus: The instance with prepared data.
    """
    return self