Skip to content

Data Preparation: Flatten Temporal Dependency for Scikit-Learn Estimators

Dataset Used in Tutorials

Use the shared synthetic dataset defined in the tutorials overview. Generate it once there and reuse it here.

Data-preparation workflows flatten longitudinal structure so you can plug the output into standard scikit-learn estimators. Follow this step-by-step path with AggrFunc (mean aggregation) and LogisticRegression—no longitudinal-specific pipeline required.

The animation below shows the intuition: each longitudinal group (e.g. all smoke_* columns) collapses into a single static column, so the output is a plain tabular matrix ready for any scikit-learn estimator.

AggrFunc flattens each longitudinal group AggrFunc flattens each longitudinal group
Click the image to expand it.

Step 1: Load data and define temporal dependencies

from scikit_longitudinal.data_preparation import LongitudinalDataset

dataset = LongitudinalDataset('./extended_stroke_longitudinal.csv')
dataset.load_data_target_train_test_split(target_column='stroke_w2', test_size=0.2, random_state=42)
dataset.setup_features_group([[2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13]])

Step 2: Flatten with AggrFunc

from scikit_longitudinal.data_preparation import AggrFunc

aggregator = AggrFunc(
 features_group=dataset.feature_groups(),
 non_longitudinal_features=dataset.non_longitudinal_features(),
 feature_list_names=dataset.data.columns.tolist(),
 aggregation_func="mean",
)

X_train_flat = aggregator.fit_transform(dataset.X_train)
X_test_flat = aggregator.transform(dataset.X_test)

Step 3: Train and evaluate a scikit-learn estimator

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

clf = LogisticRegression(max_iter=500)
clf.fit(X_train_flat, dataset.y_train)
y_pred = clf.predict(X_test_flat)

print(classification_report(dataset.y_test, y_pred))

AggrFunc outputs a static tabular matrix, which LogisticRegression can train on using the familiar Fit—Predict API. Swap in any other standard estimator (e.g., RandomForestClassifier) once the flattening step is in place.

Explore more data-preparation options

Find additional flattening strategies and parameters in the API reference.