Algorithm Adaptation: Preserve Temporal Dependency for Sklong Estimators¶

Dataset Used in Tutorials

Use the shared synthetic dataset defined in the tutorials overview. Generate it once there and reuse it here.

Algorithm-adaptation workflows keep temporal structure intact. This walkthrough uses LexicoDecisionTreeClassifier, which prioritises recent waves while respecting the full sequence.

The animation below gives the intuition behind the split rule: when several candidate waves of the same attribute yield near-identical information gains (within threshold_gain), the lexicographic tree picks the most recent one rather than the classical "largest gain wins" tie-break.

Lexicographic split: recency breaks ties — Click the image to expand it.

Step 1: Load and prepare data¶

from scikit_longitudinal.data_preparation import LongitudinalDataset

dataset = LongitudinalDataset('./extended_stroke_longitudinal.csv')
dataset.load_data_target_train_test_split(target_column='stroke_w2', test_size=0.2, random_state=42)
dataset.setup_features_group([[2,3], [4,5], [6,7], [8,9], [10,11], [12,13]])

Step 2: Initialize and fit the estimator¶

from scikit_longitudinal.estimators.trees import LexicoDecisionTreeClassifier

clf = LexicoDecisionTreeClassifier(
 features_group=dataset.feature_groups(),
 threshold_gain=0.01,
 random_state=42
)

clf.fit(dataset.X_train, dataset.y_train)

Step 3: Predict and evaluate¶

y_pred = clf.predict(dataset.X_test)
print(y_pred) # Example output

from sklearn.metrics import accuracy_score
print(f"Accuracy: {accuracy_score(dataset.y_test, y_pred)}")

This introduces basic estimator usage. Experiment with hyperparameters like threshold_gain.

Explore more longitudinal-aware estimators

Review the estimator catalog and parameters in the API reference for additional options.