Algorithm Adaptation: Preserve Temporal Dependency for Sklong Estimators¶
Dataset Used in Tutorials
Use the shared synthetic dataset defined in the tutorials overview. Generate it once there and reuse it here.
Algorithm-adaptation workflows keep temporal structure intact. This walkthrough uses LexicoDecisionTreeClassifier, which prioritises recent waves while respecting the full sequence.
The animation below gives the intuition behind the split rule: when several candidate waves of the same attribute yield near-identical information gains (within threshold_gain), the lexicographic tree picks the most recent one rather than the classical "largest gain wins" tie-break.
Step 1: Load and prepare data¶
from scikit_longitudinal.data_preparation import LongitudinalDataset
dataset = LongitudinalDataset('./extended_stroke_longitudinal.csv')
dataset.load_data_target_train_test_split(target_column='stroke_w2', test_size=0.2, random_state=42)
dataset.setup_features_group([[2,3], [4,5], [6,7], [8,9], [10,11], [12,13]])
Step 2: Initialize and fit the estimator¶
from scikit_longitudinal.estimators.trees import LexicoDecisionTreeClassifier
clf = LexicoDecisionTreeClassifier(
features_group=dataset.feature_groups(),
threshold_gain=0.01,
random_state=42
)
clf.fit(dataset.X_train, dataset.y_train)
Step 3: Predict and evaluate¶
y_pred = clf.predict(dataset.X_test)
print(y_pred) # Example output
from sklearn.metrics import accuracy_score
print(f"Accuracy: {accuracy_score(dataset.y_test, y_pred)}")
This introduces basic estimator usage. Experiment with hyperparameters like threshold_gain.
Explore more longitudinal-aware estimators
Review the estimator catalog and parameters in the API reference for additional options.