Hyperparameter Tuning: Grid vs. Random Search¶
Dataset Used in Tutorials
Use the shared synthetic dataset defined in the tutorials overview. Generate it once there and reuse it here.
Tune longitudinal-aware models to squeeze out extra performance. This guide compares grid search and random search for LexicoRandomForestClassifier, focusing on threshold_gain plus common random-forest hyperparameters.
The animation below summarises the contrast: grid search sweeps a regular lattice of hyperparameter combinations (thorough but expensive), while random search scatters samples across the same plane and, in practice, often lands inside high-performing regions that a coarse grid would miss.
Step 1: Load data and define temporal dependencies¶
from scikit_longitudinal.data_preparation import LongitudinalDataset
dataset = LongitudinalDataset('./extended_stroke_longitudinal.csv')
dataset.load_data_target_train_test_split(target_column='stroke_w2', test_size=0.2, random_state=42)
dataset.setup_features_group([[2,3], [4,5], [6,7], [8,9], [10,11], [12,13]])
features_group = dataset.feature_groups()
Step 2: Grid search key hyperparameters¶
from sklearn.model_selection import GridSearchCV
from scikit_longitudinal.estimators.ensemble import LexicoRandomForestClassifier
param_grid = {
'threshold_gain': [0.0001, 0.001, 0.01],
'n_estimators': [50, 100, 200],
'max_depth': [None, 5, 10],
}
grid = GridSearchCV(
estimator=LexicoRandomForestClassifier(features_group=features_group, random_state=42),
param_grid=param_grid,
cv=3,
n_jobs=-1,
)
grid.fit(dataset.X_train, dataset.y_train)
print(f"Best params (grid search): {grid.best_params_}")
Step 3: Random search broader spaces¶
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform, randint
param_distributions = {
'threshold_gain': loguniform(1e-4, 1e-1),
'n_estimators': randint(50, 300),
'max_depth': [None, 5, 10, 15],
'max_features': ['sqrt', 'log2', 0.8],
}
random_search = RandomizedSearchCV(
estimator=LexicoRandomForestClassifier(features_group=features_group, random_state=42),
param_distributions=param_distributions,
n_iter=12,
cv=3,
n_jobs=-1,
random_state=42,
)
random_search.fit(dataset.X_train, dataset.y_train)
print(f"Best params (random search): {random_search.best_params_}")
- Grid search: exhaustive within a small, carefully chosen grid—great when you have strong priors on useful values.
- Random search: samples diverse combinations quickly—useful when exploring larger spaces or when some parameters benefit from logarithmic sampling (e.g.,
threshold_gain).