Examples

This page contains detailed examples of how to use Lazy Predict in various scenarios.

Classification Example

Basic Classification

Here’s a basic example using the breast cancer dataset:

from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# Create classifier
clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None)

# Fit and get models
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
print(models)

Classification with Custom Metric

You can use custom metrics for model evaluation:

from sklearn.metrics import f1_score

def custom_f1(y_true, y_pred):
    return f1_score(y_true, y_pred, average='weighted')

clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=custom_f1)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Advanced Classification Options

Use advanced options like categorical encoding, timeout, and cross-validation:

clf = LazyClassifier(
    verbose=1,                          # Show progress
    ignore_warnings=True,               # Suppress warnings
    custom_metric=None,                 # Use default metrics
    predictions=True,                   # Return predictions
    classifiers='all',                  # Use all available classifiers
    categorical_encoder='onehot',       # Encoding strategy
    timeout=60,                         # Max time per model in seconds
    cv=5                                # Cross-validation folds
)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Regression Example

Basic Regression

Here’s an example using the diabetes dataset:

from lazypredict.Supervised import LazyRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# Load data
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# Create and fit regressor
reg = LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)
print(models)

Working with Pandas DataFrames

Lazy Predict works seamlessly with pandas DataFrames:

import pandas as pd

# Your DataFrame
df = pd.DataFrame(X, columns=diabetes.feature_names)

# Split features and target
X = df
y = pd.Series(diabetes.target)

# Rest remains the same
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
reg = LazyRegressor(verbose=0, ignore_warnings=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

Categorical Feature Encoding

Lazy Predict supports multiple categorical encoding strategies:

OneHot Encoding (Default)

import pandas as pd
from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import train_test_split

# Load data with categorical features
df = pd.read_csv('data_with_categories.csv')
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Default onehot encoding
clf = LazyClassifier(categorical_encoder='onehot', verbose=0)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Ordinal Encoding

Useful for ordered categorical features or when one-hot encoding creates too many features:

clf = LazyClassifier(categorical_encoder='ordinal', verbose=0)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Target Encoding

Target encoding requires the category-encoders package:

pip install category-encoders
clf = LazyClassifier(categorical_encoder='target', verbose=0)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Binary Encoding

Binary encoding is efficient for high-cardinality features:

# Requires category-encoders package
clf = LazyClassifier(categorical_encoder='binary', verbose=0)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

Comparing Encoders

import pandas as pd

results = {}
for encoder in ['onehot', 'ordinal', 'target', 'binary']:
    try:
        clf = LazyClassifier(
            categorical_encoder=encoder,
            verbose=0,
            ignore_warnings=True
        )
        models, predictions = clf.fit(X_train, X_test, y_train, y_test)
        results[encoder] = models.head(3)
        print(f"\n{encoder.upper()} Encoding Results:")
        print(models.head(3))
    except Exception as e:
        print(f"{encoder}: {e}")

Using with MLflow

Lazy Predict has built-in MLflow integration for experiment tracking. You can enable it by setting the MLflow tracking URI:

import os
os.environ['MLFLOW_TRACKING_URI'] = 'sqlite:///mlflow.db'  # Local SQLite tracking
# Or for remote tracking:
# os.environ['MLFLOW_TRACKING_URI'] = 'http://your-mlflow-server:5000'

# MLflow tracking will be automatically enabled
reg = LazyRegressor(verbose=0, ignore_warnings=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

The following metrics and artifacts will be automatically logged to MLflow:

  • Model metrics (R-squared, RMSE, etc.)

  • Training time

  • Model parameters

  • Model signatures

  • Custom metrics (if provided)

  • Model artifacts for each trained model

You can view the results in the MLflow UI:

mlflow ui

For Databricks Users

If you’re using Databricks, MLflow tracking is automatically configured:

# MLflow tracking will use Databricks tracking URI automatically
reg = LazyRegressor(verbose=0, ignore_warnings=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

Getting Model Objects

You can access the trained model objects:

# Get all trained models
model_dictionary = reg.provide_models(X_train, X_test, y_train, y_test)

# Access specific model
random_forest = model_dictionary['RandomForestRegressor']

# Make predictions with specific model
predictions = random_forest.predict(X_test)

Model Timeout

Set a maximum time limit for each model to prevent long-running models from blocking:

# Limit each model to 60 seconds
clf = LazyClassifier(timeout=60, verbose=1)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# Models that exceed the timeout will be skipped
# Check for skipped models in the verbose output

This is particularly useful when:

  • Working with large datasets where some models might take very long

  • Running experiments with time constraints

  • Preventing specific slow models from blocking the entire pipeline

GPU Acceleration

Lazy Predict supports GPU acceleration for models that support it. Enable GPU with the use_gpu=True parameter:

from lazypredict.Supervised import LazyClassifier, LazyRegressor

# Classification with GPU
clf = LazyClassifier(verbose=0, ignore_warnings=True, use_gpu=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# Regression with GPU
reg = LazyRegressor(verbose=0, ignore_warnings=True, use_gpu=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

Supported GPU backends:

  • XGBoost — uses device="cuda"

  • LightGBM — uses device="gpu"

  • CatBoost — uses task_type="GPU"

  • cuML (RAPIDS) — GPU-native scikit-learn replacements added automatically

  • LSTM / GRU — PyTorch models use CUDA device

  • TimesFM — placed on CUDA device for inference

cuML (RAPIDS) GPU Acceleration

When cuML is installed and use_gpu=True, GPU-accelerated versions of common scikit-learn models are automatically added to the benchmark:

# Install cuML (requires NVIDIA GPU + CUDA)
pip install cuml-cu12  # for CUDA 12
# cuML models are added automatically when use_gpu=True
clf = LazyClassifier(use_gpu=True, verbose=0, ignore_warnings=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
# Results will include cuML_LogisticRegression, cuML_RandomForestClassifier, etc.

cuML GPU models added:

  • Classifiers: LogisticRegression, RandomForestClassifier, KNeighborsClassifier, SVC

  • Regressors: LinearRegression, Ridge, Lasso, ElasticNet, RandomForestRegressor, KNeighborsRegressor, SVR

Time Series Forecasting with GPU

from lazypredict.TimeSeriesForecasting import LazyForecaster

fcst = LazyForecaster(
    verbose=0,
    ignore_warnings=True,
    use_gpu=True,       # GPU for XGBoost, LightGBM, CatBoost, LSTM, GRU, TimesFM
)
scores, predictions = fcst.fit(y_train, y_test)

Note

GPU acceleration requires a CUDA-capable GPU and PyTorch installed with CUDA support. If CUDA is not available, models automatically fall back to CPU with a warning.

Using Local Foundation Model Weights

If you are offline, behind a firewall, or in an air-gapped environment, you can point to a local directory containing pre-downloaded model weights instead of downloading from Hugging Face:

from lazypredict.TimeSeriesForecasting import LazyForecaster

fcst = LazyForecaster(
    verbose=0,
    ignore_warnings=True,
    foundation_model_path="/path/to/timesfm-2.5-200m-pytorch",
)
scores, predictions = fcst.fit(y_train, y_test)

To download the model for later offline use:

# Download once (requires internet)
python -c "
from huggingface_hub import snapshot_download
snapshot_download('google/timesfm-2.5-200m-pytorch', local_dir='./timesfm-local')
"
# Then use the local path in air-gapped environments
fcst = LazyForecaster(foundation_model_path="./timesfm-local")

Intel Extension Acceleration

For improved performance on Intel CPUs, install Intel Extension for Scikit-learn:

pip install scikit-learn-intelex

Lazy Predict will automatically detect and use it for acceleration:

# No code changes needed - acceleration is automatic
clf = LazyClassifier(verbose=0)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# You'll see "Intel(R) Extension for Scikit-learn enabled" in verbose output

Time Series Forecasting

Basic Forecasting

Benchmark 20+ forecasting models with a single call:

import numpy as np
from lazypredict.TimeSeriesForecasting import LazyForecaster

# Generate sample data with trend + seasonality
np.random.seed(42)
t = np.arange(200)
y = 10 + 0.05 * t + 3 * np.sin(2 * np.pi * t / 12) + np.random.normal(0, 1, 200)

y_train, y_test = y[:180], y[180:]

fcst = LazyForecaster(verbose=0, ignore_warnings=True, predictions=True)
scores, predictions = fcst.fit(y_train, y_test)
print(scores)

Forecasting with Seasonal Period

By default the seasonal period is auto-detected via ACF. Override it manually:

fcst = LazyForecaster(
    seasonal_period=12,    # monthly data with yearly cycle
    verbose=0,
    ignore_warnings=True,
)
scores, predictions = fcst.fit(y_train, y_test)

Forecasting with Exogenous Variables

Pass optional exogenous features to models that support them (SARIMAX, AutoARIMA, ML models):

# Create exogenous features
X_train = np.column_stack([np.sin(t[:180]), np.cos(t[:180])])
X_test  = np.column_stack([np.sin(t[180:]), np.cos(t[180:])])

fcst = LazyForecaster(verbose=0, ignore_warnings=True)
scores, predictions = fcst.fit(y_train, y_test, X_train, X_test)

Forecasting with Cross-Validation

Use expanding-window time series cross-validation:

fcst = LazyForecaster(
    cv=5,                  # 5-fold TimeSeriesSplit
    verbose=0,
    ignore_warnings=True,
)
scores, predictions = fcst.fit(y_train, y_test)
# scores will contain CV Mean and CV Std columns for each metric

Selecting Specific Forecasters

Run only a subset of models:

fcst = LazyForecaster(
    forecasters=["Holt", "AutoARIMA", "Ridge_TS", "LSTM_TS"],
    verbose=0,
    ignore_warnings=True,
)
scores, predictions = fcst.fit(y_train, y_test)

Custom Forecasting Metric

Add a custom metric alongside the defaults:

def median_absolute_error(y_true, y_pred):
    return float(np.median(np.abs(y_true - y_pred)))

fcst = LazyForecaster(
    custom_metric=median_absolute_error,
    verbose=0,
    ignore_warnings=True,
)
scores, predictions = fcst.fit(y_train, y_test)
# scores will include a 'median_absolute_error' column

Saving and Loading Forecaster Models

# Save all fitted models
fcst.save_models("./my_forecasters")

# Load them back
fcst2 = LazyForecaster()
fcst2.load_models("./my_forecasters")

# Use loaded models to forecast
new_forecasts = fcst2.predict(y_history=y_train, horizon=20)