PyCaret is a low-code Python library that automates the full ML pipeline. This guide uses the classic Iris dataset for classification, showing how to preprocess, compare 15+ models, tune the best one, and deploy—all in under 10 lines of code.
Start by installing: pip install pycaret[full]
PyCaret’s default pip install pycaret is a slim version with only core/hard dependencies like pandas, scikit-learn, and numpy. The [full] extra pulls all optional dependencies.
Load and prep data with these imports:
from pycaret.classification import *;
from pycaret.datasets import get_data;
data = get_data('iris')
Run setup(data, target='species')
PyCaret auto-handles missing values, encoding categoricals, scaling features, and train-test splits.
Next, benchmark models:
best = compare_models()
It trains everything from Logistic Regression to XGBoost, ranking them by Accuracy, AUC, or Recall on a leaderboard.
Pick a winner like dt = create_model('dt') for Decision Tree,
then hyperparameter-tune: tuned_dt = tune_model(dt)
Visualize results effortlessly:
plot_model(tuned_dt, plot='confusion_matrix') reveals errors;
plot_model(tuned_dt, plot='feature') shows top predictors like petal length.
Predict on new data: predictions = predict_model(tuned_dt, data=data) adds ‘Label’ and ‘Score’ columns.
Deploy with save_model(tuned_dt, 'iris_model') and reload anytime via load_model('iris_model').
| Step | Code | Key Benefit |
|---|---|---|
| Setup | setup(data, target='species') | Auto-preprocessing |
| Compare | compare_models() | Ranks 15+ models instantly |
| Tune & Plot | tune_model(create_model('dt')); plot_model() | Optimizes + visualizes |
| Predict/Deploy | predict_model(); save_model() | Production-ready in seconds |
Note: Swap Iris for trip data and target ‘delay_minutes’ for ETA prediction interests.