How to Save Random Forest Model to File in Python

2024-09-12 440 words 3 minutes

Contents

When you fit a random forest model in Python, it is essential to save the fitted model for future use for predicting the new dataset.

If you save the random forest model (or any other machine learning model) to a file, it will save your time for future use, especially when the model takes significant time or resources to train.

In Python, you can use the dump function from pickle and joblib packages to save the random forest model to file.

Method 1: Using pickle

# import package
import pickle

# save Model
pickle.dump(model, open("rf_model.pkl", "wb"))

This will save the model as a binary file. You can use any file extension to save the model.

Method 2: Using joblib

# import package
import joblib

# save Model
joblib.dump(model, "rf_model.joblib")

The following examples explain in detail how to save the fitted random forest model to a file in Python.

Example 1: Using pickle

Let’s first develop a random forest model.

We will use the iris dataset from sklearn to fit the random Forest model.

Load and split the dataset,

# import package
from sklearn.datasets import load_iris

# load dataset
data = load_iris()

# split into training (70%) and testing (30%)
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)

# Fit random forest model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# prediction on test dataset
y_pred = model.predict(X_test)

Now, you can save this fitted random forest model (model) to a file using the dump function from pickle.

# import package
import pickle

# save Model
pickle.dump(model, open("rf_model.pkl", "wb"))

We have saved the fitted random forest model to rf_model.pkl file.

if you want to load this model for future use, you can use the load function from the pickle.

Load the model,

# load Model
model = pickle.load(open("rf_model.pkl", "rb"))

Example 2: Using joblib

Let’s first develop a random forest model.

We will use the iris dataset from sklearn to fit the random Forest model.

Load and split the dataset,

# import package
from sklearn.datasets import load_iris

# load dataset
data = load_iris()

# split into training (70%) and testing (30%)
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)

# Fit random forest model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# prediction on test dataset
y_pred = model.predict(X_test)

Now, you can save this fitted random forest model (model) to a file using the dump function from joblib.

# import package
import joblib

# save Model
joblib.dump(model, "rf_model.joblib"))

We have saved the fitted random forest model to rf_model.joblib file.

if you want to load this model for future use, you can use the load function from the pickle.

Load the model,

# load Model
model = joblib.load("rf_model.joblib")