Enhancing Neural Network Performance with Early Stopping in Python
Written on
Chapter 1: Introduction to Early Stopping
Early stopping is a crucial technique employed to enhance the performance of neural networks. In Keras, various hyperparameters must be tuned to achieve optimal accuracy. Key hyperparameters include the number of hidden layers, neurons per layer, learning rate, optimizer type, batch size, activation functions, and the number of epochs.
Early stopping serves as an intelligent strategy that halts training automatically through Keras's callback feature. During training, epochs dictate the number of iterations for updating weights, which varies with the gradient descent method. In mini-batch training, the number of batches is also a hyperparameter. For a deeper understanding of these hyperparameters, continue reading.
Setting a high epoch count may lead to overfitting, causing the model to perform well on training data but poorly on unseen test data.
Python Example:
# Suppress warnings
import warnings
# Importing necessary libraries for visualization and model building
from mlxtend.plotting import plot_decision_regions
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_circles
import seaborn as sns
import matplotlib.pyplot as plt
We will create synthetic data using the sklearn library:
X, y = make_circles(n_samples=110, noise=0.1, random_state=1)
Next, we visualize the raw data for binary classification using a scatter plot:
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y)
plt.show()
Image by the Author
Subsection 1.1: Preparing the Data
We will split the data into training and testing sets to ensure model reliability, avoiding any data leakage:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=2)
To add hidden layers, we utilize the add() method. In this case, we create a fully connected layer with 256 neurons and employ the ReLU activation function. For binary classification, we use the sigmoid activation for the output layer:
model = Sequential()
model.add(Dense(256, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Next, we configure the model for compilation, specifying the optimizer, loss function, and metrics:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Now, we train the model with the defined epochs, batch size, and other parameters using the fit() method, allowing the model to learn from the training data:
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=4000)
We can visualize the training and validation loss to assess the model's generalization ability. This plot is essential for identifying underfitting and overfitting:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()
Image by the Author
As illustrated in the plot, the model is clearly overfitting, indicating a strong need for early stopping.
Subsection 1.2: Implementing Early Stopping
We can leverage early stopping to enhance performance, saving training time by halting the process when overfitting begins to occur. The sequential model represents a straightforward feed-forward layer structure:
model = Sequential()
model.add(Dense(256, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Here, we implement the early stopping mechanism with specific parameter configurations. By monitoring validation loss, we can trigger early stopping if an increase is detected. The min_delta parameter defines the threshold for improvement, while patience specifies the number of epochs to wait for improvement before stopping:
callback = EarlyStopping(
monitor="val_loss",
min_delta=0.001,
patience=20,
verbose=1,
mode="auto",
baseline=None,
restore_best_weights=False
)
We save the trained model's history for visualization:
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=4000, callbacks=callback)
To analyze the training and validation loss, we plot the results:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()
Image by the Author
The early stopping callback halted training at epoch 454, at which point the validation loss began to rise, signifying potential overfitting.
plot_decision_regions(X_test, y_test.ravel(), clf=model, legend=2)
plt.show()
Image by the Author
The decision boundary established is sufficient for effective binary classification.
I hope you found this article insightful. Feel free to connect with me on LinkedIn and Twitter.
Recommended Articles:
- Most Usable NumPy Methods with Python
- NumPy: Linear Algebra on Images
- Exception Handling Concepts in Python
- Pandas: Dealing with Categorical Data
- Hyperparameters: RandomSearchCV and GridSearchCV in Machine Learning
- Fully Explained Linear Regression with Python
- Fully Explained Logistic Regression with Python
- Data Distribution using Numpy with Python
- 40 Most Insanely Usable Methods in Python
- 20 Most Usable Pandas Shortcut Methods in Python
Chapter 2: Understanding Callbacks in Deep Learning
The first video explores callbacks, early stopping, and live loss plotting in deep learning using Keras, TensorFlow, and Python.
Chapter 3: Deep Dive into Callbacks, Checkpoints, and Early Stopping
The second video discusses callbacks, checkpoints, and early stopping in the context of deep learning with Keras and TensorFlow.