Mean Absolute Error (MAE)

Learning Guide for Mean Absolute Error (MAE) with 5 Code Examples in Python

Understanding Mean Absolute Error (MAE) is crucial for anyone working with regression models in data science and machine learning. This guide will explain what MAE is, how to calculate it, and provide five practical Python code examples to help you grasp its implementation.

What is Mean Absolute Error (MAE)?

Mean Absolute Error (MAE) is a measure used to evaluate the accuracy of a regression model. It calculates the average absolute differences between the predicted and actual values.

Why Use MAE?

MAE is widely used because it provides a straightforward interpretation of prediction accuracy. Unlike other metrics, it does not square the error terms, making it less sensitive to outliers.

Calculating MAE: The Formula

The formula to calculate MAE is:

[ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}| ]

Where:

( y_i ) is the actual value
( \hat{y_i} ) is the predicted value
( n ) is the number of observations

Implementing MAE in Python

Before diving into the examples, let’s see a basic implementation of MAE in Python:

import numpy as np

def mae(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

# Example usage
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])
print(mae(y_true, y_pred))  # Output: 0.5

Example 1: MAE with Simple Linear Regression

This example shows how to compute MAE for a simple linear regression model.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Generating sample data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Fitting the model
lin_reg = LinearRegression()
lin_reg.fit(X, y)
y_pred = lin_reg.predict(X)

# Calculating MAE
mae_value = mean_absolute_error(y, y_pred)
print(f'MAE: {mae_value}')

Example 2: MAE with Polynomial Regression

Polynomial regression can fit a wider variety of curves compared to linear regression.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Generating sample data
np.random.seed(0)
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)

# Transforming the data
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

# Fitting the model
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)
y_pred = poly_reg.predict(X_poly)

# Calculating MAE
mae_value = mean_absolute_error(y, y_pred)
print(f'MAE: {mae_value}')

Example 3: MAE with Multiple Linear Regression

Multiple linear regression involves multiple predictors.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Generating sample data
np.random.seed(0)
X = np.random.rand(100, 3)
y = 1 + 2 * X[:, 0] + 3 * X[:, 1] + 4 * X[:, 2] + np.random.randn(100)

# Fitting the model
multi_lin_reg = LinearRegression()
multi_lin_reg.fit(X, y)
y_pred = multi_lin_reg.predict(X)

# Calculating MAE
mae_value = mean_absolute_error(y, y_pred)
print(f'MAE: {mae_value}')

Example 4: MAE with Decision Tree Regressor

Decision Trees are a non-linear model.

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error

# Generating sample data
np.random.seed(0)
X = np.random.rand(100, 1)
y = 4 * X.squeeze() + np.random.randn(100)

# Fitting the model
tree_reg = DecisionTreeRegressor()
tree_reg.fit(X, y)
y_pred = tree_reg.predict(X)

# Calculating MAE
mae_value = mean_absolute_error(y, y_pred)
print(f'MAE: {mae_value}')

Example 5: MAE with Random Forest Regressor

Random Forest is an ensemble learning method that operates by constructing multiple decision trees.

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# Generating sample data
np.random.seed(0)
X = np.random.rand(100, 1)
y = 4 * X.squeeze() + np.random.randn(100)

# Fitting the model
forest_reg = RandomForestRegressor(n_estimators=100)
forest_reg.fit(X, y)
y_pred = forest_reg.predict(X)

# Calculating MAE
mae_value = mean_absolute_error(y, y_pred)
print(f'MAE: {mae_value}')

Conclusion

MAE is a critical metric for evaluating the performance of regression models. It provides a straightforward interpretation of prediction accuracy by calculating the average absolute differences between predicted and actual values. By understanding and implementing MAE using various regression techniques, you can improve your model’s accuracy and reliability.

FAQs

What is MAE used for? MAE is used to measure the accuracy of regression models by calculating the average absolute differences between predicted and actual values.
Why is MAE preferred over other metrics? MAE is preferred because it provides a straightforward interpretation of prediction accuracy and is less sensitive to outliers compared to metrics like RMSE.
Can MAE be used for classification problems? No, MAE is specifically designed for regression problems.
How do you interpret MAE values? Lower MAE values indicate better model performance, while higher values indicate poorer fit.
Is MAE sensitive to outliers? No, MAE is less sensitive to outliers compared to metrics like RMSE because it does not square the error terms.

Evaluating the Model