Measuring performance of a model is very important in Machine Learning. It helps us to understand the performance of our model and makes it easy to present your model to the related stakeholders. There are many different performance metrics out there but only some of them are suitable to be used for regression.
In my another article What is Confusion Matrix in Machine Learning, I have shown how to measure performance of a classification type model. In this article I will cover how to measure performance of a regression type model in Machine Learning.
It is impossible for us to predict the exact value but rather how close our prediction is against the real value. In regression type model, the most commonly known evaluation metrics include:
- R-squared (R2), which is the proportion of variation in the outcome that is explained by the predictor variables. In multiple regression models, R2 corresponds to the squared correlation between the observed outcome values and the predicted values by the model. The Higher the R-squared, the better the model.
- Root Mean Squared Error (RMSE), which measures the average error performed by the model in predicting the outcome for an observation. Mathematically, the RMSE is the square root of the mean squared error (MSE), which is the average squared difference between the observed actual outcome values and the values predicted by the model. So,
MSE = mean((observeds - predicteds)2) and
RMSE = sqrt(MSE). The lower the RMSE, the better the model.
- Residual Standard Error (RSE), also known as the model sigma, is a variant of the RMSE adjusted for the number of predictors in the model. The lower the RSE, the better the model. In practice, the difference between RMSE and RSE is very small, particularly for large multivariate data.
- Mean Absolute Error (MAE), like the RMSE, the MAE measures the prediction error. Mathematically, it is the average absolute difference between observed and predicted outcomes,
MAE = mean(abs(observeds - predicteds)). MAE is less sensitive to outliers compared to RMSE.
There is a problem with the above metrics, they are sensible to the inclusion of additional variables in the model, even if those variables don’t have significant contribution in explaining the outcome. Including additional variables in the model will always increase the R2 and reduce the RMSE. So, we need a more robust metric to guide the model choice.
Concerning R2, there is an adjusted version, called Adjusted R-squared, which adjusts the R2 for having too many variables in the model.
Additionally, there are four other important metrics – AIC, AICc, BIC and Mallows Cp – that are commonly used for measuring model performance and model selection. These are an unbiased estimate of the model prediction error MSE. The lower the values of these metrics, the better the model.
- AIC stands for (Akaike’s Information Criteria), a metric is developed by the Japanese Statistician, Hirotugu Akaike, 1970. The basic idea of AIC is to penalize the inclusion of additional variables to a model. It adds a penalty that increases the error when including additional terms. The lower the AIC, the better the model.
- AICc is a version of AIC corrected for small sample sizes.
- BIC (or Bayesian information criteria) is a variant of AIC with a stronger penalty for including additional variables to the model.
- Mallows Cp: A variant of AIC developed by Colin Mallows.
Generally, the most commonly used metrics, for measuring regression model quality and for comparing models, are: Adjusted R2, AIC, BIC and Cp.
In this tutorial, I tried to brief about the technique of measuring performance of a regression type model in Machine Learning. Hope you have enjoyed the tutorial. If you want to get updated, like my facebook page https://www.facebook.com/LearningBigDataAnalytics and stay connected.