What is Confusion Matrix in Machine Learning
In machine learning a confusion matrix is an N x N (N by N) Matrix that is used to evaluate the performance of a classification type model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. A confusion matrix visualizes and summarizes the performance of a classification algorithm. Confusion Matrix is a useful machine learning method which allows you to measure Recall, Precision, Accuracy, AUC-ROC curve etc.
All the diagonal elements denote correctly classified outcomes. The misclassified outcomes are represented on the off diagonals of the confusion matrix. Hence, the best classifier will have a confusion matrix with only diagonal elements and the rest of the elements set to zero. A confusion matrix generates actual values and predicted values after the classification process. The effectiveness of the system is determined according to the following values generated in the matrix. The classifiers for two classes have the following confusion matrix.
Confusion Matrix Table:
Actual class | Predicted class | |
---|---|---|
Positive | Negative | |
Positive | True positive (TP) | False negative (FN) |
Negative | False positive (FP) | True negative (TN) |
The entries in the confusion matrix are defined as the following:
- True Positive (TP) is the total number of correct results or predictions when the actual class was positive.
- False Positive (FP) is the total number of wrong results or predictions when the actual class was positive. This is also known as the Type 1 error.
- True Negative (TN) is the total number of correct results or predictions when the actual class was negative.
- False Negative (FN) is the total number of wrong results or predictions when the actual class was negative. This is also known as the Type 2 error.
The accuracy calculation (AC) is used to compare the efficiency of the system. It takes into account the total number of correct predictions made by the classifier. It is calculated by the following equation:
The recall is calculated by taking the proportion of correctly identified positive inputs. It is the TP rate and is measured by the given equation:
Precision is the correctly predicted positive cases by the classifier. It is measured by the given equation:
F1 score or F measure is also a measure of the test’s accuracy. It is defined as a weighted mean of precision and recall. It has its maximum value at 1 and worst at 0.
In my another article How to Measure Performance of a Regression Type Model in Machine Learning, I have shown how to measure performance of a regression type model. Hope, you will also enjoy that article as well.
In this tutorial, I tried to brief about the confusion matrix in Machine Learning. Hope you have enjoyed the tutorial. If you want to get updated, like my facebook page https://www.facebook.com/LearningBigDataAnalytics and stay connected.