Are confused with Confusion Matrix?

No worries!!!! At end of this blog you will be super confident about the Confusion Matrix.

Lets get started, Let me take one problem statement. The problem statement is social network ad. Using this dataset have to develop a model which will predict a person is going to purchase or not purchase by clicking the ad. I assumed that you have created the model, now curial and challenging part is validation of the created model. There it comes "Confusion Matrix".

Dataset of the Problem Statement

Dataset

I have created the model using Random Forest to explain the confusion matrix. Kindly see the below image and read the parameters over there.

Confusion and Report of the model.

In[15]: Confusion Matrix of the created modelĀ 

In[17]: Classification Report of the created model

I hope you have read the parameters in the image. you brain might be triggered what are these parameters trying to say? What is Precision, Recall, F1-score ,support, accuracy, macro average, weighted average? so many parameters ?
To understand the above parameter first you have to know what is the "Evaluation Metric" for Classification?

Yeah your guess is right !!!!! "Accuracy"? OK great but how to calculate, there it comes about Confusion Matrix.

Lets see the Syntax of the confusion Matrix first. To understand the core concept of Confusion Matrix, I have converted "No" as "Apple" and "Yes" as "Orange".

Confusion Matrix-Syntax

To construct the confusion matrix, we need two list one is "Actual answer"(from original dataset-Test set) and another one is "Predicted answer"( predicted by the created model for the same input in the Test Set)

Important Note: Evaluation Metric should be done to the Test Set and Model should be created by the Training Test.

In the dataset Total count is 400 ,
Training set=77 %
Test set=33 % => 134 count.

Now see the syntax of the confusion matrix in the image for 2 mins and try to understand by yourself.
I hope you are seeing the image. Ok Great. Did you find any logic over there ? like what is True Apple, False Apple, False Orange and True Orange?

If you understood from the image , that's great. Just cross check the your point below

1)True Apple-Tells that the created model have correctly classified the Apple as an Apple(Correct Classification)

2)False Apple- Tells that the created model have wrongly classified as an Orange but it should be classified as an Apple(Because Actual is Apple ,but predicted as an Orange)

3)False Orange- Tells that model have wrongly classified as an Apple but it should be classified as an Orange(Because Actual is Orange, but predicted as an Apple.

4)True Orange-Tells that model have correctly classified the orange as an orange(Correct Classification)

Great , You have crossed the first milestone of the concept

Using these four cell, You can answer for the several number of questions .

Lets see one by one
1) Accuracy
2)Recall
3)Precision
4)F1-Score
5)Macro Average
6)Weighted Average

1) Accuracy

Accuracy tells that, how the model is correctly classified the Apple and Orange.

In the given problem statement 78 correctly classified as an Apple + 43 correctly classified as orange divided by the total count in the test set.

It indirectly tells that performance of the model.

with help of accuracy you can see the over all .

2.Recall

Recall tells that, how the model is correctly classified the Each class to the total count of each class in the test set.

In the given image you can find the pattern that Recall should be calculated for the each class.

Ok. When I have to see Recall?

Good Question-- When the dataset is imbalanced.

Recall- Formula

In the above image you can find the formula for the same.

Lets see how to understand the Recall.

Recall for Apple=78(Total number of correctly classified as an Apple)/85(Total count of the Apple in the Dataset )

92 % the created model is classified the Apple alone.

Likewise if you want to know how the model is classifying the orange alone.

Super kindly do the calculation by yourself for the understanding (Don't forget to Visualize)

I hope you have visualised for Orange. That's Great!!

Important: Recall value should be high for good model.

3.Precision

In the above image you can find the what question can be raised?.

Precision tells that correct and wrong classification.

3.Precision: Formula

In the above image you can find the formula for the same. Lets see how to understand the Recall.

Precision for Apple=78(Total number of correctly classified as an Apple)/(78(Correctly Classified)+6(Wrongly Classified) )

93 % the created model is classified the correct and wrongly (Apple). Likewise if you want to know how the model is classifying the orange alone. Super kindly do the calculation by yourself for the understanding (Don't forget to Visualize) I hope you have visualised for Orange. That's Great!!

Important: Recall value should be high for good model.

4)F1-Score:Question

What if the model shows high precision and low recall,? how will you select the model performance?

F1-Score solve the issue of the above.

5)F1-Score:Formula

F1-Score is a harmonic mean. F1-score gives the overall performance of the model (For Imbalanced data)
F1-Score should be high for good model

6) Macro Average

It gives the average of each parameter.

7) Weighted Average

Weighted Average is with respect to proportion rate of each class.

Now see this, Wow you can see the insights of each term. Right!!!

Will see you in the new post!!!!!

Upcoming post on

What is AI?

What is ML? and so on

Stay Tuned!!!!!!!!

If you like it,Do not forget to comment