Confusion Matrix Calculator

Calculate and visualize classification performance metrics from confusion matrix values. Analyze accuracy, precision, recall, F1 score, and more for machine learning models.

Calculate Your Confusion Matrix Calculator

Confusion Matrix Values

Enter the values for your confusion matrix

Confusion Matrix

Predicted PositivePredicted Negative
Actual Positive80 (TP)30 (FN)
Actual Negative20 (FP)70 (TN)

What is a Confusion Matrix?

A confusion matrix is a table used to evaluate the performance of a classification model. It presents a summary of prediction results, showing the counts of true positives, false positives, true negatives, and false negatives. This visualization makes it easy to see not just how many predictions were correct, but also what types of errors were made.

Understanding the Components

A standard 2×2 confusion matrix contains four key components:

  • True Positives (TP): Cases correctly identified as positive (e.g., correctly diagnosed patients with a disease).
  • False Positives (FP): Negative cases incorrectly identified as positive (e.g., healthy patients incorrectly diagnosed with a disease). These are also known as Type I errors.
  • False Negatives (FN): Positive cases incorrectly identified as negative (e.g., sick patients incorrectly diagnosed as healthy). These are also known as Type II errors.
  • True Negatives (TN): Cases correctly identified as negative (e.g., correctly identified healthy patients).

Key Performance Metrics

From the confusion matrix, several metrics can be calculated to evaluate different aspects of model performance:

Basic Metrics

  • Accuracy: The proportion of all predictions that were correct.

    Accuracy = (TP + TN) / (TP + FP + FN + TN)

  • Precision (Positive Predictive Value): The proportion of positive identifications that were actually correct.

    Precision = TP / (TP + FP)

  • Recall (Sensitivity): The proportion of actual positives that were correctly identified.

    Recall = TP / (TP + FN)

  • Specificity: The proportion of actual negatives that were correctly identified.

    Specificity = TN / (TN + FP)

Advanced Metrics

  • F1 Score: The harmonic mean of precision and recall, providing a single measure that balances both concerns.

    F1 = 2 * (Precision * Recall) / (Precision + Recall)

  • Matthews Correlation Coefficient (MCC): A correlation coefficient between predicted and observed classifications, ranging from -1 to +1, with +1 representing perfect prediction.

    MCC = (TP*TN - FP*FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

  • Balanced Accuracy: The average of sensitivity and specificity, useful when classes are imbalanced.

    Balanced Accuracy = (Sensitivity + Specificity) / 2

Choosing the Right Metric

The importance of different metrics varies depending on the specific application:

  • When false positives are costly: Prioritize precision (e.g., spam detection, where falsely marking legitimate emails as spam is problematic).
  • When false negatives are costly: Prioritize recall (e.g., cancer screening, where missing a cancer diagnosis is more serious than a false alarm).
  • When classes are balanced: Accuracy or F1 score may suffice.
  • When classes are imbalanced: Consider balanced accuracy, MCC, or area under the ROC curve instead of simple accuracy.

Practical Applications

Confusion matrices and their derived metrics are widely used in:

  • Medical Diagnostics: Evaluating the performance of diagnostic tests or predictive models for disease detection.
  • Machine Learning: Assessing and comparing classification algorithms in tasks like image recognition, natural language processing, and predictive maintenance.
  • Information Retrieval: Evaluating search engines and recommendation systems.
  • Quality Control: Evaluating inspection processes that classify products as defective or non-defective.
  • Biometrics: Evaluating systems for facial recognition, fingerprint matching, and other identity verification methods.

Using the Confusion Matrix Calculator

Our calculator makes it easy to evaluate classification model performance:

  1. Enter the counts for each confusion matrix component (TP, FP, FN, TN).
  2. Click "Calculate Metrics" to compute performance statistics.
  3. Switch between the "Matrix" and "Metrics" tabs to see the confusion matrix visualization and detailed performance metrics.
  4. Use the results to assess model performance and identify areas for improvement.

Whether you're evaluating a medical diagnostic test, a machine learning model, or any other binary classifier, our confusion matrix calculator provides the insights you need to understand and improve performance.

Frequently Asked Questions

A confusion matrix is a table used to evaluate the performance of a classification model. It compares predicted classifications against actual classifications, breaking down results into four categories:
  • True Positives (TP): Correctly predicted positive cases
  • False Positives (FP): Incorrectly predicted positive cases (Type I error)
  • True Negatives (TN): Correctly predicted negative cases
  • False Negatives (FN): Incorrectly predicted negative cases (Type II error)

A confusion matrix shows how well your model is performing by breaking down its predictions:
  • The diagonal elements (TP and TN) represent correct predictions
  • The off-diagonal elements (FP and FN) represent errors
Higher numbers in the diagonal and lower numbers in the off-diagonal indicate better model performance.

A confusion matrix enables calculation of several performance metrics:
  • Accuracy: (TP + TN) / (TP + TN + FP + FN)
  • Precision: TP / (TP + FP)
  • Recall (Sensitivity): TP / (TP + FN)
  • Specificity: TN / (TN + FP)
  • F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
  • MCC: (TP × TN - FP × FN) / √((TP + FP) × (TP + FN) × (TN + FP) × (TN + FN))

Precision answers: "Of all instances predicted as positive, how many were actually positive?" It focuses on false positives. Formula: TP / (TP + FP)

Recall answers: "Of all actual positive instances, how many did we correctly predict?" It focuses on false negatives. Formula: TP / (TP + FN)

They represent a trade-off: improving one typically reduces the other.

Use the F1 score when you need to balance precision and recall, especially with imbalanced datasets. It's the harmonic mean of precision and recall, giving more weight to lower values. This makes it particularly useful when false positives and false negatives have similar costs, but you can't afford to miss too many positive cases.

The MCC produces a value between -1 and +1, where +1 represents perfect prediction, 0 represents random prediction, and -1 indicates total disagreement. Unlike accuracy, MCC works well even with imbalanced datasets. It's calculated using all four confusion matrix values: MCC = (TP×TN - FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

Accuracy can be misleading with imbalanced datasets. For example, in a dataset with 95% negative cases, simply predicting 'negative' for every instance would give 95% accuracy without any real learning. In such cases, metrics like precision, recall, F1 score, or MCC give more informative evaluations of model performance.

ROC (Receiver Operating Characteristic) curves plot the true positive rate (recall) against the false positive rate (1-specificity) at various classification thresholds. These rates come directly from the confusion matrix. The area under the ROC curve (AUC) measures the model's ability to distinguish between classes, with 1.0 being perfect and 0.5 being no better than random.

Choose your primary metric based on the costs of different errors in your specific application. Use precision when false positives are more costly (e.g., spam detection), recall when false negatives are more costly (e.g., disease detection), F1 score when both error types matter equally, accuracy for balanced datasets, and MCC for imbalanced datasets where all four confusion matrix categories matter.

Yes, for multi-class problems, a confusion matrix becomes an N×N table where N is the number of classes. Each row represents actual classes and each column represents predicted classes. The diagonal elements represent correct predictions for each class, while off-diagonal elements show misclassifications. Class-specific metrics can be calculated for each class or averaged across all classes.

Share This Calculator

Found this calculator helpful? Share it with your friends and colleagues!