Model Evaluation Techniques

Model evaluation has always played a crucial role in developing a machine learning model. Building a predictive machine learning model without checking for errors, cannot be counted as a fit model. For your model to be a good one, you need to check on the metrics and ensure that all the errors are removed and other improvements are made accordingly until the desired accuracy rate is attained.  In this blog, we shall jot down some important metrics which may be helpful in the model evaluation techniques for machine learning.

Various Performance Metrics

  1. Mean Absolute Error

An error can be defined as the difference between the actual values and the predicted values. Statistically, Mean Absolute Error or commonly known as the MAE, is a result of measuring the difference between two continuous variables.

Formula and code of MAE:

(1 / n) * (∑ |y – ŷ|)


  • The predicted values should be subtracted from the actual y values.
  • Now, you need to take the absolute value of each error.
  • Instantly, sum these
  • Finally, take out the average

In machine learning, the MAE is often used with regression models.

  1. Mean Squared Error

Mean Squared Error or commonly known as the MSE, is the average of the squared errors used as the loss function for the least squares values of regression. It is basically the sum of the square of the difference between the predicted target variables and the actual target variables, which is finally divided by the number of data points.

The Mean Squared Error helps in defining how close a regression line is to a set of points.

Steps for calculating MSE for a set of X and Y values:

  • Firstly, you need to find the regression line.
  • Insert the values of X into the linear regression equation. This will help you in finding the new values of Y.
  • Now, subtract the new values of Y from the original values. This will give you the error.
  • Now, you need to square the errors.
  • After squaring, the next step is to add up the errors.
  • Finally, find the mean.
  1. Root Mean Squared Error

Commonly known as RMSE, the Root Mean Squared Error occurs when a prediction is made on a dataset. It is basically the measure of how well a regression line will fit the data points. This is quite similar to the MSE model, but here, the root of the value is considered while determining the accuracy of the learning model. RMSE serves double purpose:

  • It serves as a heuristic for the training models
  • It helps in evaluating the trained models for usefulness and accuracy.

root_mean_squared_error= sqrt (mean_squared_error)

  1. R-Squared

R-Squared can be defined as the statistical measure that is used to represent the goodness fit of a regression model. The ideal value of r-square is known to be 1. Closer the value of r-square to 1, better is the fitted model.  R-square is basically the comparison of the residual sum of squares and the total number of squares, where, total sum of squares is summation of squares of distance (perpendicular) between the data points and the average line.

  1. Adjusted R-Squared

Adjusted R-Squared is quite different from R-Squared as it takes into account the number of independent variables that are used for predicting the target variable. In this way, one can determine if the addition of these new variables will actually increase the model fit or not. If the value of R-Squared does not increase with the addition of new variables, then the value of Adjusted R-Squared will also decrease.

  1. Accuracy Score

Once your machine model has been built, the main task is to determine and evaluate the performance of the model. This is basically the fraction of the number of correct predictions to the total number of predictions. In binary terms, the accuracy score is calculated in positives and negatives. If accuracy is 91 percent or 0.91, the classifier is working completely fine in identifying malignancies.

  1. Confusion Matrix

A confusion matrix can be defined as the N*N matrix that is used to evaluate the performance of a classification model, N is the number of target classes. The matrix is used to evaluate and compare the actual target values with that of the values predicted by machine model learning.

  1. Precision and Recall

In layman terms, precision is the ratio of true positives and all the positives. Precision gives us the measure of all the relevant data points.  Recall is termed as the measure of the machine model that correctly identifies the True Positives. Recall gives an estimate of how accurately your model identifies the relevant data.

  1. F1- Score

In machine learning, the F1-Score can be defined as the measure of a test’s accuracy. The test’s precision and recall is used to calculate the F1-score. The F1-score can be determined as the harmonic mean of both precision and recall. The highest value of F1-score can be 1.0 and the lowest can be 0.

  1. AUC-ROC Curve

AUC, that is, Area under the curve and ROC, that is, Receiver Characteristic Operator helps you to determine and visualize how well the machine learning classifier is performing. It is used to separate the signal from the noise.