Python for Machine Learning

Machine Learning Classifiers Comparison with Python

Evaluating and Comparing Classifiers Performance

Roberto Salazar
Towards Data Science
6 min readJun 4, 2020

--

Image by Kevin Ku available at Unsplash

Machine Learning Classifiers

Machine learning classifiers are models used to predict the category of a data point when labeled data is available (i.e. supervised learning). Some of the most widely used algorithms are logistic regression, Naïve Bayes, stochastic gradient descent, k-nearest neighbors, decision trees, random forests and support vector machines.

Choosing the Right Estimator

Determining the right estimator for a given job represents one of the most critical and hardest part while solving machine learning problems. Each estimator is suitable for a specific type of data and problem. Scikit-learn, one of the most popular Python libraries for machine learning, provides the following chart to guide the user on the decision process for choosing the most appropriate estimator.

Image by scikit-learn available at scikit-learn.org

Performance Evaluation Metrics

Classification models must be evaluated to determine their degree of effectiveness for performing a specific task. While good classification models are useful for prediction purposes, poor classification models lead to unreliable outcomes, and thus, are not useful for the user.

Performance evaluation metrics are based on the total number of the following variables:

  • True Positives: outcome correctly predicted as positive class
  • True Negatives: outcome correctly predicted as negative class
  • False Positives: outcome incorrectly predicted as positive class
  • False Negatives: outcome incorrectly predicted as negative class

which are visually represented in a matrix (i.e. confusion matrix) where one of its axis is the label that the machine learning model predicted, and the other the actual label:

--

--