Python for Machine Learning
Machine Learning Classifiers Comparison with Python
Evaluating and Comparing Classifiers Performance
Machine Learning Classifiers
Machine learning classifiers are models used to predict the category of a data point when labeled data is available (i.e. supervised learning). Some of the most widely used algorithms are logistic regression, Naïve Bayes, stochastic gradient descent, k-nearest neighbors, decision trees, random forests and support vector machines.
Choosing the Right Estimator
Determining the right estimator for a given job represents one of the most critical and hardest part while solving machine learning problems. Each estimator is suitable for a specific type of data and problem. Scikit-learn, one of the most popular Python libraries for machine learning, provides the following chart to guide the user on the decision process for choosing the most appropriate estimator.
Performance Evaluation Metrics
Classification models must be evaluated to determine their degree of effectiveness for performing a specific task. While good classification models are useful for prediction purposes, poor classification models lead to unreliable outcomes, and thus, are not useful for the user.
Performance evaluation metrics are based on the total number of the following variables:
- True Positives: outcome correctly predicted as positive class
- True Negatives: outcome correctly predicted as negative class
- False Positives: outcome incorrectly predicted as positive class
- False Negatives: outcome incorrectly predicted as negative class
which are visually represented in a matrix (i.e. confusion matrix) where one of its axis is the label that the machine learning model predicted, and the other the actual label: