top of page
Search
Writer's pictureHugo LAI

Types of Supervised Machine Learning

By Anthony Lui


Machine learning is a branch of artificial intelligence. It focuses on utilizing datasets and algorithms to imitate the way humans learn, identifying patterns and trends by itself. There are two main subcategories of machine learning: Supervised and unsupervised. In supervised machine learning, a dataset labeled with the corresponding inputs and outputs is given to the model. Classifying data and predicting outputs on the training dataset.

On the other hand, unsupervised learning utilizes unlabeled datasets, identifying hidden features and patterns without human intervention. So what's the use of these two types of machine learning? With unsupervised machine learning, the goal is to get insight into the patterns and features of a large set of data, analyzing existing data to recognize defining attributes. In supervised machine learning, the objective is to train the model so that it is able to predict the output of new data. There are two types of supervised machine learning, classification and regression. In classification, the goal is to assign input data into a specific category, meaning that it takes on and predicts discrete values or labels. On the other hand, regression aims to predict a continuous numerical value or quantity based on input variables, taking on any value within a specific range.


Some examples of supervised machine learning for classification are logistic regression, decision tree, random forest and MLP.



Logistic regression models are utilized when the dependent variable is dichotomous (binary). Logistic regression is a forecast analysis, just like all regression analyses. Describing and dissecting relationships between the dependent and independent variables.


This model uses the logistic function to form an equation between x and y. If plotted, an S curve will be formed between 0 and 1 on the Y axis, meaning that the function can only return values between 0 and 1 for the dependent variable. In my model, the output values are rounded to the closest value. Answers

below 0.5 are rounded to 0, and answers

above 0.5 are rounded to 1, so the logistic

function returns a binary outcome.




Decision tree is a graph that forms a tree containing multiple choices and results. The nodes in the graph represent a choice while the edges represent the decision rules or condition. With the use of only yes/no, true/false answers to certain questions, the model is able to denote multiple features to predict the final output. However, no single feature is able to perfectly predict the final output, requiring us to use methods such as Gini and entropy, helping us measure the impurity of identified features (Dash, 2022).


In entropy, the disorder of the node is measured, providing a value between 0 and 1 where 1 is the highest level of disorder and impurity of the variable while the minimum is given by 0. This value is calculated through the formula:





Where the Pi is the probability of randomly selecting an example in class i.


Through the calculated entropy for each variable and its potential splits, the model is able to determine the root node. This is done through the calculation of the potential split from each variable as well as the average entropy across all nodes, The entropy node is then subtracted by the average entropy, providing us with the change in entropy, which is called Information Gained, representing how much information a certain feature provides for the target variable. The variable with the highest Information Gained will then be selected as the root node.


However, some variables split from the root node will still be impure, containing a mixture of both outputs. Therefore, calculating the Information Gained for each variable relating to the variable split from the root node is required to further expand the tree with the variable with the highest Information Gained, increasing the purity of the output as the tree continues to split with the same process.


Another way of splitting the tree is through the Gini Index. While the entropy and information gained method focuses on the purity of each node, the Gini Index calculates the likelihood that a randomly selected case will be misclassified. The probability of misclassification is lower the lower the Gini Index is. Through the formula of Gini Index





Where j is the number of classes in the target variable. In our case, it will be On and Off. And P(i) represents the ratio of On and total number of observations in nodes.


This formula will provide us with a value between 0.5 and 0. Where 0 will be the maximum purity while 0.5 will be the highest impurity. So, just like with the Entropy and Information Gain criteria, we choose the variable with the greatest purity to be the root node's variable. In a similar way, we would continue to progress down the tree while performing splits in nodes with lower node purity.​​



Random forest is a more complex and robust version of decision tree. By combining multiple trees, where they are all parallel, meaning that there won’t be any interaction between the separate trees, the model can make more accurate predictions. At the end of each tree, the mean of the classes would be outputted, creating the combined prediction of all trees.


14 views

Recent Posts

See All

Comments


bottom of page