Machine Learning

30.05.2021 — Machine Learning, Beginners, Cliff Notes — 3 min read

Definition

Machine Learning (ML) is a field of computer science that gives computers the ability to learn without being explicitly programmed (Wikipedia). Tom M. Mitchell of Carnegie Mellon describes it as, "a computer program is said to learn from experience E with respect to some class of tasks in T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

ML is a big buzz word these days, but you've likely used it before. Have you heard of line-of-best fit or linear regression? Maybe you have stumbled across it in Excel or during a statistics class in school, guess what? That's machine learning - Wow! While this is simple case, ML can get complicated rather quickly.

Introduction

Machine Learning begins with having a pre-existing training set. These can be found all over the internet, on websites like Kaggle.com. Or you can make your own datasets, say, take a pen and a paper and record the temperature outside once a day, for a two years. At the end of two years, you will have a paper with 730 examples (365 days * 2). Next, you will define the hypothesis or function that you are trying to predict. Let's look at our weather example. We have to years worth of data on the temperature that was recorded in a certain location, everyday. The hypothesis could be: The temperature, at said location, can be predicted using [whichever model we decide to use]. You decide on the algorithm/cost function to use, and train the algorithm and voila, it should spit of the predicted temperatures for any year after. This is an example of a supervised learning algorithm that is predicting (computing) a continuous value. This value was the temperature in the following years. Now, what if we wanted to record more than just the temperature on any given day? We could have also recorded whether or not it was sunny or cloudy. Instead of having just temperature to train the model on, we would have an additional feature that could provide more accuracy in predicting the temperature.

** Supervised Learning ** is used when we are given an dataset of inputs and outputs, and we create a rule for predicting the output for an unknown input. There are two types of supervised learning algorithms:

Regression algorithms: when the value to be predicted is continuous or quantitative (temperature or stock prices). These include Linear Regression, Poisson Regression, and Support Vector Regression.
Classification algorithms: when the value to be predicted is categorical or qualitative (is an email spam or not spam?). These include Logistic Regression, Neural Networks, Decision Trees, and Naive Bayes Classifier

Unsupervised Learning algorithms: There are two types of unsupervised learnings algorithms: There are two types of unsupervised learning algorithms:

Clustering: Taking a dataset of examples and grouping them into clusters. These include: K-means clustering, K-nearest neighbors (KNN), Hierachial Clustering, and Probablistic Clustering.
Compression: Reducing the number of dimensions. These include Principal Component Analysis (PCA) and Singular-Value Decomposition (SVD)

Big Picture

Machine Learning is all about having the right features to train off of. The agent/program chooses the optimal features to train one of the algorithms mentioned above to uncover hidden trends (unsupervised learning) or to predict ouputs based on unseen inputs (supervised learning).

If you came from the Predicting Fake News Blog post, you can go back here: Fake News Blog Post

Thank you for reading!