Inferring information from visualized data

Photo by Carlos Muza on Unsplash

As announced in the last part of Data Story 1 a new stance will be discussed in this Data Story about the same data set, namely the FIFA 19 complete player dataset. The stance taken in this article is as follows:

With data from approximately 18000 soccer players, this stance is proved by various graphs.

Choropleth Worldmap

First an overall picture is sketched of the market value per country around the world using a Choropleth Worldmap. For this visualization the value

Inferring information from visualized data

Photo by Lukas Blazek on Unsplash

Data, data, data, so cheap and easy to obtain, yet so valuable in the eyes of a Data Scientist, Machine Learning engineer or Computer Scientist. The reason behind this is that a whole story can be told from this data, a very valuable story in fact, from which useful conclusions can be drawn.

A data story provides insights about, for example, sales, depending on the kind of data set. In this article I will be discussing a data story I made, which I will be running you through to show how much information can be extracted from data.

Getting familiar with the data

In this…

Implementing an Artificial Multilayer Neural Network using Linear Regression Analysis

Photo by Robina Weermeijer on Unsplash

In part I on of the easiest Neural Networks was introduced as well as some widely used methods and algorithms. However, to tackle the optimizing problem, a more complex model is needed. This model concerns the Multilayer Neural Network, which uses the same methods and algorithms as a basis but in this article there will be further elaborated on them.

Multilayer Neural Network

By combining the functions already discussed in part I an actual NN computation model can be built. Note that this model will for now only be able to compute an output given some input together with a matrix of weights…

Implementing a basic Artificial Neural Network using Logistic Regression Analysis

Photo by Robina Weermeijer on Unsplash

Artificial Neural Networks (ANN), simply known as Neural Networks (NN), are a set of algorithms that mimic the processes of the human brain. This is a form of reinforcement learning which, at the basic level, consists out of inputs, weights, a threshold and an output. This way the NN learns to improve its performance by itself.

The most popular and widely used learning algorithm used for NN’s is Logistic Regression, that solves classification problems (HUH? Regression for classification??? Yes, the naming of this algorithm is confusing but this name was given for historical reasons). …

The Numerical Decision Tree + Random Forest method

Photo by Vladislav Babienko on Unsplash

In the previous part of this series (the implementation of) the Categorical Decision tree was discussed, but there are also Numerical Decision trees and that is what this article is covering today!

The Numerical Decision Tree

The definition of ID3 described in part I of this series is restricted to taking on a discrete set of values. The target value, that is learned by the decision tree, as well as the attributes that are tested in the decision nodes must be discrete valued. But it is possible to extend the Categorical Decision Tree to also include numerical boundaries, so continuous-valued decision attributes can also…

Building a Categorical Decision Tree using ID3 algorithm

Photo by Jens Lelie on Unsplash

Decision tree learning is a supervised learning method for both classification and regression tasks. A decision tree visualizes decision making in a form of a tree, which can also be represented in if-then rules for the sake of readability.

This method seeks to approximate discrete-valued functions, is able to work with messy data and is capable of working with disjunctive expressions. Decision trees are widely used for inductive inference which makes the decision tree method one of the most popular inductive inference algorithms.

There are a lot of different decision tree algorithms like C4.5, ASSISTANT and ID3, which are all…

K-means and Elbow Method + implementation in Python

Photo by Oskar Yildiz on Unsplash

After completing the main types of supervised Machine Learning algorithms, it is time to discuss a unsupervised Machine Learning algorithm. A classic example of unsupervised learning is the K-means algorithm, which I will be discussing in this final article of this series as well as the Elbow Method.


For the K-means algorithm we don’t have a target output we are trying to model, but instead we are just trying to extract a pattern from the whole dataset. The goal of the K-means algorithm is to find k amount of clusters within the dataset, which are represented by their centers. …

Multivariate and full bivariate distributions + implementations in Python

Photo by Dlanor S on Unsplash

Part IV of this series discussed a classification algorithm using a univariate model, but in this article we will take it one step further by building an even better classification algorithm. I will be introducing multivariate distributions, which work with more than 1 variable. Therefore these models result into better outcomes.

Multivariate distributions

We will still be using the probability density function for normal distributions, but now we need the multi-variate version, which looks like this:

Bayes Classifier using Normal Gaussian distribution + implementation in Python

Photo by Blake Connally on Unsplash

In this article I will be covering another main type of supervised learning algorithm, namely classification. Specifically, I will be showing how I build a Bayes Classifier using Normal Gaussian distribution to estimate the likelihood of continuous variables.

The dataset used for this is the classic Fisher’s Iris dataset, which contains the measurements of the lengths and widths of the sepals and the petals of 150 flowers. The Fisher’s Iris dataset is such a classic example that it is included in Machine Learning libraries, so everyone can retrieve it by importing the dataset from the scikit-learn library.

Separating and plotting the data

The first step…

Cross validation + implementation in Pyhton

Photo by James Harrison on Unsplash

As promised in the previous part of this series I will be discussing a method that can be applied so that a generic model is chosen which is less prone to overfit than the polynomial model with order 2. This method is called Cross Validation.

Cross Validation

The Cross Validation method is a method wherein the data is splitted in a training set and a validation set, given a ratio. So, for example, with a ratio of 0.6, 60% of the data is being used as a training set to train the model and 40% is being used as a validation set…

Mina Suntea

I am an AI student, who loves to conduct research and learn new things. I also have a fascination for the criminal mind as well as culture.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store