So, what is Artificial Intelligence? Firstly, it’s not as hard as it sounds

About The Author

Liutauras Medžiūnas is a Software Architect and Team Lead working in Visma Lietuva since 2014, designing and developing software systems and applications. He is passionate about new software technologies and modern development practices.

In this article I will demystify the term Artificial Intelligence, I will reveal where and how it is used. Lastly, using basic programming techniques I will provide a simple proof of concept that AI can be applicable in uncomplicated business processes. Do not fear, you do not have to be a tech guru to understand this article.

Artificial Intelligence (AI). It seems like every article on technology has to mention AI. At least once or twice. It’s associated with self – driving cars, Alexa from Amazon, and of course it will steal your job. For better or worse, Artificial Intelligence became a buzzword. Though it seems like a lot of people do not truly understand what it means, so let me break it down for you. The term Artificial Intelligence (AI) is based on the idea that computers or systems can learn from data and identify patterns and make decisions with minimal human intervention. The method of learning from data analysis is defined as Machine Learning (ML). It automates analytical model building and basically is a subset of the Artificial Intelligence concept.

Photo by Franck V. // Robot playing a piano

The aim of this article is to provide a better understanding of what AI actually is. Today it plays an important role in many industries. It’s  used to facilitate human workforce to achieve more by spending less. For most of us, the term Artificial Intelligence associates with complex technology. The word intelligence relates to something or someone being smarter and able to perform in a better way. And no one wants to be less smart than others, right?

So, how exactly does AI contribute to the human workforce?  For example, a variety of journals are using AI for news writing and distribution. It’s no surprise that Bloomberg is one of them. Just last year their program, Cyborg, churned out thousands of articles that took financial reports and turned them into news stories like a business reporter. The education system is using AI to prepare specialized learning programs.There are several companies such as Content Technologies and Carnegie Learning currently developing intelligent instruction design and digital platforms that use AI to provide learning, testing and feedback to students from pre-K to college level that gives them the challenges they are ready for, identifies gaps in knowledge and redirects to new topics when appropriate. It’s even expected that artificial intelligence in U.S. education will grow by 47.5% from 2017-2021 according to the Artificial Intelligence Market in the US Education Sector report.   The few mentioned fields are just the beginning of the AI journey into the human workforce. In the upcoming years AI will emerge in more and more industries changing the way we work entirely.

So, is it true that the terms Artificial Intelligence and Machine Learning are too hard for most of us to comprehend? To make things  clearer let’s actually use an example where some concepts of Machine Learning can help a business achieve more by spending less.

Imagine a company that works in sales and consists of 50 sales representatives who work hard to contribute to the company’s revenue growth. They definitely want to be treated fair and earn a salary which corresponds to their results.The sales manager of this company works hard to treat all employees justly. Years of experience in the industry helped him develop a general key performance indicator for the sales teams. The manager thinks that this indicator  not only helps in making better decisions, but also provides insight which employees are more experienced and therefore are able to guide colleagues with less experience. Also, he introduced revenue as a KPI, which is one of the most important aspects that defines the success of the whole company and can be directly related to employee salary. New customers onboarded, their satisfaction score, and deals made by sales representatives are other important factors which are necessary to include in defining successful sales representative performance.

To sum up, these were the five key performance indicators, which the sales manager defined:

  • Years of experience
  • Revenue
  • New customers (new brands)
  • Customer satisfaction score
  • Deals made

The sales manager created a spreadsheet where he listed all yearly results of the employees. A few sleepless nights and countless energy drinks later, the manager collected all employee performance information from the company’s electronic journals and filled the sheet bellow (thanks, Greg). After filling out the values of the key performance measurements, he started going through each of the lines by setting a salary. Lastly, he revisited all the salary cells to make sure that every employee is aligned well in comparison with others.

Data Picture 1

To make this easier, let’s start by working out how the salary is depended on  the revenue. We will use the Python programming language to train the algorithm. To illustrate this example we will use Linear regression algorithms commonly used in predictive analysis. Regresssion models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and outcome. In this example we will use historical revenue and other data as independent variables to predict what salary is most appropriate for each employee. The usage of linear regression algorithms make it easy to present the concept of Machine Learning for people who are not familiar with it.

Step 1 – Download and install tools

Let’s download Anakonda

The World’s Most Popular Python/R Data Science Platform

Step 2 – Import the libraries

import numpy as np # fundamental package for scientific computing with Python

import matplotlib.pyplot as plt # Python 2D plotting library

import pandas as pd # High-performance, easy-to-use data structures and data analysis tools for Python

import requests as r # Requests is a Python HTTP library

import io # Python’s main facilities for dealing with various types of input and output

Step 3 – Get the salary data from the web

RAW, Data Sheet

 

# these lines of code get the data from the web url and fill the dataset

content = r.get(‘https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv’).content

dataset = pd.read_csv(io.StringIO(content.decode(‘utf-8’)))

Step 4 – Select the data subset to train the algorithm

It’s quite similar to excel or other spreadsheet programs, where the data from columns and rows can be selected. To do this in Python we use the iloc[<rows>,<columns>] function. In the brackets we set the number of rows and columns separated by a comma.

X = dataset.iloc[:, 2].values

y = dataset.iloc[:, 6].values

First let’s use the revenue for algorithm input as X. To do that we have to select the  third column by setting 2 in the iloc [:, 2] brackets and assigning it to the variable X.

Data Picture 2

Secondly, we use the salary as a result by assigning it to the variable Y. The results of variables X and Y are presented in the table.

Step 5 – Splitting the data set into the Training set and the Test set

To train a computer to figure out what tendencies are used in creating the algorithm we have a set of 50 sales team salaries records. We also want to test whether a computer is as good at predicting as most  humans are. To verify how well the algorithm performs we randomly take 10 percent of the data records . We will use them to verify or neglect our assumptions on algorithm performance in the future. We will use the other 90 percent of the records to train the algorithm.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Data Picture 3

Step 5 – Fit Linear Regression to the Training set

To fit linear regression to the training set we will use  the linear_model from sklearn library. The fit method is used to train the model using the training sets.

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

Step 6 – Predicting the Test set results

To predict the test set results we will use the predict method from the LinearRegression library by setting X_test data in order to anticipate Y_test results

y_pred = regressor.predict(X_test)

Data Picture 4

Step 7 – Visualizing the Training set results

To visualize the training set results we will use the matplotlib.pyplot library. The blue line shows the salaries that would be predicted if we set a specific revenue. To put it simply, because the relation between the revenue and the predicted salary is linear the algorithm is called linear regression.  

 

plt.scatter(X_train, y_train, color = ‘red’)

plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)

plt.title(‘Salary vs Revenue (Training set)’)

plt.xlabel(‘Revenue’)

plt.ylabel(‘Salary’)

plt.show()

Data Picture 5

Step 8 – Visualizing the Test set results

plt.scatter(X_test, y_test, color = ‘red’)

plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)

plt.title(‘Salary vs Revenue (Training set))

plt.xlabel(‘Revenue)

plt.ylabel(‘Salary’)

plt.show()

The test set helps verify how well the algorithm performs. We use a sample of the data records from the sales manager test set to compare the algorithms prediction accuracy. The blue line represents what salary would be predicted on specified revenue. The red dots represent the real t salary which has been set by the sales manager. Basically, the shortest distance from the red dot and the blue line proves the ability of the algorithm to predict the salary with a satisfying accuracy. The longer the distance between the line and the dot is the less accurate the prediction is.

Data Picture 6

The code

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

import requests as r

import io

 

# Importing the dataset

content = r.get(‘https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv’).content

dataset = pd.read_csv(io.StringIO(content.decode(‘utf-8’)))

X = dataset.iloc[:, 2:3].values

y = dataset.iloc[:, 6].values

 

# Splitting the dataset into the Training set and the Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Fitting Multiple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

 

# Predicting the Test set results

y_pred = regressor.predict(X_test)

 

# Visualising the Training set results

plt.scatter(X_train, y_train, color = ‘red’)

plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)

plt.title(‘Salary vs Revenue (Training set)’)

plt.xlabel(‘Revenue’)

plt.ylabel(‘Salary’)

plt.show()

 

# Visualising the Test set results

plt.scatter(X_test, y_test, color = ‘red’)

plt.plot(X_train, regressor.predict(X_train), color = ‘blue’)

plt.title(‘Salary vs Revenue (Training set)’)

plt.xlabel(‘Revenue’)

plt.ylabel(‘Salary’)

plt.show()

Step 9 – Let us multiple parameters

In step 4 as input parameters we used Revenue X and as an output parameter we used the Salary y. Let’s expand the amount of performance indicators and use all of the parameters which  the sales manager uses.

Data Picture 7

To start using all of these parameters we need to adjust the code.

 

X = dataset.iloc[:, 1:5].values

y = dataset.iloc[:, 6].values

 

Now we assign the KPI’s data to the variable X from columns numbered 2 to 6 and predicted salary data from column 6 to variable y.

Data Picture 8

 

Basically from the results of the table we see that y_pred values 28902, 29287, 14927, 8770, 64167, 80742, 50027, 53469, 27705, 53827 correspond to the 10 percent randomly selected salaries from the sales manager data sheet 28485, 30112, 9275, 8216, 63365, 83383, 47949, 50149, 29659, 51282. As a matter of fact, we see that the algorithm predicted the salaries with a high  accuracy percentage. The numbers are 1, 3, 38, 6, 1, 3, 4, 6, 7, 5. Only the salary of 9275 pops out from the trend with the accuracy of 38 percent. In this case, we can come to a conclusion that the salary of 9275 EUR is not so well set and does not fit into the multiple linear regression trend line.

In real cases of data science applications, scientists are using techniques like independent variable elimination to make sure that these variables truly have no impact to the results. If for example the new customers variable makes a low impact on the salary, it should be considered to remove it from the model. It is important to use cross-validation techniques to assess how the results of the statistical analysis will generalize to an independent data set. Testing is important to verify the behaviour of the model. The model should generate expected results with satisfying accuracy in all scenarios.

The code

# Multiple Linear Regression

 

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

import requests as r

import io

 

# Importing the dataset

content = r.get(‘https://raw.githubusercontent.com/liumedz/simple-ai/master/50_Salaries.csv’).content

dataset = pd.read_csv(io.StringIO(content.decode(‘utf-8’)))

X = dataset.iloc[:, 1:6].values

y = dataset.iloc[:, 6].values

 

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

 

# Fitting Multiple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

 

# Predicting the Test set results

y_pred = regressor.predict(X_test)

Conclusion

Machine learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. The more data we have , the better results we can expect. Machine learning focuses on the development of computer programs that can access data and use that data to learn. In this article, we have analyzed one of the main Machine Learning algorithms. Sales managers salaries history was used as data to train a simple multiple linear regression algorithm. This article proves that algorithms can not only predict the future results but also help us simulate the future, plan and make better decisions in such situations.

In this article, we have learned the principles of the linear regression algorithm which  is one of the most used Machine Learning algorithms in the world of Artificial Intelligence. The algorithm itself is just a computer instructed to be able to use the data and apply some patterns to make our work easier. If specialists from different work fields would be able to understand the principles of  Machine Learning algorithms, they could think of ways how these algorithms can contribute to the automation of daily business processes.