Machine Learning Crash Course-2 Hours | Learn Machine Learning | Machine Learning Tutorial | Edureka


Hello everyone and welcome to this
interesting session on machine learning. So before we move forward with our
session, let’s have a quick look at the agenda. So first of all I’ll be starting
with an introduction to data science wherein I’ll discuss how the growth of
data are led to the introduction of data science and how machine learning became
a very important part of data science so then again we’ll discuss what exactly is
machine learning and the various application of machine learning in the
day-to-day life and in the industry as well and moving forward we’ll discuss
the various types of machine learning which are namely the supervised
unsupervised and the reinforcement learning and we’ll discuss all of these
three types of machine learning in depth wherein he’ll discuss the various
algorithms which are present in these supervised and supervised and
unsupervised learning and we’ll discuss how what is the math behind all of these
algorithm how these algorithms works I will also have quite a few demos in this
session so stay tuned to learn more about machine learning so let’s get
started now data as we know is increasing at a very alarming rate and
we are generating 2.5 quintillion bytes of it every day we are living in an era
of technological transformation that is bringing about changes in the way we
take decisions aspect data is becoming pervasive across all the industries use
of machines to find patterns and predict futures is gaining a lot of prominence
in the market now what better way to discuss the growth operator generation
and subsequent workplace demand than with some sound statistics so let me
start with a few lines or the data points that were published in a McKinsey
report so the report stated that the United States alone faces a shortage of
140,000 to 190,000 people with article expertise and 1.5 million managers and
analysts with the skills to understand and make decisions based on the analysis
of picked aid now pay attention to the words with the skills to understand and
make decisions based on the analysis of big data the industry will require a lot
of big data and machine learning experts and needs even more about 10% people who
can make decisions based on the analysis even though they might not
exports on machine learning so what exactly is data science so read a
science is also known as data-driven science and it is an interdisciplinary
field about scientific methods processes and systems to extract knowledge or
insights from the data in various forms either structured or unstructured it is
the study where information comes from what it represents and how it can be
turned into a valuable resource in the creation of business and IT strategies
related science employs many techniques and theories from fees like mathematics
statistics information science and computer science it can be applied to
small data sets also yet most people think the error science is when you are
dealing with big data or large or soft data now if you have a look at the
peripherals of the data science we have statistics we had different programming
languages we have our Python we have SS then again we have software
which we deal with in which we bundle up our whole model and present the findings
to the customer or wherever it is required then again we have machine
learning which is the main topic of our session today and then again finally we
have Big Data so let’s focus on machine learning today and understand what
exactly is machine learning so machine learning is an application of artificial
intelligence that provides systems the ability to automatically learn and
improve from experience without being explicitly programmed now getting
computers to program themselves and also teaching them to make decisions using
data where writing software is a bottleneck let the data do the work
instead now much–he learning is a class of
algorithms which is data driven that is unlike normal algorithms it is the data
that does what the good answer is so if we have a look at the various features
of machine learning so first of all it uses the data to detect patterns in a
data set and adjust a program actions accordingly it focuses on the
development of computer programs that can teach themselves to grow and change
when exposed to new data so it’s not just the old data on which it has been
trained so whenever a new data is entered the program changes accordingly
it enables computers to find hidden insights using it
algorithms without being explicitly programmed either so machine learning is
a method of data analysis that automates analytical model building now let’s
understand how exactly it was so if we have a look at the diagram which is
given here we have traditional programming on one side we have machine
learning on the other so first of all in traditional program what we used to do
was provide the data provide the program and the computer used to generate the
output so things have changed now so in machine learning what we do is provide
the data and we provide a predicted output to the machine now what the
machine does is learns from the data find hidden insights and creates a model
now it takes the output data also again and it reiterates and trains and grows
accordingly so that the model gets better every time it’s strained with the
new data or the new output so the first and the foremost application of machine
learning in the industry I would like to get your attention towards is the
navigation or the Google Maps so Google Maps is probably the app we use whenever
we go out and require assistant in directions and traffic right the other
day I was traveling to another city and took the expressway and the map
suggested despite the havoc traffic you are on the fastest route no but how does
it know that well it’s a combination of people currently using the services the
historic data of that fruit collected over time and a few tricks acquired from
the other companies everyone using maps is providing their location their
average speed the route in which they are traveling which in turn helps Google
collect massive data about the traffic which may accelerate the upcoming
traffic and it adjust your route according to it which is pretty amazing
right now coming to the second application which is the social media if
we talk about Facebook so one of the most common application is automatic
friend tanks suggestion in Facebook and I’m sure you might have gotten this so
it’s present in all the other social media platform as well so Facebook uses
face detection and image recognition to automatically find the face of the
person which matches its database and hence it suggests us to tag that person
based on deep face deface is Facebook’s machine learning
project which is responsible for recognition of faces and define which
person is in picture and it also provides alternative tags to the images
already uploaded on Facebook so for example if we have a look at this image
and we introspect the following image on Facebook we get the alt tag which has a
particular description so in our case what we get here is the image may
contain sky grass outdoor and nature now transportation and commuting is another
industry where machine learning is used heavily so if you have used an app to
book a cab recently then you are already using machine learning to an extent and
what happens is that it provides a personalized application which is unique
to you it automatically detects your location and provides option to either
go home or office or any other frequent basis based on your history and patterns
it uses machine learning algorithm layered on top of historic trip date had
to make more accurate ETA predictions now uber with the implementation of
machine learning on their app and their website saw 26% accuracy in delivery and
pickup that’s a huge a point now coming to the virtual person assistant as the
name suggests virtual person assistant assist in
finding useful information when asked why a voice or text if you have the
major applications of machine learning here a speech recognition speech to text
conversion natural language processing and text-to-speech conversion all you
need to do is ask a simple question like what is my schedule for tomorrow or show
my upcoming flights now for answering your personal assistant searches for
information or recalls your related queries to collect the information
recently personal assistants are being used in chat pods which are being
implemented in various food ordering apps online training web sites and also
in commuting apps as well again product recommendation now this is one of the
area where machine learning is absolutely necessary and it was one of
the few areas which emerged the need for machine learning now suppose you check
an item on Amazon but you do not buy it then and there but the next day you are
watching videos on YouTube and suddenly you see an ad for the same item you
which to Facebook they’re also you see the same ad and again you go back to any
other side and you see the ad for the same sort of items so how does this
happen well this happens because Google tracks your search history and
recommends asked based on your search history this is one of the coolest
application of machine learning and in fact 35% of Amazon’s revenue is
generated by the products recommendation and now coming to the cool and highly
technological side of machine learning we have self-driving cars if we talk
about self-driving car it’s here and people are already using it now machine
learning plays a very important role in self-driving cars and I’m sure you guys
might have heard about Tesla the leader in this business and the excurrent
artificial intelligence is driven by the high rare manufacturer Nvidia which is
based on unsupervised learning algorithm which is a type of machine learning
algorithm now in media state that they did not train their model to detect
people or any of the objects as such the model works on deep learning and trout
sources it’s data from the other vehicles and drivers it uses a lot of
sensors which are a part of IOT and according to the data gathered by
McKenzie the automatic data will hold a tremendous value of 750 billion dollars
but that’s a lot of dollars we are talking about in now next again we have
Google Translate now remember the time when you travel to the new place and you
find it difficult to communicate with the locals or finding local spots where
everything is written in a different languages well those days are gone
Google’s G and MT which is the Google neural machine translation is a neural
machine learning that works on thousands of languages and dictionary it uses
natural language processing to provide the most accurate translation of any
sentence of words since the tone of the word also matters it uses other
techniques like POS tagging named entity recognition and chunking and it is one
of the most used applications of machine learning now if we talk about dynamic
pricing setting the rice price for a good or a service is an old problem in
economic theory there are a vast amount of pricing strategies that depend on the
objective sort a movie ticket a plane ticket or a
Kaffir everything is dynamically priced now in recent year machine learning has
enabled pricing solution to track buying trends and determine more competitive
product prices now if we talk about uber how does Oberer determine the price of
your right Uber’s biggest use of machine learning comes in the form of surge
pricing a machine learning model named as geo search if you are getting late
for a meeting and you need to book an uber in a crowded area get ready to pay
twice the normal fear even from flights if you are traveling in the festive
season the chances are that prices will be twice as much as the original price
now coming to the final application of machine learning we have is the online
video streaming we have Netflix Hulu and Amazon Prime video now here I’m going to
explain the application using the Netflix example so with over 100 million
subscribers there is no doubt that Netflix is the daddy of the online
streaming world when Netflix Petey rice has all the moving industrialists taken
aback forcing them to ask how on earth could one single website take on
Hollywood now the answer is machine learning the Netflix algorithm
constantly gathers massive amounts of data about user activities like when you
pause rewind fast-forward what do you want the content TV shows on weekdays
movies on weekend the date you watch the time you watch whenever you pause and
leave a content so that if you ever come back they would suggest the same video
the rating events which are about 4 million per day the searches which are
about 3 million per day the browsing and the scrolling behavior and a lot more
now they collect this data for each subscriber they have and use the
recommender system and a lot of machine learning applications and that is why
they have such a huge customer retention rate so I hope these applications are
enough for you to understand how exactly machine learning is changing the way we
are interacting with the society and how fast it is affecting the world in which
we live in so if you have a look at the market trend of the machine learning
here so as you can see initially it wasn’t much in the market but if
you have a look at the 2016 site there was an enormous growth in machine
learning and this happened mostly because you know earlier we had the idea
of machine learning but then again we did not had the amount of big data so as
you can see the red line we have in the histogram and the bar plot is that of
the big data so big data also increased during the years and which led to the
increase in the amount of data generated and recently we had that power or I
should say the underlying technology and the hardware to support that power that
makes us create machine learning programs that will work on the spec data
so that is why you see very high inclination during the 2016 period time
as compared to 2012 so because during 2016 we got new
hardware and we were able to find insights using those hardware and
program and create models which would walk on heavy data now let’s have a look
at the life cycle of machine learning so a typical machine learning life cycle
has six steps so the first step is collecting data second is video
wrangling then we have the third step per be analyze the data
fourth step where we train the algorithm the fifth step is when we test the
algorithm and the sixth step is when we deploy that particular algorithm for
industrial uses so when we talk about the fourth step which is collecting data
so here data is being collected from various sources and this stage involves
the collection of all the relevant data from various sources now if we talk
about data wrangling so data wrangling is the process of cleaning and
converting raw data into a format that colors can we need consumption now this
is a very important part in the machine learning lifecycle as it’s not every
time that we receive a data which is clean and is in a proper format
sometimes there are values missing sometimes there are wrong values
sometimes data format is different so a major part in a machinery lifecycle goes
in data wrangling and data cleaning so if we talk about the next step which is
data analysis so data is analyzed to select and filter the data require to
the model so in this step we take the data use machine learning algorithms to
create a particular model now next again when we have a model what we do is train
the model now here we use the data sets and the algorithm is trained on between
data set through which algorithm understand the pattern and the rules
which govern the particular data once we have trained our algorithm
next comes testing so the testing data set determines the accuracy of our model
so what we do is provide the test data set to the model and which tells us the
accuracy of the particular model whether it’s 60% 70% 80% depending upon the
requirement of the company and finally we have the operation and optimization
so if the speed and accuracy of the model is acceptable then that model
should be deployed in the real system the model that is used in the production
should be made with all the available data models improve with the amount of
available data used to create them all the result of the model needs to be
incorporated in the business strategy now after the model is deployed based
upon its performance the model is updated and improved if
there is a dip in the performance the moral is retrained so all of these
happen in the operation and optimization stage now before we move forward
since machine learning is mostly done in Python and us so and if we have a look
at the difference between Python ah I’m pretty sure most of the people would go
for Python and the major reason why people go for python is because python
has more number of libraries and pythons being used in just more than data
analysis and machine learning so some of the important Python libraries here
which I want to discuss here so first of all I’ll talk about matplotlib now what
matte lip does is that it enables you to make bar chart scatter plots the line
charts histogram basically what it does is helps in the visualization aspect as
data analyst and machine learning ingenious what one needs to represent
the data in such a format that it is used that it can be understood by
non-technical people such as people from marketing
people from sales and other departments as well so another important by the
library here we have a seaborne which is focused on the visuals of statistical
models which includes heat maps and depict the overall distributions
sometimes people work on data which are more geographically aligned and I would
say in those cases he traps are very much required now next we come to
scikit-learn and scikit-learn is the one of the most
famous libraries of Python I would say it’s simple and efficient or data mining
and for data analysis it is built on numpy and my product lab and it is open
source next on our list we have pandas it is the perfect tool for data
wrangling which is designed for quick and easy data manipulation aggregation
and visualization and finally we have numpy now numpy stands for a numerical
Python provides an abundance of useful features for operation on n arrays which
has an umpire race and mattresses in spite and mostly it is used for
mathematical purposes so which gives a plus point to any machine learning
algorithm so as these were the important part in larry’s which one must know in
order to do any price and programming for machine learning or as such if you
are doing Python programming you need to know about all of these libraries so
guys next what we are going to discuss other types of machine learning so then
again we have three types of machine learning which are supervised
reinforcement and unsupervised machine learning so if we talk about supervised
machine learning so supervised learning is where you have the input variable X
and the output variable Y and you use and uncoil know to learn the mapping
function from the input to the output so if we take the case of object detection
here so or face detection I rather say so first of all what we do is input the
raw data in the form of labelled faces and again it’s not necessary that we
just input faces to train the model what we do is input a mixture of faces and
non faces images so as you can see here we have labeled face and labeled on
faces what we do is provide the with the algorithm the algorithm creates
a model it uses the training dataset to understand what exactly is in a face
what exactly is in a picture which is an order phase and after the model is done
with the training and processing so to test it what we do is provide particular
input of a face or an on face what we know see the major part of supervised
learning here is that we exactly know the output so when we are providing a
face we ourselves know that it’s a face so to test that particular model and get
the accuracy we use the labelled input raw data so next when we talk about
unsupervised learning unsupervised learning is the training of a model
using information that is neither classified nor labeled now this model
can be used to cluster the input data in classes or the basis of the statistical
properties for example for a basket full of vegetables we can cluster different
vegetables based upon their color or sizes so if I have a look at this
particular example here we have what we are doing is we are inputting the raw
data which can be either apple banana or mango what we don’t have here which was
previously there in supervised learning are the labels so what the algorithm
does is that it visually gets the features of a particular set of data
it makes clusters so what will happen is that it will make a cluster of red
looking fruits which are Apple yellow looking fruits which are banana and
based upon the shape also it determines what exactly the fruit is and
categorizes it as mango banana or apple so this is unsupervised learning now the
third type of learning which we have here is reinforcement learning so
reinforcement learning is the learning by interacting with a space or an
environment it selects the action on the basis of its past experience the
exploration and also by new choices a reinforcement learning agent learns from
the consequences of its action rather than from being taught explicitly so if
we have a look at the example here the input data we have what it does is goes
to the training goes to the agent the agent selects the in Gotham it takes
the best action from the environment who gets the reward and the morale is
strange so if you provide a picture of a green apple
although the Apple which it particularly nose is red what it will do is it will
try to get an answer and with the past experience what it has and it will
reiterate the alcohol time and then finally provide an output which is
according to our requirements so now these were the three major types of
machine learning algorithms next what we’re gonna do is take deep into all of
these types of machine learning one by one so let’s get started with supervised
learning first and understand what exactly supervised learning and what are
the different algorithms inside it how it works the algorithms the working and
we’ll have a look at the various algorithm demos now which will make you
understand it in a much better way so let’s go ahead and understand what
exactly is supervised learning so supervised learning is where you have
the input variable X and the output variable Y and using algorithm to learn
the mapping function from the input to the output as I mentioned earlier with
the example of face detection so it is cos subbu is learning because the
process of an algorithm learning from the training data set can be thought of
as a teacher supervising the learning process so if we have a look at the
supervised learning steps or what will rather say the workflow so the model is
used as you can see here we have the historic data then we again we have the
random sampling we split the data enter training error set and the testing data
set using the training data set we with the help of machine learning which is
supervised machine learning we create statistical model now after we have a
model which is being generated with the help of the training data set what we do
is use the testing data set for prediction and testing what we do is get
the output and finally we have the model validation outcome that was third
training and testing so if we have a look at the prediction part of any
particular supervised learning algorithm so the model is used for operating
outcome of a new dataset so whenever performance of the model degraded the
model is retrained or if there are any performance issues
the model is retrained with the help of the new data now when we talk about
Super Western in there not just one but quite a few algorithms here so we have
linear regression logistic regression this is entry we have random forests we
have made biased classifiers so linear regression is used to estimate real
values for example the cost of houses the number of calls the total sales
based on the continuous variables so that is what ringing a regression is now
when we talk about logistic regression it is used to estimate discrete values
for example which are binary values like 0 & 1 yes or no true and false based on
the given set of independent way so for example when you are talking about
something like the chance of winning or if we talk about winning which can be
the true or false if will it train today which it can be the yes or no so it
cannot be like when the output of a particular algorithm or the particular
question is either yes/no or binary then only we use a logic regression now next
we have decision trees so so these are used for classification problems it
works for both categorical and continuous dependent variables and if
you talk about random forest so random forest is an N symbol of a decision tree
it gives better prediction and accuracy that decision tree so that is another
type of supervised learning algorithm and finally we have the Nate passed
classifier so it is a classification technique based on the Bayes theorem
with an assumption of independence between predictors so we’ll get more
into the details of all of these algorithms one by one so let’s get
started with linear regression so first of all let us understand what exactly
linear regression is so linear regression analysis is a powerful
technique you operating the unknown value of a variable which is the
dependent variable from the known value of another variable which is the
independent variable so a dependent variable is the variable to be predicted
or explained in a regression model whereas an independent variable it’s a
variable rate the dependent variable in a regression
equation so if you have a look here at a simple linear regression so it’s
basically equivalent to a simple line which is with a slope which is y equals
a plus BX where Y is the dependent variable a is the y-intercept we have B
which is the slope of the line and X which is the independent variable
so intercept is the value of the dependent variable Y when the value of
the independent variable X is 0 it is the point at which the line cuts the
y-axis whereas slope is the change in the
dependent variable for a unit increase in the independent variable it is the
tangent of the angle made by the line with the x-axis now when we talk about
the relation between the variables we have a particular term which is known as
correlation so correlation is an important factor to check the
dependencies when there are multiple variables what it does is it gives us an
insight of the mutual relationship among variables and it is used for creating a
correlation plot with the help of the Seabourn library which I mentioned
earlier which is one of the most important libraries in Python so
correlation is very important term to know about now if we talk about
regression lines so linear regression analysis is a powerful technique used
for predicting the unknown value of a variable which is the dependent variable
from the regression line which is simply a single line that best fits the data in
terms of having the smallest overall distance from the line to the points so
as you can see in the plot here we have the different points or the data points
so these are known as the fitted points then again we have the regression line
which has the smallest overall distance from the line to the points so you will
have a look at the distance between the point to the regression line so what
this line shows is the deviation from the regression line so exactly how far
the point is from the regression line so let’s understand a simple use case of
linear regression with the help of a demo so first of all there is a real
state company use case which I’m going to talk about so first of all here we
have John he has some baseline for pricing the villa’s and the independent
he has in Boston so here we have the dataset description which we’re going to
use so this data set has different columns such as the crime rate per
capita which is CRI M it has proportional residential residential
land zone for the Lots proportion of non retail business the river the United
Rock side concentration average number of rooms and the proportion of the owner
occupying the built prior to 1940 the distance of the five Boston employment
centers in excess of accessibility to radial highways and much more so first
of all let’s have a look at the data set we have here so one number I don’t thing
here guys is that I’m going to be using Jupiter notebook to execute all my
practicals you are free to use the spiral notebook or the console either so
it basically comes down to your preference so for my preference I’m
going to use the trippity notebook so for this use case we’re gonna use the
Boston housing data set so as you can see here we have the data set which has
the CRI MZ and in desc CAS NO x the different variables and we have the data
set of almost I would say like 500 houses so what John needs to do is plan
the pricing of the housing depending upon all of these different variables so
that it’s profitable for him to sell the house and it’s easier for the customers
also to buy the house so first of all let me open the code here for you
so first of all what we’re gonna do is import the library is necessary for this
project so we’re gonna use the numpy you’re gonna import numpy as NP import
pandas at PD then we’re gonna also import the matplotlib
and then we are going to do is read the Boston housing data set into the BOS one
variable so now what we are gonna do is create two variables x and y so what
we’re gonna do is take 0 to 13 I’ll say is from CR I am – LS dat in 1x because
that’s the independent variable and Y here is dependent variable which is the
MATV which is the final price so first of all
what we need to do is plot a correlation so what we’re gonna do is import the
Seabourn library as SNS we’re going to use the correlations to plot the
correlation between the different 0 to 13 variables what we going to do is also
use MATV here also so what we’re going to do is SNS dot heatmap correlations to
be gonna use the square to differentiate usually it comes up in square only or
circles so you don’t know so we’re going to use square you want to see you see
map with the Y as GNP you this is the color so there’s no rotation in the y
axis and we’re going to rotate the excesses to the 90 degree and let’s we
gonna plot it now so this is what the plot looks like so as you can see here
the more thicker or the more darker the color gets the more is the correlation
between the variables so for example if you have a look at CRI M&M ADV right so
as you can see here the color is very less so the correlation is very low so
one thing important what we can see here is the tax and our ad which is the full
value of the property and RIT is the index of accessibility to the radial
highways now these things are highly correlated and that is natural because
the more it is connected to the highway and more closer it is to the highway the
more easier it is for people to travel and hence the tax on it is more as it is
closer to the highways now what we’re going to do is from SQL urn
validation we’re going to import the train test split and we’re gonna split
the data set now so what we are going to do is create four variables which are
the extreme X test y train white tests and we’re going to use a train test
split function to split the x and y and here we’re going to use the test size
0.3 tree which will split the data set into the test size will be 33% well as
the training size will be 67% now this is dependent on you usually it is either
60/40 70/30 this depends on your use case your data you have the kind of
output you are getting the model you are creating and much more then again from
SQL learn dot linear model we’re going to import linear regression now this is
the major function we’re going to use is linear regression function which is
present in SQL which is a scikit-learn so we’re going to create our linear
regression model into LM and the model which are going to create it and we’re
going to fit the training videos which has the X train and the white rain and
then we’re going to create a prediction underscore Y which is the LM dot credit
and you take the X test variables which will provide the predicted Y variables
so now finally if we plot the scatter plot of the Y test and the y predicted
what we can see is that and we give the X label as white test and the Y label
has Y predicted we can see the regression line which we have plotted at
the scatter plot and if you want to draw a regression line it’s usually it will
go through all of these points excluding the extremities which are here present
at the endpoints so this is how a normal linear regression works in Python what
you do is create a correlation you find out you split the dataset into training
and testing variables then again you define what is going to be your test
size import the reintegration moral use the
training data set into the model fitted use the test data set to create the
predictions and then use the wireless core test and
the predicted why and plot the scatter plot and see how close your model is
doing with the original data it had and check the accuracy of that model not
typically you use these steps which was collecting data what we did data
wrangling analyzed the data we trained the algorithm we use the test algorithm
and then we deployed so fitting a model means that you are making your algorithm
learn the relationship between predictors and the outcomes so that you
can predict the future values of the outcome so the best fitted model has a
specific set of parameters which best defines the problem at hand since this
is a linear model with the equation y equals MX plus C so in this case the
parameters of the model learns from the data that are M and C so this is what
more fitting now if it have a look at the types of fitting which are available
so first of all machine learning algorithm first attempt to solve the
problem of underfitting that is of taking a line that does not approximate
the data well and making it approximate to the data better so machine does not
know where to stop in order to solve the problem and it can go ahead from
appropriate to overfit moral sometimes when we say a mall or fits a data set we
mean that it may have a low error rate for training data but it made not
generalize well to the overall population of the data we are interested
in so we have under fact appropriate and over fit these are the types of fitting
now guys this was linear regression which is a type of supervised learning
algorithm in machine learning so next what we’re going to do is understand the
need for logistic regression so let’s consider a use case as in political
elections are being contested in our country and suppose that we are
interested to know which candidate will probably win now the outcome variables
result in binary either win or lose the predictor variables are the amount of
money spent the age the popularity rank and etc etcetera now here the best fit
line in the regression mower is going below 0 and above 1 and since the value
of y will be discrete that is between 0 and 1 the linear I in
to be clipped at zero and one no linear regression gives us only a single line
to classify the output with linear regression our resulting curve cannot be
formulated into a single formula as you obtain three different straight lines
what we need is a new way to solve this problem so hence people came up with
logistic regression so let’s understand what exactly is logic regression so
logistic regression is a statistical method for analyzing a data set in which
there are one or more independent variables that determine an outcome and
the outcome is a binary class type so example a patient goes a followed a teen
checkup in the hospital and his interest is to know whether the cancer is benign
or malignant now a patient’s data such as sugar level blood pressure eight skin
width and the previous medical history are recorded and a doctor checks the
patient data and determines the outcome of his illness and severity of illness
the outcome will result in binary that is zero if the cancer is malignant and
one if it’s been eying non-autistic progression is a statistical method used
for analyzing a dataset there were say one or more dependent variables like we
discuss like the sugar level blood pressure each skin with the previous
medical history and the output is binary class type so now let’s have a look at
the lowest aggression curve now the law disintegration code is also called a
sigmoid curve or the S curve the sigmoid function converts any value from minus
infinity to infinity to the discrete value 0 or 1 now how to decide whether
the value is 0 or 1 from this curve so let’s take an example what we do is
provide a threshold value we set it we decide the output from that function so
let’s take an example with the threshold value of 0.4 so any value above 0.4 will
be rounded off to 1 and anyone below 0.4 we really reduce to zero so similarly we
have polynomial regression also so when we have nonlinear data which cannot be
predicted with a linear model we switch to the polynomial regression
now such a scenario is shown in the below graph so as you can see here we
have the equation y was three X cubed plus 4x squared minus
5x plus 2 now here we cannot perform this linearly so we need polynomial
regression to solve these kind of problems now when we talk about logistic
regression there is an important term which is decision tree and this is one
of the most used algorithms in supervised learning now let us
understand what exactly is a decision tree so our decision tree is a tree like
structure in which internal load represent tests on an attribute now each
attribute represents outcome of test and each leaf node represents the class
label which is a decision taken after computing all attributes
apart from root to the leaf represents classification rules and a decision tree
is made from our data by analyzing the variables from the decision tree now
from the tree we can easily find out whether there will be came tomorrow if
the conditions are rainy and less windy now let’s see how we can implement the
same so suppose here we have a dataset in which we have the outlook so what we
can do is from each of the outlooks we can divide the data as sunny overcast
and rainy so as you can see in the sunny side we get two yeses and three noes
because the outlook is sunny the humidity is normal and the wind is weak
and strong so it’s a fully sunny day what we have is that it’s not a pure
subset so what we’re going to do is split it further so if you have a look
at the overcast we have humidity high normal week so yes
during overcast we can play and if you have a look at the Raney’s area we have
three SS and – no so again what we’re going to do is split it further so when
we talk about sunny then we have humidity in humidity we have high and
normal so when the humidity is normal we’re going to play which is the pure
subset and if the humidity is high we are not going to play which is also a
pure subset now so let’s do the same for the rainy day so during rainy day we
have the vent classifier so if the wind is to be it becomes a pure subset we’re
going to play and if the vent is strong it’s a pure substance we’re not going to
play so the final decision tree looks like
so first of all we check if the outlook is sunny overcast or rain if it’s
overcast we will play if it’s sunny we then again check the humidity if the
humidity is high we will not play if the humidity is normal we will play when
again in the case of rainy if we check the vent if the wind is weak the play
will go on and similarly if the wind is strong the play must stop so this is how
exactly a decision tree works so let’s go ahead and see how we can implement
logisitics regression in decision trees now for logistic regression we’re going
to use the Casa dataset so this is how the dataset looks like so here we have
the ie diagnosis radius mean texture mean parameter mean these are the stats
of particular cancer cells or the cyst which are present in the body so we have
like total 33 columns on the way started from IDE to unnamed 32 so our main goal
here is to define whether or I’ll say predict whether the cancer is been
eyeing on mannequin so first of all what winning or two is from scikit-learn
small selection we’re gonna import cross-validation score and again we’re
gonna use numpy for linear algebra we’re gonna use pandas as speedy because for
data processing the CSV file input for data manipulation in sequel and most of
the stuff then we’re going to import the matplotlib it is used for plotting the
graph we’re going to import Seabourn which is used to plot interactive graph
like in the last example we saw we plotted a heatmap correlation so from SK
learn we’re going to import the logistic regression which is the major model or
the algorithm behind the whole logic regression when I import the Train
pressed split so as to split 38 I into two paths training and testing data set
we’re going to import metrics to check the error and the accuracy of the model
and we’re going to import decision tree classifier
so first of all what we’re gonna do is create a variable data and use the
pandas PD to read the data from the data set so here the header zero means that
the zeroth row is our column name and if you have a look at the data or the top
six part of the data we’re going to use the print data dot head and get the data
dot info so as you can see here we have so many data columns such as highly
diagnosis radius means extra main parameter main area means smoothness
mean we have texture worst symmetry worst we have fractal dimension worse
and lastly we have the unnamed so first of all we can see we have six rows and
33 columns and if you have a look at all of these columns here right we get the
total number which is the 569 which is the total number of observation we have
and we check whether it’s non null and then again we check the type of the
particular column so it’s integer it’s object float mostly most of them are
float some are integer so now again we’re going to drop the unnamed column
which is the column thirty-second 0 233 which is the 32nd column so in this
process we will change it in our data itself so if you want to save the old
data you can also see if that but then again that’s of no use so later our
columns will give us all of these columns when we remove that so as you
can see here in the output we do not have the final one which was the unnamed
the last one we have is the type which is float so latex we also don’t want the
ID column for our analysis so what we’re going to do is we’re going to drop the
ID again so as I said above the data can be divided into three paths so let’s
divide the features according to their category now as you know our diagnosis
column is an object type so we can map it to the integer value so we what we
want to do is use the data diagnosis and we’re going to map it to M 1 and B 0 so
that the output is either M or B now if we use a rated or describe so you can
see here we have eight rows and 1 columns because we dropped two of the
columns and in the diagnose we have the values
here let’s get the frequency of the cancer stages so here we’re going to use
the Seabourn SNS dart current plot data with diagnosis and lis will come and if
we use the PLT dart show so here you can see the diagnosis for zero is more and
for one is less if you plot the correlation among this data so we went
to use the PLT dot figure SNS start heat map we’re going to use a heat map we’re
going to plot the correlation c by true we’re going to use square true and we
were use the cold warm technique so as you can see here the correlation of the
radius worst with the area worst and the parameter worst is more whereas the
radius worst has high correlation to the parameter mean and the area mean because
if the radius is more the parameter is more the area is more so based on the
core plot let’s select some features from the model now the decision is made
in order to remove the colon arity so we will have a prediction variable in which
we have the texture mean the perimeter mean the smoothness mean the compactors
mean and the symmetry mean but these are the variables which we will use for the
prediction now we’re going to split the data into the training and testing data
set now in this our main data is splitted into training at data set with
0.3 test size that is 30 to 70 ratio next what we’re going to do is check the
dimension of that training and the testing data set so what we’re going to
do is use the print command and pass the parameter train dot shape and test our
shape so what we can see here is that we have almost like 400 398 observations
were 31 columns in the training dataset whereas 171 rows and 31 columns in the
testing data set so then again what we’re going to do is take the training
data input what we’re going to do is create a train underscore X with the
prediction and a score rat and train is for y is for the diagnosis now this is
the output of our training data same as we did for the test so we’re going to
use test underscore X for the test prediction variable and test underscore
Y for the test diagnosis which is the output of the test data now we’re going
to create a large stick regress method and create a model logistic dot
fit in which we’re going to fit the training dataset which is strain X
entering Y and then we’re going to use a TEM P which is temporary variable in
which you can operate X and then what we’re going to do is we’re going to
compare TEM P which is a test X with the test y to check the accuracy so the
accuracy here we get is zero point nine one then again what we need to do this
was lowest activation normal distribution are we going to use
classifier so we’re going to create a decision tree classifier with random
state given as zero now what next we’re going to do is create the
cross-validation score which is the CLF we take the moral we take that train x3
and Y and C V equals 10 the cross-validation score
now if we fit the training test and the sample weight we have not defined here
check the input F is true and X ID X sorted is none so if we get the
parameters true we predict using the test X and then predict the long
probability of test X and if we compare the score of the test X to test Y with
the sample weight none we get the same result as a decision tree so this is how
you implement a decision tree classifier and check the accuracy of the particular
model so that was it so next on our list is random forests so let’s understand
what exactly is a random forest so random forest is an N symbol classifier
made using many decision tree models so so what exactly are in symbol malls so n
symbol malls combines the results from different models the result from an
unstable model is usually better than the result of the one of the individual
model because every tree votes for one class the final decision is based upon
the majority of votes and it is better than decision tree because compared to
decision tree it can be much more accurate it R as if efficiently on large
data set it can handle thousands of input variables without variable
deletion and what it does is it gives an estimate of what variables I important
in the classification so let’s take the example of whether
so let’s understand I know for us with the help of the hurricanes and typhoons
data set so we have the data about hurricanes and typhoons from 1851 to
2014 and the data comprises off location when the pressure of tropical cyclones
in the Pacific Ocean the based on the data we have to classify the storms into
hurricanes typhoons and the subcategories as for that to
predefined classes mentioned so the predefined classes are TD tropical
cyclone of tropical depression intensity which is less than 34 knots if it’s
between thirty four to six to 18 knots it’s D s greater than 64 knots it’s a
cheer which is a hurricane intensity e^x is extra tropical cyclone s T is less
than 34 it’s a sub tropical cyclone or subtropical depression s s is greater
than 34 which is a sub tropical cyclone of subtropical storm intensity and then
again we have ello which is a low that is neither a tropical cyclone a tropical
subtropical cyclone or non an extraterrestrial cyclone and then again
finally we have DB which is disturbance of any intensity now these were the
predefined classes description so as you can see this is the data in which we
have the ID name date event say this line it’s you longitude maximum when
minimum when there are so many variables so let’s start with importing the pandas
then again we import the matplotlib then we’re going to use the aggregate method
in matplotlib we’re going to use the matplotlib
in line which is used for plotting interactive graph and I like it most for
plots so next what we’re going to do is import Seabourn as SN s now this is used
to plot the graph again and we’re going to import the model selection which is
the Train test split so we’re going to import it from a scaler and the
scikit-learn we have to import metrics watching the
accuracy then we have to import sq learn and then again from SQL and we have to
import tree from SQL or dot and symbol we’re going to import the random forest
classifier from SQL and Road metrics we’re going to import
confusion matrix so as to check the accuracy and from SQL and on message
we’re going to also import the accuracy score so let’s import random and
let’s read the data set and print the first six rows of the data sets you can
see here we have the ID we have the name date time II well stay with latitude
longitude so in total we have twenty two columns here so as you can see here we
have a column name status which is TS TS TS for the four six so what we’re gonna
do is data at our state as is equal P dot categorical data the state so what
we can do is make it a categorical data with quotes so that it’s easier for the
machine to understand it rather than having certain categories as means we’re
gonna use the categories as numbers so it’s easier for the computer to do the
analysis so let’s get the frequency of different typhoons so what we’re going
to do is random dot seed then again what are we gonna do is if we have to drop
the status we have to drop the event because these are unnecessary we’re
gonna drop latitude longitude we’re gonna drop ID then name the date and the
time it occurred so if we print the prediction list so ignore the error here
so that’s not necessary so we have the maximum when many event pressure low
went any low went s e low when s top blue and these are the parameters on
which we’re going to do the predictions so now we’ll split the data into
training and testing data sets so then again we have the trained comma test and
we’re going to use our trained test split half split in the 70s of thirty
industrial standard ratio now important thing here to note is that you can split
it in any form you want can be either 60/40 70/30 80/20 it all depends upon
the model which you have our the industrial requirement which you have so
then again if after printing let’s check the dimensions so the training dataset
comprised of eighteen thousand two hundred and ninety five rows were twenty
two columns whereas the testing dataset comprised of eight thousand rows with
twenty two columns we have the training data input train X we have trained Y so
status is the final output of the training data which will tell us the
status whether it’s at est D it’s an H u which kind of a hurricane or typhoon or
any kind of subcategories which I defined which were
like subtropical cyclone the subtropical typhoon and much more so our prediction
or the output variable will be status so so this is these are the list of the
training columns which we have here now same we have to do for the test variable
so we have the test X with the prediction error score rad with a test
wife with the status so now what we’re going to do is build a random forest
classifier so in the model we have the random forest classifier with estimators
as 100 a simple random for small and then we fit the training data set which
is a training X and green by then we again make the prediction which is the
world or predict that with the test underscore X then that and this will
predict for the test data and prediction will contain the predicted value by our
model predicted values of the diagnosis column for the test inputs so if you
print the metrics of the accuracy score between the prediction and the test
let’s go by to check the accuracy we get 95% accuracy now the same if we’re going
to do with decision tree so again we gonna use the model tree dot decision
tree classifier we’re going to use the train X and train Y which other two
training data sets and new prediction is smaller or predator score text we’re
going to create a data frame which is the Parador data frame and if we have a
look at the prediction and the test underscore Y you can see the status 10
10 3 3 10 10 11 and 5 5 3 11 and 3 3 so it goes on and on so it has 7800 for you
two rows and one column and if you print the accuracy we get a ninety-five point
five seven percent of accuracy and if you have a look at the accuracy of the
random false we get 95 point six six percent which is more than 95 point five
seven so as I mentioned earlier usually random forest gives a better output or
creates a better more than the decision tree classifier because as I mentioned
earlier it combines the result from different models you know so the final
decision is based upon the majority of votes and is usually higher than
decision tree morals so let’s let’s move ahead with our knee by selca rhythm and
let’s see what exactly is neat by it so nave bias is a simple but surprisingly
powerful algorithm for predictive modeling now it is a classification
technique based on the base theorem with an assumption of Independence among
predictors it comprises of two parts which are the Nev and the bias so in
simple terms an a bias classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other
feature even if these features depend on each other or upon the existence of the
other features all of these properties independently contribute to the
probability that a fruit it’s an apple or an orange and that is why it is known
as a noun a base model is easy to build and particularly useful for a large data
sets in probability theory and statistics Bayes theorem which is
alternatively known as the base law or the Bayes rule also melted as Bayes
theorem describes the probability of an event based on the prior knowledge of
conditions that might be related to the event so Bayes theorem is a way to
figure out the conditional probability now conditional probability is the
probability of an event happening given that it has some relationship to one or
more other events for example your probability of getting a parking space
is connected to the time that a you park where you park and what conventions are
going on at the same time so base Yoram is slightly more nuanced and a nutshell
it gives us the actual probability of an event given information about tests so
let’s talk about the base hyrum now so now given any I party sees edge and
evidence II Bayes theorem states that the relationship between the property of
the hypothesis before getting the evidence pH and the probability of the
hypothesis after getting the evidence which is P H bar e is PE bar edge into
probability of H divided by a probability of e which means it’s the
probability of even after getting the hypothesis into priority of the
hypothesis divided by the probability of the evidence so let’s understand it with
a simple ample here so now for example if a
single card is drawn from standard deck of playing cards the probability of that
card being a king is 4 out of 52 now since there are 4 kings in a standard
deck of 52 cards the rewarding this if the King is the event this card is a
king the priority of the king that is the probability of king equals 4 by 52
which in turn is 1 by 30 now if the event is varieties or instance someone
looks at the card that the single card is a face card then the posterior
probability which is the P of King given it’s a face can be calculated using the
Bayes theorem given that probability of king given its face is equal to
probability of the face given it’s a king there is a probability of face into
the probability of king since every king is also a face card so the probability
of face given its a king is equal to 1 and since there are 3 face cards in each
suit that are Jack King and Queen the probability of face card is 3 out of 30
combining these given likelihood ratios are we get the value using the paste
theorem of probability of King events of face is equal to 1 out of 3 so foreign
joint probability distribution over the events a and B the probability of a
intersection B which is the conditional probability of a given B is now defined
as property of intersection B divided by the probability of B now this is how we
get the Bayes theorem now that we know the different basic proof of how we got
the Bayes theorem so let’s have a look at the working of the patience with the
help of an examples here so let’s take the same example of the radius set of
the these forecasts in which we had the sunny rainy
overcast so first of all what we gonna do is first we will create a frequency
table using each attribute of the data set so as you can see here we have the
frequency table here for the outlook humidity and the wind so for Outlook we
have the frequency table here we have the frequency table for humidity and the
wind so next what we’re going to do is create the probability of sunny given
say yes that is three out of ten find the probability of
sunny which is five out of 14 and this forwarding comes from the total number
of observations there and from yes and no so similarly we’re gonna find the
probability of yes also which is 10 out of 14 which is 0.7 one for each
frequency table will generate these kind of likelihood tables so the likelihood
of yes given it’s a sunny is equal to 0.51
similarly the likelihood of no given sunny is equal to 0.40 so here you can
look that using Bayes theorem we have found out the likelihood of yes given
it’s a sunny and no given it’s a sunny similarly we’re gonna do the save for
likelihood table for humidity and the same for wind so for humidity we’re
gonna check the probability of yes given its high humidity is high probability of
plane no given the humidity is high is your going to calculate it using the
same base theorem so suppose we have a day with the following values in which
we have the outlook as rain humidity as high and wind as we since we discussed
the same example earlier with the decision tree we know the answer so
let’s not get ahead of ourselves and let’s try to find out the answer using
the Bayes theorem and let’s understand how napus works actually so first of all
we gonna use the likelihood of yes on that day so that equals to probability
of Outlook of rain given it’s a yes into probability of humidity
hi given it’s a yes into problem and we commence a yes into probability off yes
okay so that gives us 0.019 similarly there probably likelihood of
no on that day is the outlook is rain in units and no humidity is high given its
in no and wind is week given so know that equals to zero point zero one six
now what we’re going to do is find the probability of vs and no and for that
what we’re going to do is take the probability the likelihood and divide it
with the sum of the likelihoods obvious and no so and that will be gonna get the
probability of yes overall so you think that formula we get the probability of
years as 0.55 and the probability of no as 0.45 and our model predicts that
there is a 55% chance that there will be came tomorrow
if it’s rainy the humidity is high and the wind is weak maybe we have a look at
the industrial use cases of any bias we have new scatterings use categorization
as what happens is that the new Tsar comes in a lot of tags and it has to be
categorized so that the user gets the information he needs in a particular
format then again we have spam filtering which is one of the major use cases of
nape is classifier as it classifies the email as spam or ham then finally we
have with a prediction also as we saw just with the example that we predict
whether we going to play or not that sort of prediction is always there so
guys this was all about supervised learning we discussed linear regression
logistic regression we discussed nape I as we discussed ran a forest
decision tree and we understood how the random forest is better than decision
tree in some cases it might be equal to decision tree but nonetheless it’s
always going to provide us a better result so guys that was all about the
supervised learning so but before that let’s go ahead and see how exactly we
gonna implement named pies so as here we have another data set run or walk
it’s the kinematic data sets and it has been measured using the mobile sensor so
let the target weight will be Y as an all the columns after it – X using
scikit-learn a by a small we’re going to observe the accuracy generate a
classification report using scikit-learn now we’re going to repeat the model once
only the acceleration values as predictors and then using only the gyro
value as raters and we’re going to comment on the difference in accuracy
between the two moles so here we have a data set which is run or walk so let me
open that for you so here I was data sets run or walk so as you can see we
have the date time username race activity acceleration XY assertions see
gyro X Cairo Y Cairo Z so based on it let’s see how we can implement the name
wise classifier and so first of all what we’re going to do is import pandas at
speedy then we going to import matplotlib
for plotting we’re gonna read the run or walk data file with pandas PD Daughtry
and let’s go cs3 let’s have a look at the info so first of all we see that we
have 88 thousand five hundred eighty eight rows with 11 columns so we have
the date/time username rest activity assertion XYZ Kairo XYZ and the memory
uses is send point 4 MB data so this is how you look at the columns D F dot
columns now again we’re gonna split the dataset into training and testing data
sets so we’re going to use the Train test plate model so that’s what we’re
gonna do is split it into X train X test y train by test and we’re gonna split it
into the size of 0.2 here so again I am saying it depends on you what is the
test size so let’s print the shape of the training and see it’s 70,000
observation has six columns now what we’re gonna do is from the scikit-learn
dot knee pius we’re going to import the caution NB which is the caution a bias
and we’re going to put the classifier as caution NB then we’ll pass on the X
train and why train variables to the classifier and again we have the y
underscore predator which is the classifier predict X text and we gonna
compare the Y and ESCO predict with the y underscore test to see the accuracies
for that so for that we’re going to import sq learn dot matrix we’re going
to import the accuracy score now let’s compare both of these so the accuracy
what we get is ninety five point five four percent now another way is to get a
confusion matrix bill so from scikit-learn dot matrix we’re going to
import the confusion matrix and we’re gonna plot the matrix of five predict
and white test so as you can see here we have 90 and 699 that’s a very good
number so now what we’re gonna do is create a classification report so from
metrics we’re gonna import the classification into close reports we’re
going to put the target names as walk comma run and print the report using
white s and y predict within target means we have so for walking we get the
precision of 0.92 and the call up 0.99 f1 score is 0.96 the
support is eight thousand six hundred seventy three and four envy appreciation
of ninety ninety percent with the recoil of 0.92 and f1 score of zero point 95 so
guys this is how you exactly use the Gaussian enemy or the new pie’s
classifier on it and all of these types of algorithms which are present in the
super wires or unsurprised or reinforcement learning are all present
in the cyclotron library so one second assist SQL learn is a very important
library when you are dealing with machine learning because you do not have
to code any algorithm hard coding algorithm every algorithm is present all
you have to do is just pass the data split the dataset into training and
testing dataset and then again you have to find the predictions and then compare
the predicted vibe with the test case why so that is exactly what we do every
time we work on a machine learning algorithm now guys that was all about
supervised learning let’s go ahead and understand what exactly is unsupervised
learning so sometimes the given data is unstructured and unlabeled so it becomes
difficult to classify the data into different categories so unsupervised
learning helps to solve this problem this learning is used to cluster the
input data in classes on the basis of their statistical properties so in
example we can cluster different bikes based upon the speed limit their
acceleration or the average that they are giving so I’m super loading is a
type of machine learning algorithm used to draw inferences from Veda sets
consisting of input data without labels responses so if you have a look at the
workflow or the process flow of unsupervised learning so the training
data is collection of information without any label we have the machine
learning algorithm and then began the clustering walls so what it does is that
distributes the data into different clusters and again if you provide any
unlabeled new data it will make a prediction and find out to which posture
that particular data or the data set belongs to or the particular data point
belongs to so one of the most important algorithms in unsupervised learning is
clustering so let’s understand exactly what is clustering so a clustering
basically is the process of dividing the data sets in the groups consisting
of similar data points it means grouping of objects based on the information
found in the data describing the objects or their relationships so clustering
models focused on identifying groups of similar records and labeling records
according to the group to which they belong now this is done without the
benefit of prior knowledge about the groups and their characteristics so and
in fact we may not even know exactly how many groups are there to look for now
these models are often referred to as unsupervised learning models since there
is no external standard by which to judge the models classification
performance there are no right or wrong answers to these model and if we talk
about why clustering is used so the goal of clustering is to determine the
intrinsic group in a set of unlabeled data sometime the partitioning is the
goal or the purpose of clustering algorithm is to make sense of and exact
value from the last set of structured and unstructured data so that’s why
clustering is used in the industry and if you have a look at the various use
cases of clustering in the industry so first of all it’s being used in
marketing so discovering distinct groups in customer databases such as customers
who make a lot of long-distance calls customers who use internet more than
cause they also using insurance companies for like identifying groups of
cooperation insurance policy holders with high average game rate farmers
crash cops which is profitable they are using cease mixed studies and define
probable areas of oil or gas exploration based on seismic data and they’re also
used in the recommendation of movies if you would say they are also used in
Flickr photos they also use where Amazon for recommending the product which
category it lies in so basically if we talk about clustering there are three
types of clustering so first of all we have the exclusive clustering which is
the hard clustering so here an item belongs exclusively to one cluster not
several clusters and the data point belong exclusively to one cluster so an
example of this is the Gaming clustering so gaming clustering
does this exclusive kind of clustering so secondly we have overlapping
clustering so it is also known as soft clusters in this an item can belong to
multiple clusters as its degree of association with each cluster is shown
and for example we have fuzzy or the C means clustering which means being used
for overlapping clustering and finally we have the hierarchical clustering so
when two clusters have a painting change relationship or a tree-like structure
then it is known as hierarchical cluster so as you can see here from the example
we have a Prince child kind of relationship in the cluster given here
so let’s understand what exactly is k-means clustering so they means
clustering is an inquiry um whose main goal is to group similar elements of
data points into a cluster and it is the process by which objects are classified
into a predefined number of groups so that they are as much it is similar as
possible from one group to another group but as much as similar or possible
within each group now if you have a look at the algorithm working here you’re
right so first of all it starts with an defying the number of clusters which is
key then again we find the centroid we find the distance objects to the
distance object to the centroid distance of objects to the centroid then we find
the grouping based on the minimum distance has the centroid converge if
true then we make a cluster false we then I can find the centroid repeat all
of these steps again and again so let me show you how exactly clustering was with
an example here so first we need to decide the number of clusters to be made
now another important task here is how to decide the important number of
clusters or how to decide the number of clusters we’ll get into that later so
forth let’s assume that the number of clusters we have decided is three so
after that then we provide the centroids for all the creditors which is guessing
the algorithm calculates the Euclidean distance of the point from each centroid
and assigns the data point to the closest cluster now you clearly
understand all of you know is the square root of the distance the
square root of the square of the distance so next when the centroids are
calculated again we have our new clusters for each data point then again
the distance from the points to the new clusters are calculated and then again
the points are assigned to the closest cluster and then again we have the new
centroid scattered and now these steps are repeated until we have a repetition
in the centroids or the new centers are very close to the very previous ones so
until unless our output gets repeated or the outputs are very very close enough
we do not stop this process we keep on calculating the Euclidean distance of
all the points to the centroids then we calculate the new centroids and that is
how claiming is clustering works basically so an important part here is
to understand how to decide the value of K or the number of clusters because it
does not make any sense if you do not know how many class are you going to
make so to decide the number of clusters we have the elbow method so let’s assume
first of all compute the sum squared error which is the SS e for some value
of K for example let’s take two four six and eight now the SS e which is the sum
squared error is defined as a sum of the squared distance between each number
member of the cluster and its centroid mathematically and if you mathematically
it is given by the equation which is provided here and if you brought the key
against the SS e you will see that the error decreases as K gets large now this
is because the number of cluster increases they should be smaller so this
distortion is also smaller now the idea of the elbow method is to choose the key
at which the SSE decreases abruptly so for example here if we have a look at
the figure given here we see that the best number of cluster is at the elbow
so as you can see here the graph here genius abruptly after number four so for
this particular example we’re going to use four as the number of cluster so
first of all while working with k-means clustering there are two key points to
know first of be careful about where you start so
choosing the force center at random choosing the second center that is far
away from the first center somewhere choosing the NH center as far away
possible from the closest of the all the other centers and the second idea is to
do as many runs of k-means each with different random styling points so that
you get an idea of where exactly and how many clusters you need to make and where
exactly the centroid lies and how the data is getting converged now he means
he’s not exactly a very good method so let’s understand the pros and cons of
k-means clustering z’ we know that k-means is simple and understandable
everyone learns it at the first go the items automatically assigned to the
clusters now if we have a look at the corns so first of all one needs to
define the number of clusters this is a very heavy task as us if we have 3/4 or
if we have 10 categories and if you do not know what the number of clusters are
going to be it’s very difficult for anyone to you know to guess the number
of clusters now all the items are forced into the clusters whether they are
actually belong to any other cluster or any other category they are forced to to
lie in that other category in which they are closest to and this against happens
because of the number of clusters with not defining the credit number of
clusters or not being able to guess the correct number of clusters so and for
most of all it’s unable to handle the noisy data and the outline is because
anyways machine learning engineers and data scientists have to clean the data
but then again it comes down to the analysis what they are doing and the
method that they are using so typically people do not clean the data for k-means
clustering or even if the Queen there sometimes are now see noisy and
outliners data which affect the whole model so that was all for k-means
clustering so what we’re going to do is now use k-means clustering for the movie
data sets so we have to find out the number of clusters and divide it
accordingly so the use case is that first of all we have at the air set of
5000 movies and what we want to do is group them rip the movies into class
based on the Facebook lights so guys let’s have a look at the demo here so
first of all what we’re going to do is import deep copy numpy pandas Seabourn
the various libraries which we’re going to use now and from map Pratley we’re
going to use fly by plot and we’re gonna use this GD plot and next what we’re
gonna do is import the data set and look at the shape of the data set so if you
have a look at the shape of the data set we can see that it has 5043 rows with 28
columns and if you have a look at the head of the data set we can see it has
five thousand forty three data points so what we’re going to do is place the data
points in the plot we take the director Facebook Likes and we have a look at the
data columns ya face number in poster cast total Facebook Likes director
Facebook Likes so what we have done here now is taking the director Facebook
Likes and the actor 3 Facebook Likes right so we have five thousand forty
three rows and two columns now using the key means from s key alone what we’re
gonna do is import it first when import key means from SK learn
doctor remember guys eschaton is a very important library in Python for machine
learning so and the number of cluster what we’re gonna do is provide as five
now this again the number of cluster depends upon the SSE which is the sum
squared errors or the we’re going to use the elbow method so I’m not going to go
into the details of that again so we’re gonna fit the data into the k-means dot
fit and if you find the cluster centers then for the k-means and print it so
what we find is is an array of five clusters and if i print the label of the
k-means cluster next what we’re going to do is plot the
data which we have with the clusters with the new data clusters which we have
found and for this we’re going to use the Seabourn and as you can see here we
have plotted the card we have plotted the data into the grid and you can see
here we have five clusters so probably what I would say is that the cluster 3
and the cluster 0 are very very close so it might depend see that’s exactly what
I was going to say is that initially the main challenge in k-means clustering is
to define the number of centers which are the key so as you can see here that
the third center and the zeroth cluster the third cluster and is your cluster
are very very close to each other so guys it probably could have been in one
another cluster and the another disadvantage was that we do not exactly
know how the points are to be arranged so it’s very difficult to force the data
into any other cluster which makes our analysis a little different works fine
but sometimes it might be difficult to code in the k-means clustering now let’s
understand what exactly is C means clustering so the fuzzy C means is an
extension of the k-means clustering of the popular simple clustering technique
so fuzzy clustering also referred as soft clustering is a form of clustering
in which each data point can belong to more than one cluster so he means tries
to find the hard clusters where each point belongs to one cluster whereas the
fuzzy c means discovers the soft clusters in a soft cluster any point can
belong to more than one cluster at a time with a certain affinity value
towards each fuzzy C means assigns the degree of membership which ranges from 0
to 1 to an object to a given cluster so there is a stipulation that the sum of
fuzzy membership of an object to all the cluster it belongs to must be equal to 1
so the degree of membership of this particular point to pool of these
clusters zero point six and zero point four and if you add a peak at one so
that is one of the logic behind the fuzzy c means so and and this affinity
is proportional to the distance from the point to the center of the cluster now
then again we have the pros and cons of fuzzy c means so first of all it allows
a data point to be in multiple clusters that’s a pro it’s a more neutral
representation of the behavior of genes genes usually are involved in multiple
functions so it is a very good type of clustering when we are talking about
genes first of and again if we talk about the cons again we have to define C
which is the number of clusters same as K next we need to determine the
membership cutoff value also so that takes a lot of time and it’s
time-consuming and the clusters are sensitive to initial assignment of
centroid so a slight change or deviation from the center’s is going to result in
a very different kind of you know a funny kind of output we get from the
fuzzy see means and one of the major disadvantage of a C means clustering is
that it’s this are non-deterministic algorithm so it does not give you a
particular output as in such that’s that now let’s have a look at the third type
of clustering which is the hierarchical clustering so uh hierarchical clustering
is an alternative approach which builds a hierarchy from the bottom up or the
top to bottom and does not require to specify the number of clusters
beforehand now the algorithm works as in first of all we put each data point in
its own cluster and if I that causes to cluster and combine them into one more
cluster repeat the above step till the data points are in a single cluster now
there are two types of hierarchical clustering one is a collaborative
clustering and the other one is deviation clustering so a cumulative
clustering builds the dendogram from bottom level while the division
clustering it starts all the data points in one cluster from cluster
now again her Archaea clustering also has some sort of pros and cons so in the
pros though no assumption of a particular number of cluster is required
and it may correspond to meaningful taxonomies
we talk about the course once our decision is made to combine two clusters
it cannot be undone and one of the major disadvantage of these hierarchical
clustering is that it becomes very slow if we talk about very very large
datasets and nowadays I think every industry are using last year as its and
collecting large amounts of data so hierarchical clustering is not the apt
or the best method someone might need to go for so there’s that now when we talk
about unsupervised learning so we have K means clustering and again and there’s
another important term which people usually miss while talking about us was
learning and there’s one very important concept of market basket analysis now it
is one of the key techniques used by large retailers to uncover association
between items now it works by looking for combination of items that occurred
together frequently in the transactions to put it another way it allows
retailers to analyze the relationships between the items that the people buy
for example people who buy bread also tend to buy butter the marketing team at
the retailer stores should target customers who buy bread and butter and
provide them an offer so that they buy a third item like an egg so if a customer
buys bread and butter and sees a discount or an offer on eggs he will be
encouraged to spend more money and buy the eggs now this is what market basket
analysis is all about now to find the association between the two items and
make predictions about what the customers will buy there are two
algorithms which are the Association rule mining and de priori algorithms so
let’s discuss each of these algorithm with an example first of all if we have
a look at the Association rule mining now it’s a technique that shows how
items are associated to each other for example customers who purchase bread
have a 60% likelihood of also purchasing Jam and customers who purchase laptop
are more likely to purchase laptop bags now if you take an example of an
association rule if you have a look at the example here a aerobee it means that
if a person buys an atom a then he will also buy an item P now there are three
common ways to measure a particular Association because
we have to find these rules on the basis of some statistics right so what we do
is use support confidence and lift now these three common ways and the measures
to have a look at the Association rule mining and know exactly how good is that
rule so first of all we have support so support gives the fraction of the
transaction which contains an item a and B so it’s basically the frequency of the
item in the whole item set whereas confidence gives how often the item a
and B occurred together given the number of item given the number of times a
occur so it’s frequency a comma B divided by the frequency of a now lift
what indicates is the strength of the rule over the random occurrence of a and
B if you have a close look at the denominator of the lift formula here we
have support a into support B now a major thing which can be noted from this
is that the support of a and B are independent here so if the value of lift
or the denominator value of the lift is more it means that the items are
independently selling more not together so that in turn will decrease the value
of lift so what happens is that suppose the value of lift is more that implies
that the rule which we get it implies that the rule is strong and it can be
used for later purposes because in that case the support in to support p value
which is the denominator of lift will be low which in turn means that there is a
relationship between the items a and B so let’s take an example of Association
rule mining and understand how exactly it works so let’s suppose we have a set
of items a B C D and E and we have the set of transactions which are T 1 T 2 T
3 T 4 and T 5 and what we need to do is create some sort of rules for example
you can see a D which means that if a person buys a he buys D if a person buys
C he buys a if it wasn’t buys a e by C and for the fourth one is if a person
buy a B and C he is in turn buy a now what we need to do is calculate the
support confidence and left of these rules now here again we talk about a
priori algorithm so a priori algorithm and the
Association rule mining go hand-in-hand so what a priori does is algorithm it
uses the frequent itemsets to generate the Association rules and it is based on
the concept that a subset of a frequent item set must also be a frequent item
set so let’s understand what is a frequent item set and how all of these
work together so if we take the following transactions of items we have
transaction T 1 T 2 T 5 and the items are 1 3 4 2 3 5 1 2 3 5 2 5 and 1 3 5
now another more important thing about
support which I forgot to mention was that when talking about Association rule
mining there is a minimum support count what we need to do now the first step is
to build a list of items set of size 1 using this transaction data set and use
the minimum support count 2 now let’s see how we do that if we create the
table see when if you have a close look at the table C 1 we have the item set 1
which has a support 3 because it appears in the transaction 1 3 and 5 similarly
if you have a look at the item set the single item 3 so it has a supporter of 4
it appears in T 1 T 2 T 3 and T 5 but if we have a look at the items at 4 it only
appears in the transaction once so its support value is 1 now the item set with
the support rally which is less than the minimum support value that is to have to
be eliminated so the final table which is a table F 1 has 1 2 3 and 5 it does
not contain the 4 now what we’re going to do is create the item list of the
size 2 and all the combination of the item sets in F 1 I used in this
iteration so we’ve left 4 behind we just have 1 2 3 and 5 so the possible item
sets of 1 2 1 3 1 5 2 3 2 5 and 3 5 then again we’ll calculate these support so
in this case if you have a closer look at the table C 2 we see that the items
at 1 comma 2 is having a support value 1 which has to be eliminated so the final
table F 2 does not contain 1 comma similarly if we create the item sets of
size 3 and calculate these support values but before calculating the
support let’s perform the peering on the data set now what’s burning so after all
the combinations are made we divide the table see three items to check if there
are another subset whose support is less than the minimum support value this is a
priori algorithm so in the item sets one two three what we can see that we have
one two and in the one to five again we have one two so we’ll discard board of
these item sets and we’ll be left with 1 3 5 and 2 3 5 so with 135 we have three
subsets 1 5 1 3 3 5 which are present in table F 2 then again we have 2 3 2 5 and
3 5 which are also present in tea we’ll have to so we have to remove 1 comma 2
from the table C 3 and create the table F 3 now if we’re using the items of C 3
to create the adults of c4 so what we find is that we have the items at 1 2 3
5 the support value is 1 which is less than the minimum support value of 2 so
what we’re going to do is stop here and we’re going to return to the previous
item set that is the table C 3 so the final table F 3 was 1 3 5 with the
support value of 2 and 2 3 5 with the support value of 2 now what we do is
generate all the subsets of each frequent itemsets so let’s assume that
our minimum confidence value is 60% so for every subset s of I the output
rule is that s gives I 2’s is that s recommends I NS if the support of I
divided by the support of s is greater than or equal to the minimum confidence
value then only we’ll proceed further so keep in mind that we have not used lift
till now we are only working with support and confidence so applying rules
with Adam sets of f3 we get rule 1 which is 1 comma 3 which gives 1 3 5 and 1/3
it means if you buy 1 & 3 they 66% chance that you’ll buy item five
also similarly the rule 1 comma 5 it means that if you buy one and five
there’s 100% chance that you will buy 3 also similarly if we have a look at rule
5 and 6 here the confidence value is less than 60% which was the assumed
confidence value so what we’re gonna do is we’ll reject these fights now an
important thing to note here is that have a closer look to the rule 5 and
rule 3 you see it it has 1 5 3 1 5 3 3 1 5 it’s very confusing so one thing to
keep in mind is that the order of the item sets is also very important that
will help us allow create good rules and avoid any kind of confusion so that’s
done so now let’s learn how Association rule I used in market basket analysis
problem so what we’ll do is we will be using the online transactional data of a
retail store for generating Association rules so first of all what you need to
do is import pandas MLT ml X T and E libraries from the imported and read the
data so first of all what we’re going to do is read the data what we’re gonna do
is from ml X T and e dot frequent patterns we’re going to import the a
priori and the Association rules as you can see here we have the head of the
data you can see we have inverse numbers stock code the description quantity the
invoice detail unit price customer ID and the country so in the next step what
we will do is we will do the data cleanup which includes removing spaces
from some of the descriptions given and what we’re going to do is drop the rules
that do not have the inverse numbers and remove the freight transaction so hey
what what you’re gonna do is remove which should not have an invoice number
if the string contains type seen was a number then we’re going to remove that
those are the credits remove any kind of spaces from the descriptions so as you
can see here we have like Phi 1 and 32,000 rows with 8 columns so next what
we wanted to do is after the clean up we need to consolidate the items into one
transaction or row with each product for the sake of
keeping the data said small we’re gonna only add the sales for France so we’re
gonna use the only France and group by invoice number description with the
quantity sum up and see so which leaves us with 392 rows and 1563 columns now
there are a lot of zeros in the data but we also need to make sure any positive
values are converted to a 1 and anything less than 0 is set to 0 so for that
we’re going to use this code defining end code units if X is less than 0 it on
0 if X is greater than 1 returns 1 so what we’re going to do is map and apply
it to the whole data set we have here so now that we have structured the data
properly so the next step is to generate the frequent item set that has support
of at least 7% now this number is chosen so that you can get close enough now
what we’re going to do is generate the rules with the corresponding support
confidence and lift so we have given the minimum support a 0.7 the metric is left
frequent item set and threshold is one so these are the following rules now a
few rules with a high lift value which means that it occurs more frequently
than would be expected given the number of transaction in the product
combinations most of the places the confidence is high as well so these are
few of the observations what we get here if we filter the data frame using the
standard pandas code for large lifts 6 and high-confidence 0.8 this is what the
output is going to look like these are 1 2 3 4 5 6 7 8
so as you can see here we have the age rules which are the final rules which
are given by the Association rule mining and that is how all the industries are
any of these we’ve talked about large retailers they tend to know how their
products are used and how exactly they should rearrange and provide the offers
on the products so that people spend more and more money and time in the shop
so that was all about Association rule mining so so guys that’s all for I’m
suppose lining I hope you got to know about the different formulas how
unsupervised learning works because you know we did not provide any label to the
data all we did was create some rules and not knowing what the data is and we
did clusterings different types of clusterings k-means Cimiez hierarchical
clustering so now coming to the third and last type of learning is the
reinforcement learning so what reinforcement learning is it’s a
type of machine learning where an agent is put in an environment and it learns
to behave in this environment by performing certain actions and observing
the rewards which it gets from those actions so a reinforcement learning is
all about taking an appropriate action in order to maximize a reward in the
particular situation and in Tsuboi slowing the training d-8 are comprised
of input and expected output so the model is strained with the expected
output itself but when it comes to reinforcement learning
there is no expected output the reinforcement agent decides what actions
to take in order to perform a given task in the absence of a training dataset it
is bound to learn from its expertise so let’s understand reinforcement learning
with an analogy so consider a scenario where a baby is learning how to walk now
this scenario can go in two ways first the baby starts walking and makes it to
the candy now since the candy is the end goal the
baby is happy it’s positive the baby is happy positive reward now coming to the
second scenario the baby starts walking but falls due to some Horrors in between
now the baby gets hurt and does not get to the candy it’s negative the baby is
sad negative reward just like we humans learned from our mistakes
by trial and a No reinforcement learning is also similar
and we have an agent with his baby a reward which is candy and many hurdles
in between the agent is supposed to find the best possible path to reach the
reward so guys if you have a look at some of the important reinforcement
learning definitions first of all we have the agent so the reinforcement
learning algorithm that learns from trial and error that’s the agent now if
we talk about environment the world through which the agent moves or the
obstacles which the agent has to conquer the environment now actions are all the
possible steps that the agent can take the state s is the current conditions
returned by the environment then again we have reward R and instant return for
the environment to appraise the last action then again we have policy which
is PI it is the approach that the agent uses to remind the next action based on
the current state we have value V which is the expected long-term return with
discount as open to the short-term what are then again we have the action value
q this is similar to value except it takes an extra parameter which is the
current state action which is a now let’s talk about reward maximization for
a moment now reinforcement learning agent works based on the theory of
reward maximization this is exactly why the RL agent must be trained in such a
way that he takes the best action so that the reward is maximum now the
collective rewards at a particular time and the respective action is written as
GT equals RT plus one RT plus two and so on
now the equation is an ideal representation of rewards generally
things do not work out like this while summing up the cumulative rewards now
let me explain this with a small gape in the figure you see a fox right some meat
and a tile are reinforcement learning agent is the Fox and his end goal is to
eat the massive autumn meat before being eaten by the tiger
since this fox is clever fellow he eats the meat that is closer to him rather
than the meat which is close to the tiger because the closer he goes to the
Tiger the tiger the higher are his chances of getting
killed as a result reward near the tiger in if they are bigger meat chunks will
be discounted this is done because of the uncertainty factor that the tiger
might kill the Fox now the next thing to understand is how discounting of reward
works now to do this we define a discount called de Gama
the value of gamma is between 0 & 1 the smaller the gamma the larger the
discount and vice versa so our cumulative discounted reward is GT
summation of K 0 to infinity gamma to the power K as DK r t + k plus 1 where
gamma belongs to 0 to 1 but if the Fox decides to explore a bit it can find
bigger rewards that is this big chunk of meat this is called exploration so the
reinforcement learning basically works on the basis of exploration and
exploitation so exploitation is about using the
already known expert information to heighten his rewards whereas exploration
is all about exploring and capturing more information about the environment
there is another problem which is known as the K armed bandit problem the K
armed bandit it is a metaphor representing a casino slot machine with
Capel leavers or arms the users or the customer pulls any one of the levers to
win a predicted reward now the objective is to select the lever that will provide
the user with the highest reward now here comes the epsilon greedy algorithm
it tries to be fair to two opposite cause of exploration exploitation by
using a mechanism of flipping a coin it’s just like if you flip a coin and
comes up head you should explore forever and butter comes up days you should
exploit it takes whatever action seems best at the present moment so with
probability 1 epsilon the absolute greedy algorithm exploits the best 1
option with probability epsilon by 2 Epsilon
the algorithm explores the best known option and with the probability epsilon
by 2 with probability epsilon by 2 the algorithm explores the best known option
and with the probability epsilon by 2 the Epsilon greedy algorithm explores
the worst loan option now let’s talk about mark or decision
process the mathematical approach for mapping a solution in reinforcement
learning is called Markov decision process which is MDP in a way the
purpose of reinforcement learning is to solve a Markov decision process now the
following parameters are used to attain a solution
set of actions a set of states s we have the reward our policy PI and the value V
and we have translational function T probability that our forum leads to s
now to briefly sum it up the agent must take up an action to transition from the
start state to end state s while doing so the agent receives the reward R for
each action he takes the series of action taken by the agent define the
policy PI and the rewards collected by collected to find the value of V the
main goal here is to maximize the rewards by choosing the optimum policy
now let’s take an example of choosing the shortest path now consider the given
example here so what we have is given the above representation our goal here
is to find the shortest path between a and D each Edge has a number linked to
it and this denotes the cost to traverse that edge now the task at hand is to
traverse from point A to D with the minimum possible cost in this problem
the set of states are denoted by the nodes ABCD ABC and D the action is to
traverse from one node to another are given by a arrow B or C arrow T reward
is the cost represented by each edge and the policy is the path taken to reach
each destination a to C to D so you start off at node a and take baby steps
to your destination initially only the next possible node is visible to you if
you follow the greedy approach and take the most optimal step that is choosing a
to see instead of a to B or C now you are at nodes C and want to traverse to
node T you must again choose the path wisely choose the path with the lowest
cost we can see that ACD has the lowest cost and hence we take that path to
conclude the policy is a to C to D and the value is
and 20 so guys let’s understand Q learning
algorithm which is one of the most used reinforcement learning algorithm with
the help of an examples so we have five rooms in a building connected by toast
and each room is numbered from 0 through 4 the outside of the building can be
thought of as one big room which is tea room number 5 now dose 1 & 4 lead into
the building from the room 5 outside now let’s represent the rooms on a graph and
each node each room has a node and each door as a link so as you can see here we
have represented it as a graph and our goal is to reach the node 5 which is the
outer space so what we’re gonna do is the next step is to associate a reward
value to each toe so the dose that directed read to the you will have a
reward of 100 whereas the toes that do not directly connect to the target have
a reward and because the dose had to weigh two arrows are assigned to each
room and each row contains an instant about wylie so after that the
terminology in the q-learning includes the term states and action so the room 5
represents a state agents movement from one room to another room represents in
action and in this figure a state is depicted as a node while an action is
represented by the arrows so for example let’s say can eat in that Traverse from
room to to the roof I so the initial state is gonna be the state to it then
the next step is from stage 2 to stage 3 next its moves from stage 3 to stage
either 2 1 or 4 so if it goes to the 4 it reaches stage 5 so that’s how you
represent the hole traversing of any particular agent in all of these rooms
I represent state actions via notes so we can put this state diagram and
instant reward values into a reward table which is the matrix R so as you
can see the minus 1 here in the table represents the null values because you
cannot go from 1 to 1 right and since there is no way from to go from 1 to 0
so that is also minus 1 so minus 1 represents the null values whereas the 0
represents zero reward an hundred represents the reward going to the room
five so one more important thing to know here is that if you’re in role firing
you could go to room five the reward is one hundred so what we need to do is add
another matrix Q representing the memory of what the agent has learned through
experience the rows of matrix Q represent the current state of the agent
whereas the columns represent the possible action leading to the next
state now the formula to calculate the Q matrix is if a particular Q at a
particular state and the given action is equal to the r of that state in action
plus gamma which we discussed earlier the Kurama parameter which we discussed
earlier which ranges from 0 to 1 into the maximum of the q or the next state
comma all actions so let’s understand this with an example so but here are the
nine steps which any Q learning algorithm particularly has so first of
all is to set the gamma parameter and the environment rewards in the matrix R
then we need to do is initialize the matrix Q to 0 select the random initial
State set the initial state to current state select one among all the possible
actions for the current state using this possible action causes are going to the
next state when you get the next state get the maximum Q value for this next
state based upon all the actions compute the Q value using the formula repeat the
above steps until the current state it was your goal so the first step is to
set the values of the learning parameters gamma which is 0.8 and
initial state as room number 1 so the next initialize the Q matrix a 0 matrix
so on the left hand side as you can see here we have the Q matrix which has all
the values as 0 now from room 1 you can either go to room 3 or room 5 so let’s
select room 5 because that’s our end goal so from room 5 calculate the
maximum Q value for this next state based on all possible actions so Q 1
comma 5 equals R 1 comma 5 which is hundred plus zero point eight which is
your gamma into the maximum of Q five comma 1 5 comma 4 and 5 comma 5 so
maximum or five comma one five comma four five comma five is hundred so the Q
values from initially as you can see here the Q values are initialized to
zero so it does not matter as of now so the maximum is zero so the final Q value
for Q 1 comma 5 is 100 so so that’s how we’re gonna update our Q matrix so Q
matrix the position has 1 comma 5 and the second row gets updated to 100 so
the first step we have turned right now for the next episode we start with a
randomly chosen initial state so let’s assume that the stage is 3 so from rule
number 3 you can either go to room number 1 2 or 4 so let’s select the
option of room number 1 because from our previous experience what we’ve seen is
that one has directly connected to room 5
so from room over 1 calculate the maximum Q value for this next state
based on all possible action so 3 comma 1 if we take we get our 3 4 1 plus 0
point 8 comma into maximum of these we get the value as 80 so the matrix Q gets
updated now for the next episode the next state 1 now becomes the current
state we repeat the inner loop of the Q learning algorithm because steep one is
not the goal state from 1 you can either go to tree of 5 so let’s select 105 as
that’s our goal so from room row 5 again we can go from
all of these so the Q matrix remains the same since Q 1 5 is already fed to the
agent and that is how you select the random starting points and fill up the Q
Q matrix and see where which path will lead us there with the maximum provide
points now what we’re going to do is do the same coding using the Python in
machine so what we’re going to do is improve an umpire’s NP we’re gonna take
the R matrix as we defined earlier so that the minus 1 are the nerve values
zeros are the values which provides a 0 and hundreds is the value so what we’re
going to do is initialize the Q matrix now to 0 we’re going to put gamma as 0.8
and set the initial state as 1 now here returns all the available actors in the
state given as an argument so if we define the
action with the given state we get the available action in the current state so
we have the another function here which is known as a sample next action what
this function does is that chooses at random which action to be performed
within the range of all the available actions and finally we have action which
is the sample next action with the available act now again we have another
function which is update now what it does is that it updates the Q matrix
according to the path selected and the Q learning algorithm so so initially our Q
matrix is all 0 so what we’re going to do is we’re gonna train it over 10,000
iterations and let’s see what exactly gives the output of the Q value so if
then the agent learns move through for the iterations it will finally reach
converges value in Q matrix so the Q matrix can then be normal as that is
converted to percentage by dividing all the non-zeros entities by the highest
number which is 500 in this case so once the matrix Q gets close enough to the
state of convergence agent has learned the most optimal path to the goal State
so what we’re gonna do next is divide it by 5 which is the maximum here so Q
divided by and P Q max in 200 so that we get it normalized
now once the Q matrix gets close enough to the state of convergence the agent
has learned all the paths so the optimal path given by the Q learning algorithm
is if it starts from 2 it will go to 3 then go to 1 and then go to 5 if it
starts at 2 it can go to 3 then 4 then 5 I’ll give us the same results so as you
can see here the output given by the Q learning algorithm is the selected path
is 2 3 1 and file starting from the q state 2 so this is how exactly a
reinforcement learning algorithm works it finds the optimal solution using the
path and given the action and rewards and the various other definitions or the
various of the challenges I would say actually the main goal is to get the
master reward and get the maximum value through the environment and that’s how
an agent learns through its own path and going millions and millions of
iterations learning how each part will give us what reward so that’s how the Q
learning algorithm works and that’s how it works in Python as well as I showed
you so guys this is all about reinforcement learning and I hope you’ve
got to know about the Markov decision process how Q learning works and with
this guys we come to an end of this machine learning session I hope you guys
got to know about machine learning how it is used in the industries the various
applications of machine learning the waste types of machine learning
supervised and supplies reinforcement the waste algorithms used in machine
learning there are n number of algorithms and these are just few of
those uncoil items so guys ed rekha as we know provides a machine learning in
Python course they also provide a machine learning engineer course which
is the master program which is more aligned and structured so these course
will not only provide you with the skills and it will also help you to
learn those skills in the correct order so guys if you want to know more about
these check out the courses the link to which is given in the description box
below and if you guys have any doubts and any queries regarding this whole
video and the whole demo part or any other type of queries you have please
feel free to mention it the comments section below and we’ll revert to it as
soon as possible so as thank you for sticking this long for this machine
learning full course video thank you and happy learning. I hope you have enjoyed
listening to this video please be kind enough to like it and you can comment
any of your doubts and queries and we will reply them at the earliest do look
out for more videos in our playlist and subscribe to Edureka channel to learn
more happy learning.

Add a Comment

Your email address will not be published. Required fields are marked *