Linear Regression
Welcome to the journey of machine learning. In this post you are going to learn about linear Regression, equation of Linear Regression, use of Linear regression and will perform Linear regression with real data so let’s begin. As I mentioned in my previous blog Machine learning is Fun that how mathematics is important and useful in Machine learning. Before we move further you must have to know about Dependent and Independent variables and also about supervised and Unsupervised learning.
What is Linear regression?
Linear regression is one of the best predictive algorithm which defines the Relationship between Dependent variable and Independent variable in which Dependent variable vary from Independent Variables and Linear Regression is a Supervised learning algorithm in which you train the model and feed the data to the model so that model can predict the outcomes from the past data which help you to know about output on new data. The main purpose of Linear regression is try to get best fit line in the data by minimizing the error which help to increase the accuracy of your model. Don’t worry if you it’s confusing let’s try to understand with example.
As you can see above image which shows the best fit line and those points indicates Data. Now let’s dive into math behind Linear regression. The equation of Linear Regression is Y = ax + b now as you can see the equation Y is nothing but a Dependent variable which is depend on the entire equation b is intercept in the model, a is slope of the equation and x is independent variable which can contain any value.
The Equation Y=ax + b is for single feature that means it only contain single feature lets take an example I have single feature (Independent variable) which effects on label (Dependent variable).
E.g.
Study hrs =X | Result =Y |
6hr | 70% |
5hr | 65% |
8hr | 75% |
9hr | 80% |
Study hrs = No of hrs student study
Result = The result gain by the result
Our equation will be Result = a * Study hrs + b, where a and b are slope and intercept respectively
But if we talk about real world this equation will not work for multiple features (Independent variables) so for that we have to use Multi Linear regression equation for example let’s take same example as above what if we have 2 features like Study hrs. and Rest hrs. if we can compare with output then both features are show positive relationship.
E.g.
Rest hrs = X1 | Study hrs =X2 | Result =Y |
8hr | 6hr | 70% |
8hr | 5hr | 65% |
6hr | 8hr | 75% |
5hr | 9hr | 80% |
Equation will be Y = a * X1 + a* X2 +b , X1 = Rest hrs and X2= Study hrs.
source : superdatascience
So as we have discussed both single and multi linear Regression. Now let’s discuss how it works. As I told you before Algorithm will find best fit line for your model but the Question is how to get fest fit line for that we use Gradient descent which change slope and intercept until your error minimized. That we will discuss in another post but the main thing is that we don’t have to worry about those Equation’s because we have tools by which we can directly train the model by feeding data. Those are Scikit learn, Numpy, pandas and Matplotlib which helps us to get the data, filter the data and visualize the data.
How to Minimize the Error in Linear regression ?
Now before moving further let’s now understand about Error and how to minimize the error. The equation to find error is
Least Squared method is a statistical approach to find and determine the best fit regression line by minimizing the Error. If we talk about actual_output is nothing but the real output which we have given to the model and predicted_output is the output which is perform by the model. Our goal is to minimize the Error to get best fit line there are many approach we can apply for that for example Gradient Descent, R-squared , etc. those will discuss in another post.
Linear regression using Python
Step 1:
Import all the required libraries pandas, Matplotlib and Sklearn. We use pandas to import and convert the .csv file into pandas DataFrame and will also include Linear Regression from Sklearn.linear_model .
Step 2:
Step 3:
Look for the missing data in your dataset is it contain any null value? If it contains null value, we have to handle it that will see in another post.
As you can see that we don’t have missing values
Step 4:
One of the most important step data splitting and decide the dependent and Independent variable, you can assign any ratio for splitting the training and testing data but I recommend you to use 75:25 or 70:30. Let me clear that the ratio can be changed as the data size changed.
Step 5:
Now it’s time to fit the dataset into model now we have to create instance of LinearRegression() function and then with the help of .fit() method fit the data into to model
Step 6:
Predict the outcome using test data
Step 7:
Now the interesting part visualize your model and have a look how it’s look like, we use Matplotlib for plotting
Step 8:
Now plot predicted value on current plot using matplotlib.pyplot.plot()
No comments:
Post a Comment