Linear Regression

8 min readMar 6, 2021

Machine Learning problems always deals with the datasets itself . Real world Data are need some feature Engineering techniques which may contain null values and categorical variables as well. This can arises due to human error or it can be from sensor itself. But we can also obtained such readymade datasets from Kaggle itself such as housing price prediction, sales prediction. For the brief intro, Readymade datasets are those which have no null values, need no type of feature selection techniques. The behaviour of each point in a dataset can be visualized by matplotlib pyplot function. Its important to visualize our datasets before approaching any machine learning task.

As per visualization, there are several algorithms discussed for regression model upon which model fits the best. The most common regression problem for any prediction is Linear Regression, Lasso Regression, Ridge Regression. They are applied depending upon the behaviour and the nature of datasets . Suppose someone wants to predict a prices of the house depending on the attributes that house possesses. It’s quite easy to fit regressional model on such scenerios.

There are several hyperparameters that often effects the performance of the model in the dataset and speeds up its accuracy level. Selecting a hyperparamter for a model that fits the best is quite often a tedious work. To avoid such type of complexities, the algorithms like GridSearchCV or RandomizedSearchCV have been introduced. Both works fine inspite of having few drawbacks.

As per the introduction, lets focus firstly on Linear Regression which is the one of the common application for continuous data such as housing price predictions. In Linear Regression, Line Curve is almost straight forward which is the best fitted option for the linear based datasets. Line Curve is the continuous line that have finite curvature. Line Curve can also be complex depending upon the datapoints on the datasheet. In almost all real life situations, line curve have non zero curvature(i.e. curved line)with best suited hyperparameters that fits well

Focusing on a fact that model performs well on a linearly spaced datasets, Line Curve follow the straight line equation (y = mx + c) where m is the slope of the straight line and c is the intercept. Datasets comprises input, features and the target variables. More you collected data samples, better will be your the model performance and its accuracy. The model trained on many features should be given importances according to its weights.

Predicted Value for test datasets

where h0, h1, h2,…. are the weights given to the features for predictions to a model to compute its accuracy. Below is a diagram illustrating Linear Regression graph which shows a linear correlation between features and the target variable.

Best Fit Line is chosen such that average mean squared error between the actual and predicted value is much less as possible. Its not always necessary to start the fitted line from the origin itself. The summation of the difference between the actual output and the predicted output is known as Least Squared Error. Mean Squared Error is the mean or the average over all the error in the datasets

Note: Before fitting a model into a particular datasets , we must have to perform some necessary steps in order to compute its accuracy score such as splitting of datasets, labelling the features and the target variables that will be discussed in details as we proceed furthur

Implementation :

The scikit-learn package allows you to implement a Linear Regression model by just calling a function called LinearRegression( ) . Although scikit-learn allow us to implement such algorithms much easily but there are few family members like linear_model, preprocessing, decomposition, feature_selection are introduced.

LinearRegression is present in the scikit learn’s linear_model family member and atlast we have to fit the datasets into the regression model. After the training has been completed successfully, the model is almost ready to predict on the unseen data.

There are few hyperparameters that can be used to furthur improve its model performances and speed up its process

fit_intercept : [boolean] Whether to calculate intercept for the model.
normalize : [boolean] Normalisation before regression.
copy_X : [boolean] If true, make a copy of features else overwritten.
n_jobs : [int] If -1 all CPU’s are used. This will speedup the working for large datasets to process.

To compute its accuracy, a function from sklearn.metrics to compute the r2_score . R2_score is basically “(total variance explained by model) / total variance.” So if it is 100%, the two variables are perfectly correlated, i.e., with no variance at all. Sometimes from regressor function itself, we can implement score( ) function. There are many other attributes to check its weights and c-intercept for a given best fitted line such as coef_ or intercept_ .

This is just the basics of regression model. There are much more complexity in data that cannot be predicted by the linear regressor. For the better accuracy and performance, there are more regression algorithms to apply and their hyperparameters to make it far more better. As said earlier, real world dataset are almost messy, so some feature selection techniques must be applied to the model in order to have an effect more significantly onto the datasets.

Ridge Regression (L2 Regularization)

Focusing on another type of Regression model named Ridge Regression which often solved the problem of underfitting. Regularization strength ‘ἀ is introduced to determines the model strength of predicting on new datasets. The parameter alpha values can be tuned alternatively producing a great predictions over a new datasets. It also minimizes the loss function to choose the coefficients for each feature variable. Larger value of alpha can even lead to a overfitting problem.

Unlike Linear Regression, Ridge Regression is more efficient and computionally faster in use. In Ridge Regression, the OLS loss function is augmented in such a way that we not only minimize the sum of squared residuals but also penalize the size of parameter estimates, in order to shrink them towards zero:

Choosing a alpha value for a particular model can also be done by cross validation technique which is present in scikit learn cross_val_score method. This technique can be used to visualize upon which values of alpha, minimum least square residual is obtained so that model fit better and learn to predict on unseen datasets.

**Advantage of Ridge Regression over Linear Regression**

Overfitting is a process of training a model with much more complexities including the tri-polynomial features that can have a huge impact on training set and fails to predict on unseen dataset. Underfitting is the concept of training a high-level features in a linear way and it fails both in predicting test set and train set. To normalize the regularization strength, regularization value alpha should be kept optimal in such a way so that it neither overfits nor underfits. Even also too much smaller values of alpha can be computionally expensive and take more time to reach the global minima.

Implementation :

The Ridge( ) Regression is present in the scikit learn linear_model library function where default alpha values need to be chosen keeping in mind of low bias and high variance. Here we consider the value of alpha to be 0.1 . Fit( ) method also helps to fit to the datasets by splitting it into train and test split and then also compute its score on its basis.

There are few hyperparameters that can be used to furthur improve its model performances and speed up its process:

Lasso Regression (L1 Regularization)

Least Absolute Shrinkage and Selection Error or Lasso is a another type of regression technique that is used for high dimensional and multi correlated database. It almost penalizes for the large coefficients and unlike ridge regression which penalizes for minimum mean squared error, Lasso focuses on sum of the absolute values(L1 regularization). As a result, for the higher value of alpha, the value of coefficients may shrinks to zero or can be almost zero.

To avoid overfitting, i.e for the highly correlated features and the complexity of the model , it becomes almost zero so that model don’t fails on new datasets. Lasso Regression is quite effective to use in such case. Before applying any regression model, it’s better to visualize the datapoint, it will more easier to decide that on which states model fits more to the dataset.

Implementation of algorithms :

The algorithm used to fit the model is coordinate descent.To avoid the unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

There are few hyperparameters that can be used to furthur improve its model performances and speed up its process:

fit_intercept : [boolean] Whether to calculate intercept for the model.
normalize : [boolean] Normalisation before regression.
copy_X : [boolean] If true, make a copy of features else overwritten.
alpha : {float, array-like} shape (n_targets) . max_iter : Number of iterations

Linear Regression

Ridge Regression (L2 Regularization)

Lasso Regression (L1 Regularization)

Written by ECE_B_113_Debashis Saha