Logistic Regression

Rohit Kumar
4 min readDec 5, 2020

Before we discuss logistic regression, I certainly hope that you do know what linear regression is, and here is a short summary for linear regression.

What is Linear Regression?

Linear Regression is a supervised machine learning algorithm which is used to predict a target value based on independent variables. When I say supervised it means that it needs labeled datasets to make predictions. The target value predicted using linear regression is a continuous value. How it works is that, Linear Regression model finds the best line that can accurately predict the output for continuous dependent variable. See the images below to get a clear understanding.

Here we want to predict the size of an object using an independent variable weight. The points in the graph are known datapoints/dataset. Now in linear regression we try to fit a straight line among the datapoints such that is accurately predicts the output. See image for clarity.

And based on this line we pedict the target value for any given unknown independent variable.

Logistic Regression

Now Logistic Regression is also a supervised machine learning algorithm and is similar to linear regression but the difference that instead of predicting continuous values, it is mainly used for classification problems. Here I’ve said that it is used for classification problems but remember it is a regression model. How it works is that instead of fitting a straight line to the data like linear regression, logistic regression fits an ‘S’ shaped logistic function into the data.’

Herein the given image the function y= f(x) is a sigmoidal function usually represented as

If you look at the function closely, you’ll realize that the value of the given function can only vary from 0 to 1 which is the main catch here. It is somewhat like probability. So, Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. Mathematically, a logistic regression model predicts P(Y=1) as a function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as spam detection, Diabetes prediction, cancer detection etc. Let us see an example for better understanding.

In the above graph we have data for obese and non-obese mice based on their weight. Now the logistic regression model fits a sigmoidal function on the data.

This means that the curve now tells us the probability whether a mouse is obese or not.

By now you must have figured out that a logistic regression model becomes a classification model only after we bring a threshold value to it. For example if the probability of a mouse being obese is more than 0.6, then we classify it as obese.

Types of Logistic Regression

Generally, logistic regression means binary logistic regression having binary target variables, but there can be two more categories of target variables that can be predicted by it. Based on that number of categories, Logistic regression can be divided into following types −

Binary or Binomial

In such a kind of classification, a dependent variable will have only two possible types either 1 or 0. For example, these variables may represent success or failure, yes or no, win or loss etc.

Multinomial

In such a kind of classification, dependent variable can have 3 or more possible unordered types or the types having no quantitative significance. For example, these variables may represent “Type A” or “Type B” or “Type C”.

Before diving into the implementation of logistic regression, we must be aware of the following assumptions about the same −

· In case of binary logistic regression, the target variables must be binary always and the desired outcome is represented by the factor level 1.

· There should not be any multi-collinearity in the model, which means the independent variables must be independent of each other.

· We must include meaningful variables in our model.

· We should choose a large sample size for logistic regression.

Code Sample

Let’s see how we can we perform logistic regression using python and which libraries do we need to import.

Scikit-learn or sklearn is a free machine learning library for Python where all machine learning models are implemented as python classes.

Training your model:

>>from sklearn.linear_model import LogisticRegression

>>classifier = LogisticRegression(random_state = 0)

>>classifier.fit(xtrain, ytrain)

Using the trained model to predict data:

y_pred = classifier.predict(xtest)

We can further test the performance of the trained model

from sklearn.metrics import accuracy_score

print (“Accuracy : “, accuracy_score(ytest, y_pred))

Hope you guys liked it. Thanks for reading.

--

--