Logistic Regression in Machine Learning
- Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables.
- Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
- Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems.
- In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).
- The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
- Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets.
- Logistic Regression can be used to classify the observations using different types of data and can easily determine the most effective variables used for the classification. The below image is showing the logistic function:
Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it is called logistic regression, but is used to classify samples; Therefore, it falls under the classification algorithm.
Logistic Function (Sigmoid Function):
- The sigmoid function is a mathematical function used to map the predicted values to probabilities.
- It maps any real value into another value within a range of 0 and 1.
- The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function.
- In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0.
Assumptions for Logistic Regression:
- The dependent variable must be categorical in nature.
- The independent variable should not have multi-collinearity.
Logistic Regression Equation:
The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical steps to get Logistic Regression equations are given below:
- We know the equation of the straight line can be written as:
- In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation by (1-y):
- But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will become:
The above equation is the final equation for Logistic Regression.
Type of Logistic Regression:
On the basis of the categories, Logistic Regression can be classified into three types:
- Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
Q.1. Define the term regression with its type.
Answer:
- Regression analysis consists of a set of machine learning methods that allow us to predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x).
- The ultimate goal of the regression algorithm is to plot a best-fit line or a curve between the data.
- The three main metrics that are used for evaluating the trained regression model are variance, bias and error.
- Regression models are used to predict a continuous value.
Goal:To build a mathematical equation that defines y as a function of the x variables.
Q.2. Describe briefly linear regression .
Answer:
- Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope.
- It is one of the machine learning algorithms where the result is predicted by the use of known parameters which are correlated with the output.
- Regression models a target prediction value based on independent variables.
- It's used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog).
Types of Linear Regression:
- Simple Linear Regression:Simple linear regression is a regression technique in which the independent variable has a linear relationship with the dependent variable.
y= a+bx+ ε
- Multiple Linear Regression:The target variable(Y) is a linear combination of multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied for the multiple linear regression equation,
y=a+b1x1+b2x2+b3x3+....+bi+xi+ε
where
a= It is the intercept of the Regression line (can be obtained putting x=0)
b= It is the slope of the regression line, which tells whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
Q.3. Explain logistics regression.
Answer:
- Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable.
- It is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
- It is used for predicting the categorical dependent variable using a given set of independent variables.
- It’s outcome can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
Q.4. What are the types of logistics regression?
Answer:
Logistic Regression is a Machine Learning algorithm which is used for classification problems, it is a predictive analysis algorithm and based on the concept of probability.
- Binary or Binomial: Bi means two so it suggests that there will be only two possible types (either 1 and 0) of dependent variables.
- Example:
Consider a situation where you are interested in classifying an individual as diabetic or non-diabetic based on features like glucose concentration, blood pressure, age etc.
- Multinomial:Multi means more than two so it gives dependent variables can have 3 or more possible unordered types or the types having no quantitative significance.
- Example:
It is commonly used as an alternative to naive Bayes classifiers because they do not assume statistical independence of the random variables
- Ordinal: In this type of regression,we can say that the dependent variable consists of 3 or more than 3 possible ordered types or the types having a quantitative significance.
- Example:
It can be used in a situation where one needs to decide Job satisfaction level depending on the categories:
Dissatisfied,
Satisfied,
Highly Satisfied
Q.5. Differentiate between linear regression and logistics regression.
Answer:
Linear Regression | Logistic Regression |
Here,predicted values are the mean of the target variable for the given values of the input variables | Here,predicted values are the probability of a particular level of the target variable for the given values of the input variables |
The data is modelled using a straight line. | The probability of some obtained event is represented as a linear function of a combination of predictor variables. |
Linear relationship between dependent and independent variables is required. | Linear relationship between dependent and independent variables is not required. |
- Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as "cat", "dogs", or "sheep"
- Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as "low", "Medium", or "High".
0 Comments