### Machine Learning Algorithms Part 10: Logistic Regression Example In Python

Logistic Regression is a supervised machine learning algorithm used in the classification of data. For example, suppose that given their income, we wanted to predict whether a customer would buy a product or not. In other words, we want to classify the customers into two categories, those who we think will purchase the product and those who will not.

By using the regression line that best fits our data, we can express the likelihood of a customer making a purchase. By assigning a threshold at 0.5 (or 50%), we can obtain reasonably accurate results.

The probability of an event occurring can never be below 0 or exceed 1 (or 100%) therefore we transform our linear function using a **Sigmoid function** in such a way as to create asymptotes at 0 and 1.

### Some pros and cons of Logistic Regression

#### Pros:

- Simple and efficient
- Low variance
- Models can be updated

#### Cons:

- Doesn’t handle large number of categorical variables well
- Requires transformation of non-linear features

### Code

Let’s take a look at how we could go about classifying data using Logistic Regression in python.

```
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
```

Machine learning models can’t handle categorical data, therefore it’s necessary to encode it in terms of numbers (i.e. male=0, female=1).

We can help gradient descent converge by ensuring the mean of each of our features is close to 0. This can be achieved by standardizing by applying the proceeding formula to each of our samples.

``` dataset = pd.read_csv('./data.csv') ``` ``` encoder = LabelEncoder() dataset['Gender'] = encoder.fit_transform(dataset['Gender']) ``` ``` X = dataset[['Gender', 'Age', 'EstimatedSalary']] y = dataset[['Purchased']] ``` ``` scaler = StandardScaler() X = scaler.fit_transform(X) ``` ``` train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0) ```By using the `LogisticRegression`

class from the `sklearn`

module, we can train our model and have it classify the customers in the test set.

```
classifier = LogisticRegression(random_state=0)
classifier.fit(train_X, train_y)
pred_y = classifier.predict(test_X)
```

We can compare the predictions made by our model to the actual values with the use of a confusion matrix. The numbers on the diagonal correspond to correct predictions.

`confusion_matrix(test_y, pred_y)`

The accuracy of our model is 63/80 = 0.7875 (or 78.75%).

**Cory Maklin**

_Sign in now to see your channels and recommendations!_www.youtube.com