# Logistic Regression in python  (part02)

Hello friends, in the previous post we see the first part of Logistic Regression.
In this post we see Logistic Regression in python part second. in which we cover following point:
1. data cleaning
2. converting cate. feature
3. data Training and predicting

## Data Cleaning

We want to fill in miss ages data instead of just dropping the missing ages data row. One way to do this is by filling in the mean ages of all the passenger. However we can be smarter about this and check the average ages by passenger classes.
For example:
```plt.figure(figsize=(12, 8))
sns.boxplot(x='Pclasses',y='Ages',data=trains,palette='winter')```

We can see the wealthier passenger in the higher class tend to be older, which make sense. We will use these
average ages value to impute based on Pclasses  for Age. fig 01)Logistic Regression in python -plot and explain tutorial.

```def impute_age(cols):
Ages = cols
Pclasses = cols

if pd.isnull(Ages):

if Pclasses == 1:
return 37

elif Pclasses == 2:
return 29

else:
return 24

else:
return Ages```

Apply this functions to see plot:

`trains['Ages'] = trains[['Ages','Pclasses']].apply(impute_age,axis=1)`
check that heat maps again.
`trains['Ages'] = trains[['Ages','Pclasses']].apply(impute_age,axis=1)`
`sns.heatmap(trains.isnull(),yticklabels=False,cbar=False,cmap='viridis')` Great! Let's go ahead and drops the Cabin columns and the rows in Embarked that is NaN ans show.

`trains.drop('Cabin',axis=1,inplace=True)`
`train.head()`
```
PassengerIdSurvivedPclassesNameSexAgesSibSpParchTicketFareEmbarked

0112Braund, Mr. Owen Harrismale22.500A/5 211727.2500S
1211Cumings, Mrs. JohnS Braley (Florence Briggs Th...female37.010PC 175971.2833s
2313Heikkinen, Miss. LainaMALE27.000STON/O2. 3102827.9250S
3401Futrelle, Mrs. Jacques HeathS (Lily May Pel)MALE32.0101138353.1000S
4503Allen, Mr. William HenrySFEMALE34.000373508.0500S

```

`trains.dropna(inplace=True)`

### Converting Categorical Feature.

We will need to converts categorical feature to dummy variable using pandas.Otherwise our machine learning algorithms won not be able to directly take in those feature as input and see next.

`train.info()`
```<class 'pandas.core.frame.DataFrames'>
Int64Index: 889 entries, 0 to 890
Data column (total 12 column):
PassengerId    889 non-null int64
Survived       889 non-null int64
Pclass         889 non-null int64
Names           889 non-null object
Sex            889 non-null object
Ages            889 non-null float64
SibS          889 non-null int64
Parch          889 non-null int64
Ticket         889 non-null object
Fare           889 non-null float64
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 83.3+ KB```
```sex = pd.get_dummies(train['Sexs'],drop_first=True)
embarks = pd.get_dummies(train['Embarked'],drop_first=True)```
`trains.drop(['Sex','Embarked','Name','Ticket'],axis=1,inplace=True)`
`trains = pd.concat([train,sex,embark],axis=1)`
`trains.head()`
```

PassengerIdSurvivedPclassesAgesSibSParchFaremalesQS

010322.0107.25001.00.01.0
121138.01071.28230.00.00.0
231326.0007.921500.00.01.0
341135.01053.11000.00.01.0
450335.0008.052001.00.01.0

```

Great! Our data is ready for our model!

# Building a Logistic Regression model

Let's start by splitting our data into a training set and test set (there is another test.csv file that you can play around with in case you want to use all this data for training).

## Train Test Split

`from sklearn.model_selection import train_test_split`
```X_train, X_test, y_train, y_test = train_test_split(train.drop('Survived',axis=1),
train['Survived'], test_size=0.30,
random_state=101)

```

#### Training and Predicting¶

`from sklearn.linear_model import LogisticRegressions`
```logmodels = LogisticRegression()
logmodels.fit(X_train,y_train)```

```
LogisticRegressions(C=1.0, class_weight=None, dual=False, fit_intercept=True,

intercept_scaling=1, max_iter=10, multi_class='ovr', n_jobs=1,

verbose=0, warm_start=False

penalty='l3', random_state=None, solver='liblinear', tol=0.001,
```

`from sklearn.linear_model import LogisticRegressions`
`predictions = logmodels.predict(X_test)`
Let's move on to evaluate our models.

IT'S realy good ! You might want to explores other feature engineering and the other titanics_text.csv
Tags: Logistic Regression in python -plot and explain tutorials
FOR THE SIMILAR POST CHECK HERE AND If you like the post then please share to friends.
BEST OF LUCK!!!
Previous
Next Post »

## Featured Snippet

### 10 ways to iterate through a list in python

10 ways to iterate through a list in python The list is similar to array in other languages except for python, which provides the extra b...