# Linear Regression in Machine Learning: part02

*WELCOME TO SECOND PART OF YOUR LINEAR REGRESSION POST. IN THE FIRST POST WE SEE OPERATION ON DATASETS. IN THIS POST WE SEE LINEAR REGRESSION OPERATION. LET'S START:*## Training a Linear Regression Model

Let's now begin to train out regression models. We will need to first split up our data into an X array that contain the feature to train on, and a y array with the target variables, in this case the Price column. We will toss out the Address columns because it only has text info that the linear regression model can not use.

### X and y array

## Train Test Splits

Now let's splits the data into a training set and a testing set. We will train out models on the training set and then use the test set to evaluate the modes

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

### from sklearn.linear_model import LinearRegression

from sklearn.linear_model import LinearRegression

lm = LinearRegression()

lm.fit(X_train,y_train)

#### Model Evaluation

Let's evaluate the model by checking out it is coefficient and how we can interpret them

# print the intercept print(lm.intercept_)

-2640159.79685

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient']) coeff_df

Interpreting the coefficients: - Holding all other features fixed, a 1 unit increase in **Avg. Area Income** is associated with an **increase of \$21.52 **. - Holding all other features fixed, a 1 unit increase in **Avg. Area House Age** is associated with an **increase of \$164883.28 **. - Holding all other features fixed, a 1 unit increase in **Avg. Area Number of Rooms** is associated with an **increase of \$122368.67 **. - Holding all other features fixed, a 1 unit increase in **Avg. Area Number of Bedrooms** is associated with an **increase of \$2233.80 **. - Holding all other features fixed, a 1 unit increase in **Area Population** is associated with an **increase of \$15.15 **. Does this make sense? Probably not because I made up this data. If you want real data to repeat this sort of analysis, check out the [boston dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html):

`from sklearn.datasets import load_boston boston = load_boston() print(boston.DESCR) boston_df = boston.data`

predictions = lm.predict(X_test)

plt.scatter(y_test,predictions)

<matplotlib.collections.PathCollection at 0x142622c88>

Residual Histogram

sns.distplot((y_test-predictions));

## Regression Evaluation Metric

Here are three common evaluation metric for regression problem:

Mean Absolute Error(MAE) is the mean of the absolute value of the error:1𝑛∑𝑖=1𝑛|𝑦𝑖−𝑦̂ 𝑖| $$\frac{1}{n}\sum _{i=1}^{n}|{y}_{i}-{\hat{y}}_{i}|$$

Mean Squared Error(MSE) is the mean of the squared error:1𝑛∑𝑖=1𝑛(𝑦𝑖−𝑦̂ 𝑖)2 $$\frac{1}{n}\sum _{i=1}^{n}({y}_{i}-{\hat{y}}_{i}{)}^{2}$$

Root Mean Squared Error(RMSE) is the square root of the mean of the squared error:1𝑛∑𝑖=1𝑛(𝑦𝑖−𝑦̂ 𝑖)2⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ $$\sqrt{\frac{1}{n}\sum _{i=1}^{n}({y}_{i}-{\hat{y}}_{i}{)}^{2}}$$Comparing these metric:

**MAE**is the easiest to understand, because it is the average error.**MSE**is more popular than MAE, because MSE "punishes" larger error, which tends to be useful in the real world.**RMSE**is more popular than MSE, because RMSE is interpretableS in the "y" units.

All of these are **loss function**, because we want to minimize them.

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions)) print('MSE:', metrics.mean_squared_error(y_test, predictions)) print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

MAE: 82288.2225191 MSE: 10460958907.2 RMSE: 102278.829223

This was your Machine Learning Project!

**Tags: Linear Regression in Machine Learning-plot-algorithms-explain**

FOR THE FIRST PART OF PROJECT *CLICK HERE.*

* BEST OF LUCK!!!*