DS #12 Medical Insurance Cost Prediction

Columns Description :

  • Age: Age of primary beneficiary
    - Sex: Primary beneficiary’s gender
    - BMI: Body mass index (providing an understanding of the body, weights that are relatively high or low relative to height)
    - Children: Number of children covered by health insurance / Number of dependents
    - Smoker: Smoking (yes, no)
    - Region: Beneficiary’s residential area in the US (northeast, southeast, southwest, northwest)
    - Charges: Individual medical costs billed by health insurance
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import io
from google.colab import files
uploaded = files.upload()
insurance_dataset = pd.read_csv(io.BytesIO(uploaded['insurance.csv']))
insurance_dataset.head()
insurance_dataset.shape
insurance_dataset.info()
insurance_dataset.isnull().sum()

Exploratory Data Analysis

insurance_dataset.describe
sns.set()
plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['age'])
plt.title('Age Distribution')
plt.show()
plt.figure(figsize=(6,6))
sns.countplot(x='sex', data=insurance_dataset)
plt.title('Sex Distribution')
plt.show()
plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['bmi'])
plt.title('BMI Distribution')
plt.show()
plt.figure(figsize=(6,6))
sns.countplot(x='children', data=insurance_dataset)
plt.title('Children')
plt.show()
plt.figure(figsize=(6,6))
sns.countplot(x='smoker', data=insurance_dataset)
plt.title('smoker')
plt.show()
plt.figure(figsize=(6,6))
sns.countplot(x='region', data=insurance_dataset)
plt.title('region')
plt.show()
plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['charges'])
plt.title('Charges Distribution')
plt.show()

Data Pre-Processing

# encoding sex columninsurance_dataset.replace({'sex':{'male':0,'female':1}}, inplace=True)# encoding 'smoker' columninsurance_dataset.replace({'smoker':{'yes':0,'no':1}}, inplace=True)# encoding 'region' columninsurance_dataset.replace({'region':{'southeast':0,'southwest':1,'northeast':2,'northwest':3}}, inplace=True)
X = insurance_dataset.drop(columns='charges', axis=1)
Y = insurance_dataset['charges']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

Model Training

regressor = LinearRegression()
regressor.fit(X_train, Y_train)
training_data_prediction =regressor.predict(X_train)
r2_train = metrics.r2_score(Y_train, training_data_prediction)
print('R squared vale : ', r2_train)
test_data_prediction =regressor.predict(X_test)
r2_test = metrics.r2_score(Y_test, test_data_prediction)
print('R squared vale : ', r2_test)

input_data = (31,1,25.74,0,1,0)
#changing input_data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)
# reshape the array
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
prediction = regressor.predict(input_data_reshaped)
print(prediction)
print('The insurance cost is USD ', prediction[0])

Unlisted

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Product Recommendation at Criteo

An explainer on Tree Ensemble Layer

Machine Learning: A Journey From Linear Regression to Logistic Regression

Simple Explanation: From Logistic Regression to Neural Network — Part 2

Understanding Confusion Matrix, Precision-Recall, and F1-Score

Enable GPU for Soft Actor Critic with 4 lines of codes

The Fuzz about Quantum Convolutional Neural Networks

Machine Learning Writing Month: Generative Models

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store