README¶

In this project, a neural network is used to predict the car prices based on the distance driven, model year and and whether the vehicle has been in an accident. The prediction shows an average error of ~51%, thus the model could not learn from the distance driven, model year and accidents to predict the price. The model could be improved by adding more input features.

Source: https://www.kaggle.com/datasets/taeefnajib/used-car-price-prediction-dataset/data

Exploratory Data Analysis¶

Libraries¶

In [89]:
import pandas as pd
import matplotlib.pyplot as plt
import torch
from torch import nn
import seaborn as sns
from sklearn.model_selection import train_test_split

Load data¶

In [90]:
data = pd.read_csv("used_cars.csv")
data.head()
Out[90]:
brand model model_year milage fuel_type engine transmission ext_col int_col accident clean_title price
0 Ford Utility Police Interceptor Base 2013 51,000 mi. E85 Flex Fuel 300.0HP 3.7L V6 Cylinder Engine Flex Fuel Capa... 6-Speed A/T Black Black At least 1 accident or damage reported Yes $10,300
1 Hyundai Palisade SEL 2021 34,742 mi. Gasoline 3.8L V6 24V GDI DOHC 8-Speed Automatic Moonlight Cloud Gray At least 1 accident or damage reported Yes $38,005
2 Lexus RX 350 RX 350 2022 22,372 mi. Gasoline 3.5 Liter DOHC Automatic Blue Black None reported NaN $54,598
3 INFINITI Q50 Hybrid Sport 2015 88,900 mi. Hybrid 354.0HP 3.5L V6 Cylinder Engine Gas/Electric H... 7-Speed A/T Black Black None reported Yes $15,500
4 Audi Q3 45 S line Premium Plus 2021 9,835 mi. Gasoline 2.0L I4 16V GDI DOHC Turbo 8-Speed Automatic Glacier White Metallic Black None reported NaN $34,999

Preprocessing data¶

Model year

In [91]:
model_year = data["model_year"].max()-data["model_year"]
model_year = model_year.astype(float)
model_year = pd.DataFrame(model_year)

milage

In [92]:
milage = data["milage"]
milage = milage.str.replace("mi.","")
milage = milage.str.replace(",","")
milage = milage.astype(float)
milage = pd.DataFrame(milage)

Accident free

In [93]:
accident_free = data["accident"] == "None reported"
accident_free = accident_free.astype(int)

price

In [94]:
price = data["price"]
price = price.str.replace("$","")
price = price.str.replace(",","")
price = price.astype(float)
price = pd.DataFrame(price)

new dataframe

In [95]:
df = pd.concat([model_year,milage,accident_free,price], axis=1)
df.head()
Out[95]:
model_year milage accident price
0 11.0 51000.0 0 10300.0
1 3.0 34742.0 0 38005.0
2 2.0 22372.0 1 54598.0
3 9.0 88900.0 1 15500.0
4 3.0 9835.0 1 34999.0

Correlation

In [96]:
print(df.corr())
            model_year    milage  accident     price
model_year    1.000000  0.617720 -0.188222 -0.199496
milage        0.617720  1.000000 -0.272352 -0.305528
accident     -0.188222 -0.272352  1.000000  0.105135
price        -0.199496 -0.305528  0.105135  1.000000
In [97]:
fig = plt.figure(figsize=(12,8))
ax = plt.axes(projection='3d')

z = df["price"]
x = df["model_year"]
y = df["milage"]

ax.scatter(x, y, z,c=z, cmap="viridis", s=50)

ax.set_xlabel("Model Year")
ax.set_ylabel("Milage")
ax.set_zlabel("Price")
ax.set_title("Visualization Data")
plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)
plt.tight_layout()
plt.show()
No description has been provided for this image

Training Neural Network¶

Preprocessing¶

Copy dataframe

In [98]:
df_model = df.copy()

print(df_model.shape)
(4009, 4)
In [99]:
X = df_model[["model_year", "milage", "accident"]]
y = df_model["price"]

print(X.shape)
print(y.shape)
(4009, 3)
(4009,)

Split training and test data

In [100]:
X_train, X_test, y_train, y_test = train_test_split(
    X,y,
    test_size=0.05,
    random_state=42
)

print(f"Input training data:",X_train.shape, "Input test data:",X_test.shape)
print(y_train.shape, y_test.shape)
Input training data: (3808, 3) Input test data: (201, 3)
(3808,) (201,)

Neural network model¶

Training data to tensor

In [101]:
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32) #.values in pytroch array,
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).reshape(-1,1)

print(X_train_tensor.shape)
print(y_train_tensor.shape)
torch.Size([3808, 3])
torch.Size([3808, 1])

Normilize data

In [102]:
X_mean = X_train_tensor.mean(axis=0)
X_std = X_train_tensor.std(axis=0)
X_train_tensor = (X_train_tensor - X_mean) / X_std
In [103]:
y_mean = y_train_tensor.mean()
y_std = y_train_tensor.std()
y_train_tensor = (y_train_tensor - y_mean) / y_std

Model

In [104]:
model = nn.Sequential(
    nn.Linear(3, 16),
    nn.ReLU(),
    nn.Linear(16, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)

loss function and optimizer

In [105]:
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Training loop

In [106]:
# Training loop
losses_list = []
for i in range(0,3000):  
    optimizer.zero_grad() 
    # model
    outputs = model(X_train_tensor)
    # loss
    loss = loss_fn(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()

    # loss fucntion
    losses_list.append(loss.item())

    if i % 500 == 0:
        print(loss)
tensor(1.0066, grad_fn=<MseLossBackward0>)
tensor(0.5707, grad_fn=<MseLossBackward0>)
tensor(0.5497, grad_fn=<MseLossBackward0>)
tensor(0.5315, grad_fn=<MseLossBackward0>)
tensor(0.4996, grad_fn=<MseLossBackward0>)
tensor(0.4637, grad_fn=<MseLossBackward0>)

plot loss function

In [107]:
plt.figure(figsize=(8,5))
plt.plot(losses_list)
plt.xlabel("Iteration")
plt.ylabel("Loss")
plt.title("Loss vs Iterations")
plt.grid(True)
plt.show()
No description has been provided for this image

Validation¶

test data to tensor

In [108]:
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32) #.values in pytroch array,
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).reshape(-1,1)

print(X_test_tensor.shape)
print(y_test_tensor.shape)
torch.Size([201, 3])
torch.Size([201, 1])

Make prediction

In [109]:
prediction = model((X_test_tensor - X_mean) / X_std)
prediction_orig = prediction * y_std + y_mean
percent_error = torch.abs((y_test_tensor - prediction_orig)/y_test_tensor)*100

Compute average error

In [110]:
# Mean Absolute Percentage Error (MAPE)
mean_error = torch.mean(percent_error)

# print("Percentage error per sample:", percent_error)
print(f"Mean Absolute Percentage Error: {mean_error:.2f}%")
Mean Absolute Percentage Error: 51.00%