Take care of your missing data in the data preprocessing

作者: shaneZhang 分类: 人工智能相关,机器学习基础知识 发布时间: 2020-10-13 10:46
# import the stand library
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
print(X)
y = dataset.iloc[:, 3].values
print(y)

# Taking care of missing data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
print(X)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

print(X_train)
print(X_test)
print(y_train)
print(y_test)

如果觉得我的文章对您有用,请随意打赏。如果有其他问题请联系博主QQ(909491009)或者下方留言!

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注