模式识别|数据集(鸢尾花卉)
文章目录
- 数据集:鸢尾花卉
-
- 数据集介绍
- 数据集元数据
- 专有名词
- 预处理
- 数据预览
- 分组统计
- 统计量操作
- 绘图操作
数据集:鸢尾花卉 进阶分析:【 k k k近邻法】
数据集介绍
数据集来源 | UCI 数据库 Iris |
---|---|
概述 | UCI Iris 数据集原始数据,根据鸢花萼片和花瓣4项指标对花的种类进行分类。 |
数据介绍 | UCI Iris是常用分类建模数据集,通过花萼长度,花萼宽度,花瓣长度,花瓣宽度4个属性预测鸢尾花卉属于(Setosa,Versicolour,Virginica)三个种类中的哪一类。 |
属性数 | 5 |
记录数 | 150 |
无缺失值记录数 | 150 |
名称 | 数据类型 |
---|---|
萼片宽度 | float |
萼片长度 | float |
花瓣长度 | float |
花瓣宽度 | float |
鸢花种类 | string |
中文 | 英文 |
---|---|
萼片 | C a l y x Calyx Calyx |
花瓣 | P e t a l Petal Petal |
长度 | L e n g t h Length Length |
宽度 | W i d t h Width Width |
鸢花 | I r i s Iris Iris |
# 导包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt# 读取文件
fname = 'iris.data'
with open(fname,'r+',encoding='utf-8') as f:
s = [i[:-1].split(',') for i in f.readlines()]# pandas读取数据,样本数各50个
names = ['slength','swidth','plength','pwidth','name']
iris = pd.DataFrame(data=https://www.it610.com/article/s,columns=names)
iris
数据预览
索引 | slength | swidth | plength | pwidth | name |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
… | … | … | … | … | … |
146 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
150 | None | None | None | None | None |
分组统计
# 三种类别
Setosa = iris.iloc[0:50,0:4].astype('float')
Versicolour = iris.iloc[50:100,0:4].astype('float')
Virginica = iris.iloc[100:150,0:4].astype('float')# 统计每个品种有多少个样本
iris['name'].value_counts()
整体样本数
:Iris-versicolor50
Iris-virginica50
Iris-setosa50
Name: name, dtype: int64
统计每个类别每个数据点出现的个数
:# setosa
Setosa.loc[:,'slength'].value_counts(),Setosa.loc[:,'swidth'].value_counts(),Setosa.loc[:,'plength'].value_counts(),Setosa.loc[:,'pwidth'].value_counts()
# versicolour
Versicolour.loc[:,'slength'].value_counts(),Versicolour.loc[:,'swidth'].value_counts(),Versicolour.loc[:,'plength'].value_counts(),Versicolour.loc[:,'pwidth'].value_counts()
# virginica
Virginica.loc[:,'slength'].value_counts(),Virginica.loc[:,'swidth'].value_counts(),Virginica.loc[:,'plength'].value_counts(),Virginica.loc[:,'pwidth'].value_counts()
分组样本数
:# Setosa:
(5.08
5.18
4.85
5.45
4.94
4.64
5.23
4.43
4.72
5.72
5.52
4.51
5.81
4.31
5.31
Name: slength, dtype: int64,
3.49
3.56
3.06
3.15
3.25
3.84
3.73
3.92
3.32
3.62
2.91
4.11
4.41
4.21
2.31
4.01
Name: swidth, dtype: int64,
1.514
1.412
1.67
1.37
1.74
1.92
1.22
1.11
1.01
Name: plength, dtype: int64,
0.228
0.47
0.37
0.16
0.51
0.61
Name: pwidth, dtype: int64)# Versicolour:
(5.65
5.75
5.55
6.04
6.14
6.33
6.73
5.83
6.22
5.02
6.42
6.62
5.92
5.21
5.41
5.11
7.01
6.81
4.91
6.51
6.91
Name: slength, dtype: int64,
3.08
2.97
2.86
2.75
2.54
2.63
2.43
3.13
2.33
3.23
2.22
2.01
3.31
3.41
Name: swidth, dtype: int64,
4.57
4.75
4.05
4.44
4.24
3.93
4.13
4.63
3.52
4.92
4.82
4.32
3.32
3.81
3.71
5.11
3.61
5.01
3.01
Name: plength, dtype: int64,
1.313
1.510
1.07
1.47
1.25
1.63
1.13
1.81
1.71
Name: pwidth, dtype: int64)# Virginica:
(6.36
6.75
6.45
7.74
6.54
6.93
7.23
5.83
6.12
6.82
6.22
6.02
5.61
7.31
7.61
7.91
7.11
5.71
4.91
5.91
7.41
Name: slength, dtype: int64,
3.012
2.88
3.25
2.54
3.14
2.74
3.33
2.62
3.82
2.92
3.42
2.21
3.61
Name: swidth, dtype: int64,
5.17
5.66
5.83
4.93
6.13
5.73
5.03
5.53
4.82
5.42
5.32
6.02
5.92
5.22
6.72
6.61
6.41
4.51
6.31
6.91
Name: plength, dtype: int64,
1.811
2.38
2.06
2.16
1.95
2.43
2.53
2.23
1.52
1.61
1.71
1.41
Name: pwidth, dtype: int64)
统计量操作
求和
:# 分组求和
print('Setosa: ',np.sum(np.array(Setosa), axis=0))
print('Versicolour: ',np.sum(np.array(Versicolour), axis=0))
print('Virginica: ',np.sum(np.array(Virginica), axis=0))
Setosa:[250.3 170.973.212.2]
Versicolour:[296.8 138.5 213.66.3]
Virginica:[329.4 148.7 277.6 101.3]
均值
:# 分组求均值
print('Setosa: ',np.mean(np.array(Setosa), axis=0))
print('Versicolour: ',np.mean(np.array(Versicolour), axis=0))
print('Virginica: ',np.mean(np.array(Virginica), axis=0))
Setosa:[5.006 3.418 1.464 0.244]
Versicolour:[5.936 2.774.261.326]
Virginica:[6.588 2.974 5.552 2.026]
最大值
:# 分组求最大值
print('Setosa: ',np.amax(np.array(Setosa), axis=0))
print('Versicolour: ',np.amax(np.array(Versicolour), axis=0))
print('Virginica: ',np.amax(np.array(Virginica), axis=0))
Setosa:[5.8 4.4 1.9 0.6]
Versicolour:[7.3.4 5.1 1.8]
Virginica:[7.9 3.8 6.9 2.5]
最小值
:# 分组求最小值
print('Setosa: ',np.amin(np.array(Setosa), axis=0))
print('Versicolour: ',np.amin(np.array(Versicolour), axis=0))
print('Virginica: ',np.amin(np.array(Virginica), axis=0))
Setosa:[4.3 2.3 1.0.1]
Versicolour:[4.9 2.3.1. ]
Virginica:[4.9 2.2 4.5 1.4]
平方根
:# 分组求平方根
print('Setosa: ')
print(np.sqrt(np.array(Setosa)))
print('Versicolour: ')
print(np.sqrt(np.array(Versicolour)))
print('Virginica: ')
print(np.sqrt(np.array(Virginica)))
Setosa:
[[2.25831796 1.87082869 1.18321596 0.4472136 ]
[2.21359436 1.73205081 1.18321596 0.4472136 ]
[2.16794834 1.78885438 1.14017543 0.4472136 ]
[2.14476106 1.76068169 1.22474487 0.4472136 ]
[2.23606798 1.89736661.18321596 0.4472136 ]
[2.32379001 1.97484177 1.30384048 0.63245553]
[2.14476106 1.84390889 1.18321596 0.54772256]
[2.23606798 1.84390889 1.22474487 0.4472136 ]
[2.09761771.70293864 1.18321596 0.4472136 ]
[2.21359436 1.76068169 1.22474487 0.31622777]
[2.32379001 1.92353841 1.22474487 0.4472136 ]
[2.19089023 1.84390889 1.26491106 0.4472136 ]
[2.19089023 1.73205081 1.18321596 0.31622777]
[2.07364414 1.73205081 1.04880885 0.31622777]
[2.40831892 2.1.09544512 0.4472136 ]
[2.38746728 2.09761771.22474487 0.63245553]
[2.32379001 1.97484177 1.14017543 0.63245553]
[2.25831796 1.87082869 1.18321596 0.54772256]
[2.38746728 1.94935887 1.30384048 0.54772256]
[2.25831796 1.94935887 1.22474487 0.54772256]
[2.32379001 1.84390889 1.30384048 0.4472136 ]
[2.25831796 1.92353841 1.22474487 0.63245553]
[2.14476106 1.89736661.0.4472136 ]
[2.25831796 1.81659021 1.30384048 0.70710678]
[2.19089023 1.84390889 1.37840488 0.4472136 ]
[2.23606798 1.73205081 1.26491106 0.4472136 ]
[2.23606798 1.84390889 1.26491106 0.63245553]
[2.28035085 1.87082869 1.22474487 0.4472136 ]
[2.28035085 1.84390889 1.18321596 0.4472136 ]
[2.16794834 1.78885438 1.26491106 0.4472136 ]
[2.19089023 1.76068169 1.26491106 0.4472136 ]
[2.32379001 1.84390889 1.22474487 0.63245553]
[2.28035085 2.02484567 1.22474487 0.31622777]
[2.34520788 2.04939015 1.18321596 0.4472136 ]
[2.21359436 1.76068169 1.22474487 0.31622777]
[2.23606798 1.78885438 1.09544512 0.4472136 ]
[2.34520788 1.87082869 1.14017543 0.4472136 ]
[2.21359436 1.76068169 1.22474487 0.31622777]
[2.09761771.73205081 1.14017543 0.4472136 ]
[2.25831796 1.84390889 1.22474487 0.4472136 ]
[2.23606798 1.87082869 1.14017543 0.54772256]
[2.12132034 1.51657509 1.14017543 0.54772256]
[2.09761771.78885438 1.14017543 0.4472136 ]
[2.23606798 1.87082869 1.26491106 0.77459667]
[2.25831796 1.94935887 1.37840488 0.63245553]
[2.19089023 1.73205081 1.18321596 0.54772256]
[2.25831796 1.94935887 1.26491106 0.4472136 ]
[2.14476106 1.78885438 1.18321596 0.4472136 ]
[2.30217289 1.92353841 1.22474487 0.4472136 ]
[2.23606798 1.81659021 1.18321596 0.4472136 ]]
Versicolour:
[[2.64575131 1.78885438 2.16794834 1.18321596]
[2.52982213 1.78885438 2.12132034 1.22474487]
[2.62678511 1.76068169 2.21359436 1.22474487]
[2.34520788 1.51657509 2.1.14017543]
[2.54950976 1.67332005 2.14476106 1.22474487]
[2.38746728 1.67332005 2.12132034 1.14017543]
[2.50998008 1.81659021 2.16794834 1.26491106]
[2.21359436 1.54919334 1.81659021 1.]
[2.56904652 1.70293864 2.14476106 1.14017543]
[2.28035085 1.64316767 1.97484177 1.18321596]
[2.23606798 1.41421356 1.87082869 1.]
[2.42899156 1.73205081 2.04939015 1.22474487]
[2.44948974 1.48323972.1.]
[2.46981781 1.70293864 2.16794834 1.18321596]
[2.36643191 1.70293864 1.89736661.14017543]
[2.58843582 1.76068169 2.09761771.18321596]
[2.36643191 1.73205081 2.12132034 1.22474487]
[2.40831892 1.64316767 2.02484567 1.]
[2.48997992 1.48323972.12132034 1.22474487]
[2.36643191 1.58113883 1.97484177 1.04880885]
[2.42899156 1.78885438 2.19089023 1.34164079]
[2.46981781 1.67332005 2.1.14017543]
[2.50998008 1.58113883 2.21359436 1.22474487]
[2.46981781 1.67332005 2.16794834 1.09544512]
[2.52982213 1.70293864 2.07364414 1.14017543]
[2.56904652 1.73205081 2.09761771.18321596]
[2.60768096 1.67332005 2.19089023 1.18321596]
[2.58843582 1.73205081 2.23606798 1.30384048]
[2.44948974 1.70293864 2.12132034 1.22474487]
[2.38746728 1.61245155 1.87082869 1.]
[2.34520788 1.54919334 1.94935887 1.04880885]
[2.34520788 1.54919334 1.92353841 1.]
[2.40831892 1.64316767 1.97484177 1.09544512]
[2.44948974 1.64316767 2.25831796 1.26491106]
[2.32379001 1.73205081 2.12132034 1.22474487]
[2.44948974 1.84390889 2.12132034 1.26491106]
[2.58843582 1.76068169 2.16794834 1.22474487]
[2.50998008 1.51657509 2.09761771.14017543]
[2.36643191 1.73205081 2.02484567 1.14017543]
[2.34520788 1.58113883 2.1.14017543]
[2.34520788 1.61245155 2.09761771.09544512]
[2.46981781 1.73205081 2.14476106 1.18321596]
[2.40831892 1.61245155 2.1.09544512]
[2.23606798 1.51657509 1.81659021 1.]
[2.36643191 1.64316767 2.04939015 1.14017543]
[2.38746728 1.73205081 2.04939015 1.09544512]
[2.38746728 1.70293864 2.04939015 1.14017543]
[2.48997992 1.70293864 2.07364414 1.14017543]
[2.25831796 1.58113883 1.73205081 1.04880885]
[2.38746728 1.67332005 2.02484567 1.14017543]]
Virginica:
[[2.50998008 1.81659021 2.44948974 1.58113883]
[2.40831892 1.64316767 2.25831796 1.37840488]
[2.66458252 1.73205081 2.42899156 1.44913767]
[2.50998008 1.70293864 2.36643191 1.34164079]
[2.54950976 1.73205081 2.40831892 1.4832397 ]
[2.75680975 1.73205081 2.56904652 1.44913767]
[2.21359436 1.58113883 2.12132034 1.30384048]
[2.70185122 1.70293864 2.50998008 1.34164079]
[2.58843582 1.58113883 2.40831892 1.34164079]
[2.68328157 1.89736662.46981781 1.58113883]
[2.54950976 1.78885438 2.25831796 1.41421356]
[2.52982213 1.64316767 2.30217289 1.37840488]
[2.60768096 1.73205081 2.34520788 1.44913767]
[2.38746728 1.58113883 2.23606798 1.41421356]
[2.40831892 1.67332005 2.25831796 1.54919334]
[2.52982213 1.78885438 2.30217289 1.51657509]
[2.54950976 1.73205081 2.34520788 1.34164079]
[2.77488739 1.94935887 2.58843582 1.4832397 ]
[2.77488739 1.61245155 2.62678511 1.51657509]
[2.44948974 1.48323972.23606798 1.22474487]
[2.62678511 1.78885438 2.38746728 1.51657509]
[2.36643191 1.67332005 2.21359436 1.41421356]
[2.77488739 1.67332005 2.58843582 1.41421356]
[2.50998008 1.64316767 2.21359436 1.34164079]
[2.58843582 1.81659021 2.38746728 1.44913767]
[2.68328157 1.78885438 2.44948974 1.34164079]
[2.48997992 1.67332005 2.19089023 1.34164079]
[2.46981781 1.73205081 2.21359436 1.34164079]
[2.52982213 1.67332005 2.36643191 1.44913767]
[2.68328157 1.73205081 2.40831892 1.26491106]
[2.72029411.67332005 2.46981781 1.37840488]
[2.81069386 1.94935887 2.52982213 1.41421356]
[2.52982213 1.67332005 2.36643191 1.4832397 ]
[2.50998008 1.67332005 2.25831796 1.22474487]
[2.46981781 1.61245155 2.36643191 1.18321596]
[2.77488739 1.73205081 2.46981781 1.51657509]
[2.50998008 1.84390889 2.36643191 1.54919334]
[2.52982213 1.76068169 2.34520788 1.34164079]
[2.44948974 1.73205081 2.19089023 1.34164079]
[2.62678511 1.76068169 2.32379001 1.44913767]
[2.58843582 1.76068169 2.36643191 1.54919334]
[2.62678511 1.76068169 2.25831796 1.51657509]
[2.40831892 1.64316767 2.25831796 1.37840488]
[2.60768096 1.78885438 2.42899156 1.51657509]
[2.58843582 1.81659021 2.38746728 1.58113883]
[2.58843582 1.73205081 2.28035085 1.51657509]
[2.50998008 1.58113883 2.23606798 1.37840488]
[2.54950976 1.73205081 2.28035085 1.41421356]
[2.48997992 1.84390889 2.32379001 1.51657509]
[2.42899156 1.73205081 2.25831796 1.34164079]]
标准差
:# 分组求标准差
print('Setosa: ',np.std(np.array(Setosa), axis=0))
print('Versicolour: ',np.std(np.array(Versicolour), axis=0))
print('Virginica: ',np.std(np.array(Virginica), axis=0))
Setosa:[0.34894699 0.37719491 0.17176728 0.10613199]
Versicolour:[0.51098337 0.31064449 0.46518813 0.19576517]
Virginica:[0.62948868 0.31925538 0.54634787 0.27188968]
方差
:# 分组求方差
print('Setosa: ',np.var(np.array(Setosa), axis=0))
print('Versicolour: ',np.var(np.array(Versicolour), axis=0))
print('Virginica: ',np.var(np.array(Virginica), axis=0))
Setosa:[0.121764 0.142276 0.029504 0.011264]
Versicolour:[0.261104 0.09650.21640.038324]
Virginica:[0.396256 0.101924 0.298496 0.073924]
协方差
:# 分组求协方差
print('Setosa: ',np.cov(np.mean(np.array(Setosa), axis=0)))
print('Versicolour: ',np.cov(np.mean(np.array(Versicolour), axis=0)))
print('Virginica: ',np.cov(np.mean(np.array(Virginica), axis=0)))
Setosa:4.427078666666667
Versicolour:3.916518666666666
Virginica:4.576966666666664
绘图操作
# 散点图
ax = Setosa.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx');
Setosa.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax);
plt.xlabel('$Length$');
plt.ylabel('$Width$');
plt.title('$Setosa$');
ax = Versicolour.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx');
Versicolour.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax);
plt.xlabel('$Length$');
plt.ylabel('$Width$');
plt.title('$Versicolour$');
ax = Virginica.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx');
Virginica.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax);
plt.xlabel('$Length$');
plt.ylabel('$Width$');
plt.title('$Virginica$');
图示
:文章图片
# 直方图
plt.bar([1,2,3,4],np.mean(np.array(Setosa), axis=0),label='Setosa');
plt.bar([8,9,10,11],np.mean(np.array(Versicolour), axis=0),label='Versicolour');
plt.bar([15,16,17,18],np.mean(np.array(Virginica), axis=0),label='Virginica');
plt.legend();
plt.xticks((1,2,3,4,8,9,10,11,15,16,17,18),('sl','sw','pl','pw','sl','sw','pl','pw','sl','sw','pl','pw'));
plt.title('The different kinds of mean in three kinds of flowers');
图示
:文章图片
# 盒图
blt = plt.boxplot(np.array(Setosa), notch=False, sym='o',vert=True, patch_artist=True);
colors = ['pink', 'lightblue', 'lightgreen']
for pacthes, color in zip(bplt['boxes'], colors):
pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth'))
plt.title('Setosa')
plt.show()blt = plt.boxplot(np.array(Versicolour), notch=False, sym='o',vert=True, patch_artist=True);
colors = ['pink', 'lightblue', 'lightgreen']
for pacthes, color in zip(bplt['boxes'], colors):
pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth'))
plt.title('Versicolour')
plt.show()blt = plt.boxplot(np.array(Virginica), notch=False, sym='o',vert=True, patch_artist=True);
colors = ['pink', 'lightblue', 'lightgreen']
for pacthes, color in zip(bplt['boxes'], colors):
pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth'))
plt.title('Virginica')
plt.show()
【模式识别|数据集(鸢尾花卉)】
图示
:文章图片
推荐阅读
- Docker应用:容器间通信与Mariadb数据库主从复制
- 使用协程爬取网页,计算网页数据大小
- 人脸识别|【人脸识别系列】| 实现自动化妆
- Java|Java基础——数组
- Python数据分析(一)(Matplotlib使用)
- Jsr303做前端数据校验
- Spark|Spark 数据倾斜及其解决方案
- 数据库设计与优化
- 爬虫数据处理HTML转义字符
- 数据库总结语句