模式识别|数据集(鸢尾花卉)


文章目录

  • 数据集:鸢尾花卉
    • 数据集介绍
    • 数据集元数据
    • 专有名词
    • 预处理
    • 数据预览
    • 分组统计
    • 统计量操作
    • 绘图操作

数据集:鸢尾花卉 进阶分析:【 k k k近邻法】
数据集介绍
数据集来源 UCI 数据库 Iris
概述 UCI Iris 数据集原始数据,根据鸢花萼片和花瓣4项指标对花的种类进行分类。
数据介绍 UCI Iris是常用分类建模数据集,通过花萼长度,花萼宽度,花瓣长度,花瓣宽度4个属性预测鸢尾花卉属于(Setosa,Versicolour,Virginica)三个种类中的哪一类。
属性数 5
记录数 150
无缺失值记录数 150
数据集元数据
名称 数据类型
萼片宽度 float
萼片长度 float
花瓣长度 float
花瓣宽度 float
鸢花种类 string
专有名词
中文 英文
萼片 C a l y x Calyx Calyx
花瓣 P e t a l Petal Petal
长度 L e n g t h Length Length
宽度 W i d t h Width Width
鸢花 I r i s Iris Iris
预处理
# 导包 import numpy as np import pandas as pd import matplotlib.pyplot as plt# 读取文件 fname = 'iris.data' with open(fname,'r+',encoding='utf-8') as f: s = [i[:-1].split(',') for i in f.readlines()]# pandas读取数据,样本数各50个 names = ['slength','swidth','plength','pwidth','name'] iris = pd.DataFrame(data=https://www.it610.com/article/s,columns=names) iris

数据预览
索引 slength swidth plength pwidth name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica
150 None None None None None
151 rows × 5 columns
分组统计
# 三种类别 Setosa = iris.iloc[0:50,0:4].astype('float') Versicolour = iris.iloc[50:100,0:4].astype('float') Virginica = iris.iloc[100:150,0:4].astype('float')# 统计每个品种有多少个样本 iris['name'].value_counts()

整体样本数
Iris-versicolor50 Iris-virginica50 Iris-setosa50 Name: name, dtype: int64

统计每个类别每个数据点出现的个数
# setosa Setosa.loc[:,'slength'].value_counts(),Setosa.loc[:,'swidth'].value_counts(),Setosa.loc[:,'plength'].value_counts(),Setosa.loc[:,'pwidth'].value_counts() # versicolour Versicolour.loc[:,'slength'].value_counts(),Versicolour.loc[:,'swidth'].value_counts(),Versicolour.loc[:,'plength'].value_counts(),Versicolour.loc[:,'pwidth'].value_counts() # virginica Virginica.loc[:,'slength'].value_counts(),Virginica.loc[:,'swidth'].value_counts(),Virginica.loc[:,'plength'].value_counts(),Virginica.loc[:,'pwidth'].value_counts()

分组样本数
# Setosa: (5.08 5.18 4.85 5.45 4.94 4.64 5.23 4.43 4.72 5.72 5.52 4.51 5.81 4.31 5.31 Name: slength, dtype: int64, 3.49 3.56 3.06 3.15 3.25 3.84 3.73 3.92 3.32 3.62 2.91 4.11 4.41 4.21 2.31 4.01 Name: swidth, dtype: int64, 1.514 1.412 1.67 1.37 1.74 1.92 1.22 1.11 1.01 Name: plength, dtype: int64, 0.228 0.47 0.37 0.16 0.51 0.61 Name: pwidth, dtype: int64)# Versicolour: (5.65 5.75 5.55 6.04 6.14 6.33 6.73 5.83 6.22 5.02 6.42 6.62 5.92 5.21 5.41 5.11 7.01 6.81 4.91 6.51 6.91 Name: slength, dtype: int64, 3.08 2.97 2.86 2.75 2.54 2.63 2.43 3.13 2.33 3.23 2.22 2.01 3.31 3.41 Name: swidth, dtype: int64, 4.57 4.75 4.05 4.44 4.24 3.93 4.13 4.63 3.52 4.92 4.82 4.32 3.32 3.81 3.71 5.11 3.61 5.01 3.01 Name: plength, dtype: int64, 1.313 1.510 1.07 1.47 1.25 1.63 1.13 1.81 1.71 Name: pwidth, dtype: int64)# Virginica: (6.36 6.75 6.45 7.74 6.54 6.93 7.23 5.83 6.12 6.82 6.22 6.02 5.61 7.31 7.61 7.91 7.11 5.71 4.91 5.91 7.41 Name: slength, dtype: int64, 3.012 2.88 3.25 2.54 3.14 2.74 3.33 2.62 3.82 2.92 3.42 2.21 3.61 Name: swidth, dtype: int64, 5.17 5.66 5.83 4.93 6.13 5.73 5.03 5.53 4.82 5.42 5.32 6.02 5.92 5.22 6.72 6.61 6.41 4.51 6.31 6.91 Name: plength, dtype: int64, 1.811 2.38 2.06 2.16 1.95 2.43 2.53 2.23 1.52 1.61 1.71 1.41 Name: pwidth, dtype: int64)

统计量操作 求和
# 分组求和 print('Setosa: ',np.sum(np.array(Setosa), axis=0)) print('Versicolour: ',np.sum(np.array(Versicolour), axis=0)) print('Virginica: ',np.sum(np.array(Virginica), axis=0))

Setosa:[250.3 170.973.212.2] Versicolour:[296.8 138.5 213.66.3] Virginica:[329.4 148.7 277.6 101.3]

均值
# 分组求均值 print('Setosa: ',np.mean(np.array(Setosa), axis=0)) print('Versicolour: ',np.mean(np.array(Versicolour), axis=0)) print('Virginica: ',np.mean(np.array(Virginica), axis=0))

Setosa:[5.006 3.418 1.464 0.244] Versicolour:[5.936 2.774.261.326] Virginica:[6.588 2.974 5.552 2.026]

最大值
# 分组求最大值 print('Setosa: ',np.amax(np.array(Setosa), axis=0)) print('Versicolour: ',np.amax(np.array(Versicolour), axis=0)) print('Virginica: ',np.amax(np.array(Virginica), axis=0))

Setosa:[5.8 4.4 1.9 0.6] Versicolour:[7.3.4 5.1 1.8] Virginica:[7.9 3.8 6.9 2.5]

最小值
# 分组求最小值 print('Setosa: ',np.amin(np.array(Setosa), axis=0)) print('Versicolour: ',np.amin(np.array(Versicolour), axis=0)) print('Virginica: ',np.amin(np.array(Virginica), axis=0))

Setosa:[4.3 2.3 1.0.1] Versicolour:[4.9 2.3.1. ] Virginica:[4.9 2.2 4.5 1.4]

平方根
# 分组求平方根 print('Setosa: ') print(np.sqrt(np.array(Setosa))) print('Versicolour: ') print(np.sqrt(np.array(Versicolour))) print('Virginica: ') print(np.sqrt(np.array(Virginica)))

Setosa: [[2.25831796 1.87082869 1.18321596 0.4472136 ] [2.21359436 1.73205081 1.18321596 0.4472136 ] [2.16794834 1.78885438 1.14017543 0.4472136 ] [2.14476106 1.76068169 1.22474487 0.4472136 ] [2.23606798 1.89736661.18321596 0.4472136 ] [2.32379001 1.97484177 1.30384048 0.63245553] [2.14476106 1.84390889 1.18321596 0.54772256] [2.23606798 1.84390889 1.22474487 0.4472136 ] [2.09761771.70293864 1.18321596 0.4472136 ] [2.21359436 1.76068169 1.22474487 0.31622777] [2.32379001 1.92353841 1.22474487 0.4472136 ] [2.19089023 1.84390889 1.26491106 0.4472136 ] [2.19089023 1.73205081 1.18321596 0.31622777] [2.07364414 1.73205081 1.04880885 0.31622777] [2.40831892 2.1.09544512 0.4472136 ] [2.38746728 2.09761771.22474487 0.63245553] [2.32379001 1.97484177 1.14017543 0.63245553] [2.25831796 1.87082869 1.18321596 0.54772256] [2.38746728 1.94935887 1.30384048 0.54772256] [2.25831796 1.94935887 1.22474487 0.54772256] [2.32379001 1.84390889 1.30384048 0.4472136 ] [2.25831796 1.92353841 1.22474487 0.63245553] [2.14476106 1.89736661.0.4472136 ] [2.25831796 1.81659021 1.30384048 0.70710678] [2.19089023 1.84390889 1.37840488 0.4472136 ] [2.23606798 1.73205081 1.26491106 0.4472136 ] [2.23606798 1.84390889 1.26491106 0.63245553] [2.28035085 1.87082869 1.22474487 0.4472136 ] [2.28035085 1.84390889 1.18321596 0.4472136 ] [2.16794834 1.78885438 1.26491106 0.4472136 ] [2.19089023 1.76068169 1.26491106 0.4472136 ] [2.32379001 1.84390889 1.22474487 0.63245553] [2.28035085 2.02484567 1.22474487 0.31622777] [2.34520788 2.04939015 1.18321596 0.4472136 ] [2.21359436 1.76068169 1.22474487 0.31622777] [2.23606798 1.78885438 1.09544512 0.4472136 ] [2.34520788 1.87082869 1.14017543 0.4472136 ] [2.21359436 1.76068169 1.22474487 0.31622777] [2.09761771.73205081 1.14017543 0.4472136 ] [2.25831796 1.84390889 1.22474487 0.4472136 ] [2.23606798 1.87082869 1.14017543 0.54772256] [2.12132034 1.51657509 1.14017543 0.54772256] [2.09761771.78885438 1.14017543 0.4472136 ] [2.23606798 1.87082869 1.26491106 0.77459667] [2.25831796 1.94935887 1.37840488 0.63245553] [2.19089023 1.73205081 1.18321596 0.54772256] [2.25831796 1.94935887 1.26491106 0.4472136 ] [2.14476106 1.78885438 1.18321596 0.4472136 ] [2.30217289 1.92353841 1.22474487 0.4472136 ] [2.23606798 1.81659021 1.18321596 0.4472136 ]] Versicolour: [[2.64575131 1.78885438 2.16794834 1.18321596] [2.52982213 1.78885438 2.12132034 1.22474487] [2.62678511 1.76068169 2.21359436 1.22474487] [2.34520788 1.51657509 2.1.14017543] [2.54950976 1.67332005 2.14476106 1.22474487] [2.38746728 1.67332005 2.12132034 1.14017543] [2.50998008 1.81659021 2.16794834 1.26491106] [2.21359436 1.54919334 1.81659021 1.] [2.56904652 1.70293864 2.14476106 1.14017543] [2.28035085 1.64316767 1.97484177 1.18321596] [2.23606798 1.41421356 1.87082869 1.] [2.42899156 1.73205081 2.04939015 1.22474487] [2.44948974 1.48323972.1.] [2.46981781 1.70293864 2.16794834 1.18321596] [2.36643191 1.70293864 1.89736661.14017543] [2.58843582 1.76068169 2.09761771.18321596] [2.36643191 1.73205081 2.12132034 1.22474487] [2.40831892 1.64316767 2.02484567 1.] [2.48997992 1.48323972.12132034 1.22474487] [2.36643191 1.58113883 1.97484177 1.04880885] [2.42899156 1.78885438 2.19089023 1.34164079] [2.46981781 1.67332005 2.1.14017543] [2.50998008 1.58113883 2.21359436 1.22474487] [2.46981781 1.67332005 2.16794834 1.09544512] [2.52982213 1.70293864 2.07364414 1.14017543] [2.56904652 1.73205081 2.09761771.18321596] [2.60768096 1.67332005 2.19089023 1.18321596] [2.58843582 1.73205081 2.23606798 1.30384048] [2.44948974 1.70293864 2.12132034 1.22474487] [2.38746728 1.61245155 1.87082869 1.] [2.34520788 1.54919334 1.94935887 1.04880885] [2.34520788 1.54919334 1.92353841 1.] [2.40831892 1.64316767 1.97484177 1.09544512] [2.44948974 1.64316767 2.25831796 1.26491106] [2.32379001 1.73205081 2.12132034 1.22474487] [2.44948974 1.84390889 2.12132034 1.26491106] [2.58843582 1.76068169 2.16794834 1.22474487] [2.50998008 1.51657509 2.09761771.14017543] [2.36643191 1.73205081 2.02484567 1.14017543] [2.34520788 1.58113883 2.1.14017543] [2.34520788 1.61245155 2.09761771.09544512] [2.46981781 1.73205081 2.14476106 1.18321596] [2.40831892 1.61245155 2.1.09544512] [2.23606798 1.51657509 1.81659021 1.] [2.36643191 1.64316767 2.04939015 1.14017543] [2.38746728 1.73205081 2.04939015 1.09544512] [2.38746728 1.70293864 2.04939015 1.14017543] [2.48997992 1.70293864 2.07364414 1.14017543] [2.25831796 1.58113883 1.73205081 1.04880885] [2.38746728 1.67332005 2.02484567 1.14017543]] Virginica: [[2.50998008 1.81659021 2.44948974 1.58113883] [2.40831892 1.64316767 2.25831796 1.37840488] [2.66458252 1.73205081 2.42899156 1.44913767] [2.50998008 1.70293864 2.36643191 1.34164079] [2.54950976 1.73205081 2.40831892 1.4832397 ] [2.75680975 1.73205081 2.56904652 1.44913767] [2.21359436 1.58113883 2.12132034 1.30384048] [2.70185122 1.70293864 2.50998008 1.34164079] [2.58843582 1.58113883 2.40831892 1.34164079] [2.68328157 1.89736662.46981781 1.58113883] [2.54950976 1.78885438 2.25831796 1.41421356] [2.52982213 1.64316767 2.30217289 1.37840488] [2.60768096 1.73205081 2.34520788 1.44913767] [2.38746728 1.58113883 2.23606798 1.41421356] [2.40831892 1.67332005 2.25831796 1.54919334] [2.52982213 1.78885438 2.30217289 1.51657509] [2.54950976 1.73205081 2.34520788 1.34164079] [2.77488739 1.94935887 2.58843582 1.4832397 ] [2.77488739 1.61245155 2.62678511 1.51657509] [2.44948974 1.48323972.23606798 1.22474487] [2.62678511 1.78885438 2.38746728 1.51657509] [2.36643191 1.67332005 2.21359436 1.41421356] [2.77488739 1.67332005 2.58843582 1.41421356] [2.50998008 1.64316767 2.21359436 1.34164079] [2.58843582 1.81659021 2.38746728 1.44913767] [2.68328157 1.78885438 2.44948974 1.34164079] [2.48997992 1.67332005 2.19089023 1.34164079] [2.46981781 1.73205081 2.21359436 1.34164079] [2.52982213 1.67332005 2.36643191 1.44913767] [2.68328157 1.73205081 2.40831892 1.26491106] [2.72029411.67332005 2.46981781 1.37840488] [2.81069386 1.94935887 2.52982213 1.41421356] [2.52982213 1.67332005 2.36643191 1.4832397 ] [2.50998008 1.67332005 2.25831796 1.22474487] [2.46981781 1.61245155 2.36643191 1.18321596] [2.77488739 1.73205081 2.46981781 1.51657509] [2.50998008 1.84390889 2.36643191 1.54919334] [2.52982213 1.76068169 2.34520788 1.34164079] [2.44948974 1.73205081 2.19089023 1.34164079] [2.62678511 1.76068169 2.32379001 1.44913767] [2.58843582 1.76068169 2.36643191 1.54919334] [2.62678511 1.76068169 2.25831796 1.51657509] [2.40831892 1.64316767 2.25831796 1.37840488] [2.60768096 1.78885438 2.42899156 1.51657509] [2.58843582 1.81659021 2.38746728 1.58113883] [2.58843582 1.73205081 2.28035085 1.51657509] [2.50998008 1.58113883 2.23606798 1.37840488] [2.54950976 1.73205081 2.28035085 1.41421356] [2.48997992 1.84390889 2.32379001 1.51657509] [2.42899156 1.73205081 2.25831796 1.34164079]]

标准差
# 分组求标准差 print('Setosa: ',np.std(np.array(Setosa), axis=0)) print('Versicolour: ',np.std(np.array(Versicolour), axis=0)) print('Virginica: ',np.std(np.array(Virginica), axis=0))

Setosa:[0.34894699 0.37719491 0.17176728 0.10613199] Versicolour:[0.51098337 0.31064449 0.46518813 0.19576517] Virginica:[0.62948868 0.31925538 0.54634787 0.27188968]

方差
# 分组求方差 print('Setosa: ',np.var(np.array(Setosa), axis=0)) print('Versicolour: ',np.var(np.array(Versicolour), axis=0)) print('Virginica: ',np.var(np.array(Virginica), axis=0))

Setosa:[0.121764 0.142276 0.029504 0.011264] Versicolour:[0.261104 0.09650.21640.038324] Virginica:[0.396256 0.101924 0.298496 0.073924]

协方差
# 分组求协方差 print('Setosa: ',np.cov(np.mean(np.array(Setosa), axis=0))) print('Versicolour: ',np.cov(np.mean(np.array(Versicolour), axis=0))) print('Virginica: ',np.cov(np.mean(np.array(Virginica), axis=0)))

Setosa:4.427078666666667 Versicolour:3.916518666666666 Virginica:4.576966666666664

绘图操作
# 散点图 ax = Setosa.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx'); Setosa.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax); plt.xlabel('$Length$'); plt.ylabel('$Width$'); plt.title('$Setosa$'); ax = Versicolour.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx'); Versicolour.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax); plt.xlabel('$Length$'); plt.ylabel('$Width$'); plt.title('$Versicolour$'); ax = Virginica.plot.scatter(x='slength', y='swidth', color='tab:blue', label='Calyx'); Virginica.plot.scatter(x='plength', y='pwidth', color='tab:orange', label='Petal', ax=ax); plt.xlabel('$Length$'); plt.ylabel('$Width$'); plt.title('$Virginica$');

图示
模式识别|数据集(鸢尾花卉)
文章图片

# 直方图 plt.bar([1,2,3,4],np.mean(np.array(Setosa), axis=0),label='Setosa'); plt.bar([8,9,10,11],np.mean(np.array(Versicolour), axis=0),label='Versicolour'); plt.bar([15,16,17,18],np.mean(np.array(Virginica), axis=0),label='Virginica'); plt.legend(); plt.xticks((1,2,3,4,8,9,10,11,15,16,17,18),('sl','sw','pl','pw','sl','sw','pl','pw','sl','sw','pl','pw')); plt.title('The different kinds of mean in three kinds of flowers');

图示
模式识别|数据集(鸢尾花卉)
文章图片

# 盒图 blt = plt.boxplot(np.array(Setosa), notch=False, sym='o',vert=True, patch_artist=True); colors = ['pink', 'lightblue', 'lightgreen'] for pacthes, color in zip(bplt['boxes'], colors): pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth')) plt.title('Setosa') plt.show()blt = plt.boxplot(np.array(Versicolour), notch=False, sym='o',vert=True, patch_artist=True); colors = ['pink', 'lightblue', 'lightgreen'] for pacthes, color in zip(bplt['boxes'], colors): pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth')) plt.title('Versicolour') plt.show()blt = plt.boxplot(np.array(Virginica), notch=False, sym='o',vert=True, patch_artist=True); colors = ['pink', 'lightblue', 'lightgreen'] for pacthes, color in zip(bplt['boxes'], colors): pacthes.set_facecolor(color)plt.xticks((1,2,3,4),('slength','swidth','plength','pwidth')) plt.title('Virginica') plt.show()

【模式识别|数据集(鸢尾花卉)】图示
模式识别|数据集(鸢尾花卉)
文章图片

    推荐阅读