python|机器学习_KNN(K近邻算法numpy版)(附(手写数字0-9数据集及测试))

knn工作机制简单,其没有训练过程,在训练阶段只是把样本存起来。当有测试输入时,根据选取的距离计算方式,计算与每个样本的距离,然后选取距离最小的前k个样本中类别最多的那个为预测结果。
预测结果与计算距离方法有关,与K的选取有关。

"""KNN from 机器学习实战 19 页""" import numpy as npdef create_dataset(): group = np.array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = ['A', 'A', 'B', 'B'] return group, labelsdef classify(inx, data_set, labels, k): data_set_size = data_set.shape[0] diff_mat = np.tile(inx, (data_set_size, 1)) - data_set # np.tile(a,(2,1))第一个参数为Y轴扩大倍数,第二个为X轴扩大倍数.变为与数据集维度相同 sqdiff_mat = diff_mat ** 2 sqdistances = sqdiff_mat.sum(axis=1) distances = sqdistances ** 0.5 # 以上求出了目标值,与样本各点的欧氏距离 sorted_dis = distances.argsort() class_count = {} for i in range(k): temp_label = labels[sorted_dis[i]] class_count[temp_label] = class_count.get(temp_label, 0) + 1out = max(zip(class_count.keys(), class_count.values()))return out[0]data, label = create_dataset() result = classify([0.5, 0.5], data, label, 3) print("分类结果为:", result)

手写数字识别测试:
数据集:https://download.csdn.net/download/TYtangyan/12680063
import numpy as np import os from KNN import classifydef img2vector(file_name): data = https://www.it610.com/article/[] fr = open(file_name) line_data = fr.readlines() for i in range(32): for j in range(32): data.append(line_data[i][j:j + 1])return datadef get_data_all(data_path): filename = os.listdir(data_path) data_all = [] label_all = [] for name in filename: data = img2vector(os.path.join(data_path, name)) data_all.append(data) label = name.split("_")[0] label_all.append(label) return np.array(data_all).astype(float), np.array(label_all)data_path = r'E:\CODE\MLinaction\knn\trainingDigits' train_data, train_label = get_data_all(data_path)test_path = r'E:\CODE\MLinaction\knn\testDigits' test_data, test_label = get_data_all(test_path) right_um = 0 for i in range(len(test_data)): pre = classify(test_data[i], train_data, train_label, 3) print('标签为:', test_label[i], "预测结果为:", pre) if int(test_label[i]) == int(pre): right_um += 1 print("准确率为:", right_um/len(test_data))

预测结果:
【python|机器学习_KNN(K近邻算法numpy版)(附(手写数字0-9数据集及测试))】python|机器学习_KNN(K近邻算法numpy版)(附(手写数字0-9数据集及测试))
文章图片

    推荐阅读