【人工智能+大数据|特征工程(特征预处理(无量纲化处理))】
文章目录
- 一、瞎解释
- 二、归一化
- 三、标准化★
一、瞎解释 特征预处理API
sklearn.preprocessing
为什么要做归一化/标准化?
无量纲化
特征的单位或者数量相差较大,这样某特征会‘绝对’最终结果,使得其他算法无法学习到其他特征。
二、归一化 将原始数据进行变换将数据映射到[0,1]之间(默认)
公式:
文章图片
文章图片
文章图片
我们可以使用sklearn库中的MinMaxScaler(feature_range(0,1)):进行数据处理
案例:
import pandas as pd
from sklearn.preprocessing import MinMaxScalerdef minmax_demo():
"""
归一化
:return:
"""
# 1、获取数据
data = https://www.it610.com/article/pd.read_csv('test00.csv')
# 只要前三列数据
data = https://www.it610.com/article/data.iloc[:, :3]
print("data:\n", data)
# 2、实例化一个转换器类
transfer = MinMaxScaler()
# 3、调用fit_transform()
data_new = transfer.fit_transform(data)
print("data_new:\n", data_new)
return Noneif __name__ == '__main__':
minmax_demo()
最终转换结果都在 0-1 区间内
data:
heightweightchest measurement
0180700.88877
1190800.99665
2168600.65878
3159650.65598
4169560.55658
5173600.46058
6186760.69978
7178600.64979
8175750.89895
9176600.88488
10177900.79595
111681000.48789
121581020.55646
13168600.69585
14179800.65785
15183700.69578
16190660.89586
17196880.96527
18187910.62488
19182900.58484
20158700.58947
21159550.58484
22166550.59896
23178540.48487
24163690.68745
25156550.52621
26189890.66959
27156560.59595
28189980.59716
29169660.65479
30179550.99598
31177680.55257
32166760.69784
33169860.68745
34189890.69988
35188680.78955
36176590.55999
37177600.68747
38196800.64888
data_new:
[[0.60.33333333 0.79875762]
[0.850.54166667 1.]
[0.30.1250.36972783]
[0.0750.22916667 0.36450464]
[0.3250.04166667 0.17908109]
[0.4250.1250.]
[0.750.45833333 0.44621038]
[0.550.1250.35295764]
[0.4750.43750.81774768]
[0.50.1250.79150111]
[0.5250.750.6256086 ]
[0.30.95833333 0.05094484]
[0.051.0.17885724]
[0.30.1250.43887925]
[0.5750.54166667 0.36799299]
[0.6750.33333333 0.43874867]
[0.850.250.81198351]
[1.0.70833333 0.94146287]
[0.7750.77083333 0.30648982]
[0.650.750.23179809]
[0.050.33333333 0.24043502]
[0.0750.02083333 0.23179809]
[0.250.02083333 0.25813793]
[0.550.0.04531125]
[0.1750.31250.42320966]
[0.0.02083333 0.12242804]
[0.8250.72916667 0.38989311]
[0.0.04166667 0.25252299]
[0.8250.91666667 0.25478016]
[0.3250.250.36228478]
[0.5750.02083333 0.99875016]
[0.5250.29166667 0.17160072]
[0.250.45833333 0.44259145]
[0.3250.66666667 0.42320966]
[0.8250.72916667 0.44639693]
[0.80.29166667 0.61366986]
[0.50.10416667 0.1854422 ]
[0.5250.1250.42324696]
[1.0.54166667 0.3512601 ]]Process finished with exit code 0
归一化缺点:如果最大值和最小值是异常值,则对结果影响很大
三、标准化★ 通过对原始数据进行变换,把数据变换到均值为0,标准差为1的范围内
文章图片
对于标准化而言,如果出现异常值,则对最终结果的影响也不是很大
使用sklearn中的API—StandardScaler()
案例:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScalerdef minmax_demo():
"""
归一化
:return:
"""
# 1、获取数据
data = https://www.it610.com/article/pd.read_csv('test00.csv')
# 只要前三列数据
data = https://www.it610.com/article/data.iloc[:, :3]
print("data:\n", data)
# 2、实例化一个转换器类
# transfer = MinMaxScaler()
transfer = StandardScaler()
# 3、调用fit_transform()
data_new = transfer.fit_transform(data)
print("data_new:\n", data_new)
return Noneif __name__ == '__main__':
minmax_demo()
data:
heightweightchest measurement
0180700.88877
1190800.99665
2168600.65878
3159650.65598
4169560.55658
5173600.46058
6186760.69978
7178600.64979
8175750.89895
9176600.88488
10177900.79595
111681000.48789
121581020.55646
13168600.69585
14179800.65785
15183700.69578
16190660.89586
17196880.96527
18187910.62488
19182900.58484
20158700.58947
21159550.58484
22166550.59896
23178540.48487
24163690.68745
25156550.52621
26189890.66959
27156560.59595
28189980.59716
29169660.65479
30179550.99598
31177680.55257
32166760.69784
33169860.68745
34189890.69988
35188680.78955
36176590.55999
37177600.68747
38196800.64888
data_new:
[[ 0.40612393 -0.139331891.4864856 ]
[ 1.295946030.566375082.26419106]
[-0.66166258 -0.84503885 -0.17150918]
[-1.46250247 -0.49218537 -0.19169434]
[-0.57268037 -1.12732164 -0.90826759]
[-0.21675154 -0.84503885 -1.60033029]
[ 0.940017190.284092290.12405926]
[ 0.22815951 -0.84503885 -0.23631797]
[-0.038787120.213521591.55987308]
[ 0.05019509 -0.845038851.45844265]
[ 0.13917731.272082040.81734748]
[-0.661662581.977789-1.40345287]
[-1.551484682.11893039 -0.90913267]
[-0.66166258 -0.845038850.09572795]
[ 0.317141720.56637508 -0.17821354]
[ 0.67307056 -0.139331890.09522332]
[ 1.29594603 -0.421614671.53759732]
[ 1.829839281.130940652.03797306]
[ 1.02899941.34265273 -0.41589382]
[ 0.584088351.27208204 -0.70454163]
[-1.55148468 -0.13933189 -0.67116403]
[-1.46250247 -1.19789233 -0.70454163]
[-0.839627-1.19789233 -0.60275075]
[ 0.22815951 -1.26846303 -1.42522401]
[-1.10657363 -0.209902580.03517246]
[-1.7294491-1.19789233 -1.12720451]
[ 1.206963821.20151134 -0.09358004]
[-1.7294491-1.12732164 -0.6244498 ]
[ 1.206963821.83664761 -0.61572692]
[-0.57268037 -0.42161467 -0.20027304]
[ 0.31714172 -1.197892332.25936103]
[ 0.1391773-0.28047328 -0.93717563]
[-0.8396270.284092290.11007383]
[-0.572680370.989799250.03517246]
[ 1.206963821.201511340.12478016]
[ 1.11798161 -0.280473280.77120997]
[ 0.05019509 -0.91560955 -0.88368495]
[ 0.1391773-0.845038850.03531664]
[ 1.829839280.56637508 -0.24287815]]
推荐阅读
- 人工智能+大数据|特征工程(特征提取入门学习(附案例))
- 程序员|作为一只Python爬虫(如何破解滑动验证码)
- python批量自动整理文件
- Python爬虫|逆向系列 | AES逆向加密案例分析
- javascript|某数和某5秒-反混淆动态注入调试的一种方案
- python|Python+Celery实现基于Fastnetmon异常流量清洗
- 游戏|超级玛丽的 python 实现
- python|基于移动最小二乘(MLS)的图像扭曲刚性变形python实现
- python|【Python】获取汽车论坛所有汽车品牌列表及链接地址数据