python中的ix函数的简单介绍 _数据

Python 基本操作- 数据选取loc、iloc、ix函数 loc中的数据是列名，是字符串，所以前后都要?。籭loc中数据是int整型，所以是Python默认的前闭后开
构建数据集df
loc函数主要通过行标签索引行数据，划重点，标签！标签！标签！
loc[1]选择行标签是1的（从0、1、2、3这几个行标签中）
loc[0:1] 和 loc[0,1]的区别，其实最重要的是loc[0:1]和iloc[0:1]
索引某一列数据，loc[:,0:1]，还是标签，注意，如果列标签是个字符，比如'a'，loc['a']是不行的，必须为loc[:，'a'] 。
但如果行标签是'a',选取这一行，用loc['a']是可以的。
iloc 主要是通过行号获取行数据，划重点，序号！序号！序号！
iloc[0:1]，由于Python默认是前闭后开，所以，这个选择的只有第一行！
如果想用标签索引，如iloc['a']，就会报错，它只支持int型。
ix——结合前两种的混合索引，即可以是行序号，也可以是行标签。
如选择prize10(prize为一个标签)的，即 df.loc[df.prize10]
还有并或等操作
python选取特定列——pandas的iloc和loc以及icol使用
pandas入门——loc与iloc函数
pandas中loc、iloc、ix的区别
pandas基础之按行取数（DataFrame）
python函数图的绘制pre
importnumpy as np
import matplotlib.pyplot as plt
frommatplotlib.patches import Polygon
def func(x):
return-(x-2)*(x-8) 40
x=np.linspace(0,10)
y=func(x)
fig,ax = plt.subplots()
plt.plot(x,y,'r',linewidth=2)
plt.ylim(ymin=20)
a=2
b=9
ax.set_xticks([a,b])
ax.set_xticklabels(['$a$','$b$'])
ax.set_yticks([])
plt.figtext(0.9,0.05,'$x$')
plt.figtext(0.1,0.9,'$y$')
ix=np.linspace(a,b)
iy=func(ix)
ixy=zip(ix,iy)
verts=[(a,0)] list(ixy) [(b,0)]
poly = Polygon(verts,facecolor='0.9',edgecolor='0.5')
ax.add_patch(poly)
x_math=(a b)*0.5
y_math=35
plt.text(x_math,y_math,r"$\int_a^b(-(x-2)*(x-8) 40)dx$",horizontalalignment='center',size=12)
plt.show()
/pre
数据分析员用python做数据分析是怎么回事，需要用到python中的那些内容，具体是怎么操作的?最近，Analysis with Programming加入了Planet Python 。我这里来分享一下如何通过Python来开始数据分析。具体内容如下：
数据导入
导入本地的或者web端的CSV文件；
数据变换；
数据统计描述；
假设检验
单样本t检验；
可视化；
创建自定义函数。
数据导入
1
这是很关键的一步，为了后续的分析我们首先需要导入数据。通常来说，数据是CSV格式，就算不是，至少也可以转换成CSV格式。在Python中，我们的操作如下：
import pandas as pd
# Reading data locally
df = pd.read_csv('/Users/al-ahmadgaidasaad/Documents/d.csv')
# Reading data from web
data_url = ""
df = pd.read_csv(data_url)
为了读取本地CSV文件，我们需要pandas这个数据分析库中的相应模块。其中的read_csv函数能够读取本地和web数据。
END
数据变换
1
既然在工作空间有了数据，接下来就是数据变换。统计学家和科学家们通常会在这一步移除分析中的非必要数据。我们先看看数据（下图）
对R语言程序员来说，上述操作等价于通过print(head(df))来打印数据的前6行，以及通过print(tail(df))来打印数据的后6行。当然Python中，默认打印是5行，而R则是6行。因此R的代码head(df, n = 10)，在Python中就是df.head(n = 10) ，打印数据尾部也是同样道理
请点击输入图片描述
2
在R语言中，数据列和行的名字通过colnames和rownames来分别进行提取。在Python中，我们则使用columns和index属性来提取，如下：
# Extracting column names
print df.columns
# OUTPUT
Index([u'Abra', u'Apayao', u'Benguet', u'Ifugao', u'Kalinga'], dtype='object')
# Extracting row names or the index
print df.index
# OUTPUT
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], dtype='int64')
3
数据转置使用T方法，
# Transpose data
print df.T
# OUTPUT
0123456789
Abra12434158178717152126655769272154010395424
Apayao2934923519221450123857452109917038138210588
Benguet14842871955353625307712796246325921064
Ifugao33008063107419607331513134513414226684213828
Kalinga1055335257454431687852028252310636238497340140
...697071727374757677
Abra...12763247059094620913316250560303631113345
Apayao...3762519532351266335386132087840065675638902
Benguet...235440455987353025853519706235612583
Ifugao...9838171251894015560774619737194221591011096
Kalinga...657821527952437243856614816513618082334968663
78
Abra2623
Apayao18264
Benguet3745
Ifugao16787
Kalinga16900
Other transformations such as sort can be done using codesort/code attribute. Now let's extract a specific column. In Python, we do it using either codeiloc/code or codeix/code attributes, but codeix/code is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have
4
其他变换，例如排序就是用sort属性。现在我们提取特定的某列数据。Python中，可以使用iloc或者ix属性。但是我更喜欢用ix，因为它更稳定一些。假设我们需数据第一列的前5行，我们有：
print df.ix[:, 0].head()
# OUTPUT 01243 14158 21787 317152 41266 Name: Abra, dtype: int64
5
顺便提一下，Python的索引是从0开始而非1 。为了取出从11到20行的前3列数据，我们有
print df.ix[10:20, 0:3]
# OUTPUT
AbraApayaoBenguet
1098113112560
1127366150933039
12110017012382
137212110011088
14104814272847
1525679156612942
16105521912119
1754376461734
18102911832302
1923710122222598
20109123432654
上述命令相当于df.ix[10:20, ['Abra', 'Apayao', 'Benguet']] 。
6
为了舍弃数据中的列，这里是列1(Apayao)和列2(Benguet)，我们使用drop属性，如下：
print df.drop(df.columns[[1, 2]], axis = 1).head()
# OUTPUT
AbraIfugaoKalinga
01243330010553
14158806335257
2178710744544
3171521960731687
4126633158520
axis 参数告诉函数到底舍弃列还是行。如果axis等于0，那么就舍弃行。
END
统计描述
1
下一步就是通过describe属性，对数据的统计特性进行描述：
print df.describe()
# OUTPUT
AbraApayaoBenguetIfugaoKalinga
count79.00000079.00000079.00000079.00000079.000000
mean12874.37974716860.6455703237.39240512414.62025330446.417722
std16746.46694515448.1537941588.5364295034.28201922245.707692
min927.000000401.000000148.0000001074.0000002346.000000
2524.0000003435.5000002328.0000008205.0000008601.500000
50W90.00000010588.0000003202.00000013044.00000024494.000000
75330.50000033289.0000003918.50000016099.50000052510.500000
max60303.00000054625.0000008813.00000021031.00000068663.000000
END
假设检验
1
Python有一个很好的统计推断包。那就是scipy里面的stats 。ttest_1samp实现了单样本t检验。因此，如果我们想检验数据Abra列的稻谷产量均值，通过零假设，这里我们假定总体稻谷产量均值为15000 ，我们有：
from scipy import stats as ss
# Perform one sample t-test using 1500 as the true mean
print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)
# OUTPUT
(-1.1281738488299586, 0.26270472069109496)
返回下述值组成的元祖：
t : 浮点或数组类型t统计量
prob : 浮点或数组类型two-tailed p-value 双侧概率值
2
通过上面的输出，看到p值是0.267远大于α等于0.05，因此没有充分的证据说平均稻谷产量不是150000 。将这个检验应用到所有的变量，同样假设均值为15000，我们有：
print ss.ttest_1samp(a = df, popmean = 15000)
# OUTPUT
(array([ -1.12817385,1.07053437, -65.81425599,-4.564575,6.17156198]),
array([2.62704721e-01,2.87680340e-01,4.15643528e-70,
1.83764399e-05,2.82461897e-08]))
第一个数组是t统计量，第二个数组则是相应的p值
END
可视化
1
Python中有许多可视化模块，最流行的当属matpalotlib库。稍加提及，我们也可选择bokeh和seaborn模块。之前的博文中，我已经说明了matplotlib库中的盒须图模块功能。
请点击输入图片描述
2
# Import the module for plotting
import matplotlib.pyplot as plt
plt.show(df.plot(kind = 'box'))
现在，我们可以用pandas模块中集成R的ggplot主题来美化图表。要使用ggplot ，我们只需要在上述代码中多加一行，
import matplotlib.pyplot as plt
pd.options.display.mpl_style = 'default' # Sets the plotting display theme to ggplot2
df.plot(kind = 'box')
3
这样我们就得到如下图表：
请点击输入图片描述
4
比matplotlib.pyplot主题简洁太多。但是在本文中，我更愿意引入seaborn模块，该模块是一个统计数据可视化库。因此我们有：
# Import the seaborn library
import seaborn as sns
# Do the boxplot
plt.show(sns.boxplot(df, widths = 0.5, color = "pastel"))
请点击输入图片描述
【python中的ix函数的简单介绍】5
多性感的盒式图，继续往下看。
请点击输入图片描述
6
plt.show(sns.violinplot(df, widths = 0.5, color = "pastel"))
请点击输入图片描述
7
plt.show(sns.distplot(df.ix[:,2], rug = True, bins = 15))
请点击输入图片描述
8
with sns.axes_style("white"):
plt.show(sns.jointplot(df.ix[:,1], df.ix[:,2], kind = "kde"))
请点击输入图片描述
9
plt.show(sns.lmplot("Benguet", "Ifugao", df))
END
创建自定义函数
在Python中，我们使用def函数来实现一个自定义函数。例如，如果我们要定义一个两数相加的函数，如下即可：
def add_2int(x, y):
return xy
print add_2int(2, 2)
# OUTPUT
4
顺便说一下，Python中的缩进是很重要的。通过缩进来定义函数作用域，就像在R语言中使用大括号{…}一样。这有一个我们之前博文的例子：
产生10个正态分布样本，其中和
基于95%的置信度，计算和 ;
重复100次; 然后
计算出置信区间包含真实均值的百分比
Python中，程序如下：
import numpy as np
import scipy.stats as ss
def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):
m = np.zeros((rep, 4))
for i in range(rep):
norm = np.random.normal(loc = mu, scale = sigma, size = n)
xbar = np.mean(norm)
low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))
up = xbarss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))
if (mulow)(muup):
rem = 1
else:
rem = 0
m[i, :] = [xbar, low, up, rem]
inside = np.sum(m[:, 3])
per = inside / rep
desc = "There are "str(inside)" confidence intervals that contain "
"the true mean ("str(mu)"), that is "str(per)" percent of the total CIs"
return {"Matrix": m, "Decision": desc}
上述代码读起来很简单，但是循环的时候就很慢了。下面针对上述代码进行了改进，这多亏了 Python专家
import numpy as np
import scipy.stats as ss
def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):
scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))
norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n))
xbar = norm.mean(1)
low = xbar - scaled_crit
up = xbarscaled_crit
rem = (mulow)(muup)
m = np.c_[xbar, low, up, rem]
inside = np.sum(m[:, 3])
per = inside / rep
desc = "There are "str(inside)" confidence intervals that contain "
"the true mean ("str(mu)"), that is "str(per)" percent of the total CIs"
return {"Matrix": m, "Decision": desc}
python做数据分析主要干哪些事情第一、检查数据表
Python中使用shape函数来查看数据表的维度，也就是行数以及列数。你可以使用info函数来查看数据表的整体信息，使用dtype函数来返回数据格式;lsnull是Python中检验空值的函数，可以对整个数据表进行检查，也可以单独对某一行进行空值检查，返回的结构是逻辑值，包含空值返回true，不包含则返回false 。
第二、数据清洗
Python可以进行数据清洗，Python中处理空值的方法比较灵活，可以使用Dropna函数用来删除数据表中包含空值的数据，也可以使用fillna函数对空值进行填充;Python中dtype是查看数据格式的函数，与之对应的是astype函数，用来更改数据格式，Rename是更改列名称的函数，drop_duplicates函数删除重复值，replace函数实现数据替换。
第三、数据提取
进行数据提取时，主要使用三个函数：loc、iloc以及ix 。Loc函数按标签进行提取，iloc按位置进行提取， ix可以同时按照标签和位置进行提取。除了按标签和位置提取数据之外，还可以按照具体的条件进行提取，比如使用loc和isin两个函数配合使用。
第四、数据筛选
Python数据分析还可以进行数据筛选， Python中使用loc函数配合筛选条件来完成筛选功能，配合sum和count函数还能实现Excel中sumif和countif函数的功能。使用的主要函数是groupby和pivot_table;groupby是进行分类汇总的函数，使用方法比较简单，groupby按列名称出现的顺序进行分组。
关于python中的ix函数和的介绍到此就结束了，不知道你从中找到你需要的信息了吗？如果你还想了解更多这方面的信息，记得收藏关注本站。

python中的ix函数的简单介绍

推荐阅读

学dj在哪里学比较好 dj学多久

宝宝|宝宝白天出生和晚上出生，差距很大

毛球殿下除菌猫砂靠谱吗？DOWNYBALL毛球殿下除菌猫砂好吗

爱普生l805废墨怎么排出来

如何做优雅女生观后感如何做优雅女生，如何做优雅女生作文

艾叶晒干之前要洗吗

过失损坏交通设施有怎样刑事处罚

抵抗力差吃什么维生素抵抗力差吃什么

把茄子炸的脆脆的是什么菜

饲料保存方法饲料储存方法

机动车怎么审车机动车怎么审

十大奢侈品运动鞋品牌 brooks是什么品牌

10克甘草大约是多少片

俄罗斯女人的体味那么难闻,她们有没有想过消除身上的体味？

人参发芽还能吃吗

88vip不到1000淘气值会自动续费吗

脑袋秀逗

火腿切开后保存方法火腿切开后如何保存方法

小孩皮肤过敏怎么办家长应做好这几点

痰湿阻滞怎么调理