python|python pandas 聚合与分组函数

1 主要内容

  1. DataFrame.groupby().sum()
  2. DataFrame.groupby().agg()
  3. pandas.concat([DataFrame1,DataFrame2])
  4. pandas.merge(DataFrame1,DataFrame2,parameters….)
  5. DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
  6. 帮助文档的获取
2 实例
  1. 构造dataframe如下所示:
foodfood_idnumberpriceuser_id weather 0soup461.8182503cold 1soup861.8340454hot 2iceream873.0424222cold 3chocolate365.2475644hot 4iceream634.3194504cold 5iceream542.9122911cold 6iceream276.1185292cold 7soup841.3949392hot 8soup682.9214462hot 9chocolate213.6636184hot

实现程序如下所示:
import pandas as pd from numpy import random from numpy.random import rand import numpy as nprandom.seed(42)df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10), 'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'], 'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'], 'price':10 * rand(10),'number':random.randint(1,9,10)}) print df

2 groupby函数应用
代码
groupby1 = df.groupby(['user_id'])#按照user_id分组输出所有的值 i = 0 for user_id,group in groupby1: i = i + 1 print "group", i , user_id print group

结果
group 1 1 foodfood_idnumberpriceuser_id weather 5iceream542.9122911cold group 2 2 foodfood_idnumberpriceuser_id weather 2iceream873.0424222cold 6iceream276.1185292cold 7soup841.3949392hot 8soup682.9214462hot group 3 3 foodfood_idnumberpriceuser_id weather 0soup461.818253cold group 4 4 foodfood_idnumberpriceuser_id weather 1soup861.8340454hot 3chocolate365.2475644hot 4iceream634.3194504cold 9chocolate213.6636184hot

3 groupby和sum等函数结合使用

代码

print(groupby1.sum())#对除了groupby索引以外的每个数值列进行求和 print(groupby1['food_id','number'].sum()) #对除了groupby索引以外的特定数值列进行求和 print(df.groupby(['user_id'],as_index=False).sum())#默认as_index=True,是否将user_id当做索引,默认是 #当然除了sum,还有mean,min,max,median,mode,std,mad等等,操作方法同理 #groupby()中的形参可用help(df.groupby)来查看 #常用的参数axis=0,表示对行进行操作,即指定列中不同值进行分组;axis=1,表示对列进行分组


output[1]: food_idnumberprice user_id 1542.912291 2242613.477336 3461.818250 4191615.064678 output[2]: food_idnumber user_id 154 22426 346 41916 output[3]: user_idfood_idnumberprice 01542.912291 12242613.477336 23461.818250 34191615.064678


df.groupby(['food','weather']).size()

foodweather chocolatehot2 icereamcold4 soupcold1 hot3 dtype: int64

4 agg函数
代码
print df.groupby(['weather','food']).agg([np.mean,np.median])

结果
output[4]: food_idnumberprice\

user_id mean median weather food coldiceream2.2500002 soup3.0000003 hotchocolate4.0000004 soup2.6666672

mean median mean median mean median weather food cold iceream 5.250000 5.5 5.25 5.5 4.098173 3.680936 soup 4.000000 4.0 6.00 6.0 1.818250 1.818250 hot chocolate 2.500000 2.5 3.50 3.5 4.455591 4.455591 soup 7.333333 8.0 6.00 6.0 2.050143 1.834045
5 concat()

代码
print "df :3\n",df[:3] print "df :4\n",df[6:] print pd.concat([df[:3],df[6:]],axis=0)

结果

df :3 foodfood_idnumberpriceuser_id weather 0soup461.8182503cold 1soup861.8340454hot 2iceream873.0424222cold df :4 foodfood_idnumberpriceuser_id weather 6iceream276.1185292cold 7soup841.3949392hot 8soup682.9214462hot 9chocolate213.6636184hot df.concat foodfood_idnumberpriceuser_id weather 0soup461.8182503cold 1soup861.8340454hot 2iceream873.0424222cold 6iceream276.1185292cold 7soup841.3949392hot 8soup682.9214462hot 9chocolate213.6636184hot

6 merge()和join()
代码

df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']}) df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})print "df1\n",df1 print "df2\n",df2 print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr') print "inner join with Merge()\n",pd.merge(df1,df2,how='inner') print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')

结果

df1 DestEmpNr 0The Hague5 1Amsterdam3 2Rotterdam9 df2 AmountEmpNr 010.05 15.09 22.57 Merge() on Key DestEmpNrAmount 0The Hague510.0 1Rotterdam95.0 inner join with Merge() DestEmpNrAmount 0The Hague510.0 1Rotterdam95.0 Dests join tips DestEmpNrDestAmountEmpNrTips 0The Hague510.05 1Amsterdam35.09 2Rotterdam92.57

6帮助文档获取方式
1.help(pd.concat) 2.dir(pd.concat) 3.pd.concat? ...

7 参考文献
利用python进行数据分析笔记

python数据分析,Ivan Idris著
本文为转载文章,原文出处:https://blog.csdn.net/ly_ysys629/article/details/72553273
【python|python pandas 聚合与分组函数】

    推荐阅读