1 主要内容
- DataFrame.groupby().sum()
- DataFrame.groupby().agg()
- pandas.concat([DataFrame1,DataFrame2])
- pandas.merge(DataFrame1,DataFrame2,parameters….)
- DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
- 帮助文档的获取
- 构造dataframe如下所示:
foodfood_idnumberpriceuser_id weather
0soup461.8182503cold
1soup861.8340454hot
2iceream873.0424222cold
3chocolate365.2475644hot
4iceream634.3194504cold
5iceream542.9122911cold
6iceream276.1185292cold
7soup841.3949392hot
8soup682.9214462hot
9chocolate213.6636184hot
实现程序如下所示:
import pandas as pd
from numpy import random
from numpy.random import rand
import numpy as nprandom.seed(42)df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),
'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],
'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],
'price':10 * rand(10),'number':random.randint(1,9,10)}) print df
2 groupby函数应用
代码
groupby1 = df.groupby(['user_id'])#按照user_id分组输出所有的值
i = 0
for user_id,group in groupby1:
i = i + 1
print "group", i , user_id
print group
结果
group 1 1
foodfood_idnumberpriceuser_id weather
5iceream542.9122911cold
group 2 2
foodfood_idnumberpriceuser_id weather
2iceream873.0424222cold
6iceream276.1185292cold
7soup841.3949392hot
8soup682.9214462hot
group 3 3
foodfood_idnumberpriceuser_id weather
0soup461.818253cold
group 4 4
foodfood_idnumberpriceuser_id weather
1soup861.8340454hot
3chocolate365.2475644hot
4iceream634.3194504cold
9chocolate213.6636184hot
3 groupby和sum等函数结合使用
代码
print(groupby1.sum())#对除了groupby索引以外的每个数值列进行求和
print(groupby1['food_id','number'].sum()) #对除了groupby索引以外的特定数值列进行求和
print(df.groupby(['user_id'],as_index=False).sum())#默认as_index=True,是否将user_id当做索引,默认是
#当然除了sum,还有mean,min,max,median,mode,std,mad等等,操作方法同理
#groupby()中的形参可用help(df.groupby)来查看
#常用的参数axis=0,表示对行进行操作,即指定列中不同值进行分组;axis=1,表示对列进行分组
output[1]:
food_idnumberprice
user_id
1542.912291
2242613.477336
3461.818250
4191615.064678
output[2]:
food_idnumber
user_id
154
22426
346
41916
output[3]:
user_idfood_idnumberprice
01542.912291
12242613.477336
23461.818250
34191615.064678
df.groupby(['food','weather']).size()
foodweather
chocolatehot2
icereamcold4
soupcold1
hot3
dtype: int64
4 agg函数
代码
print df.groupby(['weather','food']).agg([np.mean,np.median])
结果
output[4]:
food_idnumberprice\
user_id
mean median
weather food
coldiceream2.2500002
soup3.0000003
hotchocolate4.0000004
soup2.6666672
mean median mean median mean median weather food cold iceream 5.250000 5.5 5.25 5.5 4.098173 3.680936 soup 4.000000 4.0 6.00 6.0 1.818250 1.818250 hot chocolate 2.500000 2.5 3.50 3.5 4.455591 4.455591 soup 7.333333 8.0 6.00 6.0 2.050143 1.834045
5 concat()
代码
print "df :3\n",df[:3]
print "df :4\n",df[6:]
print pd.concat([df[:3],df[6:]],axis=0)
结果
df :3
foodfood_idnumberpriceuser_id weather
0soup461.8182503cold
1soup861.8340454hot
2iceream873.0424222cold
df :4
foodfood_idnumberpriceuser_id weather
6iceream276.1185292cold
7soup841.3949392hot
8soup682.9214462hot
9chocolate213.6636184hot
df.concat
foodfood_idnumberpriceuser_id weather
0soup461.8182503cold
1soup861.8340454hot
2iceream873.0424222cold
6iceream276.1185292cold
7soup841.3949392hot
8soup682.9214462hot
9chocolate213.6636184hot
6 merge()和join()
代码
df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})
df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})print "df1\n",df1
print "df2\n",df2
print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr')
print "inner join with Merge()\n",pd.merge(df1,df2,how='inner')
print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')
结果
df1
DestEmpNr
0The Hague5
1Amsterdam3
2Rotterdam9
df2
AmountEmpNr
010.05
15.09
22.57
Merge() on Key
DestEmpNrAmount
0The Hague510.0
1Rotterdam95.0
inner join with Merge()
DestEmpNrAmount
0The Hague510.0
1Rotterdam95.0
Dests join tips
DestEmpNrDestAmountEmpNrTips
0The Hague510.05
1Amsterdam35.09
2Rotterdam92.57
6帮助文档获取方式
1.help(pd.concat)
2.dir(pd.concat)
3.pd.concat?
...
7 参考文献
利用python进行数据分析笔记
python数据分析,Ivan Idris著
本文为转载文章,原文出处:https://blog.csdn.net/ly_ysys629/article/details/72553273
【python|python pandas 聚合与分组函数】
推荐阅读
- 推荐系统论文进阶|CTR预估 论文精读(十一)--Deep Interest Evolution Network(DIEN)
- Python专栏|数据分析的常规流程
- Python|Win10下 Python开发环境搭建(PyCharm + Anaconda) && 环境变量配置 && 常用工具安装配置
- Python绘制小红花
- Pytorch学习|sklearn-SVM 模型保存、交叉验证与网格搜索
- OpenCV|OpenCV-Python实战(18)——深度学习简介与入门示例
- python|8. 文件系统——文件的删除、移动、复制过程以及链接文件
- 爬虫|若想拿下爬虫大单,怎能不会逆向爬虫,价值过万的逆向爬虫教程限时分享
- 分布式|《Python3网络爬虫开发实战(第二版)》内容介绍
- java|微软认真聆听了开源 .NET 开发社区的炮轰( 通过CLI 支持 Hot Reload 功能)