数据分析|pandas学习（三） grouping 学习|python|pandas

pandas学习（三） grouping

groupby() 分组函数
.agg() agg函数，它提供基于列的聚合操作。而groupby可以看做是基于行，或者说index的聚合操作
.apply() apply() 使用时，通常放入一个 lambda 函数表达式、或一个函数作为操作运算
div() div() 方法将 DataFrame 中的每个值除以指定的值。

1.数据集-1

文章图片

1.1 哪个大洲平均喝更多的啤酒?

drinks.groupby('continent').beer_servings.mean()

文章图片

1.2 打印每列各大洲的平均酒精消费量

drinks.groupby('continent').mean()

文章图片

1.3 输出每列每个大洲的酒精消费量中位数

drinks.groupby('continent').median()

文章图片

1.4 打印烈酒消耗的平均值、最小值和最大值。

drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])

文章图片

2.数据集-2

文章图片

2.1 计算每个职业的平均年龄

users.groupby('occupation').age.mean()

文章图片

2.2 发现每个职业的男性比例，并将其从最多到最少排序

# create a function def gender_to_numeric(x): if x == 'M': return 1 if x == 'F': return 0# apply() 使用时，通常放入一个 lambda 函数表达式、或一个函数作为操作运算 # apply the function to the gender column and create a new column users['gender_n'] = users['gender'].apply(gender_to_numeric)a = users.groupby('occupation').gender_n.sum() / users.occupation.value_counts() * 100 # sort to the most male a.sort_values(ascending = False)

文章图片

2.3 对于职业和性别的每种组合，计算平均年龄

users.groupby(['occupation', 'gender']).age.mean()

文章图片

2.4 对于每种职业，计算最低和最高年龄

# agg聚合函数,Pandas中可以利用agg()对Series、DataFrame以及groupby()后的结果进行聚合操作。 # agg函数，它提供基于列的聚合操作。而groupby可以看做是基于行，或者说index的聚合操作 users.groupby('occupation').age.agg(['min', 'max'])

文章图片

2.5 每个职业的男女比例

# create a data frame and apply count to gender # 根据'occupation'与'gender'两项做groupby分组，然后根据gender做计数统计 gender_ocup = users.groupby(['occupation', 'gender']).agg({'gender': 'count'}) gender_ocup.head()

文章图片

# create a DataFrame and apply count for each occupation occup_count = users.groupby(['occupation']).agg('count') occup_count.head()

文章图片

# divide the gender_ocup per the occup_count and multiply per 100 # div() 方法将 DataFrame 中的每个值除以指定的值。 occup_gender = gender_ocup.div(occup_count, level = "occupation") * 100 occup_gender.head()

文章图片

# present all rows from the 'gender column' occup_gender.loc[: , 'gender']

文章图片

3.数据集-3

文章图片

3.1 来自Nighthawks的regiment的平均值

regiment[regiment['regiment'] == 'Nighthawks'].groupby('regiment').mean()

文章图片

3.2 显示按团和公司分组的平均预测试分数，不带分层索引

# 当一个DataFrame有多个索引时，unstack() 这是一个根据索引行列转换的函数 regiment.groupby(['regiment', 'company']).preTestScore.mean().unstack()

文章图片

3.3 迭代一个组并打印来自该团的名称和整个数据

# Group the dataframe by regiment, and for each regiment, for name, group in regiment.groupby('regiment'): # print the name of the regiment print(name) # print the data of that regiment print(group)

【数据分析|pandas学习（三） grouping】