100 个 pandas 案例,强烈建议收藏!

花门楼前见秋草,岂能贫贱相看老。这篇文章主要讲述100 个 pandas 案例,强烈建议收藏!相关的知识,希望能为你提供帮助。
文章很长,如果短时间看不完那就收藏吧,总会用到的

  • 如何使用列表和字典创建 Series
  • 使用列表创建 Series
  • 使用 name 参数创建 Series
  • 使用简写的列表创建 Series
  • 使用字典创建 Series
  • 如何使用 Numpy 函数创建 Series
  • 如何获取 Series 的索引和值
  • 如何在创建 Series 时指定索引
  • 如何获取 Series 的大小和形状
  • 如何获取 Series 开始或末尾几行数据
  • Head()
  • Tail()
  • Take()
  • 使用切片获取 Series 子集
  • 如何创建 DataFrame
  • 如何设置 DataFrame 的索引和列信息
  • 如何重命名 DataFrame 的列名称
  • 如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行
  • 在 DataFrame 中使用“isin”过滤多行
  • 迭代 DataFrame 的行和列
  • 如何通过名称或索引删除 DataFrame 的列
  • 向 DataFrame 中新增列
  • 如何从 DataFrame 中获取列标题列表
  • 如何随机生成 DataFrame
  • 如何选择 DataFrame 的多个列
  • 如何将字典转换为 DataFrame
  • 使用 ioc 进行切片
  • 检查 DataFrame 中是否是空的
  • 在创建 DataFrame 时指定索引和列名称
  • 使用 iloc 进行切片
  • iloc 和 loc 的区别
  • 使用时间索引创建空 DataFrame
  • 如何改变 DataFrame 列的排序
  • 检查 DataFrame 列的数据类型
  • 更改 DataFrame 指定列的数据类型
  • 如何将列的数据类型转换为 DateTime 类型
  • 将 DataFrame 列从 floats 转为 ints
  • 如何把 dates 列转换为 DateTime 类型
  • 两个 DataFrame 相加
  • 在 DataFrame 末尾添加额外的行
  • 为指定索引添加新行
  • 如何使用 for 循环添加行
  • 在 DataFrame 顶部添加一行
  • 如何向 DataFrame 中动态添加行
  • 在任意位置插入行
  • 使用时间戳索引向 DataFrame 中添加行
  • 为不同的行填充缺失值
  • append, concat 和 combine_first 示例
  • 获取行和列的平均值
  • 计算行和列的总和
  • 连接两列
  • 过滤包含某字符串的行
  • 过滤索引中包含某字符串的行
  • 使用 AND 运算符过滤包含特定字符串值的行
  • 查找包含某字符串的所有行
  • 如果行中的值包含字符串,则创建与字符串相等的另一列
  • 计算 pandas group 中每组的行数
  • 检查字符串是否在 DataFrme 中
  • 从 DataFrame 列中获取唯一行值
  • 计算 DataFrame 列的不同值
  • 删除具有重复索引的行
  • 删除某些列具有重复值的行
  • 从 DataFrame 单元格中获取值
  • 使用 DataFrame 中的条件索引获取单元格上的标量值
  • 设置 DataFrame 的特定单元格值
  • 从 DataFrame 行获取单元格值
  • 用字典替换 DataFrame 列中的值
  • 统计基于某一列的一列的数值
  • 处理 DataFrame 中的缺失值
  • 删除包含任何缺失数据的行
  • 删除 DataFrame 中缺失数据的列
  • 按降序对索引值进行排序
  • 按降序对列进行排序
  • 使用 rank 方法查找 DataFrame 中元素的排名
  • 在多列上设置索引
  • 确定 DataFrame 的周期索引和列
  • 导入 CSV 指定特定索引
  • 将 DataFrame 写入 csv
  • 使用 Pandas 读取 csv 文件的特定列
  • Pandas 获取 CSV 列的列表
  • 找到列值最大的行
  • 使用查询方法进行复杂条件选择
  • 检查 Pandas 中是否存在列
  • 为特定列从 DataFrame 中查找 n-smallest 和 n-largest 值
  • 从 DataFrame 中查找所有列的最小值和最大值
  • 在 DataFrame 中找到最小值和最大值所在的索引位置
  • 计算 DataFrame Columns 的累积乘积和累积总和
  • 汇总统计
  • 查找 DataFrame 的均值、中值和众数
  • 测量 DataFrame 列的方差和标准偏差
  • 计算 DataFrame 列之间的协方差
  • 计算 Pandas 中两个 DataFrame 对象之间的相关性
  • 计算 DataFrame 列的每个单元格的百分比变化
  • 在 Pandas 中向前和向后填充 DataFrame 列的缺失值
  • 在 Pandas 中使用非分层索引使用 Stacking
  • 使用分层索引对 Pandas 进行拆分
  • Pandas 获取 html 页面上 table 数据
1如何使用列表和字典创建 Series使用列表创建 Series
import pandas as pd

ser1 = pd.Series([1.5, 2.5, 3, 4.5, 5.0, 6])
print(ser1)

Output:
01.5
12.5
23.0
34.5
45.0
56.0
dtype: float64

使用 name 参数创建 Series
import pandas as pd

ser2 = pd.Series(["India", "Canada", "Germany"], name="Countries")
print(ser2)

Output:
0India
1Canada
2Germany
Name: Countries, dtype: object

使用简写的列表创建 Series
import pandas as pd

ser3 = pd.Series(["A"]*4)
print(ser3)

Output:
0A
1A
2A
3A
dtype: object

使用字典创建 Series
import pandas as pd

ser4 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")
print(ser4)

Output:
IndiaNew Delhi
JapanTokyo
UKLondon
dtype: object

2如何使用 Numpy 函数创建 Series
import pandas as pd
import numpy as np

ser1 = pd.Series(np.linspace(1, 10, 5))
print(ser1)

ser2 = pd.Series(np.random.normal(size=5))
print(ser2)

Output:
01.00
13.25
25.50
37.75
410.00
dtype: float64
0-1.694452
1-1.570006
21.713794
30.338292
40.803511
dtype: float64

3如何获取 Series 的索引和值
import pandas as pd
import numpy as np

ser1 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")

print(ser1.values)
print(ser1.index)

print("\\n")

ser2 = pd.Series(np.random.normal(size=5))
print(ser2.index)
print(ser2.values)

Output:
[New Delhi Tokyo London]
Index([India, Japan, UK], dtype=object)


RangeIndex(start=0, stop=5, step=1)
[ 0.66265478 -0.722222110.36086421.409554361.3096732 ]

4如何在创建 Series 时指定索引
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print(ser1)

Output:
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object

5如何获取 Series 的大小和形状
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print(len(ser1))

print(ser1.shape)

print(ser1.size)

Output:
6
(6,)
6

6如何获取 Series 开始或末尾几行数据Head()
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Head()-----")
print(ser1.head())

print("\\n\\n-----Head(2)-----")
print(ser1.head(2))

Output:
-----Head()-----
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
dtype: object


-----Head(2)-----
INDIndia
CANCanada
dtype: object

Tail()
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Tail()-----")
print(ser1.tail())

print("\\n\\n-----Tail(2)-----")
print(ser1.tail(2))

Output:
-----Tail()-----
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object


-----Tail(2)-----
GERGermany
FRAFrance
dtype: object

Take()
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Take()-----")
print(ser1.take([2, 4, 5]))

Output:
-----Take()-----
AUSAustralia
GERGermany
FRAFrance
dtype: object

7使用切片获取 Series 子集
import pandas as pd

num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]

idx = [A, B, C, D, E, F, G, H, I, J]

series = pd.Series(num, index=idx)

print("\\n [2:2] \\n")
print(series[2:4])

print("\\n [1:6:2] \\n")
print(series[1:6:2])

print("\\n [:6] \\n")
print(series[:6])

print("\\n [4:] \\n")
print(series[4:])

print("\\n [:4:2] \\n")
print(series[:4:2])

print("\\n [4::2] \\n")
print(series[4::2])

print("\\n [::-1] \\n")
print(series[::-1])

Output
[2:2]

C200
D300
dtype: int64

[1:6:2]

B100
D300
F500
dtype: int64

[:6]

A0
B100
C200
D300
E400
F500
dtype: int64

[4:]

E400
F500
G600
H700
I800
J900
dtype: int64

[:4:2]

A0
C200
dtype: int64

[4::2]

E400
G600
I800
dtype: int64

[::-1]

J900
I800
H700
G600
F500
E400
D300
C200
B100
A0
dtype: int64

8如何创建 DataFrame
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])

print(employees)

Output:
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician

9如何设置 DataFrame 的索引和列信息
import pandas as pd

employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])

print(employees)

Output
NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624

10如何重命名 DataFrame 的列名称
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])

employees.columns = [EmpCode, EmpName, EmpOccupation, EmpDOJ, EmpAge]

print(employees)

Output:
EmpCodeEmpName EmpOccupationEmpDOJEmpAge
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician

11如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\nUse == operator\\n")
print(employees.loc[employees[Age] == 23])

print("\\nUse < operator\\n")
print(employees.loc[employees[Age] < 30])

print("\\nUse != operator\\n")
print(employees.loc[employees[Occupation] != Statistician])

print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] != Statistician) &
(employees[Name] == John)])

Output:
Use == operator

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

Use < operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
3292018-02-26Emp004SparkStatistician

Use != operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer

Multiple Conditions

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

12在 DataFrame 中使用“isin”过滤多行
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\nUse isin operator\\n")
print(employees.loc[employees[Occupation].isin([Chemist,Programmer])])

print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] == Chemist) |
(employees[Name] == John) &
(employees[Age] < 30)])

Output:
Use isin operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer

Multiple Conditions

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

13迭代 DataFrame 的行和列
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\n Example iterrows \\n")
for index, col in employees.iterrows():
print(col[Name], "--", col[Age])


print("\\n Example itertuples \\n")
for row in employees.itertuples(index=True, name=Pandas):
print(getattr(row, "Name"), "--", getattr(row, "Age"))

Output:
Example iterrows

John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40

Example itertuples

John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40

14如何通过名称或索引删除 DataFrame 的列
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print(employees)

print("\\n Drop Column by Name \\n")
employees.drop(Age, axis=1, inplace=True)
print(employees)

print("\\n Drop Column by Index \\n")
employees.drop(employees.columns[[0,1]], axis=1, inplace=True)
print(employees)

Output:
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer

Drop Column by Name

Date Of Join EmpCodeNameOccupation
02018-01-25Emp001JohnChemist
12018-01-26Emp002DoeStatistician
22018-01-26Emp003WilliamStatistician
32018-02-26Emp004SparkStatistician
42018-03-16Emp005MarkProgrammer

Drop Column by Index

NameOccupation
0JohnChemist
1DoeStatistician
2WilliamStatistician
3SparkStatistician
4MarkProgrammer

15向 DataFrame 中新增列
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

employees[City] = [London, Tokyo, Sydney, London, Toronto]

print(employees)

Output:
Age Date Of Join EmpCodeNameOccupationCity
0232018-01-25Emp001JohnChemistLondon
1242018-01-26Emp002DoeStatisticianTokyo
2342018-01-26Emp003WilliamStatisticianSydney
3292018-02-26Emp004SparkStatisticianLondon
4402018-03-16Emp005MarkProgrammerToronto

16如何从 DataFrame 中获取列标题列表
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print(list(employees))

print(list(employees.columns.values))

print(employees.columns.tolist())

Output:
[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]

17如何随机生成 DataFrame
import pandas as pd
import numpy as np

np.random.seed(5)

df_random = pd.DataFrame(np.random.randint(100, size=(10, 6)),
columns=list(ABCDEF),
index=[Row-.format(i) for i in range(10)])

print(df_random)

Output:
ABCDEF
Row-099786116738
Row-162273080776
Row-2155380274477
Row-3756547308486
Row-41894162182
Row-51678558080
Row-64365127312
Row-768388319187
Row-8306211676555
Row-939178272933

18如何选择 DataFrame 的多个列
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

df = employees[[EmpCode, Age, Name]]
print(df)

Output:
EmpCodeAgeName
0Emp00123John
1Emp00224Doe
2Emp00334William
3Emp00429Spark
4Emp00540Mark

19如何将字典转换为 DataFrame
import pandas as pd

data = https://www.songbingjia.com/android/(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
)
print(data)

df = pd.DataFrame(data)

print(df)

Output:
Height: [165, 70, 120, 80, 180, 172, 150], Food: [Steak, Lamb, Mango,
Apple, Cheese, Melon, Beans], Age: [30, 20, 22, 40, 32, 28, 39], Sco
re: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], Color: [Blue, Green, Red, Whi
te, Gray, Black, Red], State: [NY, TX, FL, AL, AK, TX, TX
]
AgeColorFoodHeightScore State
030BlueSteak1654.6NY
120GreenLamb708.3TX
222RedMango1209.0FL
340WhiteApple803.3AL
432GrayCheese1801.8AK
528BlackMelon1729.5TX
639RedBeans1502.2TX

20使用 ioc 进行切片
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Selecting a single row with .loc with a string -- \\n")
print(df.loc[Penelope])

print("\\n -- Selecting multiple rows with .loc with a list of strings -- \\n")
print(df.loc[[Cornelia, Jane, Dean]])

print("\\n -- Selecting multiple rows with .loc with slice notation -- \\n")
print(df.loc[Aaron:Dean])

Output:
-- Selecting a single row with .loc with a string --

Age40
ColorWhite
FoodApple
Height80
Score3.3
StateAL
Name: Penelope, dtype: object

-- Selecting multiple rows with .loc with a list of strings --

Age ColorFoodHeightScore State
Cornelia39RedBeans1502.2TX
Jane30BlueSteak1654.6NY
Dean32GrayCheese1801.8AK

-- Selecting multiple rows with .loc with slice notation --

AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Penelope40WhiteApple803.3AL
Dean32GrayCheese1801.8AK

21检查 DataFrame 中是否是空的
import pandas as pd

df = pd.DataFrame()

if df.empty:
print(DataFrame is empty!)

Output:
DataFrame is empty!

22在创建 DataFrame 时指定索引和列名称
import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

df = pd.DataFrame(values, index=code, columns=[Country])

print(df)

Output:
Country
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance

23使用 iloc 进行切片
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Selecting a single row with .iloc with an integer -- \\n")
print(df.iloc[4])

print("\\n -- Selecting multiple rows with .iloc with a list of integers -- \\n")
print(df.iloc[[2, -2]])

print("\\n -- Selecting multiple rows with .iloc with slice notation -- \\n")
print(df.iloc[:5:3])

Output:
-- Selecting a single row with .iloc with an integer --

Age32
ColorGray
FoodCheese
Height180
Score1.8
StateAK
Name: Dean, dtype: object

-- Selecting multiple rows with .iloc with a list of integers --

AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Christina28BlackMelon1729.5TX

-- Selecting multiple rows with .iloc with slice notation --

AgeColorFoodHeightScore State
Jane30BlueSteak1654.6NY
Penelope40WhiteApple803.3AL

24iloc 和 loc 的区别
  • loc 索引器还可以进行布尔选择,例如,如果我们想查找 Age 小于 30 的所有行并仅返回 Color 和 Height 列,我们可以执行以下操作。我们可以用 iloc 复制它,但我们不能将它传递给一个布尔系列,必须将布尔系列转换为 numpy 数组
  • loc 从索引中获取具有特定标签的行(或列)
  • iloc 在索引中的特定位置获取行(或列)(因此它只需要整数)
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- loc -- \\n")
print(df.loc[df[Age] < 30, [Color, Height]])

print("\\n -- iloc -- \\n")
print(df.iloc[(df[Age] < 30).values, [1, 3]])

Output:
-- loc --

ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172

-- iloc --

ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172

25使用时间索引创建空 DataFrame
import datetime
import pandas as pd

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date, periods=10, freq=D)

columns = [A, B, C]

df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0)

print(df)

Output:
ABC
2018-09-30000
2018-10-01000
2018-10-02000
2018-10-03000
2018-10-04000
2018-10-05000
2018-10-06000
2018-10-07000
2018-10-08000
2018-10-09000

26如何改变 DataFrame 列的排序
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Change order using columns -- \\n")
new_order = [3, 2, 1, 4, 5, 0]
df = df[df.columns[new_order]]
print(df)

print("\\n -- Change order using reindex -- \\n")
df = df.reindex([State, Color, Age, Food, Score, Height], axis=1)
print(df)

Output:
-- Change order using columns --

HeightFoodColorScore StateAge
Jane165SteakBlue4.6NY30
Nick70LambGreen8.3TX20
Aaron120MangoRed9.0FL22
Penelope80AppleWhite3.3AL40
Dean180CheeseGray1.8AK32
Christina172MelonBlack9.5TX28
Cornelia150BeansRed2.2TX39

-- Change order using reindex --

StateColorAgeFoodScoreHeight
JaneNYBlue30Steak4.6165
NickTXGreen20Lamb8.370
AaronFLRed22Mango9.0120
PenelopeALWhite40Apple3.380
DeanAKGray32Cheese1.8180
ChristinaTXBlack28Melon9.5172
CorneliaTXRed39Beans2.2150

27检查 DataFrame 列的数据类型
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print(df.dtypes)

Output:
Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object

28更改 DataFrame 指定列的数据类型
import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print(df.dtypes)

df[Age] = df[Age].astype(str)

print(df.dtypes)

Output:
Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object
Ageobject
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object

29如何将列的数据类型转换为 DateTime 类型
import pandas as pd

df = pd.DataFrame(DateOFBirth: [1349720105, 1349806505, 1349892905,
1349979305, 1350065705, 1349792905,
1349730105],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)

df[DateOFBirth] = pd.to_datetime(df[DateOFBirth], unit=s)

print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)

Output:
----------------Before---------------

DateOFBirthint64
Stateobject
dtype: object
DateOFBirth State
Jane1349720105NY
Nick1349806505TX
Aaron1349892905FL
Penelope1349979305AL
Dean1350065705AK
Christina1349792905TX
Cornelia1349730105TX

----------------After----------------

DateOFBirthdatetime64[ns]
Stateobject
dtype: object
DateOFBirth State
Jane2012-10-08 18:15:05NY
Nick2012-10-09 18:15:05TX
Aaron2012-10-10 18:15:05FL
Penelope2012-10-11 18:15:05AL
Dean2012-10-12 18:15:05AK
Christina 2012-10-09 14:28:25TX
Cornelia2012-10-08 21:01:45TX

30将 DataFrame 列从 floats 转为 ints
import pandas as pd

df = pd.DataFrame(DailyExp: [75.7, 56.69, 55.69, 96.5, 84.9, 110.5,
58.9],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)

df[DailyExp] = df[DailyExp].astype(int)

print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)

Output:
----------------Before---------------

DailyExpfloat64
Stateobject
dtype: object
DailyExp State
Jane75.70NY
Nick56.69TX
Aaron55.69FL
Penelope96.50AL
Dean84.90AK
Christina110.50TX
Cornelia58.90TX

----------------After----------------

DailyExpint32
Stateobject
dtype: object
DailyExp State
Jane75NY
Nick56TX
Aaron55FL
Penelope96AL
Dean84AK
Christina110TX
Cornelia58TX

31如何把 dates 列转换为 DateTime 类型
import pandas as pd

df = pd.DataFrame(DateOfBirth: [1986-11-11, 1999-05-12, 1976-01-01,
1986-06-01, 1983-06-04, 1990-03-07,
1999-07-09],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)

df[DateOfBirth] = df[DateOfBirth].astype(datetime64)

print("\\n----------------After----------------\\n")
print(df.dtypes)

Output:
----------------Before---------------

DateOfBirthobject
Stateobject
dtype: object

----------------After----------------

DateOfBirthdatetime64[ns]
Stateobject
dtype: object

32两个 DataFrame 相加
import pandas as pd

df1 = pd.DataFrame(Age: [30, 20, 22, 40], Height: [165, 70, 120, 80],
Score: [4.6, 8.3, 9.0, 3.3], State: [NY, TX,
FL, AL],
index=[Jane, Nick, Aaron, Penelope])

df2 = pd.DataFrame(Age: [32, 28, 39], Color: [Gray, Black, Red],
Food: [Cheese, Melon, Beans],
Score: [1.8, 9.5, 2.2], State: [AK, TX, TX],
index=[Dean, Christina, Cornelia])

df3 = df1.append(df2, sort=True)

print(df3)

Output:
AgeColorFoodHeightScore State
Jane30NaNNaN165.04.6NY
Nick20NaNNaN70.08.3TX
Aaron22NaNNaN120.09.0FL
Penelope40NaNNaN80.03.3AL
Dean32GrayCheeseNaN1.8AK
Christina28BlackMelonNaN9.5TX
Cornelia39RedBeansNaN2.2TX

33在 DataFrame 末尾添加额外的行
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

employees.loc[len(employees)] = [45, 2018-01-25, Emp006, Sunny,
Programmer]

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:
------------ BEFORE ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer

------------ AFTER ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer
5452018-01-25Emp006SunnyProgrammer

34为指定索引添加新行
import pandas as pd

employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

employees.loc[Emp003] = [Sunny, Programmer, 2018-01-25, 45]

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:
------------ BEFORE ----------------

NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624

------------ AFTER ----------------

NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624
Emp003SunnyProgrammer2018-01-2545

35如何使用 for 循环添加行
import pandas as pd

cols = [Zip]
lst = []
zip = 32100

for a in range(10):
lst.append([zip])
zip = zip + 1

df = pd.DataFrame(lst, columns=cols)

print(df)

Output:
Zip
032100
132101
232102
332103
432104
532105
632106
732107
832108
932109

36在 DataFrame 顶部添加一行
import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp002, Emp003, Emp004],
Name: [John, Doe, William],
Occupation: [Chemist, Statistician, Statistician],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26],
Age: [23, 24, 34])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

# New line
line = pd.DataFrame(Name: Dean, Age: 45, EmpCode: Emp001,
Date Of Join: 2018-02-26, Occupation: Chemist
, index=[0])

# Concatenate two dataframe
employees = pd.concat([line,employees.ix[:]]).reset_index(drop=True)

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:
------------ BEFORE ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp002JohnChemist
1242018-01-26Emp003DoeStatistician
2342018-01-26Emp004WilliamStatistician

------------ AFTER ----------------

Age Date Of Join EmpCodeNameOccupation
0452018-02-26Emp001DeanChemist
1232018-01-25Emp002JohnChemist
2242018-01-26Emp003DoeStatistician
3342018-01-26Emp004WilliamStatistician

37如何向 DataFrame 中动态添加行
import pandas as pd

df = pd.DataFrame(columns=[Name, Age])

df.loc[1, Name] = Rocky
df.loc[1, Age] = 23

df.loc[2, Name] = Sunny

print(df)

Output:
NameAge
1Rocky23
2SunnyNaN

38在任意位置插入行
import pandas as pd

df = pd.DataFrame(columns=[Name, Age])

df.loc[1, Name] = Rocky
df.loc[1, Age] = 21

df.loc[2, Name] = Sunny
df.loc[2, Age] = 22

df.loc[3, Name] = Mark
df.loc[3, Age] = 25

df.loc[4, Name] = Taylor
df.loc[4, Age] = 28

print("\\n------------ BEFORE ----------------\\n")
print(df)

line = pd.DataFrame("Name": "Jack", "Age": 24, index=[2.5])
df = df.append(line, ignore_index=False)
df = df.sort_index().reset_index(drop=True)

df = df.reindex([Name, Age], axis=1)
print("\\n------------ AFTER ----------------\\n")
print(df)

【100 个 pandas 案例,强烈建议收藏!】Output:
------------ BEFORE ----------------

Name Age
1Rocky21
2Sunny22
3Mark25
4Taylor28

------------ AFTER ----------------

Name Age
0Rocky21
1Sunny22
2Jack24
3Mark25
4Taylor28

39使用时间戳索引向 DataFrame 中添加行
import pandas as pd

df =

    推荐阅读