100 个 pandas 案例，强烈建议收藏！ _建议

花门楼前见秋草，岂能贫贱相看老。这篇文章主要讲述100 个 pandas 案例，强烈建议收藏！相关的知识，希望能为你提供帮助。
文章很长，如果短时间看不完那就收藏吧，总会用到的

如何使用列表和字典创建 Series
使用列表创建 Series
使用 name 参数创建 Series
使用简写的列表创建 Series
使用字典创建 Series
如何使用 Numpy 函数创建 Series
如何获取 Series 的索引和值
如何在创建 Series 时指定索引
如何获取 Series 的大小和形状
如何获取 Series 开始或末尾几行数据
Head()
Tail()
Take()
使用切片获取 Series 子集
如何创建 DataFrame
如何设置 DataFrame 的索引和列信息
如何重命名 DataFrame 的列名称
如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行
在 DataFrame 中使用“isin”过滤多行
迭代 DataFrame 的行和列
如何通过名称或索引删除 DataFrame 的列
向 DataFrame 中新增列
如何从 DataFrame 中获取列标题列表
如何随机生成 DataFrame
如何选择 DataFrame 的多个列
如何将字典转换为 DataFrame
使用 ioc 进行切片
检查 DataFrame 中是否是空的
在创建 DataFrame 时指定索引和列名称
使用 iloc 进行切片
iloc 和 loc 的区别
使用时间索引创建空 DataFrame
如何改变 DataFrame 列的排序
检查 DataFrame 列的数据类型
更改 DataFrame 指定列的数据类型
如何将列的数据类型转换为 DateTime 类型
将 DataFrame 列从 floats 转为 ints
如何把 dates 列转换为 DateTime 类型
两个 DataFrame 相加
在 DataFrame 末尾添加额外的行
为指定索引添加新行
如何使用 for 循环添加行
在 DataFrame 顶部添加一行
如何向 DataFrame 中动态添加行
在任意位置插入行
使用时间戳索引向 DataFrame 中添加行
为不同的行填充缺失值
append, concat 和 combine_first 示例
获取行和列的平均值
计算行和列的总和
连接两列
过滤包含某字符串的行
过滤索引中包含某字符串的行
使用 AND 运算符过滤包含特定字符串值的行
查找包含某字符串的所有行
如果行中的值包含字符串，则创建与字符串相等的另一列
计算 pandas group 中每组的行数
检查字符串是否在 DataFrme 中
从 DataFrame 列中获取唯一行值
计算 DataFrame 列的不同值
删除具有重复索引的行
删除某些列具有重复值的行
从 DataFrame 单元格中获取值
使用 DataFrame 中的条件索引获取单元格上的标量值
设置 DataFrame 的特定单元格值
从 DataFrame 行获取单元格值
用字典替换 DataFrame 列中的值
统计基于某一列的一列的数值
处理 DataFrame 中的缺失值
删除包含任何缺失数据的行
删除 DataFrame 中缺失数据的列
按降序对索引值进行排序
按降序对列进行排序
使用 rank 方法查找 DataFrame 中元素的排名
在多列上设置索引
确定 DataFrame 的周期索引和列
导入 CSV 指定特定索引
将 DataFrame 写入 csv
使用 Pandas 读取 csv 文件的特定列
Pandas 获取 CSV 列的列表
找到列值最大的行
使用查询方法进行复杂条件选择
检查 Pandas 中是否存在列
为特定列从 DataFrame 中查找 n-smallest 和 n-largest 值
从 DataFrame 中查找所有列的最小值和最大值
在 DataFrame 中找到最小值和最大值所在的索引位置
计算 DataFrame Columns 的累积乘积和累积总和
汇总统计
查找 DataFrame 的均值、中值和众数
测量 DataFrame 列的方差和标准偏差
计算 DataFrame 列之间的协方差
计算 Pandas 中两个 DataFrame 对象之间的相关性
计算 DataFrame 列的每个单元格的百分比变化
在 Pandas 中向前和向后填充 DataFrame 列的缺失值
在 Pandas 中使用非分层索引使用 Stacking
使用分层索引对 Pandas 进行拆分
Pandas 获取 html 页面上 table 数据

1如何使用列表和字典创建 Series使用列表创建 Series

import pandas as pd

ser1 = pd.Series([1.5, 2.5, 3, 4.5, 5.0, 6])
print(ser1)

Output:

01.5
12.5
23.0
34.5
45.0
56.0
dtype: float64

使用 name 参数创建 Series

import pandas as pd

ser2 = pd.Series(["India", "Canada", "Germany"], name="Countries")
print(ser2)

Output:

0India
1Canada
2Germany
Name: Countries, dtype: object

使用简写的列表创建 Series

import pandas as pd

ser3 = pd.Series(["A"]*4)
print(ser3)

Output:

0A
1A
2A
3A
dtype: object

使用字典创建 Series

import pandas as pd

ser4 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")
print(ser4)

Output:

IndiaNew Delhi
JapanTokyo
UKLondon
dtype: object

2如何使用 Numpy 函数创建 Series

import pandas as pd
import numpy as np

ser1 = pd.Series(np.linspace(1, 10, 5))
print(ser1)

ser2 = pd.Series(np.random.normal(size=5))
print(ser2)

Output:

01.00
13.25
25.50
37.75
410.00
dtype: float64
0-1.694452
1-1.570006
21.713794
30.338292
40.803511
dtype: float64

3如何获取 Series 的索引和值

import pandas as pd
import numpy as np

ser1 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")

print(ser1.values)
print(ser1.index)

print("\\n")

ser2 = pd.Series(np.random.normal(size=5))
print(ser2.index)
print(ser2.values)

Output:

[New Delhi Tokyo London]
Index([India, Japan, UK], dtype=object)

RangeIndex(start=0, stop=5, step=1)
[ 0.66265478 -0.722222110.36086421.409554361.3096732 ]

4如何在创建 Series 时指定索引

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print(ser1)

Output:

INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object

5如何获取 Series 的大小和形状

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print(len(ser1))

print(ser1.shape)

print(ser1.size)

Output:

6
(6,)
6

6如何获取 Series 开始或末尾几行数据Head()

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Head()-----")
print(ser1.head())

print("\\n\\n-----Head(2)-----")
print(ser1.head(2))

Output:

-----Head()-----
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
dtype: object

-----Head(2)-----
INDIndia
CANCanada
dtype: object

Tail()

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Tail()-----")
print(ser1.tail())

print("\\n\\n-----Tail(2)-----")
print(ser1.tail(2))

Output:

-----Tail()-----
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object

-----Tail(2)-----
GERGermany
FRAFrance
dtype: object

Take()

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

ser1 = pd.Series(values, index=code)

print("-----Take()-----")
print(ser1.take([2, 4, 5]))

Output:

-----Take()-----
AUSAustralia
GERGermany
FRAFrance
dtype: object

7使用切片获取 Series 子集

import pandas as pd

num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]

idx = [A, B, C, D, E, F, G, H, I, J]

series = pd.Series(num, index=idx)

print("\\n [2:2] \\n")
print(series[2:4])

print("\\n [1:6:2] \\n")
print(series[1:6:2])

print("\\n [:6] \\n")
print(series[:6])

print("\\n [4:] \\n")
print(series[4:])

print("\\n [:4:2] \\n")
print(series[:4:2])

print("\\n [4::2] \\n")
print(series[4::2])

print("\\n [::-1] \\n")
print(series[::-1])

Output

[2:2]

C200
D300
dtype: int64

[1:6:2]

B100
D300
F500
dtype: int64

[:6]

A0
B100
C200
D300
E400
F500
dtype: int64

[4:]

E400
F500
G600
H700
I800
J900
dtype: int64

[:4:2]

A0
C200
dtype: int64

[4::2]

E400
G600
I800
dtype: int64

[::-1]

J900
I800
H700
G600
F500
E400
D300
C200
B100
A0
dtype: int64

8如何创建 DataFrame

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])

print(employees)

Output:

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician

9如何设置 DataFrame 的索引和列信息

import pandas as pd

employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])

print(employees)

Output

NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624

10如何重命名 DataFrame 的列名称

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])

employees.columns = [EmpCode, EmpName, EmpOccupation, EmpDOJ, EmpAge]

print(employees)

Output:

EmpCodeEmpName EmpOccupationEmpDOJEmpAge
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician

11如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\nUse == operator\\n")
print(employees.loc[employees[Age] == 23])

print("\\nUse < operator\\n")
print(employees.loc[employees[Age] < 30])

print("\\nUse != operator\\n")
print(employees.loc[employees[Occupation] != Statistician])

print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] != Statistician) &
(employees[Name] == John)])

Output:

Use == operator

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

Use < operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
3292018-02-26Emp004SparkStatistician

Use != operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer

Multiple Conditions

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

12在 DataFrame 中使用“isin”过滤多行

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\nUse isin operator\\n")
print(employees.loc[employees[Occupation].isin([Chemist,Programmer])])

print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] == Chemist) |
(employees[Name] == John) &
(employees[Age] < 30)])

Output:

Use isin operator

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer

Multiple Conditions

Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist

13迭代 DataFrame 的行和列

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\n Example iterrows \\n")
for index, col in employees.iterrows():
print(col[Name], "--", col[Age])

print("\\n Example itertuples \\n")
for row in employees.itertuples(index=True, name=Pandas):
print(getattr(row, "Name"), "--", getattr(row, "Age"))

Output:

Example iterrows

John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40

Example itertuples

John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40

14如何通过名称或索引删除 DataFrame 的列

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print(employees)

print("\\n Drop Column by Name \\n")
employees.drop(Age, axis=1, inplace=True)
print(employees)

print("\\n Drop Column by Index \\n")
employees.drop(employees.columns[[0,1]], axis=1, inplace=True)
print(employees)

Output:

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer

Drop Column by Name

Date Of Join EmpCodeNameOccupation
02018-01-25Emp001JohnChemist
12018-01-26Emp002DoeStatistician
22018-01-26Emp003WilliamStatistician
32018-02-26Emp004SparkStatistician
42018-03-16Emp005MarkProgrammer

Drop Column by Index

NameOccupation
0JohnChemist
1DoeStatistician
2WilliamStatistician
3SparkStatistician
4MarkProgrammer

15向 DataFrame 中新增列

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

employees[City] = [London, Tokyo, Sydney, London, Toronto]

print(employees)

Output:

Age Date Of Join EmpCodeNameOccupationCity
0232018-01-25Emp001JohnChemistLondon
1242018-01-26Emp002DoeStatisticianTokyo
2342018-01-26Emp003WilliamStatisticianSydney
3292018-02-26Emp004SparkStatisticianLondon
4402018-03-16Emp005MarkProgrammerToronto

16如何从 DataFrame 中获取列标题列表

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print(list(employees))

print(list(employees.columns.values))

print(employees.columns.tolist())

Output:

[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]

17如何随机生成 DataFrame

import pandas as pd
import numpy as np

np.random.seed(5)

df_random = pd.DataFrame(np.random.randint(100, size=(10, 6)),
columns=list(ABCDEF),
index=[Row-.format(i) for i in range(10)])

print(df_random)

Output:

ABCDEF
Row-099786116738
Row-162273080776
Row-2155380274477
Row-3756547308486
Row-41894162182
Row-51678558080
Row-64365127312
Row-768388319187
Row-8306211676555
Row-939178272933

18如何选择 DataFrame 的多个列

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

df = employees[[EmpCode, Age, Name]]
print(df)

Output:

EmpCodeAgeName
0Emp00123John
1Emp00224Doe
2Emp00334William
3Emp00429Spark
4Emp00540Mark

19如何将字典转换为 DataFrame

import pandas as pd

data = https://www.songbingjia.com/android/(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
)
print(data)

df = pd.DataFrame(data)

print(df)

Output:

Height: [165, 70, 120, 80, 180, 172, 150], Food: [Steak, Lamb, Mango,
Apple, Cheese, Melon, Beans], Age: [30, 20, 22, 40, 32, 28, 39], Sco
re: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], Color: [Blue, Green, Red, Whi
te, Gray, Black, Red], State: [NY, TX, FL, AL, AK, TX, TX
]
AgeColorFoodHeightScore State
030BlueSteak1654.6NY
120GreenLamb708.3TX
222RedMango1209.0FL
340WhiteApple803.3AL
432GrayCheese1801.8AK
528BlackMelon1729.5TX
639RedBeans1502.2TX

20使用 ioc 进行切片

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Selecting a single row with .loc with a string -- \\n")
print(df.loc[Penelope])

print("\\n -- Selecting multiple rows with .loc with a list of strings -- \\n")
print(df.loc[[Cornelia, Jane, Dean]])

print("\\n -- Selecting multiple rows with .loc with slice notation -- \\n")
print(df.loc[Aaron:Dean])

Output:

-- Selecting a single row with .loc with a string --

Age40
ColorWhite
FoodApple
Height80
Score3.3
StateAL
Name: Penelope, dtype: object

-- Selecting multiple rows with .loc with a list of strings --

Age ColorFoodHeightScore State
Cornelia39RedBeans1502.2TX
Jane30BlueSteak1654.6NY
Dean32GrayCheese1801.8AK

-- Selecting multiple rows with .loc with slice notation --

AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Penelope40WhiteApple803.3AL
Dean32GrayCheese1801.8AK

21检查 DataFrame 中是否是空的

import pandas as pd

df = pd.DataFrame()

if df.empty:
print(DataFrame is empty!)

Output:

DataFrame is empty!

22在创建 DataFrame 时指定索引和列名称

import pandas as pd

values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]

code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]

df = pd.DataFrame(values, index=code, columns=[Country])

print(df)

Output:

Country
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance

23使用 iloc 进行切片

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Selecting a single row with .iloc with an integer -- \\n")
print(df.iloc[4])

print("\\n -- Selecting multiple rows with .iloc with a list of integers -- \\n")
print(df.iloc[[2, -2]])

print("\\n -- Selecting multiple rows with .iloc with slice notation -- \\n")
print(df.iloc[:5:3])

Output:

-- Selecting a single row with .iloc with an integer --

Age32
ColorGray
FoodCheese
Height180
Score1.8
StateAK
Name: Dean, dtype: object

-- Selecting multiple rows with .iloc with a list of integers --

AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Christina28BlackMelon1729.5TX

-- Selecting multiple rows with .iloc with slice notation --

AgeColorFoodHeightScore State
Jane30BlueSteak1654.6NY
Penelope40WhiteApple803.3AL

24iloc 和 loc 的区别

loc 索引器还可以进行布尔选择，例如，如果我们想查找 Age 小于 30 的所有行并仅返回 Color 和 Height 列，我们可以执行以下操作。我们可以用 iloc 复制它，但我们不能将它传递给一个布尔系列，必须将布尔系列转换为 numpy 数组
loc 从索引中获取具有特定标签的行（或列）
iloc 在索引中的特定位置获取行（或列）（因此它只需要整数）

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- loc -- \\n")
print(df.loc[df[Age] < 30, [Color, Height]])

print("\\n -- iloc -- \\n")
print(df.iloc[(df[Age] < 30).values, [1, 3]])

Output:

-- loc --

ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172

-- iloc --

ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172

25使用时间索引创建空 DataFrame

import datetime
import pandas as pd

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date, periods=10, freq=D)

columns = [A, B, C]

df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0)

print(df)

Output:

ABC
2018-09-30000
2018-10-01000
2018-10-02000
2018-10-03000
2018-10-04000
2018-10-05000
2018-10-06000
2018-10-07000
2018-10-08000
2018-10-09000

26如何改变 DataFrame 列的排序

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n -- Change order using columns -- \\n")
new_order = [3, 2, 1, 4, 5, 0]
df = df[df.columns[new_order]]
print(df)

print("\\n -- Change order using reindex -- \\n")
df = df.reindex([State, Color, Age, Food, Score, Height], axis=1)
print(df)

Output:

-- Change order using columns --

HeightFoodColorScore StateAge
Jane165SteakBlue4.6NY30
Nick70LambGreen8.3TX20
Aaron120MangoRed9.0FL22
Penelope80AppleWhite3.3AL40
Dean180CheeseGray1.8AK32
Christina172MelonBlack9.5TX28
Cornelia150BeansRed2.2TX39

-- Change order using reindex --

StateColorAgeFoodScoreHeight
JaneNYBlue30Steak4.6165
NickTXGreen20Lamb8.370
AaronFLRed22Mango9.0120
PenelopeALWhite40Apple3.380
DeanAKGray32Cheese1.8180
ChristinaTXBlack28Melon9.5172
CorneliaTXRed39Beans2.2150

27检查 DataFrame 列的数据类型

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print(df.dtypes)

Output:

Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object

28更改 DataFrame 指定列的数据类型

import pandas as pd

df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print(df.dtypes)

df[Age] = df[Age].astype(str)

print(df.dtypes)

Output:

Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object
Ageobject
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object

29如何将列的数据类型转换为 DateTime 类型

import pandas as pd

df = pd.DataFrame(DateOFBirth: [1349720105, 1349806505, 1349892905,
1349979305, 1350065705, 1349792905,
1349730105],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)

df[DateOFBirth] = pd.to_datetime(df[DateOFBirth], unit=s)

print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)

Output:

----------------Before---------------

DateOFBirthint64
Stateobject
dtype: object
DateOFBirth State
Jane1349720105NY
Nick1349806505TX
Aaron1349892905FL
Penelope1349979305AL
Dean1350065705AK
Christina1349792905TX
Cornelia1349730105TX

----------------After----------------

DateOFBirthdatetime64[ns]
Stateobject
dtype: object
DateOFBirth State
Jane2012-10-08 18:15:05NY
Nick2012-10-09 18:15:05TX
Aaron2012-10-10 18:15:05FL
Penelope2012-10-11 18:15:05AL
Dean2012-10-12 18:15:05AK
Christina 2012-10-09 14:28:25TX
Cornelia2012-10-08 21:01:45TX

30将 DataFrame 列从 floats 转为 ints

import pandas as pd

df = pd.DataFrame(DailyExp: [75.7, 56.69, 55.69, 96.5, 84.9, 110.5,
58.9],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)

df[DailyExp] = df[DailyExp].astype(int)

print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)

Output:

----------------Before---------------

DailyExpfloat64
Stateobject
dtype: object
DailyExp State
Jane75.70NY
Nick56.69TX
Aaron55.69FL
Penelope96.50AL
Dean84.90AK
Christina110.50TX
Cornelia58.90TX

----------------After----------------

DailyExpint32
Stateobject
dtype: object
DailyExp State
Jane75NY
Nick56TX
Aaron55FL
Penelope96AL
Dean84AK
Christina110TX
Cornelia58TX

31如何把 dates 列转换为 DateTime 类型

import pandas as pd

df = pd.DataFrame(DateOfBirth: [1986-11-11, 1999-05-12, 1976-01-01,
1986-06-01, 1983-06-04, 1990-03-07,
1999-07-09],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])

print("\\n----------------Before---------------\\n")
print(df.dtypes)

df[DateOfBirth] = df[DateOfBirth].astype(datetime64)

print("\\n----------------After----------------\\n")
print(df.dtypes)

Output:

----------------Before---------------

DateOfBirthobject
Stateobject
dtype: object

----------------After----------------

DateOfBirthdatetime64[ns]
Stateobject
dtype: object

32两个 DataFrame 相加

import pandas as pd

df1 = pd.DataFrame(Age: [30, 20, 22, 40], Height: [165, 70, 120, 80],
Score: [4.6, 8.3, 9.0, 3.3], State: [NY, TX,
FL, AL],
index=[Jane, Nick, Aaron, Penelope])

df2 = pd.DataFrame(Age: [32, 28, 39], Color: [Gray, Black, Red],
Food: [Cheese, Melon, Beans],
Score: [1.8, 9.5, 2.2], State: [AK, TX, TX],
index=[Dean, Christina, Cornelia])

df3 = df1.append(df2, sort=True)

print(df3)

Output:

AgeColorFoodHeightScore State
Jane30NaNNaN165.04.6NY
Nick20NaNNaN70.08.3TX
Aaron22NaNNaN120.09.0FL
Penelope40NaNNaN80.03.3AL
Dean32GrayCheeseNaN1.8AK
Christina28BlackMelonNaN9.5TX
Cornelia39RedBeansNaN2.2TX

33在 DataFrame 末尾添加额外的行

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

employees.loc[len(employees)] = [45, 2018-01-25, Emp006, Sunny,
Programmer]

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:

------------ BEFORE ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer

------------ AFTER ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer
5452018-01-25Emp006SunnyProgrammer

34为指定索引添加新行

import pandas as pd

employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

employees.loc[Emp003] = [Sunny, Programmer, 2018-01-25, 45]

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:

------------ BEFORE ----------------

NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624

------------ AFTER ----------------

NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624
Emp003SunnyProgrammer2018-01-2545

35如何使用 for 循环添加行

import pandas as pd

cols = [Zip]
lst = []
zip = 32100

for a in range(10):
lst.append([zip])
zip = zip + 1

df = pd.DataFrame(lst, columns=cols)

print(df)

Output:

Zip
032100
132101
232102
332103
432104
532105
632106
732107
832108
932109

36在 DataFrame 顶部添加一行

import pandas as pd

employees = pd.DataFrame(
EmpCode: [Emp002, Emp003, Emp004],
Name: [John, Doe, William],
Occupation: [Chemist, Statistician, Statistician],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26],
Age: [23, 24, 34])

print("\\n------------ BEFORE ----------------\\n")
print(employees)

# New line
line = pd.DataFrame(Name: Dean, Age: 45, EmpCode: Emp001,
Date Of Join: 2018-02-26, Occupation: Chemist
, index=[0])

# Concatenate two dataframe
employees = pd.concat([line,employees.ix[:]]).reset_index(drop=True)

print("\\n------------ AFTER ----------------\\n")
print(employees)

Output:

------------ BEFORE ----------------

Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp002JohnChemist
1242018-01-26Emp003DoeStatistician
2342018-01-26Emp004WilliamStatistician

------------ AFTER ----------------

Age Date Of Join EmpCodeNameOccupation
0452018-02-26Emp001DeanChemist
1232018-01-25Emp002JohnChemist
2242018-01-26Emp003DoeStatistician
3342018-01-26Emp004WilliamStatistician

37如何向 DataFrame 中动态添加行

import pandas as pd

df = pd.DataFrame(columns=[Name, Age])

df.loc[1, Name] = Rocky
df.loc[1, Age] = 23

df.loc[2, Name] = Sunny

print(df)

Output:

NameAge
1Rocky23
2SunnyNaN

38在任意位置插入行

import pandas as pd

df = pd.DataFrame(columns=[Name, Age])

df.loc[1, Name] = Rocky
df.loc[1, Age] = 21

df.loc[2, Name] = Sunny
df.loc[2, Age] = 22

df.loc[3, Name] = Mark
df.loc[3, Age] = 25

df.loc[4, Name] = Taylor
df.loc[4, Age] = 28

print("\\n------------ BEFORE ----------------\\n")
print(df)

line = pd.DataFrame("Name": "Jack", "Age": 24, index=[2.5])
df = df.append(line, ignore_index=False)
df = df.sort_index().reset_index(drop=True)

df = df.reindex([Name, Age], axis=1)
print("\\n------------ AFTER ----------------\\n")
print(df)

【100 个 pandas 案例，强烈建议收藏！】Output:

------------ BEFORE ----------------

Name Age
1Rocky21
2Sunny22
3Mark25
4Taylor28

------------ AFTER ----------------

Name Age
0Rocky21
1Sunny22
2Jack24
3Mark25
4Taylor28

39使用时间戳索引向 DataFrame 中添加行

import pandas as pd

df =

推荐阅读

电子烟的好处

做什么行业会利润高又赚钱？

乳胶内衣怎么清洗晾晒

梦见买枕巾是什么意思梦见枕巾是怎么回事

涉嫌非法种植毒品原植物法定刑罚内容是多少

自己读书感悟的名言

投稿|何时起数码玩家已不再期待Android新版本？

支付宝基金过年期间休市吗

犯七是什么意思,有什么讲究犯七是什么意思

如何选购新鲜白萝卜

米兔|小米新品儿童手表发售，米兔儿童电话手表5C入手体验

秦书记青岛高速公路青岛高速公路最新消息今天

法院对串通投标罪规定定罪量刑标准是怎样

嵌怎么读嵌字是什么意思

苹果11网络特别差是什么原因

贵阳市少年儿童图书馆春节活动一览

solidworks2012热分析

2023年南四湖禁渔/禁采时间南四湖鱼馆怎么样

奥迪车质量怎么样? 奥迪车质量怎么样

2021送同事元宵节祝福语

SQL编程题练习题（基础）#yyds干货盘点#

MySQL—— 数据库操作基础和常用语法（DDL,DML,DQL,DCL）

基于esbuild的universal bundler设计

#yyds干货盘点#Prometheus 之告警的艺术

U盘设置只读模式图文详细教程分享

为啥U盘显示0字节?

U盘不显示卷标怎样办？

电脑插入U盘蓝屏了怎样办?

u盘出现copy.exe失去怎样处理?