花门楼前见秋草,岂能贫贱相看老。这篇文章主要讲述100 个 pandas 案例,强烈建议收藏!相关的知识,希望能为你提供帮助。
1如何使用列表和字典创建 Series使用列表创建 Series
import pandas as pd
ser1 = pd.Series([1.5, 2.5, 3, 4.5, 5.0, 6])
print(ser1)
Output:
01.5
12.5
23.0
34.5
45.0
56.0
dtype: float64
使用 name 参数创建 Series
import pandas as pd
ser2 = pd.Series(["India", "Canada", "Germany"], name="Countries")
print(ser2)
Output:
0India
1Canada
2Germany
Name: Countries, dtype: object
使用简写的列表创建 Series
import pandas as pd
ser3 = pd.Series(["A"]*4)
print(ser3)
Output:
0A
1A
2A
3A
dtype: object
使用字典创建 Series
import pandas as pd
ser4 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")
print(ser4)
Output:
IndiaNew Delhi
JapanTokyo
UKLondon
dtype: object
2如何使用 Numpy 函数创建 Seriesimport pandas as pd
import numpy as np
ser1 = pd.Series(np.linspace(1, 10, 5))
print(ser1)
ser2 = pd.Series(np.random.normal(size=5))
print(ser2)
Output:
01.00
13.25
25.50
37.75
410.00
dtype: float64
0-1.694452
1-1.570006
21.713794
30.338292
40.803511
dtype: float64
3如何获取 Series 的索引和值import pandas as pd
import numpy as np
ser1 = pd.Series("India": "New Delhi",
"Japan": "Tokyo",
"UK": "London")
print(ser1.values)
print(ser1.index)
print("\\n")
ser2 = pd.Series(np.random.normal(size=5))
print(ser2.index)
print(ser2.values)
Output:
[New Delhi Tokyo London]
Index([India, Japan, UK], dtype=object)
RangeIndex(start=0, stop=5, step=1)
[ 0.66265478 -0.722222110.36086421.409554361.3096732 ]
4如何在创建 Series 时指定索引import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print(ser1)
Output:
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object
5如何获取 Series 的大小和形状import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print(len(ser1))
print(ser1.shape)
print(ser1.size)
Output:
6
(6,)
6
6如何获取 Series 开始或末尾几行数据Head()
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Head()-----")
print(ser1.head())
print("\\n\\n-----Head(2)-----")
print(ser1.head(2))
Output:
-----Head()-----
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
dtype: object
-----Head(2)-----
INDIndia
CANCanada
dtype: object
Tail()
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Tail()-----")
print(ser1.tail())
print("\\n\\n-----Tail(2)-----")
print(ser1.tail(2))
Output:
-----Tail()-----
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
dtype: object
-----Tail(2)-----
GERGermany
FRAFrance
dtype: object
Take()
import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
ser1 = pd.Series(values, index=code)
print("-----Take()-----")
print(ser1.take([2, 4, 5]))
Output:
-----Take()-----
AUSAustralia
GERGermany
FRAFrance
dtype: object
7使用切片获取 Series 子集import pandas as pd
num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]
idx = [A, B, C, D, E, F, G, H, I, J]
series = pd.Series(num, index=idx)
print("\\n [2:2] \\n")
print(series[2:4])
print("\\n [1:6:2] \\n")
print(series[1:6:2])
print("\\n [:6] \\n")
print(series[:6])
print("\\n [4:] \\n")
print(series[4:])
print("\\n [:4:2] \\n")
print(series[:4:2])
print("\\n [4::2] \\n")
print(series[4::2])
print("\\n [::-1] \\n")
print(series[::-1])
Output
[2:2]
C200
D300
dtype: int64
[1:6:2]
B100
D300
F500
dtype: int64
[:6]
A0
B100
C200
D300
E400
F500
dtype: int64
[4:]
E400
F500
G600
H700
I800
J900
dtype: int64
[:4:2]
A0
C200
dtype: int64
[4::2]
E400
G600
I800
dtype: int64
[::-1]
J900
I800
H700
G600
F500
E400
D300
C200
B100
A0
dtype: int64
8如何创建 DataFrameimport pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])
print(employees)
Output:
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician
9如何设置 DataFrame 的索引和列信息import pandas as pd
employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])
print(employees)
Output
NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624
10如何重命名 DataFrame 的列名称import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp00],
Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24])
employees.columns = [EmpCode, EmpName, EmpOccupation, EmpDOJ, EmpAge]
print(employees)
Output:
EmpCodeEmpName EmpOccupationEmpDOJEmpAge
0232018-01-25Emp001John DoeChemist
1242018-01-26Emp00William SparkStatistician
11如何根据 Pandas 列中的值从 DataFrame 中选择或过滤行import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print("\\nUse == operator\\n")
print(employees.loc[employees[Age] == 23])
print("\\nUse <
operator\\n")
print(employees.loc[employees[Age] <
30])
print("\\nUse != operator\\n")
print(employees.loc[employees[Occupation] != Statistician])
print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] != Statistician) &
(employees[Name] == John)])
Output:
Use == operator
Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist
Use <
operator
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
3292018-02-26Emp004SparkStatistician
Use != operator
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer
Multiple Conditions
Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist
12在 DataFrame 中使用“isin”过滤多行import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print("\\nUse isin operator\\n")
print(employees.loc[employees[Occupation].isin([Chemist,Programmer])])
print("\\nMultiple Conditions\\n")
print(employees.loc[(employees[Occupation] == Chemist) |
(employees[Name] == John) &
(employees[Age] <
30)])
Output:
Use isin operator
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
4402018-03-16Emp005MarkProgrammer
Multiple Conditions
Age Date Of Join EmpCodeName Occupation
0232018-01-25Emp001JohnChemist
13迭代 DataFrame 的行和列import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print("\\n Example iterrows \\n")
for index, col in employees.iterrows():
print(col[Name], "--", col[Age])
print("\\n Example itertuples \\n")
for row in employees.itertuples(index=True, name=Pandas):
print(getattr(row, "Name"), "--", getattr(row, "Age"))
Output:
Example iterrows
John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40
Example itertuples
John -- 23
Doe -- 24
William -- 34
Spark -- 29
Mark -- 40
14如何通过名称或索引删除 DataFrame 的列import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print(employees)
print("\\n Drop Column by Name \\n")
employees.drop(Age, axis=1, inplace=True)
print(employees)
print("\\n Drop Column by Index \\n")
employees.drop(employees.columns[[0,1]], axis=1, inplace=True)
print(employees)
Output:
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer
Drop Column by Name
Date Of Join EmpCodeNameOccupation
02018-01-25Emp001JohnChemist
12018-01-26Emp002DoeStatistician
22018-01-26Emp003WilliamStatistician
32018-02-26Emp004SparkStatistician
42018-03-16Emp005MarkProgrammer
Drop Column by Index
NameOccupation
0JohnChemist
1DoeStatistician
2WilliamStatistician
3SparkStatistician
4MarkProgrammer
15向 DataFrame 中新增列import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
employees[City] = [London, Tokyo, Sydney, London, Toronto]
print(employees)
Output:
Age Date Of Join EmpCodeNameOccupationCity
0232018-01-25Emp001JohnChemistLondon
1242018-01-26Emp002DoeStatisticianTokyo
2342018-01-26Emp003WilliamStatisticianSydney
3292018-02-26Emp004SparkStatisticianLondon
4402018-03-16Emp005MarkProgrammerToronto
16如何从 DataFrame 中获取列标题列表import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print(list(employees))
print(list(employees.columns.values))
print(employees.columns.tolist())
Output:
[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]
[Age, Date Of Join, EmpCode, Name, Occupation]
17如何随机生成 DataFrameimport pandas as pd
import numpy as np
np.random.seed(5)
df_random = pd.DataFrame(np.random.randint(100, size=(10, 6)),
columns=list(ABCDEF),
index=[Row-.format(i) for i in range(10)])
print(df_random)
Output:
ABCDEF
Row-099786116738
Row-162273080776
Row-2155380274477
Row-3756547308486
Row-41894162182
Row-51678558080
Row-64365127312
Row-768388319187
Row-8306211676555
Row-939178272933
18如何选择 DataFrame 的多个列import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
df = employees[[EmpCode, Age, Name]]
print(df)
Output:
EmpCodeAgeName
0Emp00123John
1Emp00224Doe
2Emp00334William
3Emp00429Spark
4Emp00540Mark
19如何将字典转换为 DataFrameimport pandas as pd
data = https://www.songbingjia.com/android/(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
)
print(data)
df = pd.DataFrame(data)
print(df)
Output:
Height: [165, 70, 120, 80, 180, 172, 150], Food: [Steak, Lamb, Mango,
Apple, Cheese, Melon, Beans], Age: [30, 20, 22, 40, 32, 28, 39], Sco
re: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], Color: [Blue, Green, Red, Whi
te, Gray, Black, Red], State: [NY, TX, FL, AL, AK, TX, TX
]
AgeColorFoodHeightScore State
030BlueSteak1654.6NY
120GreenLamb708.3TX
222RedMango1209.0FL
340WhiteApple803.3AL
432GrayCheese1801.8AK
528BlackMelon1729.5TX
639RedBeans1502.2TX
20使用 ioc 进行切片import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n -- Selecting a single row with .loc with a string -- \\n")
print(df.loc[Penelope])
print("\\n -- Selecting multiple rows with .loc with a list of strings -- \\n")
print(df.loc[[Cornelia, Jane, Dean]])
print("\\n -- Selecting multiple rows with .loc with slice notation -- \\n")
print(df.loc[Aaron:Dean])
Output:
-- Selecting a single row with .loc with a string --
Age40
ColorWhite
FoodApple
Height80
Score3.3
StateAL
Name: Penelope, dtype: object
-- Selecting multiple rows with .loc with a list of strings --
Age ColorFoodHeightScore State
Cornelia39RedBeans1502.2TX
Jane30BlueSteak1654.6NY
Dean32GrayCheese1801.8AK
-- Selecting multiple rows with .loc with slice notation --
AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Penelope40WhiteApple803.3AL
Dean32GrayCheese1801.8AK
21检查 DataFrame 中是否是空的import pandas as pd
df = pd.DataFrame()
if df.empty:
print(DataFrame is empty!)
Output:
DataFrame is empty!
22在创建 DataFrame 时指定索引和列名称import pandas as pd
values = ["India", "Canada", "Australia",
"Japan", "Germany", "France"]
code = ["IND", "CAN", "AUS", "JAP", "GER", "FRA"]
df = pd.DataFrame(values, index=code, columns=[Country])
print(df)
Output:
Country
INDIndia
CANCanada
AUSAustralia
JAPJapan
GERGermany
FRAFrance
23使用 iloc 进行切片import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n -- Selecting a single row with .iloc with an integer -- \\n")
print(df.iloc[4])
print("\\n -- Selecting multiple rows with .iloc with a list of integers -- \\n")
print(df.iloc[[2, -2]])
print("\\n -- Selecting multiple rows with .iloc with slice notation -- \\n")
print(df.iloc[:5:3])
Output:
-- Selecting a single row with .iloc with an integer --
Age32
ColorGray
FoodCheese
Height180
Score1.8
StateAK
Name: Dean, dtype: object
-- Selecting multiple rows with .iloc with a list of integers --
AgeColorFoodHeightScore State
Aaron22RedMango1209.0FL
Christina28BlackMelon1729.5TX
-- Selecting multiple rows with .iloc with slice notation --
AgeColorFoodHeightScore State
Jane30BlueSteak1654.6NY
Penelope40WhiteApple803.3AL
24iloc 和 loc 的区别
import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n -- loc -- \\n")
print(df.loc[df[Age] <
30, [Color, Height]])
print("\\n -- iloc -- \\n")
print(df.iloc[(df[Age] <
30).values, [1, 3]])
Output:
-- loc --
ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172
-- iloc --
ColorHeight
NickGreen70
AaronRed120
ChristinaBlack172
25使用时间索引创建空 DataFrameimport datetime
import pandas as pd
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date, periods=10, freq=D)
columns = [A, B, C]
df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0)
print(df)
Output:
ABC
2018-09-30000
2018-10-01000
2018-10-02000
2018-10-03000
2018-10-04000
2018-10-05000
2018-10-06000
2018-10-07000
2018-10-08000
2018-10-09000
26如何改变 DataFrame 列的排序import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n -- Change order using columns -- \\n")
new_order = [3, 2, 1, 4, 5, 0]
df = df[df.columns[new_order]]
print(df)
print("\\n -- Change order using reindex -- \\n")
df = df.reindex([State, Color, Age, Food, Score, Height], axis=1)
print(df)
Output:
-- Change order using columns --
HeightFoodColorScore StateAge
Jane165SteakBlue4.6NY30
Nick70LambGreen8.3TX20
Aaron120MangoRed9.0FL22
Penelope80AppleWhite3.3AL40
Dean180CheeseGray1.8AK32
Christina172MelonBlack9.5TX28
Cornelia150BeansRed2.2TX39
-- Change order using reindex --
StateColorAgeFoodScoreHeight
JaneNYBlue30Steak4.6165
NickTXGreen20Lamb8.370
AaronFLRed22Mango9.0120
PenelopeALWhite40Apple3.380
DeanAKGray32Cheese1.8180
ChristinaTXBlack28Melon9.5172
CorneliaTXRed39Beans2.2150
27检查 DataFrame 列的数据类型import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print(df.dtypes)
Output:
Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object
28更改 DataFrame 指定列的数据类型import pandas as pd
df = pd.DataFrame(Age: [30, 20, 22, 40, 32, 28, 39],
Color: [Blue, Green, Red, White, Gray, Black,
Red],
Food: [Steak, Lamb, Mango, Apple, Cheese,
Melon, Beans],
Height: [165, 70, 120, 80, 180, 172, 150],
Score: [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print(df.dtypes)
df[Age] = df[Age].astype(str)
print(df.dtypes)
Output:
Ageint64
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object
Ageobject
Colorobject
Foodobject
Heightint64
Scorefloat64
Stateobject
dtype: object
29如何将列的数据类型转换为 DateTime 类型import pandas as pd
df = pd.DataFrame(DateOFBirth: [1349720105, 1349806505, 1349892905,
1349979305, 1350065705, 1349792905,
1349730105],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)
df[DateOFBirth] = pd.to_datetime(df[DateOFBirth], unit=s)
print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)
Output:
----------------Before---------------
DateOFBirthint64
Stateobject
dtype: object
DateOFBirth State
Jane1349720105NY
Nick1349806505TX
Aaron1349892905FL
Penelope1349979305AL
Dean1350065705AK
Christina1349792905TX
Cornelia1349730105TX
----------------After----------------
DateOFBirthdatetime64[ns]
Stateobject
dtype: object
DateOFBirth State
Jane2012-10-08 18:15:05NY
Nick2012-10-09 18:15:05TX
Aaron2012-10-10 18:15:05FL
Penelope2012-10-11 18:15:05AL
Dean2012-10-12 18:15:05AK
Christina 2012-10-09 14:28:25TX
Cornelia2012-10-08 21:01:45TX
30将 DataFrame 列从 floats 转为 intsimport pandas as pd
df = pd.DataFrame(DailyExp: [75.7, 56.69, 55.69, 96.5, 84.9, 110.5,
58.9],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n----------------Before---------------\\n")
print(df.dtypes)
print(df)
df[DailyExp] = df[DailyExp].astype(int)
print("\\n----------------After----------------\\n")
print(df.dtypes)
print(df)
Output:
----------------Before---------------
DailyExpfloat64
Stateobject
dtype: object
DailyExp State
Jane75.70NY
Nick56.69TX
Aaron55.69FL
Penelope96.50AL
Dean84.90AK
Christina110.50TX
Cornelia58.90TX
----------------After----------------
DailyExpint32
Stateobject
dtype: object
DailyExp State
Jane75NY
Nick56TX
Aaron55FL
Penelope96AL
Dean84AK
Christina110TX
Cornelia58TX
31如何把 dates 列转换为 DateTime 类型import pandas as pd
df = pd.DataFrame(DateOfBirth: [1986-11-11, 1999-05-12, 1976-01-01,
1986-06-01, 1983-06-04, 1990-03-07,
1999-07-09],
State: [NY, TX, FL, AL, AK, TX, TX]
,
index=[Jane, Nick, Aaron, Penelope, Dean,
Christina, Cornelia])
print("\\n----------------Before---------------\\n")
print(df.dtypes)
df[DateOfBirth] = df[DateOfBirth].astype(datetime64)
print("\\n----------------After----------------\\n")
print(df.dtypes)
Output:
----------------Before---------------
DateOfBirthobject
Stateobject
dtype: object
----------------After----------------
DateOfBirthdatetime64[ns]
Stateobject
dtype: object
32两个 DataFrame 相加import pandas as pd
df1 = pd.DataFrame(Age: [30, 20, 22, 40], Height: [165, 70, 120, 80],
Score: [4.6, 8.3, 9.0, 3.3], State: [NY, TX,
FL, AL],
index=[Jane, Nick, Aaron, Penelope])
df2 = pd.DataFrame(Age: [32, 28, 39], Color: [Gray, Black, Red],
Food: [Cheese, Melon, Beans],
Score: [1.8, 9.5, 2.2], State: [AK, TX, TX],
index=[Dean, Christina, Cornelia])
df3 = df1.append(df2, sort=True)
print(df3)
Output:
AgeColorFoodHeightScore State
Jane30NaNNaN165.04.6NY
Nick20NaNNaN70.08.3TX
Aaron22NaNNaN120.09.0FL
Penelope40NaNNaN80.03.3AL
Dean32GrayCheeseNaN1.8AK
Christina28BlackMelonNaN9.5TX
Cornelia39RedBeansNaN2.2TX
33在 DataFrame 末尾添加额外的行import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp001, Emp002, Emp003, Emp004, Emp005],
Name: [John, Doe, William, Spark, Mark],
Occupation: [Chemist, Statistician, Statistician,
Statistician, Programmer],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26, 2018-02-26,
2018-03-16],
Age: [23, 24, 34, 29, 40])
print("\\n------------ BEFORE ----------------\\n")
print(employees)
employees.loc[len(employees)] = [45, 2018-01-25, Emp006, Sunny,
Programmer]
print("\\n------------ AFTER ----------------\\n")
print(employees)
Output:
------------ BEFORE ----------------
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer
------------ AFTER ----------------
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp001JohnChemist
1242018-01-26Emp002DoeStatistician
2342018-01-26Emp003WilliamStatistician
3292018-02-26Emp004SparkStatistician
4402018-03-16Emp005MarkProgrammer
5452018-01-25Emp006SunnyProgrammer
34为指定索引添加新行import pandas as pd
employees = pd.DataFrame(
data=https://www.songbingjia.com/android/Name: [John Doe, William Spark],
Occupation: [Chemist, Statistician],
Date Of Join: [2018-01-25, 2018-01-26],
Age: [23, 24],
index=[Emp001, Emp002],
columns=[Name, Occupation, Date Of Join, Age])
print("\\n------------ BEFORE ----------------\\n")
print(employees)
employees.loc[Emp003] = [Sunny, Programmer, 2018-01-25, 45]
print("\\n------------ AFTER ----------------\\n")
print(employees)
Output:
------------ BEFORE ----------------
NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624
------------ AFTER ----------------
NameOccupation Date Of JoinAge
Emp001John DoeChemist2018-01-2523
Emp002William SparkStatistician2018-01-2624
Emp003SunnyProgrammer2018-01-2545
35如何使用 for 循环添加行import pandas as pd
cols = [Zip]
lst = []
zip = 32100
for a in range(10):
lst.append([zip])
zip = zip + 1
df = pd.DataFrame(lst, columns=cols)
print(df)
Output:
Zip
032100
132101
232102
332103
432104
532105
632106
732107
832108
932109
36在 DataFrame 顶部添加一行import pandas as pd
employees = pd.DataFrame(
EmpCode: [Emp002, Emp003, Emp004],
Name: [John, Doe, William],
Occupation: [Chemist, Statistician, Statistician],
Date Of Join: [2018-01-25, 2018-01-26, 2018-01-26],
Age: [23, 24, 34])
print("\\n------------ BEFORE ----------------\\n")
print(employees)
# New line
line = pd.DataFrame(Name: Dean, Age: 45, EmpCode: Emp001,
Date Of Join: 2018-02-26, Occupation: Chemist
, index=[0])
# Concatenate two dataframe
employees = pd.concat([line,employees.ix[:]]).reset_index(drop=True)
print("\\n------------ AFTER ----------------\\n")
print(employees)
Output:
------------ BEFORE ----------------
Age Date Of Join EmpCodeNameOccupation
0232018-01-25Emp002JohnChemist
1242018-01-26Emp003DoeStatistician
2342018-01-26Emp004WilliamStatistician
------------ AFTER ----------------
Age Date Of Join EmpCodeNameOccupation
0452018-02-26Emp001DeanChemist
1232018-01-25Emp002JohnChemist
2242018-01-26Emp003DoeStatistician
3342018-01-26Emp004WilliamStatistician
37如何向 DataFrame 中动态添加行import pandas as pd
df = pd.DataFrame(columns=[Name, Age])
df.loc[1, Name] = Rocky
df.loc[1, Age] = 23
df.loc[2, Name] = Sunny
print(df)
Output:
NameAge
1Rocky23
2SunnyNaN
38在任意位置插入行import pandas as pd
df = pd.DataFrame(columns=[Name, Age])
df.loc[1, Name] = Rocky
df.loc[1, Age] = 21
df.loc[2, Name] = Sunny
df.loc[2, Age] = 22
df.loc[3, Name] = Mark
df.loc[3, Age] = 25
df.loc[4, Name] = Taylor
df.loc[4, Age] = 28
print("\\n------------ BEFORE ----------------\\n")
print(df)
line = pd.DataFrame("Name": "Jack", "Age": 24, index=[2.5])
df = df.append(line, ignore_index=False)
df = df.sort_index().reset_index(drop=True)
df = df.reindex([Name, Age], axis=1)
print("\\n------------ AFTER ----------------\\n")
print(df)
【100 个 pandas 案例,强烈建议收藏!】Output:
------------ BEFORE ----------------
Name Age
1Rocky21
2Sunny22
3Mark25
4Taylor28
------------ AFTER ----------------
Name Age
0Rocky21
1Sunny22
2Jack24
3Mark25
4Taylor28
39使用时间戳索引向 DataFrame 中添加行import pandas as pd
df =
推荐阅读
- SQL编程题练习题(基础)#yyds干货盘点#
- MySQL—— 数据库操作基础 和 常用语法(DDL,DML,DQL,DCL)
- 基于esbuild的universal bundler设计
- #yyds干货盘点#Prometheus 之告警的艺术
- U盘设置只读模式图文详细教程分享
- 为啥U盘显示0字节?
- U盘不显示卷标怎样办?
- 电脑插入U盘蓝屏了怎样办?
- u盘出现copy.exe失去怎样处理?