美国犯罪率练习-时间序列索引（to_datetime(), .resample(), .set_index(), .idxmax()） pandas基础练习

时间序列知识点

将Year列的数据类型转换为datetime64

crime.Year = pd.to_datetime(crime.Year, format='%Y') crime.info()

【美国犯罪率练习-时间序列索引（to_datetime(), .resample(), .set_index(), .idxmax()）】output：

RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
Year 55 non-null datetime64[ns]
Population 55 non-null int64
Total 55 non-null int64
Violent 55 non-null int64
Property 55 non-null int64
Murder 55 non-null int64
Forcible_Rape 55 non-null int64
Robbery 55 non-null int64
Aggravated_assault 55 non-null int64
Burglary 55 non-null int64
Larceny_Theft 55 non-null int64
Vehicle_Theft 55 non-null int64
dtypes: datetime64ns, int64(11)
memory usage: 5.2 KB

将Year列设置为index

#将Year列设置为index，并且自动删除掉原来的year列 crime = crime.set_index('Year', drop = True) crime.head()

以十年为时间间隔频率对原数据进行重采样
注意：观察数据知：population列的数据自身为累积量，不能简单的求和，而其他数据可以

#思路：先对整个dataframe进行重采样后求和，再单独对population列进行处理。 #10AS: 以十年为间隔的第一天进行重采样，再将各列数据相加 crimes = crime.resample('10AS').sum()

美国犯罪率练习-时间序列索引（to_datetime(), .resample(), .set_index(), .idxmax()）

文章图片

#对原数据集的population列进行相同的重采样过程，不同的是，求重采样后相应“分组”中的最大值（此处不太严谨，若人口出现负增长不对） #在新数据集中替换 population = crime['Population'].resample('10AS').max() crimes['Population'] = population

何时是美国历史上生存最危险的年代

crime.idxmax(0)

output:
Population 2014-01-01
Violent 1992-01-01
Property 1991-01-01
Murder 1991-01-01
Forcible_Rape 1992-01-01
Robbery 1991-01-01
Aggravated_assault 1993-01-01
Burglary 1980-01-01
Larceny_Theft 1991-01-01
Vehicle_Theft 1991-01-01
dtype: datetime64[ns]
除了population的数据，其他数据均表示相应指标最危险的年份
pd.idxmax(axis = 0, skipna = True)官方文档