DataFrame
DataFrame 表示矩阵数据表,有行索引和列索引。
构建方式
In [43]: data = https://www.it610.com/article/{'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
...:'year' : [2000, 2001, 2002, 2001, 2001, 2003],
...:'pop': [1.5, 1.7,3.6, 2.4, 2.9, 3.2]}In [44]: frame = pd.DataFrame(data)In [45]: frame
Out[45]:
stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20012.4
4Nevada20012.9
5Nevada20033.2
对于大型 DataFrame,head 方法只选出前5行
In [46]: frame.head()
Out[46]:
stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20012.4
4Nevada20012.9
指定顺序
In [47]: pd.DataFrame(data, columns=['year', 'state', 'pop'])
Out[47]:
yearstatepop
02000Ohio1.5
12001Ohio1.7
22002Ohio3.6
32001Nevada2.4
42001Nevada2.9
52003Nevada3.2
传的列不在字典中
In [49]: frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
...:index=['one', 'two', 'three', 'four', 'five', 'six'])In [50]: frame2
Out[50]:
yearstatepop debt
one2000Ohio1.5NaN
two2001Ohio1.7NaN
three2002Ohio3.6NaN
four2001Nevada2.4NaN
five2001Nevada2.9NaN
six2003Nevada3.2NaN
某一列可以按字典型标记或属性检索为 Series
In [51]: frame2['state']
Out[51]:
oneOhio
twoOhio
threeOhio
fourNevada
fiveNevada
sixNevada
Name: state, dtype: objectIn [52]: frame2.year
Out[52]:
one2000
two2001
three2002
four2001
five2001
six2003
Name: year, dtype: int64
行也可以通过位置或特殊属性 loc 进行选取
In [53]: frame2.loc['three']
Out[53]:
year2002
stateOhio
pop3.6
debtNaN
Name: three, dtype: object
列的引用是可以修改的
In [54]: frame2['debt'] = 16.5In [55]: frame2
Out[55]:
yearstatepopdebt
one2000Ohio1.516.5
two2001Ohio1.716.5
three2002Ohio3.616.5
four2001Nevada2.416.5
five2001Nevada2.916.5
six2003Nevada3.216.5
In [56]: frame2['debt'] = np.arange(6.)In [57]: frame2
Out[57]:
yearstatepopdebt
one2000Ohio1.50.0
two2001Ohio1.71.0
three2002Ohio3.62.0
four2001Nevada2.43.0
five2001Nevada2.94.0
six2003Nevada3.25.0
将Series赋值给一列
In [58]: val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])In [59]: frame2['debt'] = valIn [60]: frame2
Out[60]:
yearstatepopdebt
one2000Ohio1.5NaN
two2001Ohio1.7-1.2
three2002Ohio3.6NaN
four2001Nevada2.4-1.5
five2001Nevada2.9-1.7
six2003Nevada3.2NaN
del 删除某一列
In [61]: frame2['eastern'] = frame2.state == 'Ohio'In [62]: frame2
Out[62]:
yearstatepopdebteastern
one2000Ohio1.5NaNTrue
two2001Ohio1.7-1.2True
three2002Ohio3.6NaNTrue
four2001Nevada2.4-1.5False
five2001Nevada2.9-1.7False
six2003Nevada3.2NaNFalseIn [63]: del frame2['eastern']In [64]: frame2.columns
Out[64]: Index(['year', 'state', 'pop', 'debt'], dtype='object')
对Series的修改会映射到DaraFrame中,如果要复制,应显示使用Series的copy方法 另一种数据形式
In [65]: pop = {'Nevada': {2001: 2.4, 2002: 2.9},
...:'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}In [66]: frame3 = pd.DataFrame(pop)In [67]: frame3
Out[67]:
NevadaOhio
2000NaN1.5
20012.41.7
20022.93.6
调换行和列
In [68]: frame3.T
Out[68]:
200020012002
NevadaNaN2.42.9
Ohio1.51.73.6
如果显示指明索引,则内部的字典的键不会被排序
In [69]: pd.DataFrame(pop, index=[2001, 2002, 2003])
Out[69]:
NevadaOhio
20012.41.7
20022.93.6
2003NaNNaN
包含Series的字典也可以用于构造DataFrame
In [70]: pdata = https://www.it610.com/article/{'Ohio': frame3['Ohio'][: -1],
...:'Nevada': frame3['Nevada'][: 2]}In [71]: pd.DataFrame(pdata)
Out[71]:
OhioNevada
20001.5NaN
20011.72.4
索引和列拥有name属性
In [72]: frame3.index.name = 'year'In [73]: frame3.columns.name = 'state'In [74]: frame3
Out[74]:
stateNevadaOhio
year
2000NaN1.5
20012.41.7
20022.93.6
In [75]: frame3.values
Out[75]:
array([[nan, 1.5],
[2.4, 1.7],
[2.9, 3.6]])
自动选择适合所有列的类型
In [77]: frame2.values
Out[77]:
array([[2000, 'Ohio', 1.5, nan],
[2001, 'Ohio', 1.7, -1.2],
[2002, 'Ohio', 3.6, nan],
[2001, 'Nevada', 2.4, -1.5],
[2001, 'Nevada', 2.9, -1.7],
[2003, 'Nevada', 3.2, nan]], dtype=object)
索引对象 在构造Series或DataFrame时,使用的任意数组或标签序列都可以在内部转换为索引对象
In [78]: obj = pd.Series(range(3), index=['a', 'b', 'c'])In [79]: index = obj.indexIn [80]: index
Out[80]: Index(['a', 'b', 'c'], dtype='object')In [81]: index[1:]
Out[81]: Index(['b', 'c'], dtype='object')In [82]: index[1] = 'd'
---------------------------------------------------------------------------
TypeErrorTraceback (most recent call last)
in
----> 1 index[1] = 'd'c:\users\a\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
3881
3882def __setitem__(self, key, value):
-> 3883raise TypeError("Index does not support mutable operations")
3884
3885def __getitem__(self, key):TypeError: Index does not support mutable operationsIn [83]:In [83]: labels = pd.Index(np.arange(3))In [84]: labels
Out[84]: Int64Index([0, 1, 2], dtype='int64')In [85]: obj2 = pd.Series([1.5, -2.5, 0], index=labels)In [86]: obj2
Out[86]:
01.5
1-2.5
20.0
dtype: float64In [87]: obj2.index is labels
Out[87]: True
索引对象是不可变的
In [89]: frame3.columns
Out[89]: Index(['Nevada', 'Ohio'], dtype='object', name='state')In [90]: 'Ohio' in frame3.columns
Out[90]: TrueIn [91]: 2003 in frame3.columns
Out[91]: False
In [88]: frame3
Out[88]:
stateNevadaOhio
year
2000NaN1.5
20012.41.7
20022.93.6In [89]: frame3.columns
Out[89]: Index(['Nevada', 'Ohio'], dtype='object', name='state')In [90]: 'Ohio' in frame3.columns
Out[90]: TrueIn [91]: 2003 in frame3.columns
Out[91]: FalseIn [92]: dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])In [93]: dup_labels
Out[93]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
推荐阅读
- LeetCode|LeetCode 每日一题 [52] 表示数值的字符串
- 如今,年味是越来越淡了,不少农民表示春节缺了4样东西(你咋看)
- pandas使用
- 火币集团创始人兼CEO李林表示(区块链赋能实体经济有几大问题要去解决)
- R语言从入门到机器学习|R语言rename重命名dataframe的列名实战:rename重命名dataframe的列名(写错的列名不会被重命名)
- 可重入锁
- 第8-1节表示|第8-1节表示 (Representation)|机器学习速成课程
- 脾气不好能活到今天(应采儿的回答很亲妈,网友却表示心疼小小春)
- 跟诸子学游戏|跟诸子学游戏 群组算法
- 有些事情我不理解未必表示我无知