本文翻译自:Combine two columns of text in dataframe in pandas/python
I have a 20 x 4000 dataframe in python using pandas. 我在Python中使用熊猫有20 x 4000数据帧。 Two of these columns are named Year and quarter. 这些列中的两个分别命名为Year和Quarter。 I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2 我想创建一个称为period的变量,使Year = 2000 and Quarter = q2变成2000q2
【python|在pandas / python中的数据框中合并两列文本】 Can anyone help with that? 有人可以帮忙吗?
#1楼
参考:https://stackoom.com/question/1JJ5t/在pandas-python中的数据框中合并两列文本
#2楼
dataframe["period"] = dataframe["Year"].map(str) + dataframe["quarter"]
#3楼
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['period'] = df[['Year', 'quarter']].apply(lambda x: ''.join(x), axis=1)
Yields this dataframe 产生此数据框
Year quarterperiod
02014q12014q1
12015q22015q2
This method generalizes to an arbitrary number of string columns by replacing
df[['Year', 'quarter']]
with any column slice of your dataframe, eg df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1)
. 通过将df[['Year', 'quarter']]
替换为数据帧的任何列切片,例如df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1)
。You can check more information about apply() method here 您可以在此处查看有关apply()方法的更多信息
#4楼
Although the @silvado answer is good if you change
df.map(str)
to df.astype(str)
it will be faster: 尽管如果将df.map(str)
更改为df.map(str)
, df.astype(str)
silvado答案很好,但它会更快:import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})In [131]: %timeit df["Year"].map(str)
10000 loops, best of 3: 132 us per loopIn [132]: %timeit df["Year"].astype(str)
10000 loops, best of 3: 82.2 us per loop
#5楼
The method
cat()
of the .str
accessor works really well for this: .str
访问器的cat()
方法对此非常有效:>>> import pandas as pd
>>> df = pd.DataFrame([["2014", "q1"],
...["2015", "q3"]],
...columns=('Year', 'Quarter'))
>>> print(df)
Year Quarter
02014q1
12015q3
>>> df['Period'] = df.Year.str.cat(df.Quarter)
>>> print(df)
Year QuarterPeriod
02014q12014q1
12015q32015q3
cat()
even allows you to add a separator so, for example, suppose you only have integers for year and period, you can do this: cat()
甚至允许您添加一个分隔符,因此,例如,假设年份和期间只有整数,则可以执行以下操作:>>> import pandas as pd
>>> df = pd.DataFrame([[2014, 1],
...[2015, 3]],
...columns=('Year', 'Quarter'))
>>> print(df)
Year Quarter
020141
120153
>>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
>>> print(df)
Year QuarterPeriod
0201412014q1
1201532015q3
Joining multiple columns is just a matter of passing either a list of series or a dataframe containing all but the first column as a parameter to
str.cat()
invoked on the first column (Series): 连接多列只是传递一系列列表或包含除第一列之外的所有列的数据str.cat()
作为在第一列(系列)上调用的str.cat()
的参数的问题:>>> df = pd.DataFrame(
...[['USA', 'Nevada', 'Las Vegas'],
...['Brazil', 'Pernambuco', 'Recife']],
...columns=['Country', 'State', 'City'],
... )
>>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
>>> print(df)
CountryStateCityAllTogether
0USANevadaLas VegasUSA - Nevada - Las Vegas
1BrazilPernambucoRecifeBrazil - Pernambuco - Recife
Do note that if your pandas dataframe/series has null values, you need to include the parameter na_rep to replace the NaN values with a string, otherwise the combined column will default to NaN. 请注意,如果您的pandas数据框/系列具有空值,则需要包括参数na_rep以用字符串替换NaN值,否则合并的列将默认为NaN。
#6楼
Use of a lamba function this time with string.format(). 这次通过string.format()使用lamba函数。
import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': ['q1', 'q2']})
print df
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
print dfQuarterYear
0q12014
1q22015
QuarterYear YearQuarter
0q120142014q1
1q220152015q2
This allows you to work with non-strings and reformat values as needed. 这使您可以根据需要使用非字符串并重新格式化值。
import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': [1, 2]})
print df.dtypes
print dfdf['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}q{}'.format(x[0],x[1]), axis=1)
print dfQuarterint64
Yearobject
dtype: object
QuarterYear
012014
122015
QuarterYear YearQuarter
0120142014q1
1220152015q2
推荐阅读
- python|论文画图神器!25个常用Matplotlib图的Python代码,收藏收藏!
- #|基于蒙特卡洛法的规模化电动汽车充电负荷预测(Python&Matlab实现)
- #|一场樱花雨(Python实现)
- Python数据分析|Seaborn可视化绘制散点图
- 算法|机器学习必学十大算法
- pytorch深度学习|pytorch中张量的维度变换,torch.squeeze()、torch.unsqueeze()函数
- 可视化|用可视化探索数据特征的N种姿势
- AIRX|数据科学家需要了解的15个Python库
- python|python pandas中DataFrame学习笔记