R中的DataFrame操作详细指南

数据框(DataFrame)是R的通用数据对象, 用于存储表格数据。数据框被认为是R编程中最流行的数据对象, 因为以表格形式分析数据更加方便。数据帧也可以讲授为床垫, 其中矩阵的每一列可以具有不同的数据类型。 DataFrame由三个主要部分组成, 即数据, 行和列。

R中的DataFrame操作详细指南

文章图片
可以在DataFrame上执行的操作是:
  • 创建一个DataFrame
  • 访问行和列
  • 选择数据框的子集
  • 编辑数据框
  • 向数据框添加额外的行和列
  • 根据现有变量向数据框添加新变量
  • 删除数据框中的行和列
创建一个DataFrame 在现实世界中, 将通过从现有存储中加载数据集来创建DataFrame, 存储可以是SQL数据库, CSV文件和Excel文件。也可以从R中的向量创建DataFrame。以下是可用于创建DataFrame的各种方法:
要创建数据帧,使用data.frame()命令,然后将创建的每个向量作为参数传递给函数。
例子:
# R program to illustrate dataframe# A vector which is a character vector Name = c( "Amiya" , "Raj" , "Asish" )# A vector which is a character vector Language = c( "R" , "Python" , "Java" )# A vector which is a numeric vector Age = c( 22 , 25 , 45 )# To create dataframe use data.frame command and # then pass each of the vectors # we have created as arguments # to the function data.frame() df = data.frame(Name, Language, Age)print (df)

输出如下:
NameLanguageAge 1 AmiyaR22 2RajPython25 3 AsishJava45

使用文件中的数据创建数据框:也可以通过从文件导入数据来创建数据框。为此, 你必须使用名为” read.table()‘。
语法如下:
newDF = read.table(path="Path of the file")

要从R中的CSV文件创建数据框, 请执行以下操作:
语法如下:
newDF = read.csv("FileName.csv")

访问行和列 下面给出了访问行和列的语法,
df[val1, val2]df = dataframe object val1 = rows of a data frame val2 = columns of a data frame

所以这 ‘ 值1‘和‘值2‘可以是值数组, 例如” 1:2″ 或” 2:3″ 等。如果仅指定df [val2]这仅指你需要从数据框中访问的一组列。
示例:行选择
# R program to illustrate operations # on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) print (df)# Accessing first and second row cat( "Accessing first and second row\n" ) print (df[ 1 : 2 , ])

输出如下:
Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45Accessing first and second row Name Language Age 1 AmiyaR22 2RajPython25

示例:列选择
# R program to illustrate operations # on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) print (df)# Accessing first and second column cat( "Accessing first and second column\n" ) print (df[, 1 : 2 ])

输出如下:
Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45Accessing first and second column Name Language 1 AmiyaR 2RajPython 3 AsishJava

选择数据框的子集 也可以借助以下语法, 根据某些条件创建DataFrame的子集。
newDF =子集(df, 条件)df =原始数据框条件=某些条件
例子:
# R program to illustrate operations # on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) print (df)# Selecting the subset of the data frame # where Name is equal to Amiya # OR age is greater than 30 newDf = subset(df, Name = = "Amiya" |Age> 30 )cat( "After Selecting the subset of the data frame\n" ) print (newDf)

输出如下:
Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45After Selecting the subset of the data frame Name Language Age 1 AmiyaR22 3 AsishJava45

编辑数据框 在R中, 可以通过两种方式编辑DataFrame:
通过直接分配编辑数据框:与R中的列表非常相似, 你可以通过直接分配来编辑数据帧。
例子:
# R program to illustrate operation on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) cat( "Before editing the dataframe\n" ) print (df)# Editing dataframes by direct assignments # [[3]] accesing the top level components # Here Age in this case # [[3]][3] accessing inner level componets # Here Age of Asish in this case df[[ 3 ]][ 3 ] = 30cat( "After edited the dataframe\n" ) print (df)

输出如下:
Before editing the data frame Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45After edited the data frame Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava30

使用来编辑数据框
编辑()
命令:
【R中的DataFrame操作详细指南】请按照给定的步骤编辑DataFrame:
第1步:因此, 你需要为此做的是创建一个数据框实例, 例如, 你可以看到此处使用命令创建了一个数据框实例并将其命名为” myTable” data.frame()这将创建一个空的数据框。
myTable = data.frame()
第2步:接下来, 我们将使用编辑功能启动查看器。请注意, ” myTable” 数据帧被传递回” myTable” 对象, 这样, 我们对此模块所做的更改将保存到原始对象。
myTable =编辑(myTable)
因此, 当执行以上命令时, 它将弹出一个这样的窗口,
R中的DataFrame操作详细指南

文章图片
第三步
:现在, 表格已包含此小表。
R中的DataFrame操作详细指南

文章图片
请注意, 通过单击变量名称并输入更改来更改变量名称。变量也可以设置为数字或字符。一旦DataFrame中的数据如上所示, 请关闭表。更改将自动保存。
步骤4:通过打印检查结果数据框。
> myTable
Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45

将行和列添加到数据框 添加额外的行:我们可以使用以下命令添加额外的行rbind()。语法如下所示,
newDF = rbind(df, 你必须添加的新行的条目)df =原始数据帧
请注意, 你必须添加的新行条目在使用时必须小心
rbind()
因为每个列条目中的数据类型应等于已经存在的行的数据类型。
例子:
# R program to illustrate operation on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) cat( "Before adding row\n" ) print (df)# Add a new row using rbind() newDf = rbind(df, data.frame(Name = "Sandeep" , Language = "C" , Age = 23 )) cat( "After Added a row\n" ) print (newDf)

输出如下:
Before adding row Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45After Added a row Name Language Age 1AmiyaR22 2RajPython25 3AsishJava45 4 SandeepC23

添加额外的列:我们可以使用以下命令添加额外的列cbind()。语法如下所示,
newDF = cbind(df, 你必须添加的新列的条目)df =原始数据帧
例子:
# R program to illustrate operation on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) cat( "Before adding column\n" ) print (df)# Add a new column using cbind() newDf = cbind(df, Rank = c( 3 , 5 , 1 ))cat( "After Added a column\n" ) print (newDf)

输出如下:
Before adding column Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45After Added a column Name Language Age Rank 1 AmiyaR223 2RajPython255 3 AsishJava451

向DataFrame添加新变量 在R中, 我们可以基于现有变量将新变量添加到数据框。为此, 我们必须先调用dplyr使用命令库图书馆() 。然后打电话mutate()函数将基于现有列添加额外的变量列。
语法如下:
library(dplyr)newDF = mutate(df, new_var = [existing_var])df =原始数据框new_var =新变量的名称existing_var =你要执行的修改操作(例如, 对数值乘以10)
例子:
# R program to illustrate operation on a data frame# Importing the dplyr library library(dplyr)# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) cat( "Original Dataframe\n" ) print (df)# Creating an extra variable column # "log_Age" which is log of variable column "Age" # Using mutate() command newDf = mutate(df, log_Age = log(Age))cat( "After creating extra variable column\n" ) print (newDf)

输出如下:
Original Dataframe Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45After creating extra variable column Name Language Agelog_Age 1 AmiyaR22 3.091042 2RajPython25 3.218876 3 AsishJava45 3.806662

从数据框中删除行和列 要删除行或列, 首先, 你需要访问该行或列, 然后在该行或列之前插入一个负号。它表明你必须删除该行或列。
语法如下:
newDF = df [-rowNo, -colNo] df =原始数据帧
例子:
# R program to illustrate operation on a data frame# Creating a dataframe df = data.frame( "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 ) ) cat( "Before deleting the 3rd row and 2nd column\n" ) print (df)# delete the third row and the second column newDF = df[ - 3 , - 2 ]cat( "After Deleted the 3rd row and 2nd column\n" ) print (newDF)

输出如下:
Before deleting the 3rd row and 2nd column Name Language Age 1 AmiyaR22 2RajPython25 3 AsishJava45 After Deleted the 3rd row and 2nd column Name Age 1 Amiya22 2Raj25

    推荐阅读