和上文提到的awk一样,sed也是Unix的文本处理工具。sed是Stream Editor(流式编辑器)的缩写,它能够基于模式匹配过滤(所谓过滤就是在文件中找到符合某些条件的行)修改文本(就是对找到的符合条件的内容进行一些修改操作)。
1、sed命令格式
1.1 sed命令的基本格式
sed命令主要有三种使用形式:
- sed ‘编辑指令’ 文件1 文件2 ……:用于将处理后的结果输出
- sed -n ‘编辑指令’ 文件1 文件2 ……:用于只输出编辑指令影响的行
- sed -i ‘编辑指令’ 文件1 文件2 ……:用于直接在文本文件上修改文本内容(在物理磁盘上修改文件)
编辑指令主要由两部分组成:前面是逗号隔开的两个地址(或者没有逗号,只有一个地址),代表要处理文本的起始位置到结束位置;后面是要进行的操作类型。格式如下:
[起始地址[,结束地址]]操作类型
如果在一条sed命令中要用到多条编辑指令,那么各个编辑指令之间要用; 隔开,也可以将多条编辑指令放在多个单引号中,但是这样的话,要在每个单引号的前面加一个-e。下面是一个简单的例子:
$cat sed_test.txt
1 apple a,b,d,f
2 boy alsdjf,apple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139$sed -n '2,5p' sed_test.txt
2 boy alsdjf,apple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139$sed -n '5p' sed_test.txt
5 eat http://blog.csdn.net/xia7139$sed -n -e '2p' -e'5p' sed_test.txt
2 boy alsdjf,apple,kdjf
5 eat http://blog.csdn.net/xia7139
1.3 操作类型 sed常用的操作类型如下:
操作 | 作用 |
p | 打印文本行(print) |
n | 取下一行(next) |
d | 删除(delete) |
s | 字符串替换(substitude) |
a | 追加新的文本(append) |
使用正则表达式:
(1)输出从第一个包含kdjf的行到最后一行($代表最后一行)
$sed -n '/kdjf/,$p' sed_test.txt
2 boy alsdjf,appleapple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139
(2)输出包含单词apple的行
(shell中单词是指一个字符串前后有空格或其它标点符号。正则表达式中用一个<>来界定一个单词,在sed中用该符号要进行转义。)
$sed -n '/\/p' sed_test.txt
1 apple a,b,d,f删除指定行(这里没有-i,不对原文件进行操作,只是将处理后的结果输出。):
(1)删除第2到4行
$sed '2,4d' sed_test.txt
1 apple a,b,d,f
5 eat http://blog.csdn.net/xia7139
(2)删除包含appleapple的行和最后一行($)
$sed '/appleapple/d;
$d' sed_test.txt
1 apple a,b,d,f
3 cat 163.2.201.1
4 dog www.google.com
(3)删除不包含(!表示反选,选中不符合条件的行)apple的行(这样就只剩下了包含apple的行了)
$sed '/apple/!d' sed_test.txt
1 apple a,b,d,f
2 boy alsdjf,appleapple,kdjf替换指定文本:
(1)将1-4行的apple换成AMAZON。s代表替换,g代表如果一行出现两个apple则全部替换。
$sed '1,4s/apple/AMAZON/g' sed_test.txt
1 AMAZON a,b,d,f
2 boy alsdjf,AMAZONAMAZON,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139
(2)注释shell脚本(在行首插入#)
$sed '1,3s/^/#/g' sed_test.txt
#1 apple a,b,d,f
#2 boy alsdjf,appleapple,kdjf
#3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139
(3)删除字符串apple(如果不写起始地址和结束地址,则默认为所有行。)
$sed 's/apple//g' sed_test.txt
1a,b,d,f
2 boy alsdjf,,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.csdn.net/xia7139
以上两篇文章介绍了Unix两个文本处理工具awk和sed,希望对大家有所帮助。
3、sed和正则表达式
利用正则表达式结合sed能极大地帮助我们处理文本。比如下面的例子:
例1:正则表达式初步使用。
$ cat poem.txt
The choice
By William Butler Yeats
The intellect of man is forced to choose
Perfection of life ,or of the work,
And if take the second must refuse
A heavenly mansion ,raging in the dark.
When all that story 's finished ,what's the news?
In luck or out the toil has left its mark:
That old perplexity an empty purse,
Or the day's vanity ,the night's remorse.
(1)用命令删掉文本行首的空格。
$ sed 's/^\s*//g' poem.txt
The choice
By William Butler Yeats
The intellect of man is forced to choose
Perfection of life ,or of the work,
And if take the second must refuse
A heavenly mansion ,raging in the dark.
When all that story 's finished ,what's the news?
In luck or out the toil has left its mark:
That old perplexity an empty purse,
Or the day's vanity ,the night's remorse.
也可以这样(注意,这里+是要被转义的,而上面的*不用转义。):
$ sed 's/^\s\+//g' poem.txt
The choice
By William Butler Yeats
The intellect of man is forced to choose
Perfection of life ,or of the work,
And if take the second must refuse
A heavenly mansion ,raging in the dark.
When all that story 's finished ,what's the news?
In luck or out the toil has left its mark:
That old perplexity an empty purse,
Or the day's vanity ,the night's remorse.(2)删掉文中所有的空格
$ sed 's/\s*//g' poem.txt
Thechoice
ByWilliamButlerYeats
Theintellectofmanisforcedtochoose
Perfectionoflife,orofthework,
Andiftakethesecondmustrefuse
Aheavenlymansion,raginginthedark.
Whenallthatstory'sfinished,what'sthenews?
Inluckoroutthetoilhasleftitsmark:
Thatoldperplexityanemptypurse,
Ortheday'svanity,thenight'sremorse.
如下也可以达到同样的效果:
$ sed 's/\s\+//g' poem.txt
Thechoice
ByWilliamButlerYeats
Theintellectofmanisforcedtochoose
Perfectionoflife,orofthework,
Andiftakethesecondmustrefuse
Aheavenlymansion,raginginthedark.
Whenallthatstory'sfinished,what'sthenews?
Inluckoroutthetoilhasleftitsmark:
Thatoldperplexityanemptypurse,
Ortheday'svanity,thenight'sremorse.也可以使用如下命令实现相关功能:
$ sed 's/^[[:space:]]*//g' poem.txt(删除行开头的空格)
$ sed 's/^[ ]*//g' poem.txt(删除行开头的空格)
$ sed 's/^ *//g' poem.txt(删除行开头的空格)
$ sed 's/^[[:space:]]*//g' poem.txt(删除行开头的空格)
$ sed '/^$/d' poem.txt(删除空行)
$ sed '/^[ ]*$/d' poem.txt(删除空行和只有空格的行)
4、初步体会sed的威力
4.1 去掉不想要的标签
比如你有一个文件内容如下:
test.txt:
{'books/daglib/0015113': 'Scale-isometric polytopal graphs in hypercubes and Zn. - 锐客网 \n',
'books/daglib/0097705': 'Discrete total lp-norm approximation problem for the function. - 锐客网 \n',
'books/daglib/p/AveneauCFM11': 'A Framework for n-Dimensional Visibility Computations. - 锐客网 \n',
'books/daglib/p/Carter11': 'Using Dungeons and Dragons to Integrate Curricula in Classroom. - 锐客网 \n',
'books/daglib/p/CasolaLRV11': 'Access Control in Cloud-on-Grid Systems: The PerfCloud Case Study. - 锐客网 \n',
'books/daglib/p/ChunKZDMZ11': 'Reverse Engineer of Gene Networks with Application in silico Network. - 锐客网 \n',
'books/daglib/p/ChungK11': 'eQTL Mapping for Functional Classes of Saccharomyces cerevisiae Genes wssion. - 锐客网 \n',
'books/daglib/p/Goldman11': 'A Model for Computer Graphics Based on Algebra for \xe2\x84\x9d3. - 锐客网 \n',
'books/daglib/p/LiZ11': 'Line Geometry over \xe2\x84\x9d3, 3, and Stewart Platforms. - 锐客网 \n',
'books/daglib/p/Liestol11': 'Situated Simulations Between Reality and Designing a Narrative Space. - 锐客网 \n'}
现在你要将其中的各行中的类似于标签之类的东西去掉,只需用一条sed命令:
$sed -e 's///g;
s/<\/title>//g' -e 's///g;
s/<\/i>//g' -e 's///g;
s/<\/sub>//g' -e 's///g;
s/<\/sup>//g' test.txt
{'books/daglib/0015113': 'Scale-isometric polytopal graphs in hypercubes and Zn.\n',
'books/daglib/0097705': 'Discrete total lp-norm approximation problem for the function.\n',
'books/daglib/p/AveneauCFM11': 'A Framework for n-Dimensional Visibility Computations.\n',
'books/daglib/p/Carter11': 'Using Dungeons and Dragons to Integrate Curricula in Classroom.\n',
'books/daglib/p/CasolaLRV11': 'Access Control in Cloud-on-Grid Systems: The PerfCloud Case Study.\n',
'books/daglib/p/ChunKZDMZ11': 'Reverse Engineer of Gene Networks with Application in silico Network.\n',
'books/daglib/p/ChungK11': 'eQTL Mapping for Functional Classes of Saccharomyces cerevisiae Genes wssion.\n',
'books/daglib/p/Goldman11': 'A Model for Computer Graphics Based on Algebra for \xe2\x84\x9d3.\n',
'books/daglib/p/LiZ11': 'Line Geometry over \xe2\x84\x9d3, 3, and Stewart Platforms.\n',
'books/daglib/p/Liestol11': 'Situated Simulations Between Reality and Designing a Narrative Space.\n'}
如果要在原文件中修改,只需加-i参数。
4.2 多文件中的替换
temp目录下有t1.txt、t2.txt和t3.txt等三个文件,其内容分别如下:
$ cat t1.txt
StevenJobs mac
i like
mill fuck
$ cat t2.txt
youi
life
love
lol
StevenJobs
sex
friend
one night stand
for one night
$ cat t3.txt
good night
nigtmare
fuck
StevenJobs StevenJobs
StevenJobsStevenJobs
mac
下面想要将上面三个文件中的“StevenJobs”换成“Apple”,一一替换的话太麻烦,这里用一个稍微简单的方法,一条命令完成所有文件的替换。 预备知识
(1)反引号
反引号括起来的字符串被shell解释为命令行,在执行时,shell首先执行该命令行,并以它的标准输出结果取代整个反引号(包括两个反引号)部分。这样,可以实现用一个命令的执行输出作为另一条命令参数的结果。在bash shell中,$()也会有相同的效果,下面是一个例子。
$ echo `ls`
t1.txt t2.txt t3.txt
$ echo $(ls)
t1.txt t2.txt t3.txt
(2)grep命令的r和l选项 grep -r: Read all files under each directory, recursively。
也是就是说,会递归逐层向下查找目录和目录子目录下的文件,如果没有-r选项grep只会查找当前目录下的文件,不会查找子目录。
grep -l: Suppress normal output; instead print the name of each input file fromwhichoutputwouldnormallyhavebeenprinted.The scanning will stop on the first match.
这个选项的man手册读起来稍显晦涩,简单地说,-l选项只会打印出包含我们要查找内容的文件名称,下面是grep的例子。
$ grep -r "Steven" .
./t1.txt:StevenJobs mac
./t2.txt: StevenJobs
./t3.txt:StevenJobs StevenJobs
./t3.txt:StevenJobsStevenJobs
$ grep "Steven" .
$ grep -r "Steven" .
./t1.txt:StevenJobs mac
./t2.txt: StevenJobs
./t3.txt:StevenJobs StevenJobs
./t3.txt:StevenJobsStevenJobs
//下面的命令和上面的命令效果相同
$ grep"Steven" *
t1.txt:StevenJobs mac
t2.txt: StevenJobs
t3.txt:StevenJobs StevenJobs
t3.txt:StevenJobsStevenJobs
$ grep -rl "Steven" .
./t1.txt
./t2.txt
./t3.txt
用上上面的命令,不难得出下面的解决办法,效果如下: 【Unix文本处理工具之sed】
$ sed -i 's/StevenJobs/apple/g' `grep -rl StevenJobs .`
$ cat t*
apple mac
i like
mill fuck
youi
life
love
lol
apple
sex
friend
one night stand
for one night
good night
nigtmare
fuck
apple apple
appleapple
mac
>.< Over!
推荐阅读
- Unix/Linux环境C编程入门教程|Unix/Linux环境C编程入门教程 1 Solaris 11 64bit环境搭建
- TcpIp|利用TCP/IP堆栈进行远程操作系统判别的方法
- Security...|hping2的使用方法. ..后边再添加其他工具...
- unix|unix ls 命令
- unix|文件权限问题 unix
- unix|secureCRT
- UNIX/LINUX|C++ Epoll的封装
- linux的系统时钟和硬件时钟不一致问题
- 14|Linux下轻型文本编辑器Nano常用快捷键