linux学习|shell中按列值统计行数及去除重复值

测试文件在系统文件中的shell文件夹中的test.log文件,内容为:

abcdefg higklmn 12345 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 12345 fuck! afdsaff adfgaga 63542 fdasg sdfasfd sdafadf 12345 asdga jfaldjf sdfasfs 63542 sdfad abcddfg higdfmn 12345 fuck! jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad abcdefg higklmn 67890 fuck! afdsaff adfgaga 63542 fdasg sdfasfd sdafadf 67890 asdga jfaldjf sdfasfs 67890 sdfad abcddfg higdfmn 63542 fuck! afdscff adfgada 67890 fdasg sdfagfd sdavadf 67890 asdga jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad afdscff adfgada 12345 fdasg sdfagfd sdavadf 12345 asdga


1:首先查看日志文件:

[root@master ~]# cat /shell/test.log | sort -n abcddfg higdfmn 12345 fuck! abcddfg higdfmn 63542 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 67890 fuck! afdsaff adfgaga 63542 fdasg afdsaff adfgaga 63542 fdasg afdscff adfgada 12345 fdasg afdscff adfgada 67890 fdasg jfaldjf sdfasfs 63542 sdfad jfaldjf sdfasfs 67890 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad jfalsjf sdf4sfs 67890 sdfad sdfagfd sdavadf 12345 asdga sdfagfd sdavadf 67890 asdga sdfasfd sdafadf 12345 asdga sdfasfd sdafadf 67890 asdga


2:按照第三列的值的不同,统计各个值出现的行数,结果如下:
使用awk命令:awk '{a[$3]++}END{for i in a}print i,a[i]}' /shell/test.log

[root@master ~]# awk '{a[$3]++}END{for(i in a)print i,a[i]}' /shell/test.log 63542 4 67890 9 12345 11

3:查看某列中有几种不同的数值,输出:
awk '{if(!a[$3]++) print $3}' /shell/test.log

[root@master ~]# awk '{if(!a[$3]++) print $3}' /shell/test.log 12345 63542 67890


4:查看某列中不同值的个数,并输出第一次在此列中出现的值的行:
awk ‘{if(!($3 in a)){a[$3]; print}}’ /shell/test.log

[root@master ~]# awk '{if(!($3 in a)){a[$3]; print}}' /shell/test.log abcdefg higklmn 12345 fuck! afdsaff adfgaga 63542 fdasg abcdefg higklmn 67890 fuck!


5:uniq命令是去掉重复行,不过只能去掉相邻的重复行。
[root@master ~]# uniq /shell/test.log | wc -l 16



[root@master ~]# uniq /shell/test.log | sort -n abcddfg higdfmn 12345 fuck! abcddfg higdfmn 63542 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 67890 fuck! afdsaff adfgaga 63542 fdasg afdsaff adfgaga 63542 fdasg afdscff adfgada 12345 fdasg afdscff adfgada 67890 fdasg jfaldjf sdfasfs 63542 sdfad jfaldjf sdfasfs 67890 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 67890 sdfad sdfagfd sdavadf 12345 asdga sdfagfd sdavadf 67890 asdga sdfasfd sdafadf 12345 asdga sdfasfd sdafadf 67890 asdga


里面有重复值没有被完全去除

6:awk脚本中可以完全去掉重复行:

[root@master ~]# awk '{if(!(a[$0]++)){a[$0]; print}}' /shell/test.log | wc -l 15

[root@master ~]# awk '{if(!(a[$0]++)){a[$0]; print}}' /shell/test.log | sort -n abcddfg higdfmn 12345 fuck! abcddfg higdfmn 63542 fuck! abcdefg higklmn 12345 fuck! abcdefg higklmn 67890 fuck! afdsaff adfgaga 63542 fdasg afdscff adfgada 12345 fdasg afdscff adfgada 67890 fdasg jfaldjf sdfasfs 63542 sdfad jfaldjf sdfasfs 67890 sdfad jfalsjf sdf4sfs 12345 sdfad jfalsjf sdf4sfs 67890 sdfad sdfagfd sdavadf 12345 asdga sdfagfd sdavadf 67890 asdga sdfasfd sdafadf 12345 asdga sdfasfd sdafadf 67890 asdga

完全去除重复值
【linux学习|shell中按列值统计行数及去除重复值】通过结果可以看出,uniq命令得到16行,awk命令得到15行,上面两行重复值在此处已经去除了。

    推荐阅读