uniq - report or omit repeated lines
sort -r -tuniq -r -c
uniq的作用: 去除相邻重复行
[root@n1 data]# cat ip.txt10.0.0.910.0.0.810.0.0.710.0.0.710.0.0.810.0.0.810.0.0.9[root@n1 data]# uniq ip.txt10.0.0.910.0.0.810.0.0.710.0.0.810.0.0.9
sort作用: 让通的行相邻
- 让相同的行相邻[root@n1 data]# sort ip.txt10.0.0.710.0.0.710.0.0.810.0.0.810.0.0.810.0.0.910.0.0.9- 去掉相邻重复的行: 方法1[root@n1 data]# sort ip.txt |uniq10.0.0.710.0.0.810.0.0.9- 方法2:[root@n1 data]# sort -u ip.txt10.0.0.710.0.0.810.0.0.9
去重+统计次数
[root@n1 data]# sort ip.txt |uniq -c2 10.0.0.73 10.0.0.82 10.0.0.9
题目:[百度搜狐面试题] 统计url出现次数
maotai.loghttp://www.maotai.com/index.htmlhttp://www.maotai.com/1.htmlhttp://post.maotai.com/index.htmlhttp://mp3.maotai.com/3.htmlhttp://www.maotai.com/1.htmlhttp://post.maotai.com/2.html
- 过滤url[root@n1 data]# awk -F / '{print $3}' url.txtwww.maotai.comwww.maotai.compost.maotai.commp3.maotai.comwww.maotai.compost.maotai.com - sourt+uniq降序排列[root@n1 data]# awk -F / '{print $3}' url.txt|sort|uniq -c1 mp3.maotai.com2 post.maotai.com3 www.maotai.com
- 降序排序:方法1: awk[root@n1 data]# awk -F / '{print $3}' url.txt|sort|uniq -c|sort -r3 www.maotai.com2 post.maotai.com1 mp3.maotai.com方法2: cut[root@n1 data]# cut -d / -f3 url.txt |sort|uniq -c|sort -r3 www.maotai.com2 post.maotai.com1 mp3.maotai.com优化:[root@n1 data]# cut -d / -f3 url.txt |sort -r|uniq -c3 www.maotai.com2 post.maotai.com1 mp3.maotai.com
对第二列排序
sort -t 分隔符, 类似awk的-F,取字段用$1 $2或cut的-d,取字段f数字. –k 第几列[root@n1 test]# cat ip.txt10.0.0.9 o10.0.0.9 a10.0.0.8 z10.0.0.8 k10.0.0.8 c10.0.0.7 n10.0.0.7 f[root@n1 test]# sort -t " " -k2 ip.txt10.0.0.9 a10.0.0.8 c10.0.0.7 f10.0.0.8 k10.0.0.7 n10.0.0.9 o10.0.0.8 z注: 分隔符默认是空格,因此 –t 可以省略[root@n1 test]# sort -k2 ip.txt[root@n1 test]# sort -rk2 ip.txt #倒序排列
sort –runtk -r --reverse 倒序 –u --unique 去重 –n --numeric-sort 按数字排序 -t --field-separator=SEP 分隔 –k --key=KEYDEF 通过key排序uniq –c --count
题目:要求对ip的第三列降序排序,如果第三列相同,那就第四列按照降序排序.
[root@n1 test]# cat arp.txt192.168.0.3 00:e0:4c:41:d2:a5192.168.2.2 00:e0:4c:41:d1:7d192.168.3.7 00:50:bf:11:94:60192.168.3.5 00:e0:4c:43:a3:46192.168.2.4 00:0a:eb:6d:08:10192.168.1.2 00:01:6c:99:37:47192.168.4.9 00:0a:e6:b5:d1:4b192.168.0.4 00:0e:1f:51:74:24192.168.6.7 00:1d:72:40:b2:e1192.168.8.4 00:01:6c:36:5d:64192.168.1.22 00:e0:4c:41:ce:73192.168.0.15 00:e0:4c:41:d7:0e192.168.2.9 00:e0:4c:41:d1:8b192.168.0.122 00:16:ec:c5:46:45192.168.9.115 00:01:6c:98:f7:07192.168.7.111 00:17:31:b6:6e:a9
sort -t. -k3.1,3.1nr -k4.1,4.3nr arp.txt -k多少列 -k3.1,3.3 第三列第一个字符到第三列第一个字符 -k4.1,4.3 第四列第一个字符,第四列第三个字符
[root@n1 test]# sort -t. -k3.1,3.1nr -k4.1,4.3nr arp.txt192.168.9.115 00:01:6c:98:f7:07192.168.8.4 00:01:6c:36:5d:64192.168.7.111 00:17:31:b6:6e:a9192.168.6.7 00:1d:72:40:b2:e1192.168.4.9 00:0a:e6:b5:d1:4b192.168.3.7 00:50:bf:11:94:60192.168.3.5 00:e0:4c:43:a3:46192.168.2.9 00:e0:4c:41:d1:8b192.168.2.4 00:0a:eb:6d:08:10192.168.2.2 00:e0:4c:41:d1:7d192.168.1.22 00:e0:4c:41:ce:73192.168.1.2 00:01:6c:99:37:47192.168.0.122 00:16:ec:c5:46:45192.168.0.15 00:e0:4c:41:d7:0e192.168.0.4 00:0e:1f:51:74:24192.168.0.3 00:e0:4c:41:d2:a5
题目:[百度搜狐面试题] 统计url出现次数 ---awk解决
maotai.loghttp://www.maotai.com/index.htmlhttp://www.maotai.com/1.htmlhttp://post.maotai.com/index.htmlhttp://mp3.maotai.com/3.htmlhttp://www.maotai.com/1.htmlhttp://post.maotai.com/2.html