gawkgsub函数的实际应用-成都创新互联网站建设

关于创新互联

多方位宣传企业产品与服务 突出企业形象

公司简介 公司的服务 荣誉资质 新闻动态 联系我们

gawkgsub函数的实际应用

本篇内容主要讲解“gawk gsub函数的实际应用”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“gawk gsub函数的实际应用”吧!

创新互联是一家专注于成都做网站、成都网站建设、成都外贸网站建设与策划设计,陆河网站建设哪家好?创新互联做网站,专注于网站建设十多年,网设计领域的专业建站公司;建站业务涵盖:陆河等地区。陆河做网站价格咨询:18982081108

在做一个数据清洗需求的时候,需要查询两张表里几个字段相同的重复数据。大概思路就是用exists语句,类似:
select *
  from a
 where exists (select 1
          from b
         where a.col1 = b.col1
           and a.col2 = a.col2);
但是这里麻烦的地方在于要匹配的列太多了:
a.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM,a.NOTICESTATE,a.PUBLISHDATE,a.FILENUMBER
用Linux文本处理的方法解决这个问题:
先将这段放到一个文本里:
root@bd-dev-mingshuo-183:/tmp#more 1
a.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM,
a.NOTICESTATE,a.PUBLISHDATE,a.FILENUMBER 这里介绍一下gawk里的gsub函数  
gsub匹配所有的符合正则表达式的内容,然后替换,相当于 sed 's//g'  
语法如下:
gsub(regular expression, subsitution string, target string);
处理的目标范围是第三个字段,匹配条件是第一个参数,匹配后,替换为第二个参数。

将一行文本处理为多行文本:
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'
a.INFOCODE
a.SOURCENAME
a.SOURCETYPE
a.PUBLISHTYPE
a.NOTICEDATE
a.ENDDATE
a.NOTICETITLE
a.LANGUAGE
a.IMPORTLEVEL
a.SOURCEURL
a.ATTACHTYPE
a.ATTACHNAME
a.ATTACHSIZE
a.FORM
a.ACCESSORYNUM
a.NOTICESTATE
a.PUBLISHDATE
a.FILENUMBER 复制每一列:
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'|gawk -F'\n' '{print "on",$0,"=",$0,"and"}'
on a.INFOCODE = a.INFOCODE and
on a.SOURCENAME = a.SOURCENAME and
on a.SOURCETYPE = a.SOURCETYPE and
on a.PUBLISHTYPE = a.PUBLISHTYPE and
on a.NOTICEDATE = a.NOTICEDATE and
on a.ENDDATE = a.ENDDATE and
on a.NOTICETITLE = a.NOTICETITLE and
on a.LANGUAGE = a.LANGUAGE and
on a.IMPORTLEVEL = a.IMPORTLEVEL and
on a.SOURCEURL = a.SOURCEURL and
on a.ATTACHTYPE = a.ATTACHTYPE and
on a.ATTACHNAME = a.ATTACHNAME and
on a.ATTACHSIZE = a.ATTACHSIZE and
on a.FORM = a.FORM and
on a.ACCESSORYNUM = a.ACCESSORYNUM and
on a.NOTICESTATE = a.NOTICESTATE and
on a.PUBLISHDATE = a.PUBLISHDATE and
on a.FILENUMBER = a.FILENUMBER and

替换
root@bd-dev-mingshuo-183:/tmp#more 1|gawk 'gsub(/,/,"\n",$0)'|gawk -F'\n' '{print $0,"=",$0,"and"}'|sed 's/= a/= b/g'     
a.INFOCODE = b.INFOCODE and
a.SOURCENAME = b.SOURCENAME and
a.SOURCETYPE = b.SOURCETYPE and
a.PUBLISHTYPE = b.PUBLISHTYPE and
a.NOTICEDATE = b.NOTICEDATE and
a.ENDDATE = b.ENDDATE and
a.NOTICETITLE = b.NOTICETITLE and
a.LANGUAGE = b.LANGUAGE and
a.IMPORTLEVEL = b.IMPORTLEVEL and
a.SOURCEURL = b.SOURCEURL and
a.ATTACHTYPE = b.ATTACHTYPE and
a.ATTACHNAME = b.ATTACHNAME and
a.ATTACHSIZE = b.ATTACHSIZE and
a.FORM = b.FORM and
a.ACCESSORYNUM = b.ACCESSORYNUM and
a.NOTICESTATE = b.NOTICESTATE and
a.PUBLISHDATE = b.PUBLISHDATE and
a.FILENUMBER = b.FILENUMBER and

处理过程比较简单,重点在于gawk里的gsub函数的应用,以及处理思路。

到此,相信大家对“gawk gsub函数的实际应用”有了更深的了解,不妨来实际操作一番吧!这里是创新互联网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!


分享名称:gawkgsub函数的实际应用
文章起源:http://kswsj.cn/article/iepejd.html

其他资讯