提取Word中特定字符中的文本到Excel

提取Word中特定字符中的文本到Excel
文章图片
image.png 要求:提取Word中每个《》之间的文本。(代码在Excel中)
代码1:查找替换

Sub 提取特定字符中间的内容() '查找替换 Dim wdapp As Object, wdoc As Object'声明wdapp和wdoc变量为对象类型 Set wdapp = CreateObject("Word.Application")'将新建word程序对象赋给变量wdapp Dim Findchar As String'要查找的字符 Findchar = "《*》" Set wdoc = wdapp.Documents.Open(ThisWorkbook.Path & "\范例.docx") '打开word文档 With wdoc.Content.Find '此处针对全文档 .MatchWildcards = True '使用通配符 Do While .Execute(FindText:=Findchar) = True '将内容返回到Excel k = k + 1 Cells(k, 1) = .Parent'此代码没有处理符号 Cells(k, 2) = Replace(Replace(.Parent, "《", ""), "》", "")'此代码去除符号 Loop End With wdoc.Close False'关闭word文档,不保存更改。 wdapp.Quit '关闭word程序 Set wdapp = Nothing'释放内存 Set wdoc = Nothing'释放内存 End Sub

【提取Word中特定字符中的文本到Excel】代码2:Split方法
Sub 提取特定字符中间的内容1() 'split Dim wdapp As Object, wdoc As Object'声明wdapp和wdoc变量为对象类型 Set wdapp = CreateObject("Word.Application")'将新建word程序对象赋给变量wdapp Dim schar As String'字符串 Set wdoc = wdapp.Documents.Open(ThisWorkbook.Path & "\范例.docx") '打开word文档 schar = wdoc.Content '将文档内容赋值给字符串 wdoc.Close False'关闭word文档,不保存更改。 wdapp.Quit '关闭word程序 Set wdapp = Nothing'释放内存 Set wdoc = Nothing'释放内存 Dim s s = Split(schar, "《") For Each s1 In s If InStr(s1, "》") > 0 Then k = k + 1 Cells(k, 1) = Split(s1, "》")(0) End If Next End Sub

代码3:正则表达式,三个表达式都能够实现结果,注意submatchse就行。
Sub 提取特定字符中间的内容2() '正则 Dim wdapp As Object, wdoc As Object'声明wdapp和wdoc变量为对象类型 Set wdapp = CreateObject("Word.Application")'将新建word程序对象赋给变量wdapp Dim schar As String'字符串 Set wdoc = wdapp.Documents.Open(ThisWorkbook.Path & "\范例.docx") '打开word文档 schar = wdoc.Content '将文档内容赋值给字符串 wdoc.Close False'关闭word文档,不保存更改。 wdapp.Quit '关闭word程序 Set wdapp = Nothing'释放内存 Set wdoc = Nothing'释放内存 With CreateObject("vbscript.regexp") .Global = True '.Pattern = "《([^》]+)》" '.Pattern = "《(.*?)》" .Pattern = "[^《]+(?=》)" Set matc = .Execute(schar) For Each mat In matc k = k + 1 'Cells(k, 1) = mat.submatchse(0) Cells(k, 1) = mat.Value Next End With End Sub

结果:

提取Word中特定字符中的文本到Excel
文章图片
image.png

    推荐阅读