本文概述
- Tika OOXMLParser构造函数
- OOXMLParser示例
Tika OOXMLParser构造函数
Constructor | Description |
---|---|
public OOXMLParser() | 它用于实例化类。 |
Method | Description |
---|---|
公共Set < MediaType> getSupportedTypes(ParseContext上下文) | 它返回此解析器支持的媒体类型集。 |
公共无效解析(InputStream流, ContentHandler处理程序, 元数据元数据, ParseContext上下文)引发IOException, SAXException, TikaException | 它将文档流解析为一系列XHTML SAX事件。 |
package tikaexample;
import java.io.IOException;
import java.io.InputStream;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class MSOfficeExample { public static void main(String[] args) throws IOException, SAXException, TikaException {BodyContentHandler handler= new BodyContentHandler();
OOXMLParser parser= new OOXMLParser();
Metadata metadata= http://www.srcmini.com/new Metadata();
ParseContext pcontext= new ParseContext();
try (InputStream stream = AutoDetectParseExample.class.getResourceAsStream("srcmini.xls")) {parser.parse(stream, handler, metadata, pcontext);
System.out.println("Document Content:" + handler.toString());
System.out.println("Document Metadata:");
String[] metadatas = metadata.names();
for(String data : metadatas) {System.out.println(data + ":" + metadata.get(data));
}}catch(Exception e) {System.out.println("Exception message: "+ e.getMessage());
} }}
我们的文件包含以下内容。
文章图片
【Tika MS Office文件提取示例】输出
Document Content:Sheet1 Employee Manual Punch In Time Out Time Device Total Minute Total Time Working Minutes 01-Nov-17 8:27:00 AM 01-Nov-17 6:30:00 PM 1 603 540 -63 02-Nov-17 8:09:00 AM 02-Nov-17 6:30:00 PM 1 621 540 -81 03-Nov-17 8:25:00 AM 03-Nov-17 6:30:00 PM 1 605 540 -65Document Metadata:date:2018-05-06T11:20:06Zcp:revision:1custom:DocSecurity:0dc:creator:Receptiondcterms:created:2017-12-03T08:38:57Zlanguage:en-INLast-Modified:2018-05-06T11:20:06Zdcterms:modified:2018-05-06T11:20:06ZLast-Save-Date:2018-05-06T11:20:06ZTemplate:protected:falsemeta:save-date:2018-05-06T11:20:06ZApplication-Name:LibreOffice/5.1.6.2$Linux_X86_64 LibreOffice_project/10m0$Build-2modified:2018-05-06T11:20:06Zcustom:LinksUpToDate:falseContent-Type:application/vnd.openxmlformats-officedocument.spreadsheetml.sheetcreator:Receptiondc:language:en-INmeta:author:Receptionmeta:creation-date:2017-12-03T08:38:57Zextended-properties:Application:LibreOffice/5.1.6.2$Linux_X86_64 LibreOffice_project/10m0$Build-2custom:ShareDoc:falsecustom:ScaleCrop:falseCreation-Date:2017-12-03T08:38:57Zcustom:HyperlinksChanged:falseRevision-Number:1extended-properties:Template:custom:AppVersion:12.0000
推荐阅读
- 6款Windows最佳任务计划软件下载推荐合集(你需要哪个())
- Tika MP4文件提取示例
- Tika语言检测解释和示例
- Tika Mp3文件提取示例
- Tika Jar文件提取示例
- Apache Tika安装详细步骤详解
- Tika简要简介
- Tika图像提取示例
- Tika HTML文件提取示例