Android中解析读取复杂word，excel，ppt等的方法 _Android

春衣少年当酒歌，起舞四顾以笑和。这篇文章主要讲述Android中解析读取复杂word，excel，ppt等的方法相关的知识，希望能为你提供帮助。
前段时间在尝试做一个android里的万能播放器，能播放各种格式的软件，其中就涉及到了最常用的office软件。查阅了下资料，发现Android中最传统的直接解析读取word， excel的方法主要用了java里第三方包，比如利用tm-extractors-0.4.jar和jxl.jar等，下面附上代码和效果图。
读取word用了tm-extractors-0.4.jar包，代码如下：

package com.example.readword; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import org.textmining.text.extraction.WordExtractor; import android.app.Activity; import android.os.Bundle; import android.os.Environment; import android.widget.TextView; public class MainActivity extends Activity { /** Called when the activity is first created. */ private TextView text; @ Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); text = (TextView) findViewById(R.id.text); String str = readWord(" /storage/emulated/0/ArcGIS/localtilelayer/11.doc" ); text.setText(str.trim().replace(" " , " " )); }public String readWord(String file){ //创建输入流用来读取doc文件 FileInputStream in; String text = null; try { in = new FileInputStream(new File(file)); WordExtractor extractor = null; //创建WordExtractor extractor = new WordExtractor(); //进行提取对doc文件 text = extractor.extractText(in); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } return text; } }

效果图如下：

文章图片

只是从网上随便下载的一个文档，我们可以看出，虽然能读取，但是格式的效果并不佳，而且只能读取doc，不能读取docx格式，也不能读取doc里的图片。另外就是加入使用WPF打开过这个doc的话，将无法再次读取（对于只安装WPF的我简直是个灾难）
然后是用jxl读取excel的代码，这个代码不是很齐，就写了个解析的，将excel里每行每列都解析了出来，然后自己可以重新再编辑，代码如下：

package com.readexl; import java.io.FileInputStream; import java.io.InputStream; import android.os.Bundle; import android.os.Environment; import android.app.Activity; import android.text.method.ScrollingMovementMethod; import android.view.Menu; import android.widget.TextView; import jxl.*; public class MainActivity extends Activity { TextView txt = null; public String filePath_xls = Environment.getExternalStorageDirectory() + " /case.xls" ; @ Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); txt = (TextView)findViewById(R.id.txt_show); txt.setMovementMethod(ScrollingMovementMethod.getInstance()); readExcel(); }@ Override public boolean onCreateOptionsMenu(Menu menu) { // Inflate the menu; this adds items to the action bar if it is present. getMenuInflater().inflate(R.menu.main, menu); return true; }public void readExcel() { try { /** * 后续考虑问题,比如Excel里面的图片以及其他数据类型的读取 **/ InputStream is = new FileInputStream(filePath_xls); //Workbook book = Workbook.getWorkbook(new File(" mnt/sdcard/test.xls" )); Workbook book = Workbook.getWorkbook(is); int num = book.getNumberOfSheets(); txt.setText(" the num of sheets is " + num+ " \\n" ); // 获得第一个工作表对象 Sheet sheet = book.getSheet(0); int Rows = sheet.getRows(); int Cols = sheet.getColumns(); txt.append(" the name of sheet is " + sheet.getName() + " \\n" ); txt.append(" total rows is " + Rows + " \\n" ); txt.append(" total cols is " + Cols + " \\n" ); for (int i = 0; i < Cols; + + i) { for (int j = 0; j < Rows; + + j) { // getCell(Col,Row)获得单元格的值 txt.append(" contents:" + sheet.getCell(i,j).getContents() + " \\n" ); } } book.close(); } catch (Exception e) { System.out.println(e); } }}

效果图如下：

文章图片

好吧，这只是个半成品，不过，这个方法肯定是行得通的。
之前说了这么多，很明白的意思就是我对于这两种方法都不是很满意。在这里，我先说下doc和docx的区别（ xls和xlsx， ppt和pptx等区别都和此类似）
众所周知的是doc是03及之前版本word所保存的格式， docx是07版本之后保存的格式，简单的说，在doc中，微软还是用二进制存储方式；在docx中微软开始用xml方式， docx实际上成了一个打包的ZIP压缩文件。doc解压得到的是没有扩展名的文件碎片，而docx解压可以得到一个XML和几个包含信息的文件夹。两者比较的结论就是docx更小，而且要读取图片更容易。（参考http://www.zhihu.com/question/21547795）
好吧，回到正题。如何才能解析各种word， excel等能保留原来格式并且解析里面的图片，表格或附件等内容呢。那当然就是html了！不得不承认html对于页面，表格等展示的效果确是是很强大的，原生很难写出这样的效果。在网上找了诸多的资料，以及各个大神的代码，自己又再此基础上修改了下，实现的效果还不错吧。
利用的包是POI（一堆很强大的包，可以解析几乎所有的office软件，这里以doc， docx， xls， xlsx为例）

读取文件后根据不同文件类型分别进行操作。

public void read() { if(!myFile.exists()){ if (this.nameStr.endsWith(" .doc" )) { this.getRange(); this.makeFile(); this.readDOC(); } if (this.nameStr.endsWith(" .docx" )) { this.makeFile(); this.readDOCX(); } if (this.nameStr.endsWith(" .xls" )) { try { this.makeFile(); this.readXLS(); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } if (this.nameStr.endsWith(" .xlsx" )) { try{ this.makeFile(); this.readXLSX(); }catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } } returnPath = " file:///" + myFile; // this.view.loadUrl(" file:///" + this.htmlPath); System.out.println(" htmlPath" + this.htmlPath); }

先贴上公用的方法，主要是设置生成的html文件保存地址：

public void makeFile() { String sdStateString = android.os.Environment.getExternalStorageState(); // 获取外部存储状态 if (sdStateString.equals(android.os.Environment.MEDIA_MOUNTED)) {// 确认sd卡存在,原理不知,媒体安装?? try { File sdFile = android.os.Environment .getExternalStorageDirectory(); // 获取扩展设备的文件目录 String path = sdFile.getAbsolutePath() + File.separator + " library" ; // 得到sd卡(扩展设备)的绝对路径+ " /" + xiao File dirFile = new File(path); // 获取xiao文件夹地址 if (!dirFile.exists()) {// 如果不存在 dirFile.mkdir(); // 创建目录 } File myFile = new File(path + File.separator + filename+ " .html" ); // 获取my.html的地址 if (!myFile.exists()) {// 如果不存在 myFile.createNewFile(); // 创建文件 } this.htmlPath = myFile.getAbsolutePath(); // 返回路径 } catch (Exception e) { } } }

然后是读取doc：

private void getRange() { FileInputStream in = null; POIFSFileSystem pfs = null; try { in = new FileInputStream(nameStr); pfs = new POIFSFileSystem(in); hwpf = new HWPFDocument(pfs); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }range = hwpf.getRange(); pictures = hwpf.getPicturesTable().getAllPictures(); tableIterator = new TableIterator(range); }

public void readDOC() {try { myFile = new File(htmlPath); output = new FileOutputStream(myFile); presentPicture= 0; String head = " < html> < meta charset= \\" utf-8\\" > < body> " ; String tagBegin = " < p> " ; String tagEnd = " < /p> " ; output.write(head.getBytes()); int numParagraphs = range.numParagraphs(); // 得到页面所有的段落数 for (int i = 0; i < numParagraphs; i+ + ) { // 遍历段落数 Paragraph p = range.getParagraph(i); // 得到文档中的每一个段落 if (p.isInTable()) { int temp = i; if (tableIterator.hasNext()) { String tableBegin = " < table style= \\" border-collapse:collapse\\" border= 1 bordercolor= \\" black\\" > " ; String tableEnd = " < /table> " ; String rowBegin = " < tr> " ; String rowEnd = " < /tr> " ; String colBegin = " < td> " ; String colEnd = " < /td> " ; Table table = tableIterator.next(); output.write(tableBegin.getBytes()); int rows = table.numRows(); for (int r = 0; r < rows; r+ + ) { output.write(rowBegin.getBytes()); TableRow row = table.getRow(r); int cols = row.numCells(); int rowNumParagraphs = row.numParagraphs(); int colsNumParagraphs = 0; for (int c = 0; c < cols; c+ + ) { output.write(colBegin.getBytes()); TableCell cell = row.getCell(c); int max = temp + cell.numParagraphs(); colsNumParagraphs = colsNumParagraphs + cell.numParagraphs(); for (int cp = temp; cp < max; cp+ + ) { Paragraph p1 = range.getParagraph(cp); output.write(tagBegin.getBytes()); writeParagraphContent(p1); output.write(tagEnd.getBytes()); temp+ + ; } output.write(colEnd.getBytes()); } int max1 = temp + rowNumParagraphs; for (int m = temp + colsNumParagraphs; m < max1; m+ + ) { temp+ + ; } output.write(rowEnd.getBytes()); } output.write(tableEnd.getBytes()); } i = temp; } else { output.write(tagBegin.getBytes()); writeParagraphContent(p); output.write(tagEnd.getBytes()); } } String end = " < /body> < /html> " ; output.write(end.getBytes()); output.close(); } catch (Exception e) {System.out.println(" readAndWrite Exception:" + e.getMessage()); e.printStackTrace(); } }

读取docx
【Android中解析读取复杂word，excel，ppt等的方法】

public void readDOCX() { String river = " " ; try { this.myFile = new File(this.htmlPath); // new一个File,路径为html文件 this.output = new FileOutputStream(this.myFile); // new一个流,目标为html文件 presentPicture= 0; String head = " < !DOCTYPE> < html> < meta charset= \\" utf-8\\" > < body> " ; // 定义头文件,我在这里加了utf-8,不然会出现乱码 String end = " < /body> < /html> " ; String tagBegin = " < p> " ; // 段落开始,标记开始? String tagEnd = " < /p> " ; // 段落结束 String tableBegin = " < table style= \\" border-collapse:collapse\\" border= 1 bordercolor= \\" black\\" > " ; String tableEnd = " < /table> " ; String rowBegin = " < tr> " ; String rowEnd = " < /tr> " ; String colBegin = " < td> " ; String colEnd = " < /td> " ; String style = " style= \\" " ; this.output.write(head.getBytes()); // 写如头部 ZipFile xlsxFile = new ZipFile(new File(this.nameStr)); ZipEntry sharedStringXML = xlsxFile.getEntry(" word/document.xml" ); InputStream inputStream = xlsxFile.getInputStream(sharedStringXML); XmlPullParser xmlParser = Xml.newPullParser(); xmlParser.setInput(inputStream, " utf-8" ); int evtType = xmlParser.getEventType(); boolean isTable = false; // 是表格用来统计列行数 boolean isSize = false; // 大小状态 boolean isColor = false; // 颜色状态 boolean isCenter = false; // 居中状态 boolean isRight = false; // 居右状态 boolean isItalic = false; // 是斜体 boolean isUnderline = false; // 是下划线 boolean isBold = false; // 加粗 boolean isR = false; // 在那个r中 boolean isStyle = false; int pictureIndex = 1; // docx 压缩包中的图片名 iamge1 开始所以索引从1开始 while (evtType != XmlPullParser.END_DOCUMENT) { switch (evtType) {// 开始标签 case XmlPullParser.START_TAG: String tag = xmlParser.getName(); if (tag.equalsIgnoreCase(" r" )) { isR = true; } if (tag.equalsIgnoreCase(" u" )) { // 判断下划线 isUnderline = true; } if (tag.equalsIgnoreCase(" jc" )) { // 判断对齐方式 String align = xmlParser.getAttributeValue(0); if (align.equals(" center" )) { this.output.write(" < center> " .getBytes()); isCenter = true; } if (align.equals(" right" )) { this.output.write(" < div align= \\" right\\" > " .getBytes()); isRight = true; } }if (tag.equalsIgnoreCase(" color" )) { // 判断颜色String color = xmlParser.getAttributeValue(0); this.output .write((" < span style= \\" color:" + color + " ; \\" > " ) .getBytes()); isColor = true; } if (tag.equalsIgnoreCase(" sz" )) { // 判断大小 if (isR = = true) { int size = decideSize(Integer.valueOf(xmlParser .getAttributeValue(0))); this.output.write((" < font size= " + size + " > " ) .getBytes()); isSize = true; } } // 下面是表格处理 if (tag.equalsIgnoreCase(" tbl" )) { // 检测到tbl 表格开始 this.output.write(tableBegin.getBytes()); isTable = true; } if (tag.equalsIgnoreCase(" tr" )) { // 行 this.output.write(rowBegin.getBytes()); } if (tag.equalsIgnoreCase(" tc" )) { // 列 this.output.write(colBegin.getBytes()); }if (tag.equalsIgnoreCase(" pic" )) { // 检测到标签 pic 图片 String entryName_jpeg = " word/media/image" + pictureIndex + " .jpeg" ; String entryName_png = " word/media/image" + pictureIndex + " .png" ; String entryName_gif = " word/media/image" + pictureIndex + " .gif" ; String entryName_wmf = " word/media/image" + pictureIndex + " .wmf" ; ZipEntry sharePicture = null; InputStream pictIS = null; sharePicture = xlsxFile.getEntry(entryName_jpeg); // 一下为读取docx的图片转化为流数组 if (sharePicture = = null) { sharePicture = xlsxFile.getEntry(entryName_png); } if(sharePicture = = null){ sharePicture = xlsxFile.getEntry(entryName_gif); } if(sharePicture = = null){ sharePicture = xlsxFile.getEntry(entryName_wmf); }if(sharePicture != null){ pictIS = xlsxFile.getInputStream(sharePicture); ByteArrayOutputStream pOut = new ByteArrayOutputStream(); byte[] bt = null; byte[] b = new byte[1000]; int len = 0; while ((len = pictIS.read(b)) != -1) { pOut.write(b, 0, len); } pictIS.close(); pOut.close(); bt = pOut.toByteArray(); Log.i(" byteArray" , " " + bt); if (pictIS != null) pictIS.close(); if (pOut != null) pOut.close(); writeDOCXPicture(bt); }pictureIndex+ + ; // 转换一张后索引+ 1 }if (tag.equalsIgnoreCase(" b" )) { // 检测到加粗标签 isBold = true; } if (tag.equalsIgnoreCase(" p" )) {// 检测到 p 标签 if (isTable = = false) { // 如果在表格中就无视 this.output.write(tagBegin.getBytes()); } } if (tag.equalsIgnoreCase(" i" )) { // 斜体 isItalic = true; } // 检测到值标签 if (tag.equalsIgnoreCase(" t" )) { if (isBold = = true) { // 加粗 this.output.write(" < b> " .getBytes()); } if (isUnderline = = true) { // 检测到下划线标签,输入< u> this.output.write(" < u> " .getBytes()); } if (isItalic = = true) { // 检测到斜体标签,输入< i> output.write(" < i> " .getBytes()); } river = xmlParser.nextText(); this.output.write(river.getBytes()); // 写入数值 if (isItalic = = true) { // 检测到斜体标签,在输入值之后,输入< /i> ,并且斜体状态= false this.output.write(" < /i> " .getBytes()); isItalic = false; } if (isUnderline = = true) {// 检测到下划线标签,在输入值之后,输入< /u> ,并且下划线状态= false this.output.write(" < /u> " .getBytes()); isUnderline = false; } if (isBold = = true) { // 加粗 this.output.write(" < /b> " .getBytes()); isBold = false; } if (isSize = = true) { // 检测到大小设置,输入结束标签 this.output.write(" < /font> " .getBytes()); isSize = false; } if (isColor = = true) { // 检测到颜色设置存在,输入结束标签 this.output.write(" < /span> " .getBytes()); isColor = false; } if (isCenter = = true) { // 检测到居中,输入结束标签 this.output.write(" < /center> " .getBytes()); isCenter = false; } if (isRight = = true) { // 居右不能使用< right> < /right> ,使用div可能会有状况,先用着 this.output.write(" < /div> " .getBytes()); isRight = false; } } break; // 结束标签 case XmlPullParser.END_TAG: String tag2 = xmlParser.getName(); if (tag2.equalsIgnoreCase(" tbl" )) { // 检测到表格结束,更改表格状态 this.output.write(tableEnd.getBytes()); isTable = false; } if (tag2.equalsIgnoreCase(" tr" )) { // 行结束

推荐阅读

如何让两个人的爱情能处在一个激情和兴奋的期待中？

今年9月怀孕明年6月生是男还是女

冰箱的灯不亮了，是冰箱坏了吗,正常与否看看这里

一封信

求本好看的小说完结 5本好看的海贼同人文

微信没有手机号怎么注册新号免费可以用的微信号2021

2014考研英语一真题及答案解析，2014考研英语二真题及答案解析

关于那些面试应对的小技巧如何应对面试

林内热水器排风马达声音的原因及解决方法

视频号直播展示商品，在视频号直播间拍产品会显示购买吗

今天又听说了“穷玩手机,富玩表,沙雕玩电脑”,这句话究竟是什么意思？

王广之传文言文翻译王广之传文言文翻译及原文

汗迹用什么可以洗掉

清明节在包头去哪里玩

坐骨神经疼怎么办最快最有效？坐骨神经疼的原因有哪些

手机服务器为何发生更改？手机服务器已更改怎么回事

上古卷轴5快速旅行黑屏

荔枝泡酒能放多久

迅雷接管浏览器下载怎么取消？迅雷接管浏览器下载关闭方法

外观出色，细节配置上乘、技嘉雪雕B660M AORUS PRO AX DDR4主板测试

android 项目学习随笔八（xUtils的BitmapUtils模块）

Android 仿QQ新浪相册的实现

AndroidMVP模式

android调用系统拍照那些事

使用Fiddler分析Android版API

编码标准和准则介绍和详细指南

HTML DOM标题属性用法及其示例

SASS @import用法介绍及其示例

Java中的StringBuilder类用法及其示例