Java|Java LZMA 磁盘与内存压缩实现

??LZMA(Lempel-Ziv-Markov chain-Algorithm),是一个Deflate和LZ77算法改良和优化后的压缩算法,它类似于LZ77的字典编码机制,在一般的情况下压缩率比bzip2高,用于压缩的可变字典最大可达4GB。
【Java|Java LZMA 磁盘与内存压缩实现】??LZMA的算法原理相对比较复杂,感兴趣的同学可以自行百度查看。
??本文针对磁盘上和内存中两种方式进行压缩和解压演示,演示只针对一层目录结构进行,多层目录只需递归操作进行即可。
??· Maven依赖

com.github.jponge lzma-java 1.3

??· 磁盘压缩和解压
??无特殊情况下,操作都是在磁盘上进行,将所有文件存放在某一目录中,然后对目录进行压缩,工具类代码如下:
package com.arhorchin.securitit.compress.lzma; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import lzma.sdk.lzma.Decoder; import lzma.sdk.lzma.Encoder; public class LzmaDiskUtil {/** * LZMA算法 压缩. * @param srcFilePath 待压缩文件路径. * @param tarFilePath 已压缩文件路径. * @throws Exception . */ public static void lzmaCompress(String srcFilePath, String tarFilePath) throws Exception { Encoder encoder = null; FileInputStream srcFis = null; FileOutputStream tarFos = null; try { encoder = new Encoder(); srcFis = new FileInputStream(new File(srcFilePath)); tarFos = new FileOutputStream(new File(tarFilePath)); encoder.setEndMarkerMode(false); encoder.writeCoderProperties(tarFos); long fileSize = srcFis.available(); for (int i = 0; i < 8; i++) { tarFos.write((int) (fileSize >>> (8 * i)) & 0xFF); } encoder.code(srcFis, tarFos, -1, -1, null); } finally { if (null != srcFis) { srcFis.close(); } if (null != tarFos) { tarFos.close(); } } }/** * LZMA算法 解压. * @param srcFilePath 待解压文件路径. * @param tarFilePath 已解压文件路径. * @throws Exception . */ public static void lzmaDecompress(String srcFilePath, String tarFilePath) throws Exception { Decoder decoder = null; FileInputStream srcFis = null; FileOutputStream tarFos = null; try { decoder = new Decoder(); srcFis = new FileInputStream(new File(srcFilePath)); tarFos = new FileOutputStream(new File(tarFilePath)); int propertiesSize = 5; byte[] properties = new byte[propertiesSize]; if (srcFis.read(properties, 0, propertiesSize) != propertiesSize) { throw new IOException("input .lzma file is too short"); } if (!decoder.setDecoderProperties(properties)) { throw new IOException("Incorrect stream properties"); } long outSize = 0; for (int i = 0; i < 8; i++) { int v = srcFis.read(); if (v < 0) { throw new IOException("Can't read stream size"); } outSize |= ((long) v) << (8 * i); } if (!decoder.code(srcFis, tarFos, outSize)) { throw new IOException("Error in data stream"); } } finally { if (null != srcFis) { srcFis.close(); } if (null != tarFos) { tarFos.close(); } } }}

??测试代码如下:
package com.arhorchin.securitit.com.compress; import com.arhorchin.securitit.compress.lzma.LzmaDiskUtil; public class LzmaDiskUtilTester {public static void main(String[] args) throws Exception { String srcFilePath = "C:/Users/Administrator/Downloads/个人文件/test.xml"; String tarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-lzma.xml"; LzmaDiskUtil.lzmaCompress(srcFilePath, tarFilePath); String vTarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-unlzma.xml"; LzmaDiskUtil.lzmaDecompress(tarFilePath, vTarFilePath); }}

??· 内存压缩和解压
??在实际应用中,对应不同需求,可能需要生成若干文件,然后将其压缩。在某些应用中,文件较小、文件数量较少且较为固定,频繁与磁盘操作,会带来不必要的效率影响。此时,可以在内存中将文件进行压缩得到.7z文件,工具类代码如下:
package com.arhorchin.securitit.compress.lzma; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import lzma.sdk.lzma.Decoder; import lzma.sdk.lzma.Encoder; public class LzmaMemoryUtil {/** * LZMA算法 压缩. * @param fileBytes 待压缩文件. * @return 已压缩文件. * @throws Exception . */ public static byte[] lzmaCompress(byte[] fileBytes) throws Exception { Encoder encoder = null; ByteArrayInputStream bais = null; ByteArrayOutputStream baos = null; try { encoder = new Encoder(); baos = new ByteArrayOutputStream(); bais = new ByteArrayInputStream(fileBytes); encoder.setEndMarkerMode(false); encoder.writeCoderProperties(baos); long fileSize = bais.available(); for (int i = 0; i < 8; i++) { baos.write((int) (fileSize >>> (8 * i)) & 0xFF); } encoder.code(bais, baos, -1, -1, null); return baos.toByteArray(); } finally { if (null != bais) { bais.close(); } if (null != baos) { baos.close(); } } }/** * LZMA算法 解压. * @param fileBytes 待解压文件. * @return 已解压文件. * @throws Exception . */ public static byte[] lzmaDecompress(byte[] fileBytes) throws Exception { Decoder decoder = null; ByteArrayInputStream bais = null; ByteArrayOutputStream baos = null; decoder = new Decoder(); baos = new ByteArrayOutputStream(); bais = new ByteArrayInputStream(fileBytes); try { int propertiesSize = 5; byte[] properties = new byte[propertiesSize]; if (bais.read(properties, 0, propertiesSize) != propertiesSize) { throw new IOException("input .lzma file is too short"); } if (!decoder.setDecoderProperties(properties)) { throw new IOException("Incorrect stream properties"); } long outSize = 0; for (int i = 0; i < 8; i++) { int v = bais.read(); if (v < 0) { throw new IOException("Can't read stream size"); } outSize |= ((long) v) << (8 * i); } if (!decoder.code(bais, baos, outSize)) { throw new IOException("Error in data stream"); } return baos.toByteArray(); } finally { if (null != bais) { bais.close(); } if (null != baos) { baos.close(); } } }}

??测试代码如下:
package com.arhorchin.securitit.com.compress; import java.io.File; import org.apache.commons.io.FileUtils; import com.arhorchin.securitit.compress.lzma.LzmaMemoryUtil; public class LzmaMemoryUtilTester {public static void main(String[] args) throws Exception { String txt = FileUtils.readFileToString(new File("C:/Users/Administrator/Downloads/个人文件/test-002.xml")); byte[] bts = txt.getBytes("UTF-8"); System.out.println("====压缩前数据长度:====" + bts.length); bts = LzmaMemoryUtil.lzmaCompress(bts); System.out.println("====压缩后数据长度:====" + bts.length); // System.out.println("====压缩后数据经Base64编码后:====" + Base64.encodeBase64String(bts)); System.out.println("====解压前数据长度:====" + bts.length); bts = LzmaMemoryUtil.lzmaDecompress(bts); System.out.println("====解压后数据长度:====" + bts.length); txt = new String(bts, "UTF-8"); }}

??· 总结
??由于LZMA是7z使用的一种压缩算法,与本博之前介绍7z的博文总结类似,使用LZMA压缩格式可以取得更高的压缩比,当然,任何事情发生都是有前提的,在对不同类型或不同内容文件进行压缩时,压缩比会存在变动,并不会一直稳定在某个水准。总的来说,抛开条件谈性能、谈效率,都是耍流氓。在传输或存储时,对文件大小有要求的场景下,可以使用此种压缩格式。但同时也要注意7z高压缩比所带来的负面影响,以便在系统或功能设计时,可以提前预知风险且提早进行风险防控。

    推荐阅读