Java|Java LZMA 磁盘与内存压缩实现
??LZMA(Lempel-Ziv-Markov chain-Algorithm),是一个Deflate和LZ77算法改良和优化后的压缩算法,它类似于LZ77的字典编码机制,在一般的情况下压缩率比bzip2高,用于压缩的可变字典最大可达4GB。
【Java|Java LZMA 磁盘与内存压缩实现】??LZMA的算法原理相对比较复杂,感兴趣的同学可以自行百度查看。
??本文针对磁盘上和内存中两种方式进行压缩和解压演示,演示只针对一层目录结构进行,多层目录只需递归操作进行即可。
??· Maven依赖
com.github.jponge
lzma-java
1.3
??· 磁盘压缩和解压
??无特殊情况下,操作都是在磁盘上进行,将所有文件存放在某一目录中,然后对目录进行压缩,工具类代码如下:
package com.arhorchin.securitit.compress.lzma;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;
public class LzmaDiskUtil {/**
* LZMA算法 压缩.
* @param srcFilePath 待压缩文件路径.
* @param tarFilePath 已压缩文件路径.
* @throws Exception .
*/
public static void lzmaCompress(String srcFilePath, String tarFilePath) throws Exception {
Encoder encoder = null;
FileInputStream srcFis = null;
FileOutputStream tarFos = null;
try {
encoder = new Encoder();
srcFis = new FileInputStream(new File(srcFilePath));
tarFos = new FileOutputStream(new File(tarFilePath));
encoder.setEndMarkerMode(false);
encoder.writeCoderProperties(tarFos);
long fileSize = srcFis.available();
for (int i = 0;
i < 8;
i++) {
tarFos.write((int) (fileSize >>> (8 * i)) & 0xFF);
}
encoder.code(srcFis, tarFos, -1, -1, null);
} finally {
if (null != srcFis) {
srcFis.close();
}
if (null != tarFos) {
tarFos.close();
}
}
}/**
* LZMA算法 解压.
* @param srcFilePath 待解压文件路径.
* @param tarFilePath 已解压文件路径.
* @throws Exception .
*/
public static void lzmaDecompress(String srcFilePath, String tarFilePath) throws Exception {
Decoder decoder = null;
FileInputStream srcFis = null;
FileOutputStream tarFos = null;
try {
decoder = new Decoder();
srcFis = new FileInputStream(new File(srcFilePath));
tarFos = new FileOutputStream(new File(tarFilePath));
int propertiesSize = 5;
byte[] properties = new byte[propertiesSize];
if (srcFis.read(properties, 0, propertiesSize) != propertiesSize) {
throw new IOException("input .lzma file is too short");
}
if (!decoder.setDecoderProperties(properties)) {
throw new IOException("Incorrect stream properties");
}
long outSize = 0;
for (int i = 0;
i < 8;
i++) {
int v = srcFis.read();
if (v < 0) {
throw new IOException("Can't read stream size");
}
outSize |= ((long) v) << (8 * i);
}
if (!decoder.code(srcFis, tarFos, outSize)) {
throw new IOException("Error in data stream");
}
} finally {
if (null != srcFis) {
srcFis.close();
}
if (null != tarFos) {
tarFos.close();
}
}
}}
??测试代码如下:
package com.arhorchin.securitit.com.compress;
import com.arhorchin.securitit.compress.lzma.LzmaDiskUtil;
public class LzmaDiskUtilTester {public static void main(String[] args) throws Exception {
String srcFilePath = "C:/Users/Administrator/Downloads/个人文件/test.xml";
String tarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-lzma.xml";
LzmaDiskUtil.lzmaCompress(srcFilePath, tarFilePath);
String vTarFilePath = "C:/Users/Administrator/Downloads/个人文件/test-unlzma.xml";
LzmaDiskUtil.lzmaDecompress(tarFilePath, vTarFilePath);
}}
??· 内存压缩和解压
??在实际应用中,对应不同需求,可能需要生成若干文件,然后将其压缩。在某些应用中,文件较小、文件数量较少且较为固定,频繁与磁盘操作,会带来不必要的效率影响。此时,可以在内存中将文件进行压缩得到.7z文件,工具类代码如下:
package com.arhorchin.securitit.compress.lzma;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import lzma.sdk.lzma.Decoder;
import lzma.sdk.lzma.Encoder;
public class LzmaMemoryUtil {/**
* LZMA算法 压缩.
* @param fileBytes 待压缩文件.
* @return 已压缩文件.
* @throws Exception .
*/
public static byte[] lzmaCompress(byte[] fileBytes) throws Exception {
Encoder encoder = null;
ByteArrayInputStream bais = null;
ByteArrayOutputStream baos = null;
try {
encoder = new Encoder();
baos = new ByteArrayOutputStream();
bais = new ByteArrayInputStream(fileBytes);
encoder.setEndMarkerMode(false);
encoder.writeCoderProperties(baos);
long fileSize = bais.available();
for (int i = 0;
i < 8;
i++) {
baos.write((int) (fileSize >>> (8 * i)) & 0xFF);
}
encoder.code(bais, baos, -1, -1, null);
return baos.toByteArray();
} finally {
if (null != bais) {
bais.close();
}
if (null != baos) {
baos.close();
}
}
}/**
* LZMA算法 解压.
* @param fileBytes 待解压文件.
* @return 已解压文件.
* @throws Exception .
*/
public static byte[] lzmaDecompress(byte[] fileBytes) throws Exception {
Decoder decoder = null;
ByteArrayInputStream bais = null;
ByteArrayOutputStream baos = null;
decoder = new Decoder();
baos = new ByteArrayOutputStream();
bais = new ByteArrayInputStream(fileBytes);
try {
int propertiesSize = 5;
byte[] properties = new byte[propertiesSize];
if (bais.read(properties, 0, propertiesSize) != propertiesSize) {
throw new IOException("input .lzma file is too short");
}
if (!decoder.setDecoderProperties(properties)) {
throw new IOException("Incorrect stream properties");
}
long outSize = 0;
for (int i = 0;
i < 8;
i++) {
int v = bais.read();
if (v < 0) {
throw new IOException("Can't read stream size");
}
outSize |= ((long) v) << (8 * i);
}
if (!decoder.code(bais, baos, outSize)) {
throw new IOException("Error in data stream");
}
return baos.toByteArray();
} finally {
if (null != bais) {
bais.close();
}
if (null != baos) {
baos.close();
}
}
}}
??测试代码如下:
package com.arhorchin.securitit.com.compress;
import java.io.File;
import org.apache.commons.io.FileUtils;
import com.arhorchin.securitit.compress.lzma.LzmaMemoryUtil;
public class LzmaMemoryUtilTester {public static void main(String[] args) throws Exception {
String txt = FileUtils.readFileToString(new File("C:/Users/Administrator/Downloads/个人文件/test-002.xml"));
byte[] bts = txt.getBytes("UTF-8");
System.out.println("====压缩前数据长度:====" + bts.length);
bts = LzmaMemoryUtil.lzmaCompress(bts);
System.out.println("====压缩后数据长度:====" + bts.length);
// System.out.println("====压缩后数据经Base64编码后:====" + Base64.encodeBase64String(bts));
System.out.println("====解压前数据长度:====" + bts.length);
bts = LzmaMemoryUtil.lzmaDecompress(bts);
System.out.println("====解压后数据长度:====" + bts.length);
txt = new String(bts, "UTF-8");
}}
??· 总结
??由于LZMA是7z使用的一种压缩算法,与本博之前介绍7z的博文总结类似,使用LZMA压缩格式可以取得更高的压缩比,当然,任何事情发生都是有前提的,在对不同类型或不同内容文件进行压缩时,压缩比会存在变动,并不会一直稳定在某个水准。总的来说,抛开条件谈性能、谈效率,都是耍流氓。在传输或存储时,对文件大小有要求的场景下,可以使用此种压缩格式。但同时也要注意7z高压缩比所带来的负面影响,以便在系统或功能设计时,可以提前预知风险且提早进行风险防控。
推荐阅读
- JAVA(抽象类与接口的区别&重载与重写&内存泄漏)
- 事件代理
- Java|Java OpenCV图像处理之SIFT角点检测详解
- java中如何实现重建二叉树
- 数组常用方法一
- 【Hadoop踩雷】Mac下安装Hadoop3以及Java版本问题
- Java|Java基础——数组
- RxJava|RxJava 在Android项目中的使用(一)
- java之static、static|java之static、static final、final的区别与应用
- Java基础-高级特性-枚举实现状态机