深入挖掘 FST 快速序列化压缩内存的利器的特性和原理 java

FST 的概念和定义
FST 序列化全称是 Fast Serialization Tool，它是对 Java 序列化的替换实现。既然前文中提到 Java 序列化的两点严重不足，在 FST 中得到了较大的改善，FST 的特征如下：

JDK 提供的序列化提升了 10 倍，体积也减少 3-4 倍多
支持堆外 Maps，和堆外 Maps 的持久化
支持序列化为 JSON

FST 序列化的使用
FST 的使用有两种方式，一种是快捷方式，另一种需要使用 ObjectOutput 和 ObjectInput。
直接使用 FSTConfiguration 提供的序列化和反序列化接口

public static void serialSample() { FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration(); User object = new User(); object.setName("huaijin"); object.setAge(30); System.out.println("serialization, " + object); byte[] bytes = conf.asByteArray(object); User newObject = (User) conf.asObject(bytes); System.out.println("deSerialization, " + newObject); }

FSTConfiguration 也提供了注册对象的 Class 接口，如果不注册，默认会将对象的 Class Name 写入。这个提供了易用高效的 API 方式，不使用 ByteArrayOutputStreams 而直接得到 byte[]。
使用 ObjectOutput 和 ObjectInput，能更细腻控制序列化的写入写出：

static FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration(); static void writeObject(OutputStream outputStream, User user) throws IOException { FSTObjectOutput out = conf.getObjectOutput(outputStream); out.writeObject(user); out.close(); } static FstObject readObject(InputStream inputStream) throws Exception {FSTObjectInput input = conf.getObjectInput(inputStream); User fstObject = (User) input.readObject(User.class); input.close(); return fstObject; }

FST 在 Dubbo 中的应用

Dubbo 中对 FstObjectInput 和 FstObjectOutput 重新包装解决了序列化和反序列化空指针的问题。
并且构造了 FstFactory 工厂类，使用工厂模式生成 FstObjectInput 和 FstObjectOutput。其中同时使用单例模式，控制整个应用中 FstConfiguration 是单例，并且在初始化时将需要序列化的对象全部注册到 FstConfiguration。
对外提供了同一的序列化接口 FstSerialization，提供 serialize 和 deserialize 能力。

FST 序列化/反序列化
FST 序列化存储格式

基本上所有以 Byte 形式存储的序列化对象都是类似的存储结构，不管 class 文件、so 文件、dex 文件都是类似，这方面没有什么创新的格式，最多是在字段内容上做了一些压缩优化，包括我们最常使用的 utf-8 编码都是这个做法。

FST 的序列化存储和一般的字节格式化存储方案也没有标新立异的地方，比如下面这个 FTS 的序列化字节文件

00000001:0001 0f63 6f6d 2e66 7374 2e46 5354 4265 00000010:616e f701 fc05 7630 7374 7200

格式：

Header|类名长度|类名String|字段1类型(1Byte) | [长度] | 内容|字段2类型(1Byte) | [长度] | 内容|…

0000：字节数组类型：00 标识 OBJECT
0001：类名编码，00 标识 UTF 编码，01 表示 ASCII 编码
0002：Length of class name (1Byte) = 15
0003~0011：Class name string (15Byte)
0012：Integer 类型标识 0xf7
0013：Integer 的值=1
0014：String 类型标识 0xfc
0015：String 的长度=5
0016~001a：String 的值"v0str"
001b~001c：END

从上面可以看到 Integer 类型序列化后只占用了一个字节（值等于 1），并不像在内存中占用 4Byte，所以可以看出是根据一定规则做了压缩，具体代码看FSTObjectInput#instantiateSpecialTag中对不同类型的读取，FSTObjectInput 也定义不同类型对应的枚举值：

public class FSTObjectOutput implements ObjectOutput { private static final FSTLogger LOGGER = FSTLogger.getLogger(FSTObjectOutput.class); public static Object NULL_PLACEHOLDER = new Object() { public String toString() { return "NULL_PLACEHOLDER"; }}; public static final byte SPECIAL_COMPATIBILITY_OBJECT_TAG = -19; // see issue 52 public static final byte ONE_OF = -18; public static final byte BIG_BOOLEAN_FALSE = -17; public static final byte BIG_BOOLEAN_TRUE = -16; public static final byte BIG_LONG = -10; public static final byte BIG_INT = -9; public static final byte DIRECT_ARRAY_OBJECT = -8; public static final byte HANDLE = -7; public static final byte ENUM = -6; public static final byte ARRAY = -5; public static final byte STRING = -4; public static final byte TYPED = -3; // var class == object written class public static final byte DIRECT_OBJECT = -2; public static final byte NULL = -1; public static final byte OBJECT = 0; protected FSTEncoder codec; ... }

FST 序列化和反序列化原理
对 Object 进行 Byte 序列化，相当于做了持久化的存储，在反序列的时候，如果 Bean 的定义发生了改变，那么反序列化器就要做兼容的解决方案，我们知道对于 JDK 的序列化和反序列，serialVersionUID 对版本控制起了很重要的作用。FST 对这个问题的解决方案是通过 @Version 注解进行排序。
【深入挖掘 FST 快速序列化压缩内存的利器的特性和原理】在进行反序列操作的时候，FST 会先反射或者对象 Class 的所有成员，并对这些成员进行了排序，这个排序对兼容起了关键作用，也就是 @Version 的原理。在 FSTClazzInfo 中定义了一个 defFieldComparator 比较器，用于对 Bean 的所有 Field 进行排序：

public final class FSTClazzInfo { public static final Comparator defFieldComparator = new Comparator() { @Override public int compare(FSTFieldInfo o1, FSTFieldInfo o2) { int res = 0; if ( o1.getVersion() != o2.getVersion() ) { return o1.getVersion() < o2.getVersion() ? -1 : 1; } // order: version, boolean, primitives, conditionals, object references if (o1.getType() == boolean.class && o2.getType() != boolean.class) { return -1; } if (o1.getType() != boolean.class && o2.getType() == boolean.class) { return 1; } if (o1.isConditional() && !o2.isConditional()) { res = 1; } else if (!o1.isConditional() && o2.isConditional()) {res = -1; } else if (o1.isPrimitive() && !o2.isPrimitive()) {res = -1; } else if (!o1.isPrimitive() && o2.isPrimitive())res = 1; //if (res == 0) // 64 bit / 32 bit issues //res = (int) (o1.getMemOffset() - o2.getMemOffset()); if (res == 0) res = o1.getType().getSimpleName().compareTo(o2.getType().getSimpleName()); if (res == 0) res = o1.getName().compareTo(o2.getName()); if (res == 0) {return o1.getField().getDeclaringClass().getName().compareTo(o2.getField().getDeclaringClass().getName()); } return res; } }; ... }

从代码实现上可以看到，比较的优先级是 Field 的 Version 大小，然后是 Field 类型，所以总的来说 Version 越大排序越靠后，至于为什么要排序，看下 FSTObjectInput#instantiateAndReadNoSer 方法

public class FSTObjectInput implements ObjectInput { protected Object instantiateAndReadNoSer(Class c, FSTClazzInfo clzSerInfo, FSTClazzInfo.FSTFieldInfo referencee, int readPos) throws Exception {Object newObj; newObj = clzSerInfo.newInstance(getCodec().isMapBased()); ... } else { FSTClazzInfo.FSTFieldInfo[] fieldInfo = clzSerInfo.getFieldInfo(); readObjectFields(referencee, clzSerInfo, fieldInfo, newObj,0,0); } return newObj; } protected void readObjectFields(FSTClazzInfo.FSTFieldInfo referencee, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo[] fieldInfo, Object newObj, int startIndex, int version) throws Exception { if ( getCodec().isMapBased() ) { readFieldsMapBased(referencee, serializationInfo, newObj); if ( version >= 0 && newObj instanceof Unknown == false)getCodec().readObjectEnd(); return; } if ( version < 0 ) version = 0; int booleanMask = 0; int boolcount = 8; final int length = fieldInfo.length; int conditional = 0; for (int i = startIndex; i < length; i++) {// 注意这里的循环 try { FSTClazzInfo.FSTFieldInfo subInfo = fieldInfo[i]; if (subInfo.getVersion() > version ) {// 需要进入下一个版本的迭代 int nextVersion = getCodec().readVersionTag(); // 对象流的下一个版本 if ( nextVersion == 0 ) // old object read { oldVersionRead(newObj); return; } if ( nextVersion != subInfo.getVersion() ) {// 同一个Field的版本不允许变，并且版本变更和流的版本保持同步 throw new RuntimeException("read version tag "+nextVersion+" fieldInfo has "+subInfo.getVersion()); }readObjectFields(referencee,serializationInfo,fieldInfo,newObj,i,nextVersion); // 开始下一个Version的递归 return; } if (subInfo.isPrimitive()) { ... } else { if ( subInfo.isConditional() ) { ... } // object 把读出来的值保存到FSTFieldInfo中Object subObject = readObjectWithHeader(subInfo); subInfo.setObjectValue(newObj, subObject); } ...

从这段代码的逻辑基本就可以知道 FST 的序列化和反序列化兼容的原理了，注意里面的循环，正是按照排序后的 Filed 进行循环，而每个 FSTFieldInfo 都记录自己在对象流中的位置、类型等详细信息：
序列化：

按照 Version 对 Bean 的所有 Field 进行排序（不包括 static 和 transient 修饰的 member），没有 @Version 注解的 Field 默认 version=0；如果 version 相同，按照 version, boolean, primitives, conditionals, object references 排序
按照排序的 Field 把 Bean 的 Field 逐个写到输出流
@Version 的版本只能加不能减小，如果相等的话，有可能因为默认的排序规则，导致流中的 Filed 顺序和内存中的 FSTFieldInfo[]数组的顺序不一致，而注入错误

反序列化：

反序列化按照对象流的格式进行解析，对象流中保存的 Field 顺序和内存中的 FSTFieldInfo 顺序保持一致
相同版本的 Field 在对象流中存在，在内存 Bean 中缺失：可能抛异常（会有后向兼容问题）
对象流中包含内存 Bean 中没有的高版本 Field：正常（老版本兼容新）
相同版本的 Field 在对象流中缺失，在内存 Bean 中存在：抛出异常
相同的 Field 在对象流和内存 Bean 中的版本不一致：抛出异常
内存 Bean 增加了不高于最大版本的 Field：抛出异常

所以从上面的代码逻辑就可以分析出这个使用规则：@Version 的使用原则就是，每新增一个 Field，就对应的加上 @Version 注解，并且把 version 的值设置为当前版本的最大值加一，不允许删除 Field
另外再看一下 @Version 注解的注释：明确说明了用于后向兼容

package org.nustaq.serialization.annotations; import java.lang.annotation.ElementType; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import java.lang.annotation.Target; @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.FIELD})/** * support for adding fields without breaking compatibility to old streams. * For each release of your app increment the version value. No Version annotation means version=0. * Note that each added field needs to be annotated. * * e.g. * * class MyClass implements Serializable { * *// fields on initial release 1.0 *int x; *String y; * *// fields added with release 1.5 *@Version(1) String added; *@Version(1) String alsoAdded; * *// fields added with release 2.0 *@Version(2) String addedv2; *@Version(2) String alsoAddedv2; * * } * * If an old class is read, new fields will be set to default values. You can register a VersionConflictListener * at FSTObjectInput in order to fill in defaults for new fields. * * Notes/Limits: * - Removing fields will break backward compatibility. You can only Add new fields. * - Can slow down serialization over time (if many versions) * - does not work for Externalizable or Classes which make use of JDK-special features such as readObject/writeObject *(AKA does not work if fst has to fall back to 'compatible mode' for an object). * - in case you use custom serializers, your custom serializer has to handle versioning * */public @interface Version { byte value(); }

public class FSTBean implements Serializable { /** serialVersionUID */ private static final long serialVersionUID = -2708653783151699375L; private Integer v0in private String v0str; }

准备序列化和反序列化方法

public class FSTSerial {private static void serialize(FstSerializer fst, String fileName) {try { FSTBean fstBean = new FSTBean(); fstBean.setV0int(1); fstBean.setV0str("v0str"); byte[] v1 = fst.serialize(fstBean); FileOutputStream fos = new FileOutputStream(new File("byte.bin")); fos.write(v1, 0, v1.length); fos.close(); } catch (Exception e) { e.printStackTrace(); } }private static void deserilize(FstSerializer fst, String fileName) {try { FileInputStream fis = new FileInputStream(new File("byte.bin")); ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[256]; int length = 0; while ((length = fis.read(buf)) > 0) { baos.write(buf, 0, length); } fis.close(); buf = baos.toByteArray(); FSTBean deserial = fst.deserialize(buf, FSTBean.class); System.out.println(deserial); System.out.println(deserial); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { FstSerializer fst = new FstSerializer(); serialize(fst, "byte.bin"); deserilize(fst, "byte.bin"); } }