Storage|Storage Format

文档简介(0.9.0) Data in Druid is stored in a custom column format known as a segment. Segments are composed of different types of columns. Column.java and the classes that extend it is a great place to looking into the storage format.
基本类 ValueType
枚举类,包含四个可选项:

  1. Float
  2. Long
  3. String
  4. Complex
IndexedInts
【Storage|Storage Format】主要有三个方法:
int size(); int get(int index); void fill(int index, int[] toFill);

实现类主要有:
  1. EmptyIndexedInts
  2. IntBufferIndexedInts
  3. ListBasedIndexedInts
  4. VSizeIndexedInts
size() 指的是该 Buffer 下还有多少个元素可读或可写;
get(index) 读取该 Buffer 下的 index 个元素;
fill()将对应的 Channel 数据填充到该 Buffer,目前都不支持该方法.
其中,ListBasedIndexedInts采用的存储是 List.
可以看出,部分是采用 Java NIO 操作 native memory.
ColumnCapabilities
属性:
private ValueType type = null; private boolean dictionaryEncoded = false; // 是否字典编码 private boolean runLengthEncoded = false; // 是否 runLength 编码,runLength 是虚构的,可忽略 private boolean hasInvertedIndexes = false; // 是否倒排索引 private boolean hasSpatialIndexes = false; // 是否稀疏索引 private boolean hasMultipleValues = false; // 是否有多值

DictionaryEncodedColumn
基本方法:
public int length(); // 一个字典编码列的总长度 public boolean hasMultipleValues(); // 是否有多值的情况 public int getSingleValueRow(int rowNum); // 获取某行的单值 public IndexedInts getMultiValueRow(int rowNum); // 获取某行的多值 public String lookupName(int id); // 通过 id 索引获取对应行的值,注意,null and empty 都会转化成 null public int lookupId(String name); // public int getCardinality(); // 获取基数,字典长度

唯一实现类SimpleDictionaryEncodedColumn,有三个属性:
private final IndexedInts column; private final IndexedMultivalue multiValueColumn; private final CachingIndexed cachedLookups;

其中有意思的是 cachedLookups,存储的是字典。
CachingIndexed 字典的具体实现类,实现了 Indexed接口,其它的实现类主要有:
  1. GenericIndexed
  2. ArrayIndexed
  3. BufferIndexed
  4. ListIndexed
  5. VSizeIndexed
CachingIndexed 是 wrapping a given GenericIndexed,同时使用一个 LRUMap SizedLRUMap来存储 cachedValues.
GenericIndexed
A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
V1 Storage Format:
  • byte 1: version (0x1)
  • byte 2 == 0x1 => allowReverseLookup
  • bytes 3-6 => numBytesUsed
  • bytes 7-10 => numElements
  • bytes 10-((numElements * 4) + 10): integers representing 'end' offsets of byte serialized values
  • bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value
属性有:
private final ByteBuffer theBuffer; // 内置的 ByteBuffer 存储 private final ObjectStrategy strategy; private final boolean allowReverseLookup; private final int size; // theBuffer 的当前 int 值 private final int valuesOffset; private final BufferIndexed bufferIndexed; // 内部类, BufferIndexed

Column 类 接口,详见实现类
SimpleColumn 类
属性:
private final ColumnCapabilitiescapabilities; private final SupplierdictionaryEncodedColumn; private final SupplierrunLengthColumn; private final SuppliergenericColumn; private final SuppliercomplexColumn; private final SupplierbitmapIndex; private final SupplierspatialIndex;

    推荐阅读