InfluxDB -- Retention Policy解析

Retention Policy(RP)是数据保存时间策略,超过了一定的时间后,老的数据会被自动删除。
结合CQ(Continuous Query)和RP,可以将历史数据保存为低精度,最近的数据保存为高精度,以降低存储用量。
RP的语法结构:

CREATE RETENTION POLICY ON DURATION REPLICATION [SHARD DURATION ] [DEFAULT]

其中:
  • duration指定了数据保留的时间,当超过了这个时间后,数据被自动删除;
  • replication指定每个shard的副本数,默认为1,集群场景需要>=2;
  • shard duration实际上指定每个shardGroup保留数据的时间长度,可以不传入,系统会根据duration自动计算一个值;
  • default指定是否默认的RP,若RP为默认,创建database未指定RP时,就使用默认的RP;
influxdb内置了一个默认策略autogen:
  • duration=0s表示永不过期;
  • shardGroupDuration=168h表示1个shardGroup保存7天的数据;
> show retention policies; nameduration shardGroupDuration replicaN default ------------ ------------------ -------- ------- autogen 0s168h0m0s1true

shardGroup vs shard:
  • shardGroup包含若干个shard;
  • shardGroup指定保存1段时间的数据,shardGroup下所有shard的数据都位于这个时间范围内;
InfluxDB -- Retention Policy解析
文章图片

看一下单机版influxdb,rp=autogen,replica=1的shards情况:
> show shard groups; name: shard groups id databaseretention_policy start_timeend_timeexpiry_time -- ------------------------ ----------------------------- 25 falconautogen2020-04-27T00:00:00Z 2020-05-04T00:00:00Z 2020-05-04T00:00:00Z 33 falconautogen2020-05-04T00:00:00Z 2020-05-11T00:00:00Z 2020-05-11T00:00:00Z 42 falconautogen2020-05-11T00:00:00Z 2020-05-18T00:00:00Z 2020-05-18T00:00:00Z 51 falconautogen2020-05-18T00:00:00Z 2020-05-25T00:00:00Z 2020-05-25T00:00:00Z 60 falconautogen2020-05-25T00:00:00Z 2020-06-01T00:00:00Z 2020-06-01T00:00:00Z 69 falconautogen2020-06-01T00:00:00Z 2020-06-08T00:00:00Z 2020-06-08T00:00:00Z 78 falconautogen2020-06-08T00:00:00Z 2020-06-15T00:00:00Z 2020-06-15T00:00:00Z

每个shardGroup保存7day的数据,1个shardGroup包含1个shard:
> show shards; name: falcon id database retention_policy shard_group start_timeend_timeexpiry_timeowners -- -------- ---------------- ----------- ----------------------------------- 25 falconautogen252020-04-27T00:00:00Z 2020-05-04T00:00:00Z 2020-05-04T00:00:00Z 33 falconautogen332020-05-04T00:00:00Z 2020-05-11T00:00:00Z 2020-05-11T00:00:00Z 42 falconautogen422020-05-11T00:00:00Z 2020-05-18T00:00:00Z 2020-05-18T00:00:00Z 51 falconautogen512020-05-18T00:00:00Z 2020-05-25T00:00:00Z 2020-05-25T00:00:00Z 60 falconautogen602020-05-25T00:00:00Z 2020-06-01T00:00:00Z 2020-06-01T00:00:00Z 69 falconautogen692020-06-01T00:00:00Z 2020-06-08T00:00:00Z 2020-06-08T00:00:00Z 78 falconautogen782020-06-08T00:00:00Z 2020-06-15T00:00:00Z 2020-06-15T00:00:00Z

1.如何确定1个shardGroup包含几个shard?
// replicaN是创建RP时指定的副本数 shardN := len(data.DataNodes) / replicaN

若有3个数据节点,每个shard 2副本,那么每个shardGroup下只有3/2=1个shard;
2.写入时序数据时,先根据时间确定存入哪个shardGroup,那如何确定数据放入哪个shard?
写入的时序数据,计算时序数据的hash,然后 hash % shardN后,决定放入哪个shard;
// HashID returns a non-cryptographic checksum of the point's key. func (p *point) HashID() uint64 { h := NewInlineFNV64a() h.Write(p.key)//p.key是measurement+tags sum := h.Sum64() return sum } func (sgi *ShardGroupInfo) ShardFor(hash uint64) ShardInfo { return sgi.Shards[hash%uint64(len(sgi.Shards))] }

每个shard对应OS上的一个目录,目录名称是shardId,递增的整数:
/var/lib/influxdb/data/mydatabase/six_month # ls -alh total 0 drwx------6 rootroot42 Sep6 08:00 . drwx------4 rootroot38 Sep3 14:59 .. drwxr-xr-x3 rootroot68 Sep6 08:41 1 drwxr-xr-x3 rootroot68 Sep3 16:48 3 drwxr-xr-x3 rootroot68 Sep3 17:02 4 drwxr-xr-x3 rootroot68 Sep 15 09:59 6

shard所在目录:datapath/database/retentionPolicy/shardId
func (s *Store) CreateShard(database, retentionPolicy string, shardID uint64, enabled bool) error { ...... path := filepath.Join(s.path, database, retentionPolicy, strconv.FormatUint(shardID, 10)) shard := NewShard(shardID, path, walPath, sfile, opt) ...... }

3.默认shardGroupDuration的计算方法:
shardGroup duration若未指定,influxdb会根据duration计算一个,计算方法:
RP's DURATION shardGroup duration
<2day 1h
>=2day and <=6month 1day
>6month 7day
【InfluxDB -- Retention Policy解析】计算方法的实现代码:
// shardGroupDuration returns the duration for a shard group based on a policy duration. func shardGroupDuration(d time.Duration) time.Duration { if d >= 180*24*time.Hour || d == 0 { // 6 months or 0 return 7 * 24 * time.Hour } else if d >= 2*24*time.Hour { // 2 days return 1 * 24 * time.Hour } return 1 * time.Hour }

    推荐阅读