针对prometheus监控系统的influxdb数据库内存优化#yyds干货盘点# _DB数据库

上下观古今，起伏千万途。这篇文章主要讲述针对prometheus监控系统的influxdb数据库内存优化#yyds干货盘点#相关的知识，希望能为你提供帮助。
1.背景描述某个看granfana上k8s监控数据时，发现其中一个月的数据异常丢失，经过一系列排查后发现，influxdb数据库系数达到最高限制，新数据无法写入数据库

文章图片

将数据库系数调整成无限制后，引发了新的问题，内存瞬间被influxdb吃光，触发内存oom，influxdb内存使用率高达99%

文章图片

2.influxdb数据库优化influxdb存储prometheus数据一段时间后，会出现明显的性能问题

数据库系数超出限制，无法写入数据库，从而导致数据异常
数据库内存消耗严重，经常卡死，影响数据正常显示

2.1.influxdb数据库系数优化
每个数据库允许的最大系列数。默认设置为一百万。将设置更改为0，以允许每个数据库无限数量的系列。

max-series-per-database = 1000000

查当前数据库的系数

[root@prometheus-10-90 ~]# influx Connected to http://localhost:8086 version 1.5.2 InfluxDB shell version: 1.5.2 > use prometheus Using database prometheus > SHOW SERIES EXACT CARDINALITY

如果某个点导致数据库中的序列数超过每个数据库的最大序列数，则InfluxDB不会写入该点，它会返回500，并出现以下错误：

err="\\"error\\":\\"partial write: max-series-per-datbase limit exceeded: (1000000) dropped=36\\"\\n"

每个[tag key]（（/ influxdb / v1.2 / concepts / glossary /＃tag-key）允许的最大标签值数量。默认设置为100000。将设置更改为0以允许无限数量的标签值如果一个标签值导致一个标签键的标签值超过max-values-per-tag，InfluxDB将不会写入该点，并且会返回部分写入错误。
标签值超过max-values-per-tag的所有现有标签键将继续接受写入，但是创建新标签值的写入将失败。

max-values-per-tag = 100000

以上两个参数是关于influxdb必配参数

1.配置优化参数 [root@prometheus-10-90 ~]# vim /etc/influxdb/influxdb.conf max-series-per-database = 0 max-values-per-tag = 02.重启influxdb [root@prometheus-10-90 ~]# systemctl restart influxdb

2.2.数据库内存优化
【针对prometheus监控系统的influxdb数据库内存优化#yyds干货盘点#】1）将TSM模式调整为tsi1，数据直接存储到磁盘文件

[root@prometheus-10-90 ~]# vim /etc/influxdb/influxdb.conf index-version = "tsi1"

2）数据库系数和内存全部优化后要在systemctl文件中也增加优化参数

[root@prometheus-10-90 ~]# cat /usr/lib/systemd/system/influxdb.service # If you modify this, please also make sure to edit init.sh[Unit] Description=InfluxDB is an open-source, distributed, time series database Documentation=https://docs.influxdata.com/influxdb/ After=network-online.target[Service] User=influxdb Group=influxdb LimitNOFILE=65536 EnvironmentFile=-/etc/default/influxdb ExecStart=/usr/bin/influxd -config /etc/influxdb/influxdb.conf $INFLUXD_OPTS KillMode=control-group Restart=on-failure #########一下几行必配######## GOMAXPROCS=2 INFLUXDB_DATA_MAX_CONCURRENT_COMPACTIONS=1 GODEBUG=madvdontneed=1 GOGC=10 INFLUXDB_DATA_INDEX_VERSION=tsi1[Install] WantedBy=multi-user.target Alias=influxd.service

3）TSM模式修改后需要重构TSI索引
官方配置文档

1.停止influxdb [root@prometheus-10-90 ~]# systemctl stop influxdb2.删除所有_series目录 #删除所有_series目录。默认情况下，_series目录存储在/data/< dbName> /_series [root@prometheus-10-90 ~]# cd /data/influxdb/data/ [root@prometheus-10-90 data]# find . -name "_series" ./_internal/_series ./prometheus/_series ./k8s_prometheus/_series [root@prometheus-10-90 data]# find . -name "_series" | xargs rm -rf3.删除所有索引目录 #索引目录会在每个数据库下的分片目录都会存在一个 #删除所有索引目录。默认情况下，索引目录存储在/data/< dbName数据库> /< rpName回收策略> /< shardID分配> /index。 [root@prometheus-10-90 data]# find . -name "index" ······ ./prometheus/autogen/2/index ./prometheus/autogen/5/index ./prometheus/autogen/14/index ./prometheus/autogen/23/index ./prometheus/autogen/32/index ./prometheus/autogen/41/index ./prometheus/autogen/50/index ./prometheus/autogen/59/index ./prometheus/autogen/68/index ./prometheus/autogen/77/index ./prometheus/_series/00/index ./prometheus/_series/01/index ./prometheus/_series/02/index ./prometheus/_series/03/index ./prometheus/_series/04/index ./prometheus/_series/05/index ./prometheus/_series/06/index ./prometheus/_series/07/index ······· [root@prometheus-10-90 data]# find . -name "index" | xargs rm -rf4.重建TSI索引 #必须以influxdb用户去执行 [root@prometheus-10-90 data]# sudo -H -u influxdb bash -c "influx_inspect buildtsi -datadir /data/influxdb/data/ -waldir /data/influxdb/wal/"

文章图片

2.3.influxdb最优配置文件
官方配置文档

[root@prometheus-10-90 ~]# vim /etc/influxdb/influxdb.conf reporting-disabled = true#关闭上报功能 [meta] dir = "/data/influxdb/meta"#meta数据存放目录 [data] dir = "/data/influxdb/data"#存储最终数据的TSM文件的目录 wal-dir = "/data/influxdb/wal"#预写日志（WAL）文件的存储目录 index-version = "tsi1"#基于时间序列（TSI）磁盘的索引，请将其值设置为tsi1。默认inmem 内存分片索引 trace-logging-enabled = false#是否开启跟踪（trace）日志，默认值：false cache-max-memory-size = "1g"#用于限定shard最大值，大于该值时会拒绝写入 cache-snapshot-memory-size = "25m"#设置快照大小，大于该值时数据会刷新到tsm文件，默认值：25MB compact-full-write-cold-duration = "24h"#如果没有收到写或删除操作，TSM引擎将压缩一个分片中的所有TSM文件的时间间隔 max-index-log-file-size = "1g"#索引预写日志（WAL）文件将压缩为索引文件时的阈值 max-series-per-database = 0#限制数据库的series数，该值为0时取消限制，默认值：1000000 series-id-set-cache-size = 100#TSI索引中用于存储先前计算的系列结果的内部缓存的大小,0禁用缓存 max-values-per-tag = 0#一个tag最大的value数，该值为0时取消限制，默认值：100000 [coordinator] write-timeout = "10s"#写操作超时时间 max-concurrent-queries = 0#最大并发查询数 query-timeout = "60s"#查询操作超时时间 log-queries-after = "0s"#慢查询超时时间 max-select-point = 0#select语句可以处理的最大点数 max-select-series = 0#select语句可以处理的最大级数 max-select-buckets = 0#select语句可以处理的最大"GROUP BY time()"的时间周期 [retention] [shard-precreation] advance-period = "30m"#预创建分区的最大提前时间，默认值：30m [monitor] store-enabled = false#关闭监控模块 [http] log-enabled = true#是否开启http请求日志，默认值：true max-row-limit = 10000#系统在非分块查询中可以返回的最大行数。默认设置（0）允许无限制的行数 [ifql] [logging] [subscriber] [[graphite]] [[collectd]] [[opentsdb]] [[udp]] [continuous_queries]

2.4.整体优化完数据库内存占比
仅仅30%

[root@prometheus-10-90 data]# ps aux | grep influxdb influxdb 16721344 30.7 68972828 7579240 ?Ssl15:024:32 /usr/bin/influxd -config /etc/influxdb/influxdb.conf