Traefik监控系统搭建

Traefik监控系统搭建 背景

  • 前边的文章Traefik学习中已经介绍了Traefik的使用,但是如果没有一个可视化的Traefik访问状态与Acces Log的Dashboard界面的话,对于一个网关来说实际上是不完整的,这篇文章就来介绍使用Prometheus + Grafana + Promtail+ Loki构建Traefik的监控中心
  • Prometheus是云原生时代事实上的系统(服务)状态监测标准,通过基于HTTP的pull方式采集时序数据,可以通过服务发现或者静态配置去获取要采集的目标服务器,支持单主节点工作,支持多种可视化图表及仪表盘--在本文中Prometheus用来收集Traefik Metrics数据
  • Grafana是一个开源的度量分析与可视化套件。 纯Javascript 开发的前端工具,通过访问库(如InfluxDB、Prometheus),展示自定义报表、显示图表等。Grafana的UI更加灵活,有丰富的插件,功能强大--在本文中Grafana用来展示来自Prometheus和Loki的数据
  • Promtail是一个日志收集的代理,它会将本地日志的内容发送到一个Loki实例,它通常部署到需要监视应用程序的每台机器/容器上。Promtail主要是用来发现目标、将标签附加到日志流以及将日志推送到Loki--本文中Promtail用来收集Traefik Access Log
  • Grafana Loki是一组可以组成一个功能齐全的日志堆栈组件,与其它日志系统不同的是,Loki只建立日志标签的索引而不索引原始日志消息,而是为日志数据设置一组标签,这意味着Loki的运营成本更低,效率也能提高几个数量级,一句话形容下Loki就是like Prometheus, but for logs--本文中Loki用来整合来自Promtail的日志数据
Traefik配置
  • 关于Traefik的配置,最关键的就是开启Metrics与Access Log的配置,静态配置文件traefik.toml如下
    [log] level = "WARN" format = "common" filePath = "/logs/traefik.log" [accessLog] filePath = "/logs/access.log" bufferingSize = 100 format = "json" [accessLog.fields.names] "StartUTC" = "drop" [accessLog.filters] retryAttempts = true minDuration = "10ms"

    • 这里只展示日志相关的关键配置
    • StartUTC的设置是为了设置日志使用的时区时间,配合TZ环境变量使用
  • traefik部署的Docker Compose配置文件traefik.yaml如下:
    version: '3' services: reverse-proxy: image: traefik restart: always environment: - TZ=Asia/Shanghai ports: - "80:80" - "443:443" networks: - traefik volumes: - ./traefik.toml:/etc/traefik/traefik.toml - /var/run/docker.sock:/var/run/docker.sock - ./config/:/etc/traefik/config/:ro - ./acme.json:/letsencrypt/acme.json - ./logs:/logs/:rw container_name: traefik # 网关健康检查 healthcheck: test: ["CMD-SHELL", "wget -q --spider --proxy off localhost:8080/ping || exit 1"] interval: 3s timeout: 5s # 创建外部网卡 docker network create traefik networks: traefik: external: true

    • 关键的部分是:
      • 指定日志使用的时区的环境变量TZ
      • 挂载本地的日志目录./logs
监控系统搭建
  • Prometheus配置文件prometheus-conf.yaml如下:
    global: scrape_interval:15s external_labels: monitor: 'codelab-monitor'scrape_configs: - job_name: 'node' scrape_interval: 5s static_configs: - targets: ['traefik:8080']

  • Loki配置文件loki.yaml
    auth_enabled: falseserver: http_listen_port: 3100ingester: wal: dir: /loki/wal lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 1h# Any chunk not receiving new logs in this time will be flushed max_chunk_age: 1h# All chunks will be flushed when they hit this age, default is 1h chunk_target_size: 1048576# Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first chunk_retain_period: 30s# Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m) max_transfer_retries: 0# Chunk transfers disabledschema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24hstorage_config: boltdb_shipper: active_index_directory: /loki/boltdb-shipper-active cache_location: /loki/boltdb-shipper-cache cache_ttl: 24h# Can be increased for faster performance over longer query periods, uses more disk space shared_store: filesystem filesystem: directory: /loki/chunkscompactor: working_directory: /loki/boltdb-shipper-compactor shared_store: filesystemlimits_config: reject_old_samples: true reject_old_samples_max_age: 168hchunk_store_config: max_look_back_period: 0stable_manager: retention_deletes_enabled: false retention_period: 0sruler: storage: type: local local: directory: /loki/rules rule_path: /loki/rules-temp alertmanager_url: http://localhost:9093 ring: kvstore: store: inmemory enable_api: true frontend: max_outstanding_per_tenant: 2048

  • promatil配置文件promtail.yaml:
    server: http_listen_port: 9080 grpc_listen_port: 0positions: filename: /tmp/positions.yamlclients: - url: http://loki:3100/loki/api/v1/pushscrape_configs: - job_name: app static_configs: - targets: - localhost labels: job: app __path__: /var/log/*log

    • 注意这里的lables标签中设置的job的名字是app,后面在Grafana中设置Dashboard时需要使用这个值
  • Docker Compose配置文件prometheus.yaml如下:
    version: "3" services: prometheus: restart: always image: prom/prometheus:v2.28.0 container_name: prometheus volumes: - ./:/etc/prometheus/ command: - "--config.file=/etc/prometheus/prometheus-conf.yaml" - "--storage.tsdb.path=/prometheus" - "--web.console.libraries=/etc/prometheus/console_libraries" - "--web.console.templates=/etc/prometheus/consoles" - "--storage.tsdb.retention.time=720h" - "--web.enable-lifecycle" ports: - 9090:9090 grafana: image: grafana/grafana:8.1.2 container_name: grafana restart: always ports: - 3000:3000 depends_on: - prometheus - loki loki: image: grafana/loki expose: - "3100" volumes: - ./loki.yaml:/etc/loki/local-config.yaml - loki_data:/loki command: -config.file=/etc/loki/local-config.yamlpromtail: image: grafana/promtail depends_on: - loki volumes: - /root/traefik/logs:/var/log - ./promtail.yaml:/etc/promtail/config.yml command: -config.file=/etc/promtail/config.yml networks: default: external: name: traefikvolumes: loki_data:

Grafana配置
  • 访问到Grafana后,使用admin:admin登录
配置数据源
  • Prometheus:左侧菜单Configuration → Data Sources,点击/编辑默认Prometheus数据源,配置URL为:http://prometheus:9090,Save & Test
  • Loki:左侧菜单Configuration → Data Sources,点击/编辑默认Loki数据源,配置URL为:http://loki:3100,Save & Test
  • 引入两个数据源后可以在左侧面板的Explore中查看是否能查询到数据,以Loki为例,选择Log browser,选择日志文件,随后点击Show Logs应能看到收集到的日志数据
配置Dashboard
  • 在左侧面板中选择Create->Import可以使用ID引入Grafana Dashboard市场中提供的支持Traefik Metrics的Dashboard
    • 给Dashboard加上星标,就可以在Configuration->Preferences->Home Dashboard中设置为首页Dashboard
  • 在左侧面板中Create->Import可以使用ID:13713引入Traefik Via Loki这个Dashboard,用来展示Traefik的Log
    • 这个Dashboard的使用需要做两个配置:
      • 引入Dashboar后会发现没数据,此时点击右上角的?键,进行Dashboard配置,进入Json Model,将JSON配置文件中的所有{job="/var/log/traefik.log"}替换成{job="app"}(源于上边的promtail的配置),随后Save Changes->Save Dashboard
        Traefik监控系统搭建
        文章图片

      • Dashboard的Request Route部分会报错显示Panel plugin not found: grafana-piechart-panel,此时执行命令docker exec -i grafana sh -c 'grafana-cli plugins install grafana-piechart-panel'在容器内安装该插件,随后重启容器docker restart grafana即可
其他
  • Grafana如果仅供自己使用,不建议将服务暴露在公网,可以参考端口映射将服务映射到本地服务器
参考
  • Traefik 2 监控系统之Grafana Prometheus Promtail Loki完美结合
  • 从ELK/EFK到PLG – 在EKS中实现基于Promtail + Loki + Grafana容器日志解决方案
  • Traefik Logs
  • loki mkdir wal: permission denied
  • 【Traefik监控系统搭建】Are you trying to mount a directory onto a file (or vice-versa)?
    • 补充:一般出现这种错误时,首先看看自己指定的目录是否存在问题,比如名字是否打错了这种...
  • Grafana Plugin Install over Docker
  • Traefik Via Loki
  • 另一种方案的参考:ElasticSearch + FileBeat + Grafana
  • Datasource proxy returning "too many outstanding requests"

    推荐阅读