Prometheus.yml 配置文件解析

不操千曲而后晓声,观千剑而后识器。这篇文章主要讲述Prometheus.yml 配置文件解析相关的知识,希望能为你提供帮助。
?配置文件指标说明global:  全局配置(如果有内部单独设定,会覆盖这个参数)
alerting: 告警插件定义。这里会设定alertmanager这个报警插件。
rule_files: 告警规则。 按照设定参数进行扫描加载,用于自定义报警规则,其报警媒介和route路由由alertmanager插件实现。
scrape_configs:采集配置。配置数据源,包含分组job_name以及具体target。又分为静态配置和服务发现
?
?原始配置文件内容:

  1. # my global config
  2. global:
  3. scrape_interval:15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  4. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  5. # scrape_timeout is set to the global default (10s).
  6.  
  7. # Alertmanager configuration
  8. alerting:
  9. alertmanagers:
  10. - static_configs:
  11. - targets:
  12. # - alertmanager:9093
  13.  
  14. # Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
  15. rule_files:
  16. # - "first_rules.yml"
  17. # - "second_rules.yml"
  18.  
  19. # A scrape configuration containing exactly one endpoint to scrape:
  20. # Here it\'s Prometheus itself.
  21. scrape_configs:
  22. # The job name is added as a label `job=` to any timeseries scraped from this config.
  23. - job_name: \'prometheus\'
  24.  
  25. # metrics_path defaults to \'/metrics\'
  26. # scheme defaults to \'http\'.
  27.  
  28. static_configs:
  29. - targets: [\'localhost:9090\']
  30.  
?
1.?global指标说明:
?
?# my global config
global: 
scrape_interval: 15s # 默认15s 全局每次数据收集的间隔
evaluation_interval: 15s # 规则扫描时间间隔是15秒,默认不填写是 1分钟
scrape_timeout: 5s#超时时间
external_labels: # 用于外部系统标签的,不是用于metrics(度量)数据?
?
2.?alerting说明
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
这里定义和prometheus集成的alertmanager插件,用于监控报警。后续会单独进行alertmanger插件的配置、配置说明、报警媒介以及route路由规则记录。



3.rule_files说明这个主要是用来设置告警规则,基于设定什么指标进行报警(类似触发器trigger)。这里设定好规则以后,prometheus会根据全局global设定的evaluation_interval参数进行扫描加载,规则改动后会自动加载。其报警媒介和route路由由alertmanager插件实现。
# Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
?4.??scrape_configs 默认规则:scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: \'prometheus\'
 
【Prometheus.yml 配置文件解析】     # metrics_path defaults to \'/metrics\'
    # scheme defaults to \'http\'.
 
    static_configs:
    - targets: [\'localhost:9090\']
 
支持的配置:
job_name: 任务目标名,可以理解成分组,每个分组包含具体的target组员。
scrape_interval: 5s #这里如果单独设定的话,会覆盖global设定的参数,拉取时间间隔为5s
metrics_path # 监控项访问的url路径,https://prometheus.21yunwei.com/metrics【通过前端web做了反向代理到后端】
targets: Endpoint # 监控目标访问地址
说明:上述为静态规则,没有设置自动发现。这种情况下增加主机需要自行修改规则,通过supervisor reload 对应任务,也是缺点:每次静态规则添加都要重启prometheus服务,不利于运维自动化。


prometheus支持服务发现①文件服务发现基于文件的服务发现方式不需要依赖其他平台与第三方服务,用户只需将 要新的target信息以yaml或json文件格式添加到target文件中 ,prometheus会定期从指定文件中读取target信息并更新
好处:
(1)不需要一个一个的手工去添加到主配置文件,只需要提交到要加载目录里边的json或yaml文件就可以了;
(2)方便维护,且不需要每次都重启prometheus服务端。
案例:?
??scrape_configs:?   # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: \'cn-hz-21yunwei-devops\'
    # metrics_path defaults to \'/metrics\'
    # scheme defaults to \'http\'.
    #静态规则
    static_configs:
    - targets: [\'localhost:9090\']
   
    #通过配置file 获取target,这里以21yunwei项目进行举例
  - job_name: \'cn-hz-21yunwei-other\'
    file_sd_configs:
    - files:
      - file_config/21yunwei/host.json?
?
json文件内容
??[?   {
    "targets": [
      "1.1.1.1:9010"
    ],
    "labels": {
      "group": "21yunwei",
      "app": "web",
      "hostname": "cn-hz-21yunwei-web"
    }
  },
  {
    "targets": [
      "2.2.2.2:9010"
    ],
    "labels": {
      "group": "21yunwei",
      "app": "devops",
      "hostname": "cn-hz-21yunwei-devops"
    }
  }
]
成品图如下

网上找了一个全解prometheus.yml
?
# my global config
global:
scrape_interval:15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration:告警配置,集成alertmanager插件
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093

# Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
rule_files:
- "rule/*.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it\'s Prometheus itself.

scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: \'cn-hz-21yunwei-devops\'
# metrics_path defaults to \'/metrics\'
# scheme defaults to \'http\'.
static_configs:
- targets: [\'cn-hz-21yunwei-devops:9100\']

#通过配置file 获取target,记录21yunwei的 web
- job_name: \'cn-hz-21yunwei-other\'
file_sd_configs:
- files:
- file_config/21yunwei/host.json

#判断告警搜 probe_success

## tcp端口检测
- job_name: "tcp_port_check"
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [tcp_connect]

file_sd_configs:
- files:
- check/port/*_port.json

relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: ******:9115

## 判断状态码搜 probe_http_status_code

## 接口检测
- job_name: \'http_url_check\'
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [http_2xx]# Look for a HTTP 200 response.

file_sd_configs:
- files:设置状态
- check/url/*_url.json

relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: *******:9115

### ICMP检测

- job_name: \'icmp_check\'
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [icmp]

file_sd_configs:
- files:
- check/icmp/*_icmp.json

relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: ******:9115

很简单但是实用:
(1)配置global参数(采集周期以及规则扫描周期);
(2)集成alertmanager插件,用于后续报警操作;
(3)设定报警rule 加载目录;
(4)设定采集对象。这里既有静态设置也有设置服务发现。(服务发现用于后续target更改只需要进行规则修改即可,不需要进行prometheus守护进程重启)
(5)设定功能检测。 这里定义了icmp、tcp_port、url三种check,分别通过调用blackbox_exporter来实现。 很简单但是实用:

(1)配置global参数(采集周期以及规则扫描周期);
(2)集成alertmanager插件,用于后续报警操作;
(3)设定报警rule 加载目录;
(4)设定采集对象。这里既有静态设置也有设置服务发现。(服务发现用于后续target更改只需要进行规则修改即可,不需要进行prometheus守护进程重启)
(5)设定功能检测。 这里定义了icmp、tcp_port、url三种check,分别通过调用blackbox_exporter来实现。




    推荐阅读