部署prometheus-operator

主要参考KubeSpray项目对prometheus-operator的部署流程,尝试手工部署prometheus-operator。
kubeproary部署prometheus-opeartor的流程 部署流程:

  • 部署promethues-operator deploy;
  • 部署prometheus的其它组件, 如node-exporter、kube-state-metrics;
# cat tasks/prometheus.yml --- - name: Kubernetes Apps | Make sure {{ prometheus_config_dir }} exists file: path: "{{ prometheus_config_dir }}" state: directory- name: Kubernetes Apps | Render templates for Prometheus-operator-deployment template: src: "{{ item}}.yaml.j2" dest: "{{ prometheus_config_dir }}/{{ item }}.yaml" with_items: - prometheus-operator-deployment- name: copy prometheus operators to {{ kube_config_dir }} copy: src: "{{ item }}.yaml" dest: "{{ prometheus_config_dir }}/{{ item }}.yaml" with_items: - 0namespace-namespace - prometheus-operator-0alertmanagerCustomResourceDefinition - prometheus-operator-0podmonitorCustomResourceDefinition - prometheus-operator-0prometheusCustomResourceDefinition - prometheus-operator-0prometheusruleCustomResourceDefinition - prometheus-operator-0servicemonitorCustomResourceDefinition - prometheus-operator-0thanosrulerCustomResourceDefinition - prometheus-operator-clusterRoleBinding - prometheus-operator-clusterRole - prometheus-operator-serviceAccount - prometheus-operator-service - prometheus-rules- name: Kubernetes Apps | apply prometheus-operator kube: kubectl: "{{ bin_dir }}/kubectl" filename: "{{ prometheus_config_dir }}/{{ item }}.yaml" state: "latest" register: result until: result is succeeded retries: 10 delay: 6 with_items: "{{ prometheus_operators }}"- name: Kubernetes Apps | Render templates for Prometheus template: src: "{{ item}}.yaml.j2" dest: "{{ prometheus_config_dir }}/{{ item }}.yaml" register: prometheus_reg with_items: - alertmanager-alertmanager - alertmanager-secret - alertmanager-serviceAccount - alertmanager-serviceMonitor - alertmanager-service - kube-state-metrics-clusterRoleBinding - kube-state-metrics-clusterRole - kube-state-metrics-deployment - kube-state-metrics-serviceAccount - kube-state-metrics-serviceMonitor - kube-state-metrics-service - node-exporter-clusterRoleBinding - node-exporter-clusterRole - node-exporter-daemonset - node-exporter-serviceAccount - node-exporter-serviceMonitor - node-exporter-service - prometheus-adapter-apiService - prometheus-adapter-clusterRoleAggregatedMetricsReader - prometheus-adapter-clusterRoleBindingDelegator - prometheus-adapter-clusterRoleBinding - prometheus-adapter-clusterRoleServerResources - prometheus-adapter-clusterRole - prometheus-adapter-configMap - prometheus-adapter-deployment - prometheus-adapter-roleBindingAuthReader - prometheus-adapter-serviceAccount - prometheus-adapter-serviceMonitor - prometheus-adapter-service - prometheus-clusterRoleBinding - prometheus-clusterRole - prometheus-kubeControllerManagerPrometheusDiscoveryService - prometheus-kubeSchedulerPrometheusDiscoveryService - prometheus-operator-serviceMonitor - prometheus-prometheus - prometheus-roleBindingConfig - prometheus-roleBindingSpecificNamespaces - prometheus-roleConfig - prometheus-roleSpecificNamespaces - prometheus-serviceAccount - prometheus-serviceMonitorApiserver - prometheus-serviceMonitorCoreDNS - prometheus-serviceMonitorKubeControllerManager - prometheus-serviceMonitorKubelet - prometheus-serviceMonitorKubeScheduler - prometheus-serviceMonitor - prometheus-service- name: Kubernetes Apps | Add policies, roles, bindings for Prometheus kube: kubectl: "{{ bin_dir }}/kubectl" filename: "{{ prometheus_config_dir }}/{{ item.item }}.yaml" state: "latest" register: result until: result is succeeded retries: 10 delay: 6 with_items: "{{ prometheus_reg.results }}"

手工部署prometheus-operator
  1. 提前给master-node打tag
因为prometheus选择部署在master节点上
kubectl label nodes k8s-master node-role.kubernetes.io/master=

  1. 部署prometheus-operator deploy
kubectl create -f . //文件列表 [root@k8s-master prometheus]# tree ./operator/ ./operator/ ├── 0namespace-namespace.yaml ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml ├── prometheus-operator-clusterRoleBinding.yaml ├── prometheus-operator-clusterRole.yaml ├── prometheus-operator-deployment.yaml ├── prometheus-operator-serviceAccount.yaml ├── prometheus-operator-service.yaml └── prometheus-rules.yaml0 directories, 13 files

  1. 部署prometheus其它组件
kubectl create -f . //文件列表 [root@k8s-master prometheus]# tree ./prometheus/ ./prometheus/ ├── alertmanager-alertmanager.yaml ├── alertmanager-secret.yaml ├── alertmanager-serviceAccount.yaml ├── alertmanager-serviceMonitor.yaml ├── alertmanager-service.yaml ├── kube-state-metrics-clusterRoleBinding.yaml ├── kube-state-metrics-clusterRole.yaml ├── kube-state-metrics-deployment.yaml ├── kube-state-metrics-serviceAccount.yaml ├── kube-state-metrics-serviceMonitor.yaml ├── kube-state-metrics-service.yaml ├── node-exporter-clusterRoleBinding.yaml ├── node-exporter-clusterRole.yaml ├── node-exporter-daemonset.yaml ├── node-exporter-serviceAccount.yaml ├── node-exporter-serviceMonitor.yaml ├── node-exporter-service.yaml ├── prometheus-adapter-apiService.yaml ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml ├── prometheus-adapter-clusterRoleBindingDelegator.yaml ├── prometheus-adapter-clusterRoleBinding.yaml ├── prometheus-adapter-clusterRoleServerResources.yaml ├── prometheus-adapter-clusterRole.yaml ├── prometheus-adapter-configMap.yaml ├── prometheus-adapter-deployment.yaml ├── prometheus-adapter-roleBindingAuthReader.yaml ├── prometheus-adapter-serviceAccount.yaml ├── prometheus-adapter-serviceMonitor.yaml ├── prometheus-adapter-service.yaml ├── prometheus-clusterRoleBinding.yaml ├── prometheus-clusterRole.yaml ├── prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml ├── prometheus-kubeSchedulerPrometheusDiscoveryService.yaml ├── prometheus-operator-serviceMonitor.yaml ├── prometheus-prometheus.yaml ├── prometheus-roleBindingConfig.yaml ├── prometheus-roleBindingSpecificNamespaces.yaml ├── prometheus-roleConfig.yaml ├── prometheus-roleSpecificNamespaces.yaml ├── prometheus-serviceAccount.yaml ├── prometheus-serviceMonitorApiserver.yaml ├── prometheus-serviceMonitorCoreDNS.yaml ├── prometheus-serviceMonitorKubeControllerManager.yaml ├── prometheus-serviceMonitorKubelet.yaml ├── prometheus-serviceMonitorKubeScheduler.yaml ├── prometheus-serviceMonitor.yaml └── prometheus-service.yaml 0 directories, 47 files

  1. 问题:alertmanager集群连接失败
上述命令执行完毕后,alertmanager集群启动失败,报错找不到其它节点:
alertmanager-main-0.alertmanager-operated:9094 alertmanager-main-1.alertmanager-operated:9094 alertmanager-main-2.alertmanager-operated:9094

启动busygox,用nslookup解析一下域名:
kubectl run -i --tty --image busybox:1.28.3 dns-test --restart=Never --rm /bin/sh # nslookup alertmanager-main-1.alertmanager-operated.monitoring ## 解析失败报错

域名解析失败,kubernetes中coredns负责域名解析,kube-proxy负责endpoint的维护;coredns的日志未发现问题,查看kube-proxy的log:
# kubectl logs kube-proxy-krzkc -n kube-system ## 这里有很多错误 Failed to list IPVS destinations, error: parseIP Error ip [...] Failed to list IPVS destinations, error: parseIP Error ip [...] Failed to list IPVS destinations, error: parseIP Error ip [...]

  1. 解决:alertmanager集群,kube-proxy版本降级
  • 升级centos至8.2;
  • 降低kube-proxy;
    这里选择将kube-proxy降级:
# kubectl edit ds kube-proxy -n kube-system ## 修改其镜像 ## 由1.18.0修改为1.17.6image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.6 imagePullPolicy: IfNotPresent name: kube-proxy

【部署prometheus-operator】参考:https://blog.csdn.net/cw03192...

    推荐阅读