prometheus-operator使用(二)|prometheus-operator使用(二) -- serviceMonitor监控kube-proxy

上文讲到serviceMonitor是service监控对象的抽象,本文就以kube-proxy为例,分析如何使用serviceMonitor对象监控kube-proxy。
1. kube-proxy的部署形式

# kubectl get all -A|grep proxy kube-systempod/kube-proxy-bn64j1/1Running030m kube-systempod/kube-proxy-jcl541/1Running030m kube-systempod/kube-proxy-n44bh1/1Running030m kube-systemdaemonset.apps/kube-proxy33333kubernetes.io/os=linux217d

【prometheus-operator使用(二)|prometheus-operator使用(二) -- serviceMonitor监控kube-proxy】可以看到,kube-proxy使用daemonset部署,但没有service,部署了3个Pod。
2. 增加kube-proxy的/metrics访问端口 kube-proxy的Pod内含1个container,并且其配置文件中,metrics绑定的ip为127.0.0.1:
# kubectl edit ds kube-proxy -n kube-system ...... spec: containers: - command: - /usr/local/bin/kube-proxy - --config=/var/lib/kube-proxy/config.conf - --hostname-override=$(NODE_NAME) env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 178.104.162.39:443/dev/kubernetes/amd64/kube-proxy:v1.18.0 imagePullPolicy: IfNotPresent name: kube-proxy

查看其配置文件/var/lib/kube-proxy/config.conf:
# kubectl exec -it kube-proxy-4vxsf /bin/sh -n kube-system# cat /var/lib/kube-proxy/config.confbindAddress: 0.0.0.0 ...... healthzBindAddress: 0.0.0.0:10256 ...... metricsBindAddress: 127.0.0.1:10249

可以看到,其绑定的metrics地址:127.0.0.1:10249;
要想外面可以访问/metrics,需要将该端口转发出来,这里使用sidecar:增加1个kube-rbac-proxy container的方式,将proxy container的metrics端口转发出来:
# kubectl edit ds kube-proxy -n kube-system#在containers列表中增加 - args: - --logtostderr - --secure-listen-address=[$(IP)]:10249 - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 - --upstream=http://127.0.0.1:10249/ env: - name: IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: 178.104.162.39:443/dev/kubernetes/amd64/kube-rbac-proxy:v0.4.1 imagePullPolicy: IfNotPresent name: kube-rbac-proxy ports: - containerPort: 10249 hostPort: 10249 name: https protocol: TCP resources: limits: cpu: 20m memory: 40Mi requests: cpu: 10m memory: 20Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File

在daemonset的最后,还指定了serviceAccount:
serviceAccount: kube-proxy serviceAccountName: kube-proxy

daemonset修改完毕后,验证10249端口是否监听:
# netstat -nalp|grep 10249|grep LISTEN tcp00 178.104.163.38:102490.0.0.0:*LISTEN16930/./kube-rbac-p tcp00 127.0.0.1:102490.0.0.0:*LISTEN16735/kube-proxy

3. 创建kube-proxy的service和serviceMonitor kube-proxy没有service,需要在service的基础上,创建serviceMonitor;
kube-proxy-service.yaml定义了name=kube-proxy的service:
  • 筛选Pod: 含label, k8s-app=kube-proxy;
  • 给自己加lable: k8s-app=kube-proxy;(serviceMonitor会用)
# cat kube-proxy-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: kube-proxy app.kubernetes.io/version: v0.18.1 k8s-app: kube-proxy name: kube-proxy namespace: kube-system spec: clusterIP: None ports: - name: https port: 10249 targetPort: https selector: k8s-app: kube-proxy

kube-proxy-serviceMonitor.yaml,它定义了name=kube-proxy的serviceMonitor:
  • 筛选service中label: k8s-app=kube-proxy的service;
# cat kube-proxy-serviceMonitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: kube-proxy namespace: monitoring labels: k8s-app: kube-proxy spec: jobLabel: kube-proxy endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 15s port: https relabelings: - action: replace regex: (.*) replacement: $1 sourceLabels: - __meta_kubernetes_pod_node_name targetLabel: instance scheme: https tlsConfig: insecureSkipVerify: true selector: matchLabels: k8s-app: kube-proxy namespaceSelector: matchNames: - kube-system

serviceMonitor定义完毕,会在prometheus的dashboard看到kube-proxy的target:
prometheus-operator使用(二)|prometheus-operator使用(二) -- serviceMonitor监控kube-proxy
文章图片

同时,在prometheus-server的配置文件中,也对应增加了kube-proxy的服务发现配置:
- job_name: monitoring/kube-proxy/0 honor_labels: false kubernetes_sd_configs: - role: endpoints namespaces: names: - kube-system scrape_interval: 15s scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: keep##筛选label:k8s_app=kube-proxy的service source_labels: - __meta_kubernetes_service_label_k8s_app regex: kube-proxy - action: keep##筛选endpoint_port_name=https的service source_labels: - __meta_kubernetes_endpoint_port_name regex: https - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator: ; regex: Node; (.*) replacement: ${1} target_label: node - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator: ; regex: Pod; (.*) replacement: ${1} target_label: pod .....

上面的配置主要有2个筛选项:
  • 筛选label: k8s-app=kube-proxy的service;
  • 筛选endpoint_port_name=https的service;
这跟kube-proxy的service定义一致。
4. 增加kube-proxy的rbac配置 daemonset中使用的serviceAccount: kube-proxy,需要给该sa增加clusterRole和clusterRoleBinding,否则scrape /metrics时会报401 Unauthorize;
# cat kube-proxy-clusterRole.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-proxy rules: - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create#kubectl apply -f kube-proxy-clusterRole.yaml

# cat kube-proxy-clusterRoleBinding.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-proxy roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-proxy subjects: - kind: ServiceAccount name: kube-proxy namespace: kube-system#kubectlapply-fkube-proxy-clusterRoleBinding.yaml

5. 集群内curl /metrics查看指标
# curl --header "Authorization: Bearer $TOKEN" --insecure https://178.104.163.38:10249/metrics # HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary ......

    推荐阅读