kubeadm|kubeadm HA全记录

  • 参考方案kubeadm-ha
  • 本文cluster-info部分为原创
  • 禁止master上发布应用与参考文不同
  1. 安装前准备
  • CentOS Linux release 7.4.1708 (Core) 8台,其中3台为master1,master2,master3,node 5台为node1~node5。
  • 条件允许的话,准备VIP一个,用户master集群。
    Host IP
    master1 172.25.16.120
    master2 172.25.16.121
    master3 172.25.16.122
    node1 172.25.16.167
    node2 172.25.16.168
    node3 172.25.16.169
    node4 172.25.16.170
    node5 172.25.16.171
    VIP 172.25.16.228
  • 所有机器上安装好docker-ce:17.09.0-ce, kubeadm:1.7.5, kubelet:1.7.5。
    • 注意:docker建议版本是1.12,高于1.12版本,请在docker安装后,在每台机器上输入iptables -P FORWARD ACCEPT
  • 貌似docker版本高于1.12都需要做以下修改,不然检查kubelet状态是出错:、
    $vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf #Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd" Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs" $ systemctl daemon-reload && systemctl restart kubelet

  • master1,master2,master3上安装kubectl:1.7.5。
  • 设置每台机器翻墙,包括yum和docker的翻墙。yum在/etc/yum.conf里配置proxy=http://SERVER:PORT,docker在/usr/lib/systemd/system/docker.service中的[service]下添加
    Environment="NO_PROXY=localhost,127.0.0.0/8,172.0.0.0/24" Environment="HTTP_PROXY=http://SERVER:PORT/" Environment="HTTPS_PROXY=http://SERVER:PORT/"
  1. etcd集群
  • 在master1上,以docker方式启动etcd
#!/bin/bash docker stop etcd && docker rm etcd rm -rf /var/lib/etcd-cluster mkdir -p /var/lib/etcd-clusterdocker run -d \ --restart always \ -v /etc/ssl/certs:/etc/ssl/certs \ -v /var/lib/etcd-cluster:/var/lib/etcd \ -p 4001:4001 \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ gcr.io/google_containers/etcd-amd64:3.0.17 \ etcd --name=etcd0 \ --advertise-client-urls=http://172.25.16.120:2379,http://172.25.16.120:4001 \ --listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \ --initial-advertise-peer-urls=http://172.25.16.120:2380 \ --listen-peer-urls=http://0.0.0.0:2380 \ --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \ --initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd

  • 在master2上,以docker方式启动etcd
#!/bin/bash docker stop etcd && docker rm etcd rm -rf /var/lib/etcd-cluster mkdir -p /var/lib/etcd-clusterdocker run -d \ --restart always \ -v /etc/ssl/certs:/etc/ssl/certs \ -v /var/lib/etcd-cluster:/var/lib/etcd \ -p 4001:4001 \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ gcr.io/google_containers/etcd-amd64:3.0.17 \ etcd --name=etcd1 \ --advertise-client-urls=http://172.25.16.121:2379,http://172.25.16.120:4001 \ --listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \ --initial-advertise-peer-urls=http://172.25.16.121:2380 \ --listen-peer-urls=http://0.0.0.0:2380 \ --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \ --initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd

  • 在master3上,以docker方式启动etcd
#!/bin/bash docker stop etcd && docker rm etcd rm -rf /var/lib/etcd-cluster mkdir -p /var/lib/etcd-clusterdocker run -d \ --restart always \ -v /etc/ssl/certs:/etc/ssl/certs \ -v /var/lib/etcd-cluster:/var/lib/etcd \ -p 4001:4001 \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ gcr.io/google_containers/etcd-amd64:3.0.17 \ etcd --name=etcd2 \ --advertise-client-urls=http://172.25.16.122:2379,http://172.25.16.122:4001 \ --listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001 \ --initial-advertise-peer-urls=http://172.25.16.122:2380 \ --listen-peer-urls=http://0.0.0.0:2380 \ --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://172.25.16.120:2380,etcd1=http://172.25.16.121:2380,etcd2=http://172.25.16.122:2380 \ --initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd

  • 在master1,mater2,master3上检测etcd状态
$ docker exec -ti etcd ash$ etcdctl member list 19dcd68c1a5b8d7d: name=etcd2 peerURLs=http://172.25.16.122:2380 clientURLs=http://172.25.16.122:2379,http://172.25.16.122:4001 isLeader=true 688e88a7e1b4e844: name=etcd0 peerURLs=http://172.25.16.120:2380 clientURLs=http://172.25.16.120:2379,http://172.25.16.120:4001 isLeader=false 692a555d87ac214c: name=etcd1 peerURLs=http://172.25.16.121:2380 clientURLs=http://172.25.16.121:2379,http://172.25.16.121:4001 isLeader=false$ etcdctl cluster-health member 19dcd68c1a5b8d7d is healthy: got healthy result from http://172.25.16.122:2379 member 688e88a7e1b4e844 is healthy: got healthy result from http://172.25.16.120:2379 member 692a555d87ac214c is healthy: got healthy result from http://172.25.16.121:2379 cluster is healthy

  1. 在master1上通过kubeadm安装
  • 配置文件内容 kubeadm-init-v1.7.5.yaml
apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration kubernetesVersion: v1.7.5 networking: podSubnet: 10.244.0.0/16 apiServerCertSANs: - centos-master-1 - centos-master-2 - centos-master-3 - 172.25.16.120 - 172.25.16.121 - 172.25.16.122 - 172.25.16.228 etcd: endpoints: - http://172.25.16.120:2379 - http://172.25.16.121:2379 - http://172.25.16.122:2379

  • 执行kubeadm init --config=kubeadm-init-v1.7.5.yaml
  • 修改 /etc/kubernetes/manifests/kube-apiserver.yaml
    # - --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota - --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds

  • 重启服务systemctl restart docker kubelet
  • 设置kubectl环境变量KUBECONFIG
$ vi ~/.bashrc export KUBECONFIG=/etc/kubernetes/admin.conf$ source ~/.bashrc

  1. 安装flannel组件
  • 建议配置文件从网上取
  • kubectl create -f flannel-rbac.yaml
--- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: flannel rules: - apiGroups: - "" resources: - pods verbs: - get - apiGroups: - "" resources: - nodes verbs: - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system

  • kubectl create -f flannel.yaml
--- apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "type": "flannel", "delegate": { "isDefaultGateway": true } } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } } --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: kube-flannel-ds namespace: kube-system labels: tier: node app: flannel spec: template: metadata: labels: tier: node app: flannel spec: hostNetwork: true nodeSelector: beta.kubernetes.io/arch: amd64 tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule serviceAccountName: flannel containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.8.0-amd64 command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr"] securityContext: privileged: true env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run - name: flannel-cfg mountPath: /etc/kube-flannel/ - name: install-cni image: quay.io/coreos/flannel:v0.8.0-amd64 command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ] volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg

  • 执行kubectl get pods --all-namespaces -o wide,等待所有服务为running即可。
  • 至此,单点kubernetes master配置完毕。
  1. Master HA配置
  • 把master1上的/etc/kubernetes/复制到master2、master3
scp -r /etc/kubernetes/ master2:/etc/ scp -r /etc/kubernetes/ master3:/etc/

  • 在master2、master3上重启kubelet服务,并检查kubelet服务状态为active (running)
    systemctl daemon-reload && systemctl restart kubelet
  • 在master2和master3上配置kubectl的环境变量KUBECONFIG。
  • 在master2、master3检测节点状态,发现节点已经加进来(需要时间下载镜像,等待一会确认状态为Ready)
  1. 修改Master配置
    • 在master2、master3上修改kube-apiserver.yaml的配置,${HOST_IP}改为本机IP
    $ vi /etc/kubernetes/manifests/kube-apiserver.yaml - --advertise-address=${HOST_IP}

  • 在master2和master3上的修改kubelet.conf设置,${HOST_IP}改为本机IP
$ vi /etc/kubernetes/kubelet.confserver: https://${HOST_IP}:6443

  • 在master2和master3上修改admin.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/admin.confserver: https://${HOST_IP}:6443

  • 在master2和master3上修改controller-manager.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/controller-manager.confserver: https://${HOST_IP}:6443

  • 在master2和master3上修改scheduler.conf,${HOST_IP}修改为本机IP地址
$ vi /etc/kubernetes/scheduler.confserver: https://${HOST_IP}:6443

  • 在master1、master2、master3上重启所有服务
$ systemctl daemon-reload && systemctl restart docker kubelet

  1. 在master1,master2,master3上安装keepalived
  • 安装
yum install -y keepalived systemctl enable keepalived && systemctl restart keepalived

  • 在master1、master2、master3上设置apiserver监控脚本,当apiserver检测失败的时候关闭keepalived服务,转移虚拟IP地址
$ vi /etc/keepalived/check_apiserver.sh #!/bin/bash err=0 for k in $( seq 1 10 ) do check_code=$(ps -ef|grep kube-apiserver | wc -l) if [ "$check_code" = "1" ]; then err=$(expr $err + 1) sleep 5 continue else err=0 break fi done if [ "$err" != "0" ]; then echo "systemctl stop keepalived" /usr/bin/systemctl stop keepalived exit 1 else exit 0 fichmod a+x /etc/keepalived/check_apiserver.sh

  • 在k8s-master1、k8s-master2、k8s-master3上查看接口名字
    $ ip a | grep 192.168.60
  • 在master1 * 、master2 * 、master3上设置keepalived,参数说明如下:
  • state ${STAT * E}:为MASTER或者 * BACKUP,只能有一个MASTER
  • interface ${IN* RFACE_NAME}:为本* 需要绑定的接口名字(通过上边的ip a命令查看)
  • mcast_src_ip ${H * OST_IP}:为本机的IP * 地址
  • 【kubeadm|kubeadm HA全记录】priority ${PRIORITY}* :为优先级,例如102、101、10 * 0,优先级越高越容易选择为MASTER,优先级不能一样
    ${VIRTUAL_IP}:为VIP地址,这里设置为172.25.16.228。
$ vi /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { router_id LVS_DEVEL } vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 2 weight -5 fall 3 rise 2 } vrrp_instance VI_1 { state ${STATE} interface ${INTERFACE_NAME} mcast_src_ip ${HOST_IP} virtual_router_id 51 priority ${PRIORITY} advert_int 2 authentication { auth_type PASS auth_pass 4be37dc3b4c90194d1600c483e10ad1d } virtual_ipaddress { ${VIRTUAL_IP} } track_script { chk_apiserver } }

  • 在master1、master2、master3上重启keepalived服务,检测虚拟IP地址是否生效
$ systemctl restart keepalived $ ping 172.25.16.228

  1. kube-proxy配置
  • 在master1上修改configmap/kube-proxy的server指向keepalived的虚拟IP地址
$ kubectl edit -n kube-system configmap/kube-proxy server: https://192.168.60.80:8443

  • 在master1上删除所有kube-proxy的pod,让proxy重建
  • 在master1、master2、master3上重启docker kubelet keepalived服务
systemctl restart docker kubelet keepalived

  • 修改cluster-info中的${HOST_IP}修改为VIP的IP。
kubectl edit configmaps cluster-info -n kube-publicserver: https://${HOST_IP}:6443

  • 至此Master HA完成
  1. 加入node
  • 在master1上查看tokenkubeadm token list
  • 在node1~node5上执行kubeadm join --token ${TOKEN} 172.25.16.228:6443
  • 在maseter1上查看nodekubectl get node,状态为Ready则为ok。
  1. 禁止master2,master3上发布应用
kubectl taint nodes master-2 node-role.kubernetes.io/master=true:NoSchedule kubectl taint nodes master-3 node-role.kubernetes.io/master=true:NoSchedule

    推荐阅读