知识为进步之母,而进步又为富强之源泉。这篇文章主要讲述Calico BGP Full Mesh 跨节点通信相关的知识,希望能为你提供帮助。
简介Full Mesh - 全互联模式,启用了 BGP 之后,Calico 的默认行为是在每个节点彼此对等的情况下创建完整的内部 BGP(iBGP)连接,这使 Calico 可以在任何 L2 网络(无论是公有云还是私有云)上运行,或者说(如果配了 IPIP)可以在任何不禁止 IPIP 流量的网络上作为 overlay 运行。对于 vxlan overlay,Calico 不使用 BGP。
Full-mesh 模式对于 100 个以内的工作节点或更少节点的中小规模部署非常有用,但是在较大的规模上,Full-mesh 模式效率会降低,较大规模情况下,Calico 官方建议使用 Route reflectors。
BGP 是增量更新的方式,不是全量更新。
BGP是应用层协议。
??calico-bgp-full-mesh??
安装部署??BGP Full Mesh?
? 就是在 ??calico-ipip?
? 的基础上修改如下参数:
其实就是关闭 IPIP 的封装,默认会使用 ??BGP Full Mesh?
?
# 将 Always 修改为 Never
- name: CALICO_IPV4POOL_IPIP
value: "Always"
可以通过 calicoctl 查看 ippool
[root@master ~]# calicoctl get ippool -o wide
NAMECIDRNATIPIPMODEVXLANMODEDISABLEDDISABLEBGPEXPORTSELECTOR
default-ipv4-ippool10.244.0.0/16trueNeverNeverfalsefalseall()
查看 calico BGP Full Mesh 状态
部署完以后,我们可以通过 calicoctl
node1 192.168.0.81
node2 192.168.0.82
已经建立 Full mesh,状态为 Established
[root@master ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |PEER TYPE| STATE |SINCE|INFO|
+--------------+-------------------+-------+----------+-------------+
| 192.168.0.81 | node-to-node mesh | up| 13:17:26 | Established |
| 192.168.0.82 | node-to-node mesh | up| 13:17:26 | Established |
+--------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
查看 calico BGP AS-number
??calico-bgp-as-number??
默认情况下,所有 Calico 节点都使用 64512 作为自治域,除非已为节点指定了 per-node AS。
# 查看 AS number 为 64512
[root@master ~]# calicoctl get node -o wide
NAMEASNIPV4IPV6
master.whale.com(64512)192.168.0.80/24
node1.whale.com(64512)192.168.0.81/24
node2.whale.com(64512)192.168.0.82/24
BGP 模式抓包测试pod1 10.244.42.65
node1 192.168.0.81
pod2 10.244.103.65
node2 192.168.0.82
[root@master <
sub>
]# kubectl create deploy cni-test --image=burlyluo/nettoolbox --replicas=2
[root@master <
/sub>
]# kubectl get pod -o wide
NAMEREADYSTATUSRESTARTSAGEIPNODENOMINATED NODEREADINESS GATES
cni-test-777bbd57c8-ggfsp1/1Running026s10.244.42.65node1.whale.com<
none>
<
none>
cni-test-777bbd57c8-gxv5q1/1Running022s10.244.103.65node2.whale.com<
none>
<
none>
[root@master <
sub>
]# kubectl get pod -o wide -A
NAMESPACENAMEREADYSTATUSRESTARTSAGEIPNODENOMINATED NODEREADINESS GATES
kube-systemcalico-kube-controllers-846cc9f754-zqz8h1/1Running059m10.244.152.128master.whale.com<
none>
<
none>
kube-systemcalico-node-5xplj1/1Running059m192.168.0.82node2.whale.com<
none>
<
none>
kube-systemcalico-node-bntq71/1Running059m192.168.0.81node1.whale.com<
none>
<
none>
kube-systemcalico-node-cqc5t1/1Running059m192.168.0.80master.whale.com<
none>
<
none>
[root@master <
/sub>
]# kubectl exec -it cni-test-777bbd57c8-ggfsp -- ping -c 1 10.244.103.65
pod1.cap
通过网卡的对应关系,我们并没有发现 在 node 节点上的诸如 vxlan 和 ipip 的封装设备,说明是通过的 node 的路由表来进行通信的。
[root@master <
sub>
]#kubectl exec -it cni-test-777bbd57c8-ggfsp -- ifconfig eth0
eth0Link encap:EthernetHWaddr 4E:DB:29:E6:AC:A8
inet addr:10.244.42.65Bcast:0.0.0.0Mask:255.255.255.255
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:11 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:810 (810.0 B)TX bytes:364 (364.0 B)
[root@master <
/sub>
]# kubectl exec -it cni-test-777bbd57c8-ggfsp -- ethtool -S eth0
NIC statistics:
peer_ifindex: 7
[root@node1 ~]# ip link show
1: lo: <
LOOPBACK,UP,LOWER_UP>
mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <
BROADCAST,MULTICAST,UP,LOWER_UP>
mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:d8:6c:fb brd ff:ff:ff:ff:ff:ff
3: docker0: <
NO-CARRIER,BROADCAST,MULTICAST,UP>
mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:ef:24:ce:c2 brd ff:ff:ff:ff:ff:ff
7: cali2009c1121bd@if3: <
BROADCAST,MULTICAST,UP,LOWER_UP>
mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
pod1-node.cap
[root@master <
sub>
]# kubectl -n kube-system exec -it calico-node-bntq7 -- bash
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init)
[root@node1 /]# birdcl
BIRD v0.3.3+birdv1.6.8 ready.
bird>
show route
0.0.0.0/0via 192.168.0.1 on ens33 [kernel1 13:17:25] * (10)
10.244.152.128/26via 192.168.0.80 on ens33 [Mesh_192_168_0_80 13:17:26] * (100/0) [i]
192.168.0.0/24dev ens33 [direct1 13:17:24] * (240)
10.244.103.64/26via 192.168.0.82 on ens33 [Mesh_192_168_0_82 13:34:33] * (100/0) [i]
172.17.0.0/16dev docker0 [direct1 13:17:24] * (240)
10.244.42.64/26blackhole [static1 13:34:33] * (200)
10.244.42.65/32dev cali2009c1121bd [kernel1 13:42:00] * (10)
bird>
show route for 10.244.103.64/26 all
10.244.103.64/26via 192.168.0.82 on ens33 [Mesh_192_168_0_82 13:34:33] * (100/0) [i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 192.168.0.82
BGP.local_pref: 100
[root@node1 /]# cat /etc/calico/confd/config/bird.cfg
function apply_communities ()
# Generated by confd
include "bird_aggr.cfg";
include "bird_ipam.cfg";
router id 192.168.0.81;
# Configure synchronization between routing tables and kernel.
protocol kernel
learn;
# Learn all alien routes from the kernel
persist;
# Dont remove routes on bird shutdown
scan time 2;
# Scan kernel routing table every 2 seconds
import all;
export filter calico_kernel_programming;
# Default is export none
graceful restart;
# Turn on graceful restart to reduce potential flaps in
# routes when reloading BIRD configuration.With a full
# automatic mesh, there is no way to prevent BGP from
# flapping since multiple nodes update their BGP
# configuration at the same time, GR is not guaranteed to
# work correctly in this scenario.
merge paths on;
# Allow export multipath routes (ECMP)
# Watch interface up/down events.
protocol device
debugstates ;
scan time 2;
# Scan interfaces every 2 seconds
protocol direct
debugstates ;
interface -"cali*", -"kube-ipvs*", "*";
# Exclude cali* and kube-ipvs* but
# include everything else.In
# IPVS-mode, kube-proxy creates a
# kube-ipvs0 interface. We exclude
# kube-ipvs0 because this interface
# gets an address for every in use
# cluster IP. We use static routes
# for when we legitimately want to
# export cluster IPs.
# Template for all BGP clients
template bgp bgp_template
debugstates ;
description "Connection to BGP peer";
local as 64512;
multihop;
gateway recursive;
# This should be the default, but just in case.
import all;
# Import all routes, since we dont know what the upstream
# topology is and therefore have to trust the ToR/RR.
export filter calico_export_to_bgp_peers;
# Only want to export routes for workloads.
add paths on;
graceful restart;
# See comment in kernel section about graceful restart.
connect delay time 2;
connect retry time 5;
error wait time 5,30;
# ------------- Node-to-node mesh -------------
# For peer /host/master.whale.com/ip_addr_v4
protocol bgp Mesh_192_168_0_80 from bgp_template
neighbor 192.168.0.80 as 64512;
source address 192.168.0.81;
# The local address we use for the TCP connection
# For peer /host/node1.whale.com/ip_addr_v4
# Skipping ourselves (192.168.0.81)
# For peer /host/node2.whale.com/ip_addr_v4
protocol bgp Mesh_192_168_0_82 from bgp_template
neighbor 192.168.0.82 as 64512;
source address 192.168.0.81;
# The local address we use for the TCP connection
passive on;
# Mesh is unidirectional, peer will connect to us.
# ------------- Global peers -------------
# No global peers configured.
# ------------- Node-specific peers -------------
# No node-specific peers configured.
[root@node1 <
/sub>
]# route -n
Kernel IP routing table
DestinationGatewayGenmaskFlags Metric RefUse Iface
0.0.0.0192.168.0.10.0.0.0UG10000 ens33
10.244.42.640.0.0.0255.255.255.192 U000 *# 黑洞路由
10.244.42.650.0.0.0255.255.255.255 UH000 cali2009c1121bd
10.244.103.64192.168.0.82255.255.255.192 UG000 ens33 # 目的地址路由
10.244.152.128192.168.0.80255.255.255.192 UG000 ens33
172.17.0.00.0.0.0255.255.0.0U000 docker0
192.168.0.00.0.0.0255.255.255.0U10000 ens33
查看 pod1 对应node 节点的 calico-node 的配置
查看 IBGP 路由配置
pod2.cap
[root@master <
sub>
]# kubectl exec -it cni-test-777bbd57c8-gxv5q -- ethtool -S eth0
NIC statistics:
peer_ifindex: 7
[root@node2 <
/sub>
]# ip link show
1: lo: <
LOOPBACK,UP,LOWER_UP>
mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <
BROADCAST,MULTICAST,UP,LOWER_UP>
mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:9f:1b:88 brd ff:ff:ff:ff:ff:ff
3: docker0: <
NO-CARRIER,BROADCAST,MULTICAST,UP>
mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:ff:bb:4a:42 brd ff:ff:ff:ff:ff:ff
7: cali9a5d1678aea@if3: <
BROADCAST,MULTICAST,UP,LOWER_UP>
mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
pod2-node.cap
node2 节点的配置同 node1 节点的配置,所以node2 就参考 node1 即可
[root@node2 ~]# route -n
Kernel IP routing table
DestinationGatewayGenmaskFlags Metric RefUse Iface
0.0.0.0192.168.0.10.0.0.0UG10000 ens33
10.244.42.64192.168.0.81255.255.255.192 UG000 ens33
10.244.103.640.0.0.0255.255.255.192 U000 *
10.244.103.650.0.0.0255.255.255.255 UH000 cali9a5d1678aea
10.244.152.128192.168.0.80255.255.255.192 UG000 ens33
172.17.0.00.0.0.0255.255.0.0U000 docker0
192.168.0.00.0.0.0255.255.255.0U10000 ens33
【Calico BGP Full Mesh 跨节点通信】
由于抓包时间过短,只抓到一个 node2 ->
master 的 BGP 的 KEEPALIVER 报文,不过这也说明了,类似于在实际环境中一样的 BGP 模式
推荐阅读
- centos python3 安装psycopg2 模块
- 系统性能分析
- 简单实现程序DLL劫持
- css(box-shadow层级问题-相邻元素背景遮盖了阴影)
- CentOS 7 命令行下安装VirtualBox
- 优维低代码(编排详解菜单配置)
- 多线程
- 低代码如何实现链接一切应用,实现办公自动化
- OpenHarmony之eTS DataAbility 的使用及数据管理