- GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源。
1.2 万里安全数据库软件GreatDB分布式部署模式介绍 万里安全数据库软件GreatDB 是一款关系型数据库软件,同时支持集中式和分布式的部署方式,本文涉及的是分布式部署方式。
分布式部署模式采用shared-nothing架构;通过数据冗余与副本管理确保数据库无单点故障;数据sharding与分布式并行计算实现数据库系统高性能;可无限制动态扩展数据节点,满足业务需要。
整体架构如下图所示:
文章图片
2. 环境准备 2.1 Chaos Mesh安装 在安装Chaos Mesh之前请确保已经预先安装了helm,docker,并准备好了一个kubernetes环境。
1)在 Helm 仓库中添加 Chaos Mesh 仓库:
helm repo add chaos-mesh https://charts.chaos-mesh.org
2)查看可以安装的 Chaos Mesh 版本:
helm search repo chaos-mesh
3)创建安装 Chaos Mesh 的命名空间:
kubectl create ns chaos-testing
4)在docker环境下安装Chaos Mesh:
helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-testing
验证安装
执行以下命令查看Chaos Mesh的运行情况:
kubectl get pod -n chaos-testing
下面是预期输出:
NAMEREADYSTATUSRESTARTSAGE
chaos-controller-manager-d7bc9ccb5-dbccq1/1Running026d
chaos-daemon-pzxc71/1Running026d
chaos-dashboard-5887f7559b-kgz461/1Running126d
如果3个pod的状态都是Running,表示 Chaos Mesh 已经成功安装。
2.2 准备测试需要的镜像 2.2.1 准备mysql镜像
一般情况下,mysql使用官方5.7版本的镜像,mysql监控采集器使用的是mysqld-exporter,也可以直接从docker hub下载:
docker pull mysql:5.7
docker pull prom/mysqld-exporter
2.2.2 准备zookeeper镜像
zookeeper使用的是官方3.5.5版本镜像,zookeeper组件涉及的监控有jmx-prometheus-exporter 和zookeeper-exporter,均从docker hub下载:
docker pull zookeeper:3.5.5
docker pull sscaling/jmx-prometheus-exporter
docker pull josdotso/zookeeper-exporter
2.2.3 准备GreatDB镜像
选择一个GreatDB的tar包,将其解压得到一个./greatdb目录,再将greatdb-service-docker.sh文件拷贝到这个解压出来的./greatdb目录里:
cp greatdb-service-docker.sh ./greatdb/
将greatdb Dockerfile放到./greatdb文件夹的同级目录下,然后执行以下命令构建GreatDB镜像:
docker build -t greatdb/greatdb:tag2021 .
2.2.4 准备GreatDB分布式集群部署/清理的镜像
下载集群部署脚本cluster-setup,集群初始化脚本init-zk 以及集群helm charts包(可咨询4.0开发/测试组获取)
将上述材料放在同一目录下,编写如下Dockerfile:
FROM debian:buster-slim as init-zkCOPY ./init-zk /root/init-zk
RUN chmod +x /root/init-zkFROM debian:buster-slim as cluster-setup
\# Set aliyun repo for speed
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.listRUN apt-get -y update && \
apt-get -y install \
curl \
wgetRUN curl -L https://storage.googleapis.com/kubernetes-release/release/v1.20.1/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && \
chmod +x /usr/local/bin/kubectl && \
mkdir /root/.kube && \
wget https://get.helm.sh/helm-v3.5.3-linux-amd64.tar.gz && \
tar -zxvf helm-v3.5.3-linux-amd64.tar.gz && \
mv linux-amd64/helm /usr/local/bin/helmCOPY ./config /root/.kube/
COPY ./helm /helm
COPY ./cluster-setup /
执行以下命令构建所需镜像:
docker build --target init-zk -t greatdb/initzk:latest .docker build --target cluster-setup -t greatdb/cluster-setup:v1 .
2.2.5 准备测试用例的镜像
目前测试支持的用例有:bank,bank2,pbank,tpcc,flashback等,每个用例都是一个可执行文件。
以flashback测例为例构建测试用例的镜像,先将用例下载到本地,在用例的同一目录下编写如下内容的Dockerfile:
FROM debian:buster-slim
COPY ./flashback /
RUN cd / && chmod +x ./flashback
执行以下命令构建测试用例镜像:
docker build -t greatdb/testsuite-flashback:v1 .
2.3 将准备好的镜像上传到私有仓库中 创建私有仓库和上传镜像操作请参考:https://zhuanlan.zhihu.com/p/...
3. Chaos Mesh的使用 3.1 搭建GreatDB分布式集群 在上一章2.2.4 中cluster-setup目录下执行以下命令块去搭建测试集群:
./cluster-setup\
-clustername=c0 \
-namespace=test \
-enable-monitor=true \
-mysql-image=mysql:5.7 \
-mysql-replica=3 \
-mysql-auth=1 \
-mysql-normal=1 \
-mysql-global=1 \
-mysql-partition=1 \
-zookeeper-repository=zookeeper \
-zookeeper-tag=3.5.5 \
-zookeeper-replica=3 \
-greatdb-repository=greatdb/greatdb \
-greatdb-tag=tag202110 \
-greatdb-replica=3 \
-greatdb-serviceHost=172.16.70.249
输出信息:
liuxinle@liuxinle-OptiPlex-5060:~/k8s/cluster-setup$ ./cluster-setup \
> -clustername=c0 \
> -namespace=test \
> -enable-monitor=true \
> -mysql-image=mysql:5.7 \
> -mysql-replica=3 \
> -mysql-auth=1 \
> -mysql-normal=1 \
> -mysql-global=1 \
> -mysql-partition=1 \
> -zookeeper-repository=zookeeper \
> -zookeeper-tag=3.5.5 \
> -zookeeper-replica=3 \
> -greatdb-repository=greatdb/greatdb \
> -greatdb-tag=tag202110 \
> -greatdb-replica=3 \
> -greatdb-serviceHost=172.16.70.249
INFO[2021-10-14T10:41:52+08:00] SetUp the cluster ...NameSpace=test
INFO[2021-10-14T10:41:52+08:00] create namespace ...
INFO[2021-10-14T10:41:57+08:00] copy helm chart templates ...
INFO[2021-10-14T10:41:57+08:00] setup ...Component=MySQL
INFO[2021-10-14T10:41:57+08:00] exec helm install and update greatdb-cfg.yaml ...
INFO[2021-10-14T10:42:00+08:00] waiting mysql pods running ...
INFO[2021-10-14T10:44:27+08:00] setup ...Component=Zookeeper
INFO[2021-10-14T10:44:28+08:00] waiting zookeeper pods running ...
INFO[2021-10-14T10:46:59+08:00] update greatdb-cfg.yaml
INFO[2021-10-14T10:46:59+08:00] setup ...Component=greatdb
INFO[2021-10-14T10:47:00+08:00] waiting greatdb pods running ...
INFO[2021-10-14T10:47:21+08:00] waiting cluster running ...
INFO[2021-10-14T10:47:27+08:00] waiting prometheus server running...
INFO[2021-10-14T10:47:27+08:00] Dump Cluster Info
INFO[2021-10-14T10:47:27+08:00] SetUp success.ClusterName=c0 NameSpace=test
看到c0-zookeeper-initzk-7hbfs的状态是Completed,其他pod的状态为Running,表示集群搭建成功。
3.2 在GreatDB分布式集群中使用Chaos Mesh做混沌测试 Chaos Mesh在kubernetes环境支持注入的故障类型包括:模拟Pod故障、模拟网络故障、模拟压力场景等,这里我们以模拟Pod故障中的pod-kill为例。
将实验配置写入到文件中 pod-kill.yaml,内容示例如下:
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos# 要注入的故障类型
metadata:
name: pod-failure-example
namespace: test# 测试集群pod所在的namespace
spec:
action: pod-kill# 要注入的具体故障类型
mode: all# 指定实验的运行方式,all(表示选出所有符合条件的 Pod)
duration: '30s'# 指定实验的持续时间
selector:
labelSelectors:
"app.kubernetes.io/component": "greatdb"# 指定注入故障目标pod的标签,通过kubectl describe pod c0-greatdb-1 -n test 命令返回结果中Labels后的内容得到
创建故障实验,命令如下:
kubectl create -n test -f pod-kill.yaml
创建完故障实验之后,执行命令 kubectl get pod -n test -o wide 结果如下:
NAMEREADYSTATUSRESTARTSAGEIPNODENOMINATED NODEREADINESS GATES
c0-auth0-mysql-02/2Running014m10.244.87.18liuxinle-optiplex-5060
c0-auth0-mysql-12/2Running014m10.244.87.54liuxinle-optiplex-5060
c0-auth0-mysql-22/2Running013m10.244.87.57liuxinle-optiplex-5060
c0-greatdb-00/2ContainerCreating02sliuxinle-optiplex-5060
c0-greatdb-10/2ContainerCreating02sliuxinle-optiplex-5060
c0-glob0-mysql-02/2Running014m10.244.87.51liuxinle-optiplex-5060
c0-glob0-mysql-12/2Running014m10.244.87.41liuxinle-optiplex-5060
c0-glob0-mysql-22/2Running013m10.244.87.60liuxinle-optiplex-5060
c0-nor0-mysql-02/2Running014m10.244.87.29liuxinle-optiplex-5060
c0-nor0-mysql-12/2Running014m10.244.87.4liuxinle-optiplex-5060
c0-nor0-mysql-22/2Running013m10.244.87.25liuxinle-optiplex-5060
c0-par0-mysql-02/2Running014m10.244.87.55liuxinle-optiplex-5060
c0-par0-mysql-12/2Running014m10.244.87.13liuxinle-optiplex-5060
c0-par0-mysql-22/2Running013m10.244.87.21liuxinle-optiplex-5060
c0-prometheus-server-6697649b76-fkvh92/2Running09m24s10.244.87.37liuxinle-optiplex-5060
c0-zookeeper-01/1Running112m10.244.87.44liuxinle-optiplex-5060
c0-zookeeper-11/1Running011m10.244.87.30liuxinle-optiplex-5060
c0-zookeeper-21/1Running010m10.244.87.49liuxinle-optiplex-5060
c0-zookeeper-initzk-7hbfs0/1Completed012m10.244.87.17liuxinle-optiplex-5060
4. 在argo中编排测试流程 Argo 是一个开源的容器本地工作流引擎,用于在Kubernetes上完成工作,可以将多步骤工作流建模为一系列任务,完成测试流程编排。
我们使用argo定义一个测试任务,基本的测试流程是固定的,如下所示:
文章图片
测试流程的step1是部署测试集群,接着开启两个并行任务,step2跑测试用例,模拟业务场景,step3同时使用Chaos Mesh注入故障,step2的测试用例执行结束之后,step4终止故障注入,最后step5清理集群环境。
4.1 用argo编排一个混沌测试工作流(以flashback测试用例为例) 1)修改 cluster-setup.yaml 中的image信息,改成步骤2.2 准备测试需要的镜像中自己传上去的集群部署/清理镜像名和tag
2)修改 testsuite-flashback.yaml 中的image信息,改成步骤2.2 准备测试需要的镜像中自己传上去的测试用例镜像名和tag
3)将集群部署、测试用例和工具模板的yaml文件全部使用 kubectl apply -n argo -f xxx.yaml 命令创建资源 (这些文件定义了一些argo template,方便用户写workflow时候使用)
kubectl apply -n argo -f cluster-setup.yaml
kubectl apply -n argo -f testsuite-flashback.yaml
kubectl apply -n argo -f tools-template.yaml
4)复制一份workflow模板文件 workflow-template.yaml,将模板文件中注释提示的部分修改为自己的设置即可,然后执行以下命令创建混沌测试工作流:
kubectl apply -n argo -f workflow-template.yaml
以下是一份workflow模板文件:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: chaostest-c0-0-
name: chaostest-c0-0
namespace: argo
spec:
entrypoint: test-entry #测试入口,在这里传入测试参数,填写clustername、namespace、host、greatdb镜像名和tag名等基本信息
serviceAccountName: argo
arguments:
parameters:
- name: clustername
value: c0
- name: namespace
value: test
- name: host
value: 172.16.70.249
- name: port
value: 30901
- name: password
value: Bgview@2020
- name: user
value: root
- name: run-time
value: 10m
- name: greatdb-repository
value: greatdb/greatdb
- name: greatdb-tag
value: tag202110
- name: nemesis
value: kill_mysql_normal_master,kill_mysql_normal_slave,kill_mysql_partition_master,kill_mysql_partition_slave,kill_mysql_auth_master,kill_mysql_auth_slave,kill_mysql_global_master,kill_mysql_global_slave,kill_mysql_master,kill_mysql_slave,net_partition_mysql_normal,net_partition_mysql_partition,net_partition_mysql_auth,net_partition_mysql_global
- name: mysql-partition
value: 1
- name: mysql-global
value: 1
- name: mysql-auth
value: 1
- name: mysql-normal
value: 2
templates:
- name: test-entry
steps:
- - name: setup-greatdb-cluster# step.1 集群部署. 请指定正确的参数,主要是mysql和zookeeper的镜像名、tag名
templateRef:
name: cluster-setup-template
template: cluster-setup
arguments:
parameters:
- name: namespace
value: "{{workflow.parameters.namespace}}"
- name: clustername
value: "{{workflow.parameters.clustername}}"
- name: mysql-image
value: mysql:5.7.34
- name: mysql-replica
value: 3
- name: mysql-auth
value: "{{workflow.parameters.mysql-auth}}"
- name: mysql-normal
value: "{{workflow.parameters.mysql-normal}}"
- name: mysql-partition
value: "{{workflow.parameters.mysql-partition}}"
- name: mysql-global
value: "{{workflow.parameters.mysql-global}}"
- name: enable-monitor
value: false
- name: zookeeper-repository
value: zookeeper
- name: zookeeper-tag
value: 3.5.5
- name: zookeeper-replica
value: 3
- name: greatdb-repository
value: "{{workflow.parameters.greatdb-repository}}"
- name: greatdb-tag
value: "{{workflow.parameters.greatdb-tag}}"
- name: greatdb-replica
value: 3
- name: greatdb-serviceHost
value: "{{workflow.parameters.host}}"
- name: greatdb-servicePort
value: "{{workflow.parameters.port}}"
- - name: run-flashbacktest# step.2 运行测试用例,请替换为你要运行的测试用例template并指定正确的参数,主要是测试使用的表个数和大小
templateRef:
name: flashback-test-template
template: flashback
arguments:
parameters:
- name: user
value: "{{workflow.parameters.user}}"
- name: password
value: "{{workflow.parameters.password}}"
- name: host
value: "{{workflow.parameters.host}}"
- name: port
value: "{{workflow.parameters.port}}"
- name: concurrency
value: 16
- name: size
value: 10000
- name: tables
value: 10
- name: run-time
value: "{{workflow.parameters.run-time}}"
- name: single-statement
value: true
- name: manage-statement
value: true
- name: invoke-chaos-for-flashabck-test# step.3 注入故障,请指定正确的参数,这里run-time和interval分别定义了故障注入的时间和频次,因此省略掉了终止故障注入步骤
templateRef:
name: chaos-rto-template
template: chaos-rto
arguments:
parameters:
- name: user
value: "{{workflow.parameters.user}}"
- name: host
value: "{{workflow.parameters.host}}"
- name: password
value: "{{workflow.parameters.password}}"
- name: port
value: "{{workflow.parameters.port}}"
- name: k8s-config
value: /root/.kube/config
- name: namespace
value: "{{workflow.parameters.namespace}}"
- name: clustername
value: "{{workflow.parameters.clustername}}"
- name: prometheus
value: ''
- name: greatdb-job
value: greatdb-monitor-greatdb
- name: nemesis
value: "{{workflow.parameters.nemesis}}"
- name: nemesis-duration
value: 1m
- name: nemesis-mode
value: default
- name: wait-time
value: 5m
- name: check-time
value: 5m
- name: nemesis-scope
value: 1
- name: nemesis-log
value: true
- name: enable-monitor
value: false
- name: run-time
value: "{{workflow.parameters.run-time}}"
- name: interval
value: 1m
- name: monitor-log
value: false
- name: enable-rto
value: false
- name: rto-qps
value: 0.1
- name: rto-warm
value: 5m
- name: rto-time
value: 1m
- name: log-level
value: debug
- - name: flashbacktest-output# 输出测试用例是否通过的结果
templateRef:
name: tools-template
template: output-result
arguments:
parameters:
- name: info
value: "flashback test pass, with nemesis: {{workflow.parameters.nemesis}}"
- - name: clean-greatdb-cluster# step.4 清理测试集群,这里的参数和step.1的参数一致
templateRef:
name: cluster-setup-template
template: cluster-setup
arguments:
parameters:
- name: namespace
value: "{{workflow.parameters.namespace}}"
- name: clustername
value: "{{workflow.parameters.clustername}}"
- name: mysql-image
value: mysql:5.7
- name: mysql-replica
value: 3
- name: mysql-auth
value: "{{workflow.parameters.mysql-auth}}"
- name: mysql-normal
value: "{{workflow.parameters.mysql-normal}}"
- name: mysql-partition
value: "{{workflow.parameters.mysql-partition}}"
- name: mysql-global
value: "{{workflow.parameters.mysql-global}}"
- name: enable-monitor
value: false
- name: zookeeper-repository
value: zookeeper
- name: zookeeper-tag
value: 3.5.5
- name: zookeeper-replica
value: 3
- name: greatdb-repository
value: "{{workflow.parameters.greatdb-repository}}"
- name: greatdb-tag
value: "{{workflow.parameters.greatdb-tag}}"
- name: greatdb-replica
value: 3
- name: greatdb-serviceHost
value: "{{workflow.parameters.host}}"
- name: greatdb-servicePort
value: "{{workflow.parameters.port}}"
- name: clean
value: true
- - name: echo-result
templateRef:
name: tools-template
template: echo
arguments:
parameters:
- name: info
value: "{{item}}"
withItems:
- "{{steps.flashbacktest-output.outputs.parameters.result}}"
【技术分享 | 在GreatDB分布式部署模式中使用Chaos Mesh做混沌测试】Enjoy GreatSQL :)
文章推荐: GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6...
万答#12,MGR整个集群挂掉后,如何才能自动选主,不用手动干预
https://mp.weixin.qq.com/s/07...
『2021数据技术嘉年华·ON LINE』:《MySQL高可用架构演进及实践》
https://mp.weixin.qq.com/s/u7...
一条sql语句慢在哪之抓包分析
https://mp.weixin.qq.com/s/AY...
万答#15,都有哪些情况可能导致MGR服务无法启动
https://mp.weixin.qq.com/s/in...
技术分享 | 为什么MGR一致性模式不推荐AFTER
https://mp.weixin.qq.com/s/rN...
关于 GreatSQL GreatSQL是由万里数据库维护的MySQL分支,专注于提升MGR可靠性及性能,支持InnoDB并行查询特性,是适用于金融级应用的MySQL分支版本。
Gitee:
https://gitee.com/GreatSQL/Gr...
GitHub:
https://github.com/GreatSQL/G...
Bilibili:
https://space.bilibili.com/13...
微信&QQ群:
可搜索添加GreatSQL社区助手微信好友,发送验证信息“加群”加入GreatSQL/MGR交流微信群
QQ群:533341697
微信小助手:wanlidbc
本文由博客一文多发平台 OpenWrite 发布!
推荐阅读
- 只要9.9元!零基础学习MySQL
- 万答#21,如何查看 MySQL 数据库一段时间内的连接情况
- 技术分享|闪回在MySQL中的实现和改进
- 万答#20,索引下推如何进行数据过滤
- linux 磁盘io利用率高,分析的正确姿势
- 技术分享 | Prometheus+Grafana监控MySQL浅析
- 万答#19,MySQL可以禁用MyISAM引擎吗()
- 技术分享|sysbench 压测工具用法浅析
- MySQL金融应用场景下跨数据中心的MGR架构方案(1)
- MySQL金融应用场景下跨数据中心的MGR架构方案(2)