node_exporter监控宿主机磁盘的源码剖析及问题定位
node_exporter以Pod形式部署,它监控宿主机的CPU、Mem、Disk等监控指标。
Pod隔离的运行环境,会对宿主机的监控造成干扰,故尽量与宿主机share namespace,通常配置
hostNetwork: true
hostPID: true
这里重点关注监控宿主机Disk分区使用率的过程。
node_exporter运行的用户 Dockerfile中,以USER指定运行用户,若未指定,则为root;
可以看出,node_exporter默认的用户为: nobody,其用户Id=65534
......
COPY ./node_exporter /bin/node_exporterEXPOSE9100
USERnobody
ENTRYPOINT[ "/bin/node_exporter" ]
node_exporter的daemonset.yaml中,配置的securityContext为:
......
hostNetwork: true
hostPID: true
securityContext:
runAsNonRoot: true
runAsUser: 65534
......
这里runAsNonRoot的配置:
- 若runAsNonRoot未配置,则使用镜像内的默认用户;
- 若配置了runAsNonRoot,则使用指定的用户执行容器进程;
# kubectl explain daemonset.spec.template.spec.securityContext.runAsNonRoot
KIND:DaemonSet
VERSION:apps/v1FIELD:runAsNonRoot DESCRIPTION:
Indicates that the container must run as a non-root user. If true, the
Kubelet will validate the image at runtime to ensure that it does not run
as UID 0 (root) and fail to start the container if it does. If unset or
false, no such validation will be performed. May also be set in
SecurityContext. If set in both SecurityContext and PodSecurityContext, the
value specified in SecurityContext takes precedence.
可以看出,node_exporter指定以非root用户(nobody)执行node_exporter。
node_exporter监控宿主机磁盘分区的原理 1) 挂载宿主机的/proc目录
将宿主机的/proc目录,挂载到容器内的/host/root/proc;
将宿主机的/目录,挂载到容器内的/host/root;
spec:
template:
spec:
containers:
- name: node-exporter
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/root
mountPropagation: HostToContainer
name: root
volumes:
- hostPath:
path: /proc
name: proc
- hostPath:
path: /
name: root
2) node_exporter读取磁盘分区
读取容器内的/host/proc/1/mounts文件,实际上读1号进程(也是宿主机进程)的mounts信息:
// node_exporter/collector/filesystem_linux.go
func mountPointDetails() ([]filesystemLabels, error) {
file, err := os.Open(procFilePath("1/mounts"))
if os.IsNotExist(err) {
// Fallback to `/proc/mounts` if `/proc/1/mounts` is missing due hidepid.
log.Debugf("Got %q reading root mounts, falling back to system mounts", err)
file, err = os.Open(procFilePath("mounts"))
}
if err != nil {
return nil, err
}
defer file.Close()return parseFilesystemLabels(file)
}
文件内容:
cat /host/proc/1/mounts
/dev/sda1 /root/workspace xfs rw,relatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p2 /boot ext3 rw,relatime 0 0
/dev/nvme0n1p1 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=936,iocharset=cp936,shortname=winnt,errors=remount-ro 0 0
/dev/nvme0n1p3 / xfs rw,relatime,attr2,inode64,noquota 0 0
解析文件内容:
func parseFilesystemLabels(r io.Reader) ([]filesystemLabels, error) {
var filesystems []filesystemLabels
scanner := bufio.NewScanner(r)
for scanner.Scan() {
parts := strings.Fields(scanner.Text())
if len(parts) < 4 {
return nil, fmt.Errorf("malformed mount point information: %q", scanner.Text())
}
// Ensure we handle the translation of \040 and \011
// as per fstab(5).
parts[1] = strings.Replace(parts[1], "\\040", " ", -1)
parts[1] = strings.Replace(parts[1], "\\011", "\t", -1)
filesystems = append(filesystems, filesystemLabels{
device:parts[0],
mountPoint: parts[1],
fsType:parts[2],
options:parts[3],
})
}
return filesystems, scanner.Err()
}
3) 查询分区大小及使用情况
- 首先,读取mount分区情况;
- 然后,对每个mount点,执行系统命令stat,查询其大小和使用情况;
- 若stat命令执行失败,则记录该分区的deviceError=1;
// node_exporter/collector/filesystem_linux.go
// GetStats returns filesystem stats.
func (c *filesystemCollector) GetStats() ([]filesystemStats, error) {
mps, err := mountPointDetails()
if err != nil {
return nil, err
}
stats := []filesystemStats{}
for _, labels := range mps {
......// The success channel is used do tell the "watcher" that the stat
// finished successfully. The channel is closed on success.
success := make(chan struct{})
go stuckMountWatcher(labels.mountPoint, success)// 对mountPoint执行stat命令,将执行结果存入buf
buf := new(syscall.Statfs_t)
err = syscall.Statfs(rootfsFilePath(labels.mountPoint), buf)close(success)if err != nil {
stats = append(stats, filesystemStats{
labels:labels,
deviceError: 1,
})
log.Debugf("Error on statfs() system call for %q: %s", rootfsFilePath(labels.mountPoint), err)
continue
}var ro float64
for _, option := range strings.Split(labels.options, ",") {
if option == "ro" {
ro = 1
break
}
}
stats = append(stats, filesystemStats{
labels:labels,
size:float64(buf.Blocks) * float64(buf.Bsize),
free:float64(buf.Bfree) * float64(buf.Bsize),
avail:float64(buf.Bavail) * float64(buf.Bsize),
files:float64(buf.Files),
filesFree: float64(buf.Ffree),
ro:ro,
})
}
return stats, nil
}
问题:node_exporter监控宿主机分区:/root/workspace 宿主机上/dev/sda1挂载分区/root/workspace,通过node_filesystem_size_byte查询不到其分区大小;
但是通过node_filesystem_device_error,查询到其信息:
node_filesystem_device_error{device="/dev/sda1",endpoint="https",fstype="xfs",instance="master1",job="node-exporter",mountpoint="/root/workspace",namespace="monitoring",pod="node-exporter-69hpl",service="node-exporter"}1
通过上面的代码可以看出,应该是读取到了分区,但是执行stat命令的时候失败;
到容器内看一下:
- 宿主机的/,挂载到容器的/host/root;
- 故宿主机的/root/workspace,应该挂载到/host/root/root/workspace
/host/root $ ls root
ls: can't open 'root': Permission denied
原因是没有读取宿主机root目录的权限。
解决:node_exporter监控宿主机分区:/root/workspace 修改node-exporter-daemonset.yaml,让pod以root用户执行容器:
......
securityContext:
runAsUser: 0
......
可以解决该问题,不过以root用户运行node_exporter,可能会有安全隐患。
值得注意的是,若宿主机的分区未挂载在/root/...目录下,那么不需要node_exporter以root运行,也就是不需要修改上面的配置,因为它可以stat命令读取到其大小及使用信息。
通常情况下,我们一般不会将分区挂载到/root/...目录下,所以这个问题一般也不会遇到。
参考: 【node_exporter监控宿主机磁盘的源码剖析及问题定位】1.https://www.cnblogs.com/YaoDD...
2.https://kubernetes.io/docs/co...
推荐阅读
- Node.js中readline模块实现终端输入
- 监控nginx
- sentry搭建错误监控系统(二)
- 网络|一文彻底搞懂前端监控
- Promise|Promise 异步控制流
- Nginx进阶(1)——nginx本地代理配置访问node服务
- 脚手架开发
- 带你了解NodeJS事件循环
- linux|linux|常用的系统监控命令
- Linux监控工具(atop安装使用)