【HBase|HBase日志中报Slow ReadProcessor read fields】某环境中查看HBase日志中报Slow ReadProcessor read fields,查看相关解释说这个问题主要是由于hdfs引起的,因为hbase作为客户端向hdfs写入数据进行持久化,和hbase本身没有太大关系。至于是因为哪一部分的问题,可以用如下命令对datanode上的日志来分析一下:
egrep -o "Slow.*?(took|cost)" /path/to/current/datanode/log | sort | uniq -c
典型的输出为,
23 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write data to disk cost
30 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write packet to mirror took
42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow flushOrSync took
63 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write data to disk cost
273 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write packet to mirror took
97 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow flushOrSync took
3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow manageWriterOsCache took
3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow PacketResponder send ack to upstream took
11 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write data to disk cost
232 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write packet to mirror took
87 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow flushOrSync took
3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow PacketResponder send ack to upstream took
936 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write data to disk cost
401432 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write packet to mirror took
42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow flushOrSync took
4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow manageWriterOsCache took
8 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow PacketResponder send ack to upstream took
128 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write data to disk cost
46404 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write packet to mirror took
68 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow flushOrSync took
4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow manageWriterOsCache took
9 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow PacketResponder send ack to upstream took
70 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write data to disk cost
143 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write packet to mirror took
28 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow flushOrSync took
12 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow manageWriterOsCache took
4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow PacketResponder send ack to upstream took
92 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write data to disk cost
187 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write packet to mirror took
181 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow flushOrSync took
15 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow manageWriterOsCache took
3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow PacketResponder send ack to upstream took
24 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write data to disk cost
61 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write packet to mirror took
102 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow flushOrSync took
14 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write data to disk cost
42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write packet to mirror took
74 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow flushOrSync took
19 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write data to disk cost
42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write packet to mirror took
65 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow flushOrSync took
2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow PacketResponder send ack to upstream took
10 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write data to disk cost
177 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write packet to mirror took
2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow flushOrSync took
Slow BlockReceiver write data to disk cost : 表明在将块写入OS缓存或磁盘时存在延迟
Slow BlockReceiver write packet to mirror took :表明在网络上写入块时有延迟
Slow manageWriterOsCache took : 表明在将块写入OS缓存或磁盘时存在延迟
Slow flushOrSync took : 表明在将块写入OS缓存或磁盘时存在延迟
一些分析方法 如果单个节点的一个或多个类别的”Slow“消息比其他主机的”Slow“消息数量多出数量级,则需要查看底层硬件是否存在问题。
如果Slow消息数最多的是Slow BlockReceiver write packet tomirror took,请通过以下命令的输出来调查可能的网络问题:
- ifconfig -a(定期检查问题主机上增加的errors和dropped的数量,往往代表的是网卡,网线或者上游的网络有问题)
- netstat -s(与正常节点相比,查找大量重新传输的数据包或其他异常高的指标)。
- netstat -s | grep -i retrans(整个集群执行)。 (在一个或多个节点上查找大于正常的计数)。
- iostat[高iowait百分比,超过15%]
- iostat -x和sar -d(特定分区的高await或%util)
- dmesg (磁盘错误)
使用smartctl对磁盘进行健康检查:停止受影响节点的所有Hadoop进程,然后运行sudo smartctl -H /dev/,检查HDFS使用的每块
推荐阅读
- 大数据|HBase Balancer失败日志报错 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler
- Hadoop|猿创征文|Hadoop大数据技术
- 大数据|大数据lab4/hadoop&HBase
- Hadoop|Hadoop集群搭建
- hadoop|搭建Hadoop完全分布式集群(三台虚拟机)
- Linux|Centos8安装 Hadoop3 详细操作(含图文)
- 大数据|BigData File Viewer工具介绍
- 大数据开发|大数据是什么(0基础大数据怎么进行入门学习?基础知识总纲)
- hadoop|hadoop日常运维问题汇总