HBase|HBase日志中报Slow ReadProcessor read fields

【HBase|HBase日志中报Slow ReadProcessor read fields】某环境中查看HBase日志中报Slow ReadProcessor read fields,查看相关解释说这个问题主要是由于hdfs引起的,因为hbase作为客户端向hdfs写入数据进行持久化,和hbase本身没有太大关系。至于是因为哪一部分的问题,可以用如下命令对datanode上的日志来分析一下:

egrep -o "Slow.*?(took|cost)" /path/to/current/datanode/log | sort | uniq -c

典型的输出为,
23 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write data to disk cost 30 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow BlockReceiver write packet to mirror took 42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.10:Slow flushOrSync took 63 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write data to disk cost 273 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow BlockReceiver write packet to mirror took 97 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow flushOrSync took 3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow manageWriterOsCache took 3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.1:Slow PacketResponder send ack to upstream took 11 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write data to disk cost 232 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow BlockReceiver write packet to mirror took 87 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow flushOrSync took 3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.2:Slow PacketResponder send ack to upstream took 936 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write data to disk cost 401432 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow BlockReceiver write packet to mirror took 42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow flushOrSync took 4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow manageWriterOsCache took 8 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.3:Slow PacketResponder send ack to upstream took 128 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write data to disk cost 46404 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow BlockReceiver write packet to mirror took 68 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow flushOrSync took 4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow manageWriterOsCache took 9 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.4:Slow PacketResponder send ack to upstream took 70 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write data to disk cost 143 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow BlockReceiver write packet to mirror took 28 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow flushOrSync took 12 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow manageWriterOsCache took 4 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.5:Slow PacketResponder send ack to upstream took 92 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write data to disk cost 187 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow BlockReceiver write packet to mirror took 181 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow flushOrSync took 15 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow manageWriterOsCache took 3 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.6:Slow PacketResponder send ack to upstream took 24 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write data to disk cost 61 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow BlockReceiver write packet to mirror took 102 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.7:Slow flushOrSync took 14 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write data to disk cost 42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow BlockReceiver write packet to mirror took 74 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.8:Slow flushOrSync took 19 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write data to disk cost 42 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow BlockReceiver write packet to mirror took 65 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow flushOrSync took 2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out.9:Slow PacketResponder send ack to upstream took 10 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write data to disk cost 177 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow BlockReceiver write packet to mirror took 2 hadoop-cmf-hdfs-DATANODE-phjrdnode02.esgyn.cn.log.out:Slow flushOrSync took

Slow BlockReceiver write data to disk cost : 表明在将块写入OS缓存或磁盘时存在延迟
Slow BlockReceiver write packet to mirror took :表明在网络上写入块时有延迟
Slow manageWriterOsCache took : 表明在将块写入OS缓存或磁盘时存在延迟
Slow flushOrSync took : 表明在将块写入OS缓存或磁盘时存在延迟
一些分析方法 如果单个节点的一个或多个类别的”Slow“消息比其他主机的”Slow“消息数量多出数量级,则需要查看底层硬件是否存在问题。
如果Slow消息数最多的是Slow BlockReceiver write packet tomirror took,请通过以下命令的输出来调查可能的网络问题:
  1. ifconfig -a(定期检查问题主机上增加的errors和dropped的数量,往往代表的是网卡,网线或者上游的网络有问题)
  2. netstat -s(与正常节点相比,查找大量重新传输的数据包或其他异常高的指标)。
  3. netstat -s | grep -i retrans(整个集群执行)。 (在一个或多个节点上查找大于正常的计数)。
如果Slow消息最多的是一些其他消息,建议使用以下命令检查磁盘问题:
  1. iostat[高iowait百分比,超过15%]
  2. iostat -x和sar -d(特定分区的高await或%util)
  3. dmesg (磁盘错误)
    使用smartctl对磁盘进行健康检查:停止受影响节点的所有Hadoop进程,然后运行sudo smartctl -H /dev/,检查HDFS使用的每块

    推荐阅读