Hadoop源码分析三启动及脚本剖析

目录

  • 1、 启动
  • 2、 脚本分析
    • start-all.sh脚本内容如下:
    • start-dfs.sh的内容如下:
    • 启动上述角色调用的hadoop-daemons.sh脚本内容如下:
    • 我们继续看hadoop-daemon.sh脚本。
    • 这里可以看见它实际是hadoop的bin目录下的hdfs文件

1、 启动 hadoop的启动是通过其sbin目录下的脚本来启动的。与启动相关的叫脚本有以下几个:
start-all.shstart-dfs.shstart-yarn.shhadoop-daemon.shyarn-daemon.sh
hadoop-daemon.sh是用来启动与hdfs相关的服务的
yarn-daemon.sh是用来启动和yarn相关的服务
start-dfs.sh是用来启动hdfs集群的
start-yarn.sh是用来启动yarn集群
start-all.sh是用来启动yarn和hdfs集群的
这几个start开头的脚本都是通过调用那两个daemon脚本来启动的。

2、 脚本分析 这里先从start-all.sh开始分析,然后逐步分析其脚本的调用。

start-all.sh脚本内容如下:
#!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.You may obtain a copy of the License at##http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# Start all hadoop daemons.Run this on master node.echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh"bin=`dirname "${BASH_SOURCE-$0}"`bin=`cd "$bin"; pwd`DEFAULT_LIBEXEC_DIR="$bin"/../libexecHADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}. $HADOOP_LIBEXEC_DIR/hadoop-config.sh# start hdfs daemons if hdfs is presentif [ -f "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh ]; then"${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIRfi# start yarn daemons if yarn is presentif [ -f "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh ]; then"${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIRfi

这个脚本的重点在第31行到末尾,这里是两个if语句,第一个if语句里(第34行)调用的是start-dfs.sh脚本,第二个if语句里(第37行)调用的是start-yarn.sh。
然后我们以start-dfs.sh为例,继续向下分析。

start-dfs.sh的内容如下:
#!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.You may obtain a copy of the License at##http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# Start hadoop dfs daemons.# Optinally upgrade or rollback dfs state.# Run this on master node.usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"bin=`dirname "${BASH_SOURCE-$0}"`bin=`cd "$bin"; pwd`DEFAULT_LIBEXEC_DIR="$bin"/../libexecHADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}. $HADOOP_LIBEXEC_DIR/hdfs-config.sh# get argumentsif [[ $# -ge 1 ]]; thenstartOpt="$1"shiftcase "$startOpt" in-upgrade)nameStartOpt="$startOpt"; ; -rollback)dataStartOpt="$startOpt"; ; *)echo $usageexit 1; ; esacfi#Add other possible optionsnameStartOpt="$nameStartOpt $@"#---------------------------------------------------------# namenodesNAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)echo "Starting namenodes on [$NAMENODES]""$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \--config "$HADOOP_CONF_DIR" \--hostnames "$NAMENODES" \--script "$bin/hdfs" start namenode $nameStartOpt#---------------------------------------------------------# datanodes (using default slaves file)if [ -n "$HADOOP_SECURE_DN_USER" ]; thenecho \"Attempting to start secure cluster, skipping datanodes. " \"Run start-secure-dns.sh as root to complete startup."else"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \--config "$HADOOP_CONF_DIR" \--script "$bin/hdfs" start datanode $dataStartOptfi#---------------------------------------------------------# secondary namenodes (if any)SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)if [ -n "$SECONDARY_NAMENODES" ]; thenecho "Starting secondary namenodes [$SECONDARY_NAMENODES]""$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \--config "$HADOOP_CONF_DIR" \--hostnames "$SECONDARY_NAMENODES" \--script "$bin/hdfs" start secondarynamenodefi#---------------------------------------------------------# quorumjournal nodes (if any)SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)case "$SHARED_EDITS_DIR" inqjournal://*)JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/; / /g; s/:[0-9]*//g')echo "Starting journal nodes [$JOURNAL_NODES]""$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \--config "$HADOOP_CONF_DIR" \--hostnames "$JOURNAL_NODES" \--script "$bin/hdfs" start journalnode ; ; esac#---------------------------------------------------------# ZK Failover controllers, if auto-HA is enabledAUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; thenecho "Starting ZK Failover Controllers on NN hosts [$NAMENODES]""$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \--config "$HADOOP_CONF_DIR" \--hostnames "$NAMENODES" \--script "$bin/hdfs" start zkfcfi# eof

首先是第23行到30行,这里是在处理hadoop的路径。
然后是第32行到51行是在处理脚本传入的参数。最后就启动hdfs的各个角色了。
首先启动的是namenode,在第60行到63行,这段代码是调用了hadoop-daemons.sh脚本来启动,这个脚本和之前提到的hadoop-daemon.sh脚本的区别在于这个脚本可以在集群的其他机器上启动该启动的角色,而hadoop-daemon.sh只能启动当前机器上的角色。其实hadoop-daemons.sh也是通过调用hadoop-daemon.sh来启动的,这个稍后再分析。这个脚本还有几个参数,其中最重要的是:start namenode。它表示启动namenode。
然后是第65行到第76行,这里是在启动datanode,启动的方式和namenode相同。启动DataNode的代码在第73行。
然后是第78行到第90行,这里在启动secondarynamenode。如果是配置了namenode的高可用,secondarynamenode便不会启动。
然后是第92行到第105行,这里在启动journalnode。如果配置了namenode的高可用,journalnode才会启动。
最后是第107行到末尾,这里启动的是zkfc。同样这个也是要配置高可用才会启动。
如果按照文档(2)中配置的高可用来看,这里启动的角色应该为:namenode、datanode、journalnode、zkfc。

启动上述角色调用的hadoop-daemons.sh脚本内容如下:
#!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.You may obtain a copy of the License at##http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# Run a Hadoop command on all slave hosts.usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."# if no args specified, show usageif [ $# -le 1 ]; thenecho $usageexit 1fibin=`dirname "${BASH_SOURCE-$0}"`bin=`cd "$bin"; pwd`DEFAULT_LIBEXEC_DIR="$bin"/../libexecHADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}. $HADOOP_LIBEXEC_DIR/hadoop-config.shexec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

这个脚本的重点在最后一行。这里有两个脚本slaves.sh和hadoop-daemon.sh。slaves.sh脚本会使用ssh登录到指定的服务器中然后执行其中的hadoop-daemon.sh脚本。这里就不分析slaves.sh脚本了。

我们继续看hadoop-daemon.sh脚本。
其内容如下:
#!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.You may obtain a copy of the License at##http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# Runs a Hadoop command as a daemon.## Environment Variables##HADOOP_CONF_DIRAlternate conf dir. Default is ${HADOOP_PREFIX}/conf.#HADOOP_LOG_DIRWhere log files are stored.PWD by default.#HADOOP_MASTERhost:path where hadoop code should be rsync'd from#HADOOP_PID_DIRThe pid files are stored. /tmp by default.#HADOOP_IDENT_STRINGA string representing this instance of hadoop. $USER by default#HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.##usage="Usage: hadoop-daemon.sh [--config ] [--hosts hostlistfile] [--script script] (start|stop)"# if no args specified, show usageif [ $# -le 1 ]; thenecho $usageexit 1fibin=`dirname "${BASH_SOURCE-$0}"`bin=`cd "$bin"; pwd`DEFAULT_LIBEXEC_DIR="$bin"/../libexecHADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}. $HADOOP_LIBEXEC_DIR/hadoop-config.sh# get arguments#default valuehadoopScript="$HADOOP_PREFIX"/bin/hadoopif [ "--script" = "$1" ]thenshifthadoopScript=$1shiftfistartStop=$1shiftcommand=$1shifthadoop_rotate_log (){log=$1; num=5; if [ -n "$2" ]; then num=$2fiif [ -f "$log" ]; then # rotate logs while [ $num -gt 1 ]; doprev=`expr $num - 1`[ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"num=$prev done mv "$log" "$log.$num"; fi}if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then. "${HADOOP_CONF_DIR}/hadoop-env.sh"fi# Determine if we're starting a secure datanode, and if so, redefine appropriate variablesif [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; thenexport HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIRexport HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIRexport HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USERstarting_secure_dn="true"fi#Determine if we're starting a privileged NFS, if so, redefine the appropriate variablesif [ "$command" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; thenexport HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIRexport HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIRexport HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USERstarting_privileged_nfs="true"fiif [ "$HADOOP_IDENT_STRING" = "" ]; thenexport HADOOP_IDENT_STRING="$USER"fi# get log directoryif [ "$HADOOP_LOG_DIR" = "" ]; thenexport HADOOP_LOG_DIR="$HADOOP_PREFIX/logs"fiif [ ! -w "$HADOOP_LOG_DIR" ] ; thenmkdir -p "$HADOOP_LOG_DIR"chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIRfiif [ "$HADOOP_PID_DIR" = "" ]; thenHADOOP_PID_DIR=/tmpfi# some variablesexport HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.logexport HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.outpid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pidHADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}# Set default scheduling priorityif [ "$HADOOP_NICENESS" = "" ]; thenexport HADOOP_NICENESS=0ficase $startStop in(start)[ -w "$HADOOP_PID_DIR" ] ||mkdir -p "$HADOOP_PID_DIR"if [ -f $pid ]; thenif kill -0 `cat $pid` > /dev/null 2>&1; thenecho $command running as process `cat $pid`.Stop it first.exit 1fifiif [ "$HADOOP_MASTER" != "" ]; thenecho rsync from $HADOOP_MASTERrsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"fihadoop_rotate_log $logecho starting $command, logging to $logcd "$HADOOP_PREFIX"case $command innamenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)if [ -z "$HADOOP_HDFS_HOME" ]; thenhdfsScript="$HADOOP_PREFIX"/bin/hdfselsehdfsScript="$HADOOP_HDFS_HOME"/bin/hdfsfinohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &; ; (*)nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &; ; esacecho $! > $pidsleep 1head "$log"# capture the ulimit outputif [ "true" = "$starting_secure_dn" ]; thenecho "ulimit -a for secure datanode user $HADOOP_SECURE_DN_USER" >> $log# capture the ulimit info for the appropriate usersu --shell=/bin/bash $HADOOP_SECURE_DN_USER -c 'ulimit -a' >> $log 2>&1elif [ "true" = "$starting_privileged_nfs" ]; thenecho "ulimit -a for privileged nfs user $HADOOP_PRIVILEGED_NFS_USER" >> $logsu --shell=/bin/bash $HADOOP_PRIVILEGED_NFS_USER -c 'ulimit -a' >> $log 2>&1elseecho "ulimit -a for user $USER" >> $logulimit -a >> $log 2>&1fisleep 3; if ! ps -p $! > /dev/null ; thenexit 1fi; ; (stop)if [ -f $pid ]; thenTARGET_PID=`cat $pid`if kill -0 $TARGET_PID > /dev/null 2>&1; thenecho stopping $commandkill $TARGET_PIDsleep $HADOOP_STOP_TIMEOUTif kill -0 $TARGET_PID > /dev/null 2>&1; thenecho "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"kill -9 $TARGET_PIDfielseecho no $command to stopfirm -f $pidelseecho no $command to stopfi; ; (*)echo $usageexit 1; ; esac

这段代码的重点在第131行到结束。这里是真正在启动服务的代码,这个文件在调用的时候,会传入两个重要的参数start/stop xxx。用于启动或停止某些服务。以启动服务为例,其重点在第153行,这里会执行一个hdfsScript脚本。这个参数的定义在第155行,

这里可以看见它实际是hadoop的bin目录下的hdfs文件
文件的内容如下:
#!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.You may obtain a copy of the License at##http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# Environment Variables##JSVC_HOMEhome directory of jsvc binary.Required for starting secure#datanode.##JSVC_OUTFILEpath to jsvc output file.Defaults to#$HADOOP_LOG_DIR/jsvc.out.##JSVC_ERRFILEpath to jsvc error file.Defaults to $HADOOP_LOG_DIR/jsvc.err.bin=`which $0`bin=`dirname ${bin}`bin=`cd "$bin" > /dev/null; pwd`DEFAULT_LIBEXEC_DIR="$bin"/../libexecHADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}. $HADOOP_LIBEXEC_DIR/hdfs-config.shfunction print_usage(){echo "Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND"echo "where COMMAND is one of:"echo "dfsrun a filesystem command on the file systems supported in Hadoop."echo "classpathprints the classpath"echo "namenode -formatformat the DFS filesystem"echo "secondarynamenoderun the DFS secondary namenode"echo "namenoderun the DFS namenode"echo "journalnoderun the DFS journalnode"echo "zkfcrun the ZK Failover Controller daemon"echo "datanoderun a DFS datanode"echo "dfsadminrun a DFS admin client"echo "haadminrun a DFS HA admin client"echo "fsckrun a DFS filesystem checking utility"echo "balancerrun a cluster balancing utility"echo "jmxgetget JMX exported values from NameNode or DataNode."echo "moverrun a utility to move block replicas across"echo "storage types"echo "oivapply the offline fsimage viewer to an fsimage"echo "oiv_legacyapply the offline fsimage viewer to an legacy fsimage"echo "oevapply the offline edits viewer to an edits file"echo "fetchdtfetch a delegation token from the NameNode"echo "getconfget config values from configuration"echo "groupsget the groups which users belong to"echo "snapshotDiffdiff two snapshots of a directory or diff the"echo "current directory contents with a snapshot"echo "lsSnapshottableDirlist all snapshottable dirs owned by the current user"echo "Use -help to see options"echo "portmaprun a portmap service"echo "nfs3run an NFS version 3 gateway"echo "cacheadminconfigure the HDFS cache"echo "cryptoconfigure HDFS encryption zones"echo "storagepolicieslist/get/set block storage policies"echo "versionprint the version"echo ""echo "Most commands print help when invoked w/o parameters."# There are also debug commands, but they don't show up in this listing.}if [ $# = 0 ]; thenprint_usageexitfiCOMMAND=$1shiftcase $COMMAND in# usage flags--help|-help|-h)print_usageexit; ; esac# Determine if we're starting a secure datanode, and if so, redefine appropriate variablesif [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; thenif [ -n "$JSVC_HOME" ]; thenif [ -n "$HADOOP_SECURE_DN_PID_DIR" ]; thenHADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIRfiif [ -n "$HADOOP_SECURE_DN_LOG_DIR" ]; thenHADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIRHADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"fiHADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USERHADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"starting_secure_dn="true"elseecho "It looks like you're trying to start a secure DN, but \$JSVC_HOME"\"isn't set. Falling back to starting insecure DN."fifi# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variablesif [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; thenif [ -n "$JSVC_HOME" ]; thenif [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; thenHADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIRfiif [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; thenHADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIRHADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"fiHADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USERHADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"starting_privileged_nfs="true"elseecho "It looks like you're trying to start a privileged NFS server, but"\"\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."fifiif [ "$COMMAND" = "namenode" ] ; thenCLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"elif [ "$COMMAND" = "zkfc" ] ; thenCLASS='org.apache.hadoop.hdfs.tools.DFSZKFailoverController'HADOOP_OPTS="$HADOOP_OPTS $HADOOP_ZKFC_OPTS"elif [ "$COMMAND" = "secondarynamenode" ] ; thenCLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"elif [ "$COMMAND" = "datanode" ] ; thenCLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'if [ "$starting_secure_dn" = "true" ]; thenHADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"elseHADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"fielif [ "$COMMAND" = "journalnode" ] ; thenCLASS='org.apache.hadoop.hdfs.qjournal.server.JournalNode'HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOURNALNODE_OPTS"elif [ "$COMMAND" = "dfs" ] ; thenCLASS=org.apache.hadoop.fs.FsShellHADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "dfsadmin" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.DFSAdminHADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "haadmin" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.DFSHAAdminCLASSPATH=${CLASSPATH}:${TOOL_PATH}HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "fsck" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.DFSckHADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "balancer" ] ; thenCLASS=org.apache.hadoop.hdfs.server.balancer.BalancerHADOOP_OPTS="$HADOOP_OPTS $HADOOP_BALANCER_OPTS"elif [ "$COMMAND" = "mover" ] ; thenCLASS=org.apache.hadoop.hdfs.server.mover.MoverHADOOP_OPTS="${HADOOP_OPTS} ${HADOOP_MOVER_OPTS}"elif [ "$COMMAND" = "storagepolicies" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.StoragePolicyAdminelif [ "$COMMAND" = "jmxget" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.JMXGetelif [ "$COMMAND" = "oiv" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPBelif [ "$COMMAND" = "oiv_legacy" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerelif [ "$COMMAND" = "oev" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewerelif [ "$COMMAND" = "fetchdt" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcherelif [ "$COMMAND" = "getconf" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.GetConfelif [ "$COMMAND" = "groups" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.GetGroupselif [ "$COMMAND" = "snapshotDiff" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.snapshot.SnapshotDiffelif [ "$COMMAND" = "lsSnapshottableDir" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.snapshot.LsSnapshottableDirelif [ "$COMMAND" = "portmap" ] ; thenCLASS=org.apache.hadoop.portmap.PortmapHADOOP_OPTS="$HADOOP_OPTS $HADOOP_PORTMAP_OPTS"elif [ "$COMMAND" = "nfs3" ] ; thenCLASS=org.apache.hadoop.hdfs.nfs.nfs3.Nfs3HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NFS3_OPTS"elif [ "$COMMAND" = "cacheadmin" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.CacheAdminelif [ "$COMMAND" = "crypto" ] ; thenCLASS=org.apache.hadoop.hdfs.tools.CryptoAdminelif [ "$COMMAND" = "version" ] ; thenCLASS=org.apache.hadoop.util.VersionInfoelif [ "$COMMAND" = "debug" ]; thenCLASS=org.apache.hadoop.hdfs.tools.DebugAdminelif [ "$COMMAND" = "classpath" ]; thenif [ "$#" -gt 0 ]; thenCLASS=org.apache.hadoop.util.Classpathelse# No need to bother starting up a JVM for this simple case.if $cygwin; thenCLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)fiecho $CLASSPATHexit 0fielseCLASS="$COMMAND"fi# cygwin path translationif $cygwin; thenCLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)HADOOP_LOG_DIR=$(cygpath -w "$HADOOP_LOG_DIR" 2>/dev/null)HADOOP_PREFIX=$(cygpath -w "$HADOOP_PREFIX" 2>/dev/null)HADOOP_CONF_DIR=$(cygpath -w "$HADOOP_CONF_DIR" 2>/dev/null)HADOOP_COMMON_HOME=$(cygpath -w "$HADOOP_COMMON_HOME" 2>/dev/null)HADOOP_HDFS_HOME=$(cygpath -w "$HADOOP_HDFS_HOME" 2>/dev/null)HADOOP_YARN_HOME=$(cygpath -w "$HADOOP_YARN_HOME" 2>/dev/null)HADOOP_MAPRED_HOME=$(cygpath -w "$HADOOP_MAPRED_HOME" 2>/dev/null)fiexport CLASSPATH=$CLASSPATHHADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"# Check to see if we should start a secure datanodeif [ "$starting_secure_dn" = "true" ]; thenif [ "$HADOOP_PID_DIR" = "" ]; thenHADOOP_SECURE_DN_PID="/tmp/hadoop_secure_dn.pid"elseHADOOP_SECURE_DN_PID="$HADOOP_PID_DIR/hadoop_secure_dn.pid"fiJSVC=$JSVC_HOME/jsvcif [ ! -f $JSVC ]; thenecho "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes. "echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\"and set JSVC_HOME to the directory containing the jsvc binary."exitfiif [[ ! $JSVC_OUTFILE ]]; thenJSVC_OUTFILE="$HADOOP_LOG_DIR/jsvc.out"fiif [[ ! $JSVC_ERRFILE ]]; thenJSVC_ERRFILE="$HADOOP_LOG_DIR/jsvc.err"fiexec "$JSVC" \-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \-errfile "$JSVC_ERRFILE" \-pidfile "$HADOOP_SECURE_DN_PID" \-nodetach \-user "$HADOOP_SECURE_DN_USER" \-cp "$CLASSPATH" \$JAVA_HEAP_MAX $HADOOP_OPTS \org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"elif [ "$starting_privileged_nfs" = "true" ] ; thenif [ "$HADOOP_PID_DIR" = "" ]; thenHADOOP_PRIVILEGED_NFS_PID="/tmp/hadoop_privileged_nfs3.pid"elseHADOOP_PRIVILEGED_NFS_PID="$HADOOP_PID_DIR/hadoop_privileged_nfs3.pid"fiJSVC=$JSVC_HOME/jsvcif [ ! -f $JSVC ]; thenecho "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run privileged NFS gateways. "echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\"and set JSVC_HOME to the directory containing the jsvc binary."exitfiif [[ ! $JSVC_OUTFILE ]]; thenJSVC_OUTFILE="$HADOOP_LOG_DIR/nfs3_jsvc.out"fiif [[ ! $JSVC_ERRFILE ]]; thenJSVC_ERRFILE="$HADOOP_LOG_DIR/nfs3_jsvc.err"fiexec "$JSVC" \-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \-errfile "$JSVC_ERRFILE" \-pidfile "$HADOOP_PRIVILEGED_NFS_PID" \-nodetach \-user "$HADOOP_PRIVILEGED_NFS_USER" \-cp "$CLASSPATH" \$JAVA_HEAP_MAX $HADOOP_OPTS \org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter "$@"else# run itexec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"fi

这段代码的重点在第134行到219行和第304行。首先看第134行到219行,这段代码虽然很长但全是一堆if else语句。其逻辑也很简单,就是根据传入的COMMAND的值来为CLASS和HADOOP_OPTS赋值。然后是第304行,执行CLASS类。以第134行的namenode类为例,这里CLASS的值为org.apache.hadoop.hdfs.server.namenode.NameNode,这是一个java类随后便会执行这个类启动namenode。
如果想要debug hdfs的源码,最好在这里设置远程调试。因为这里有单个的服务类与启动参数,可以准确的定位需要的服务。
【Hadoop源码分析三启动及脚本剖析】 以上就是Hadoop源码分析之启动及脚本剖析的详细内容,本系列下一篇文章传送门Hadoop源码分析四远程debug调试更多关于Hadoop源码分析分析的资料请继续关注脚本之家其它相关文章!

    推荐阅读