生也有涯,知也无涯。这篇文章主要讲述Hadoop 官方WordCount案例带你手把手的解析相关的知识,希望能为你提供帮助。
文章目录
- 1.需求
- 2.需求分析
- 3.项目结构图
- 4.项目依赖包
- 5.编写Mapper
- 6.编写Reducer
- 7.编写Driver
- 出现如下所示就欧克 ,接着看结果
1.需求 【Hadoop 官方WordCount案例带你手把手的解析】在给定的文本文件中统计输出每一个单词出现的总次数
hello.txt
hadoop hadoop
ss ss
cls cls
jiao
banzhang
xue
2.需求分析
文章图片
3.项目结构图
文章图片
4.项目依赖包
<
dependencies>
<
dependency>
<
groupId>
junit<
/groupId>
<
artifactId>
junit<
/artifactId>
<
version>
RELEASE<
/version>
<
/dependency>
<
dependency>
<
groupId>
org.apache.logging.log4j<
/groupId>
<
artifactId>
log4j-core<
/artifactId>
<
version>
2.8.2<
/version>
<
/dependency>
<
dependency>
<
groupId>
org.apache.hadoop<
/groupId>
<
artifactId>
hadoop-common<
/artifactId>
<
version>
2.7.2<
/version>
<
/dependency>
<
dependency>
<
groupId>
org.apache.hadoop<
/groupId>
<
artifactId>
hadoop-client<
/artifactId>
<
version>
2.7.2<
/version>
<
/dependency>
<
dependency>
<
groupId>
org.apache.hadoop<
/groupId>
<
artifactId>
hadoop-hdfs<
/artifactId>
<
version>
2.7.2<
/version>
<
/dependency>
<
dependency>
<
groupId>
junit<
/groupId>
<
artifactId>
junit<
/artifactId>
<
version>
RELEASE<
/version>
<
scope>
compile<
/scope>
<
/dependency>
<
/dependencies>
5.编写Mapper
package wordcount_hdfs;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMapper extends Mapper<
LongWritable, Text, Text, IntWritable>
{//0. 将创建对象的操作提取成变量,防止在 map 方法重复创建
private Text text = new Text();
private IntWritable i = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1. 将 Hadoop 内置的text 数据类型转换为string类型
// 方便操作
String string = value.toString();
// 2. 对字符串进行切分
String[] split = string.split(" ");
// 3. 对字符串数组遍历,将单词映射成 (单词,1)
for (String s : split) {
text.set(s);
context.write(text, i);
}}
}
6.编写Reducer
package wordcount_hdfs;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WordCountReducer extends Reducer<
Text, IntWritable,Text,IntWritable>
{private IntWritable total= new IntWritable();
@Override
protected void reduce(Text key, Iterable<
IntWritable>
values, Context context) throws IOException, InterruptedException {// 定义一个 sum,用来对每个键值对的 值 做 累加操作
int sum = 0;
for (IntWritable value : values) {
int i = value.get();
sum+=i;
}
total.set(sum);
// 最后写出到文件
context.write(key, total);
}
}
7.编写Driver
package wordcount_hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordCountDriver {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {// 0 指定路径 这里路径有两种写法 :
args = new String[]{"E:\\\\Hadoop\\\\src\\\\main\\\\resources\\\\input", "E:\\\\Hadoop\\\\src\\\\main\\\\resources\\\\ouput"};
//args = new String[]{"E:/Hadoop/src/main/resources/", "E:/Hadoop/src/main/resources/"};
// 1 获取配置信息configuration以及封装任务job
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration);
// 2 设置Driver加载路径 setJarByClass
job.setJarByClass(WordCountDriver.class);
// 3 设置map和reduce类 setMaper setReducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 4 设置map输出setmapoutputkeysetmapoutputvalue
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 5 设置最终输出kv类型 (reducer的输出kv类型) setoutoutkeysetoutputvalue
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 6 设置本地的输入和输出路径fileinputformat.setinputpath
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7 提交
boolean completion = job.waitForCompletion(true);
System.exit(completion ? 0 : 1);
}
}
8.运行结果
文章图片
出现如下所示就欧克 ,接着看结果
"D:\\Program Files\\Java\\bin\\java.exe" "-javaagent:D:\\office\\Program Files\\IntelliJ2018.2.6\\lib\\idea_rt.jar=53527:D:\\office\\Program Files\\IntelliJ2018.2.6\\bin" -Dfile.encoding=UTF-8 -classpath "D:\\Program Files\\Java\\jre\\lib\\charsets.jar;
D:\\Program Files\\Java\\jre\\lib\\deploy.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\access-bridge-64.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\cldrdata.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\dnsns.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\jaccess.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\jfxrt.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\localedata.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\nashorn.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\sunec.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\sunjce_provider.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\sunmscapi.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\sunpkcs11.jar;
D:\\Program Files\\Java\\jre\\lib\\ext\\zipfs.jar;
D:\\Program Files\\Java\\jre\\lib\\javaws.jar;
D:\\Program Files\\Java\\jre\\lib\\jce.jar;
D:\\Program Files\\Java\\jre\\lib\\jfr.jar;
D:\\Program Files\\Java\\jre\\lib\\jfxswt.jar;
D:\\Program Files\\Java\\jre\\lib\\jsse.jar;
D:\\Program Files\\Java\\jre\\lib\\management-agent.jar;
D:\\Program Files\\Java\\jre\\lib\\plugin.jar;
D:\\Program Files\\Java\\jre\\lib\\resources.jar;
D:\\Program Files\\Java\\jre\\lib\\rt.jar;
E:\\Hadoop\\target\\classes;
C:\\Users\\Administrator\\.m2\\repository\\junit\\junit\\4.13.1\\junit-4.13.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\hamcrest\\hamcrest-core\\1.3\\hamcrest-core-1.3.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\logging\\log4j\\log4j-core\\2.8.2\\log4j-core-2.8.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\logging\\log4j\\log4j-api\\2.8.2\\log4j-api-2.8.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-common\\2.7.2\\hadoop-common-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-annotations\\2.7.2\\hadoop-annotations-2.7.2.jar;
D:\\Program Files\\Java\\lib\\tools.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\google\\guava\\guava\\11.0.2\\guava-11.0.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-cli\\commons-cli\\1.2\\commons-cli-1.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\commons\\commons-math3\\3.1.1\\commons-math3-3.1.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\xmlenc\\xmlenc\\0.52\\xmlenc-0.52.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-httpclient\\commons-httpclient\\3.1\\commons-httpclient-3.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-codec\\commons-codec\\1.4\\commons-codec-1.4.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-io\\commons-io\\2.4\\commons-io-2.4.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-net\\commons-net\\3.1\\commons-net-3.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-collections\\commons-collections\\3.2.2\\commons-collections-3.2.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\javax\\servlet\\servlet-api\\2.5\\servlet-api-2.5.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\mortbay\\jetty\\jetty\\6.1.26\\jetty-6.1.26.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\mortbay\\jetty\\jetty-util\\6.1.26\\jetty-util-6.1.26.jar;
C:\\Users\\Administrator\\.m2\\repository\\javax\\servlet\\jsp\\jsp-api\\2.1\\jsp-api-2.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\sun\\jersey\\jersey-core\\1.9\\jersey-core-1.9.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\sun\\jersey\\jersey-json\\1.9\\jersey-json-1.9.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\codehaus\\jettison\\jettison\\1.1\\jettison-1.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\sun\\xml\\bind\\jaxb-impl\\2.2.3-1\\jaxb-impl-2.2.3-1.jar;
C:\\Users\\Administrator\\.m2\\repository\\javax\\xml\\bind\\jaxb-api\\2.2.2\\jaxb-api-2.2.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\javax\\xml\\stream\\stax-api\\1.0-2\\stax-api-1.0-2.jar;
C:\\Users\\Administrator\\.m2\\repository\\javax\\activation\\activation\\1.1\\activation-1.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\codehaus\\jackson\\jackson-jaxrs\\1.8.3\\jackson-jaxrs-1.8.3.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\codehaus\\jackson\\jackson-xc\\1.8.3\\jackson-xc-1.8.3.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\sun\\jersey\\jersey-server\\1.9\\jersey-server-1.9.jar;
C:\\Users\\Administrator\\.m2\\repository\\asm\\asm\\3.1\\asm-3.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-logging\\commons-logging\\1.1.3\\commons-logging-1.1.3.jar;
C:\\Users\\Administrator\\.m2\\repository\\log4j\\log4j\\1.2.17\\log4j-1.2.17.jar;
C:\\Users\\Administrator\\.m2\\repository\\net\\java\\dev\\jets3t\\jets3t\\0.9.0\\jets3t-0.9.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\httpcomponents\\httpclient\\4.1.2\\httpclient-4.1.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\httpcomponents\\httpcore\\4.1.2\\httpcore-4.1.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\jamesmurty\\utils\\java-xmlbuilder\\0.4\\java-xmlbuilder-0.4.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-lang\\commons-lang\\2.6\\commons-lang-2.6.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-configuration\\commons-configuration\\1.6\\commons-configuration-1.6.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-digester\\commons-digester\\1.8\\commons-digester-1.8.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-beanutils\\commons-beanutils\\1.7.0\\commons-beanutils-1.7.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-beanutils\\commons-beanutils-core\\1.8.0\\commons-beanutils-core-1.8.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\slf4j\\slf4j-api\\1.7.10\\slf4j-api-1.7.10.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\slf4j\\slf4j-log4j12\\1.7.10\\slf4j-log4j12-1.7.10.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\codehaus\\jackson\\jackson-core-asl\\1.9.13\\jackson-core-asl-1.9.13.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\codehaus\\jackson\\jackson-mapper-asl\\1.9.13\\jackson-mapper-asl-1.9.13.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\avro\\avro\\1.7.4\\avro-1.7.4.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\thoughtworks\\paranamer\\paranamer\\2.3\\paranamer-2.3.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\xerial\\snappy\\snappy-java\\1.0.4.1\\snappy-java-1.0.4.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\google\\protobuf\\protobuf-java\\2.5.0\\protobuf-java-2.5.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\google\\code\\gson\\gson\\2.2.4\\gson-2.2.4.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-auth\\2.7.2\\hadoop-auth-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\directory\\server\\apacheds-kerberos-codec\\2.0.0-M15\\apacheds-kerberos-codec-2.0.0-M15.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\directory\\server\\apacheds-i18n\\2.0.0-M15\\apacheds-i18n-2.0.0-M15.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\directory\\api\\api-asn1-api\\1.0.0-M20\\api-asn1-api-1.0.0-M20.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\directory\\api\\api-util\\1.0.0-M20\\api-util-1.0.0-M20.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\curator\\curator-framework\\2.7.1\\curator-framework-2.7.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\jcraft\\jsch\\0.1.42\\jsch-0.1.42.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\curator\\curator-client\\2.7.1\\curator-client-2.7.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\curator\\curator-recipes\\2.7.1\\curator-recipes-2.7.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\google\\code\\findbugs\\jsr305\\3.0.0\\jsr305-3.0.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\htrace\\htrace-core\\3.1.0-incubating\\htrace-core-3.1.0-incubating.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\zookeeper\\zookeeper\\3.4.6\\zookeeper-3.4.6.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\commons\\commons-compress\\1.4.1\\commons-compress-1.4.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\tukaani\\xz\\1.0\\xz-1.0.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-client\\2.7.2\\hadoop-client-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-mapreduce-client-app\\2.7.2\\hadoop-mapreduce-client-app-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-mapreduce-client-common\\2.7.2\\hadoop-mapreduce-client-common-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-yarn-client\\2.7.2\\hadoop-yarn-client-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-yarn-server-common\\2.7.2\\hadoop-yarn-server-common-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-mapreduce-client-shuffle\\2.7.2\\hadoop-mapreduce-client-shuffle-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-yarn-api\\2.7.2\\hadoop-yarn-api-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-mapreduce-client-core\\2.7.2\\hadoop-mapreduce-client-core-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-yarn-common\\2.7.2\\hadoop-yarn-common-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\com\\sun\\jersey\\jersey-client\\1.9\\jersey-client-1.9.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-mapreduce-client-jobclient\\2.7.2\\hadoop-mapreduce-client-jobclient-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\apache\\hadoop\\hadoop-hdfs\\2.7.2\\hadoop-hdfs-2.7.2.jar;
C:\\Users\\Administrator\\.m2\\repository\\commons-daemon\\commons-daemon\\1.0.13\\commons-daemon-1.0.13.jar;
C:\\Users\\Administrator\\.m2\\repository\\io\\netty\\netty\\3.6.2.Final\\netty-3.6.2.Final.jar;
C:\\Users\\Administrator\\.m2\\repository\\io\\netty\\netty-all\\4.0.23.Final\\netty-all-4.0.23.Final.jar;
C:\\Users\\Administrator\\.m2\\repository\\xerces\\xercesImpl\\2.9.1\\xercesImpl-2.9.1.jar;
C:\\Users\\Administrator\\.m2\\repository\\xml-apis\\xml-apis\\1.3.04\\xml-apis-1.3.04.jar;
C:\\Users\\Administrator\\.m2\\repository\\org\\fusesource\\leveldbjni\\leveldbjni-all\\1.8\\leveldbjni-all-1.8.jar" KVText.KVTextDriver
2020-10-21 14:41:01,541 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-10-21 14:41:01,551 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2020-10-21 14:41:02,916 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-10-21 14:41:02,936 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set.User classes may not be found. See Job or Job#setJar(String).
2020-10-21 14:41:03,236 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2020-10-21 14:41:03,256 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2020-10-21 14:41:03,326 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local297471183_0001
2020-10-21 14:41:03,476 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2020-10-21 14:41:03,476 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local297471183_0001
2020-10-21 14:41:03,476 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2020-10-21 14:41:03,486 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-10-21 14:41:03,486 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-10-21 14:41:03,536 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2020-10-21 14:41:03,536 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local297471183_0001_m_000000_0
2020-10-21 14:41:03,566 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-10-21 14:41:03,576 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-10-21 14:41:03,616 INFO [org.apache.hadoop.mapred.Task] -Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6cec6fbe
2020-10-21 14:41:03,626 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/E:/Hadoop/src/main/resources/input/englishconment.txt:0+80
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0;
bufvoid = 104857600
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396;
length = 6553600
2020-10-21 14:41:03,666 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-10-21 14:41:03,676 INFO [org.apache.hadoop.mapred.LocalJobRunner] -
2020-10-21 14:41:03,676 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-10-21 14:41:03,676 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-10-21 14:41:03,676 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0;
bufend = 48;
bufvoid = 104857600
2020-10-21 14:41:03,676 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584);
kvend = 26214384(104857536);
length = 13/6553600
2020-10-21 14:41:03,796 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0
2020-10-21 14:41:03,806 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local297471183_0001_m_000000_0 is done. And is in the process of committing
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - file:/E:/Hadoop/src/main/resources/input/englishconment.txt:0+80
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.Task] - Task \'attempt_local297471183_0001_m_000000_0\' done.
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local297471183_0001_m_000000_0
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks
2020-10-21 14:41:03,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local297471183_0001_r_000000_0
2020-10-21 14:41:03,836 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-10-21 14:41:03,836 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-10-21 14:41:04,115 INFO [org.apache.hadoop.mapred.Task] -Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@2f6f6193
2020-10-21 14:41:04,115 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@15648f8c
2020-10-21 14:41:04,135 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=648858816, maxSingleShuffleLimit=162214704, mergeThreshold=428246848, iosortFactor=10, memToMemMergeOutputsThreshold=10
2020-10-21 14:41:04,135 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local297471183_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2020-10-21 14:41:04,165 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local297471183_0001_m_000000_0 decomp: 58 len: 62 to MEMORY
2020-10-21 14:41:04,175 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 58 bytes from map-output for attempt_local297471183_0001_m_000000_0
2020-10-21 14:41:04,175 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile ->
map-output of size: 58, inMemoryMapOutputs.size() ->
1, commitMemory ->
0, usedMemory ->
58
2020-10-21 14:41:04,175 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning
2020-10-21 14:41:04,175 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-10-21 14:41:04,175 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2020-10-21 14:41:04,195 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2020-10-21 14:41:04,195 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
2020-10-21 14:41:04,205 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 58 bytes to disk to satisfy reduce memory limit
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 62 bytes from disk
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-10-21 14:41:04,215 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2020-10-21 14:41:04,225 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local297471183_0001_r_000000_0 is done. And is in the process of committing
2020-10-21 14:41:04,225 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-10-21 14:41:04,225 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local297471183_0001_r_000000_0 is allowed to commit now
2020-10-21 14:41:04,235 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task \'attempt_local297471183_0001_r_000000_0\' to file:/E:/Hadoop/src/main/resources/ouput/_temporary/0/task_local297471183_0001_r_000000
2020-10-21 14:41:04,235 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce >
reduce
2020-10-21 14:41:04,235 INFO [org.apache.hadoop.mapred.Task] - Task \'attempt_local297471183_0001_r_000000_0\' done.
2020-10-21 14:41:04,235 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local297471183_0001_r_000000_0
2020-10-21 14:41:04,235 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete.
2020-10-21 14:41:04,485 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local297471183_0001 running in uber mode : false
2020-10-21 14:41:04,485 INFO [org.apache.hadoop.mapreduce.Job] -map 100% reduce 100%
2020-10-21 14:41:04,485 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local297471183_0001 completed successfully
2020-10-21 14:41:04,515 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=672
FILE: Number of bytes written=583822
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=4
Map output records=4
Map output bytes=48
Map output materialized bytes=62
Input split bytes=124
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=62
Reduce input records=4
Reduce output records=2
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=12
Total committed heap usage (bytes)=374865920
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=80
File Output Format Counters
Bytes Written=32Process finished with exit code 0
推荐阅读
- gini基尼系数,累积准确度分布,AUC(风控模型核心指标)
- 对话交互(封闭域任务型与开放域闲聊算法技术)
- 用户路径分析之利器“桑基图”
- 深度学习中的分布式训练
- greenplum的升级与连接池pgbouncer
- 零代码上线小布对话技能(技能平台的实践与思考)
- Apache Airflow单机/分布式环境搭建
- 风控信用评分-(scorecard)记分卡开发流程,详细介绍分数校准原理calibratio
- 袋鼠云(基于Flink构建实时计算平台的总体架构和关键技术点)