常用的英文网站字体,seo搜索,广州平面设计师招聘,利用已有网站 制作环境
Linux#xff1a;Hadoop2.x
Windows#xff1a;jdk1.8、Maven3、IDEA2021
步骤
编程分析 编程分析包括#xff1a; 1.数据过程分析#xff1a;数据从输入到输出的过程分析。 2.数据类型分析#xff1a;Map的输入输出类型#xff0c;Reduce的输入输出类型#x…环境
LinuxHadoop2.x
Windowsjdk1.8、Maven3、IDEA2021
步骤
编程分析 编程分析包括 1.数据过程分析数据从输入到输出的过程分析。 2.数据类型分析Map的输入输出类型Reduce的输入输出类型 编程分析决定了我们该如何编写代码。
新建Maven工程
打开IDEA–点击File–New–Project
选择Maven–点击Next
选择一个空目录作为项目目录目录名称例如wordcount建议目录路径不包含中文和空格点击Finish 添加依赖
修改pom.xml添加如下依赖 dependenciesdependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-common/artifactIdversion2.7.3/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-client/artifactIdversion2.7.3/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-hdfs/artifactIdversion2.7.3/version/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-core/artifactIdversion2.7.3/version/dependency/dependencies加载依赖
新建包
在src\main\java目录下新建包:org.example
填入org.example效果如下
新建类
在org.example包下新建出三个类分别为:MyMapper、MyReducer、MyMain效果如下 编写Map程序
编辑MyMapper类步骤如下
1.继承Mapper
2.重写map()方法
3.编写Map逻辑代码1.v1由Text类型转换为String2.按空格进行分词split( )方法3.输出k2, v2编写Reduce程序
编辑MyReducer类步骤如下
1.继承Reducer
2.重写reduce()方法
3.编写Reduce逻辑代码1.k4 k32.v4 v3元素的和3.输出k4, v4编写Main程序Driver程序
编辑MyMain类步骤如下
1. 创建一个job和任务入口(指定主类)
2. 指定job的mapper和输出的类型k2 v2
3. 指定job的reducer和输出的类型k4 v4
4. 指定job的输入和输出路径
5. 执行job思考
代码编写完成后可以先在Windows本地运行吗
打包 看到BUILD SUCCESS为打包成功 打包后得到的jar包在项目的target目录下 提交到Hadoop集群运行
1.将上一步打包得到的jar包上传到linux 2.启动hadoop集群
start-all.sh3.运行jar包
从Linux本地上传一个文件到hdfs
hdfs dfs -put 1.txt /input/1.txthdfs查看输入数据
运行jar包
hadoop jar wordcount-1.0-SNAPSHOT.jar org.example.MyMain /input/1.txt /output/wordcount正常运行过程输出如下
[hadoopnode1 ~]$ hadoop jar wordcount-1.0-SNAPSHOT.jar org.example.MyMain /input/1.txt /output/wordcount
22/03/29 00:23:59 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.193.140:8032
22/03/29 00:23:59 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
22/03/29 00:24:00 INFO input.FileInputFormat: Total input paths to process : 1
22/03/29 00:24:00 INFO mapreduce.JobSubmitter: number of splits:1
22/03/29 00:24:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648484275192_0001
22/03/29 00:24:01 INFO impl.YarnClientImpl: Submitted application application_1648484275192_0001
22/03/29 00:24:01 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1648484275192_0001/
22/03/29 00:24:01 INFO mapreduce.Job: Running job: job_1648484275192_0001
22/03/29 00:24:08 INFO mapreduce.Job: Job job_1648484275192_0001 running in uber mode : false
22/03/29 00:24:08 INFO mapreduce.Job: map 0% reduce 0%
22/03/29 00:24:12 INFO mapreduce.Job: map 100% reduce 0%
22/03/29 00:24:17 INFO mapreduce.Job: map 100% reduce 100%
22/03/29 00:24:19 INFO mapreduce.Job: Job job_1648484275192_0001 completed successfully
22/03/29 00:24:19 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read55FILE: Number of bytes written237261FILE: Number of read operations0FILE: Number of large read operations0FILE: Number of write operations0HDFS: Number of bytes read119HDFS: Number of bytes written25HDFS: Number of read operations6HDFS: Number of large read operations0HDFS: Number of write operations2Job Counters Launched map tasks1Launched reduce tasks1Data-local map tasks1Total time spent by all maps in occupied slots (ms)2290Total time spent by all reduces in occupied slots (ms)2516Total time spent by all map tasks (ms)2290Total time spent by all reduce tasks (ms)2516Total vcore-milliseconds taken by all map tasks2290Total vcore-milliseconds taken by all reduce tasks2516Total megabyte-milliseconds taken by all map tasks2344960Total megabyte-milliseconds taken by all reduce tasks2576384Map-Reduce FrameworkMap input records2Map output records4Map output bytes41Map output materialized bytes55Input split bytes94Combine input records0Combine output records0Reduce input groups3Reduce shuffle bytes55Reduce input records4Reduce output records3Spilled Records8Shuffled Maps 1Failed Shuffles0Merged Map outputs1GC time elapsed (ms)103CPU time spent (ms)1200Physical memory (bytes) snapshot425283584Virtual memory (bytes) snapshot4223356928Total committed heap usage (bytes)277348352Shuffle ErrorsBAD_ID0CONNECTION0IO_ERROR0WRONG_LENGTH0WRONG_MAP0WRONG_REDUCE0File Input Format Counters Bytes Read25File Output Format Counters Bytes Written25
[hadoopnode1 ~]$
查看输出结果
思考 如果运行过程报如下错误该如何解决 代码还可以优化吗如何优化
完成enjoy it!