文章多处参考自Hadoop: Setting up a Single Node Cluster
注意:测试环境中应有3台虚拟机,各自网卡UUID,IP,主机名均不同,一下信息仅为其中之一的配置
虚拟机信息
CentOS Linux release 7.6.1810 (Core)
Derived from Red Hat Enterprise Linux 7.6 (Source)
Linux hadoop10 3.10.0-957.27.2.el7.x86_64
账户信息
用户名 | 密码 |
---|---|
sun | 123456 |
hadoop | 123456 |
root | 123456 |
- 测试hadoop时务必使用hadoop用户操作
网络配置详情
虚拟机外围配置
环境:VMware® Workstation 15 Pro 15.0.0 build-10134415
- 虚拟网络VMnet配置
名称 | 类型 | 外部链接 | 主机连接 | DHCP | 子网地址 |
---|---|---|---|---|---|
VMnet8 | NAT模式 | NAT模式 | 已连接 | - | 192.169.100.0 |
关闭DHCP服务
子网IP:192.168.100.0 子网掩码:255.255.255.0
网关IP:192.168.100.2
关闭IPV6
虚拟机内配置
ip addr
1
2
3
4
5
6
7
8
9
10
11
12
131: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
00:0c:29:f1:16:d2 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.10/24 brd 192.168.100.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::711a:c906:a3e3:aed6/64 scope link noprefixroute
valid_lft forever preferred_lft forever主要网络信息
/etc/sysconfig/network-scripts/ifcfg-ens33
1
2
3
4
5
6
7BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.100.10
NETMASK=255.255.255.0
GATEWAY=192.168.100.2
DNS1=114.114.114.114
DNS2=8.8.8.8主机名
/etc/sysconfig/network
1
2NETWORKING=yes
HOSTNAME=hadoop10HOSTS
/etc/hosts
1
2
3
4
5localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
hadoop10
hadoop11
hadoop12环境变量
hadoop和jdk存放目录(注意 /opt 下的所有文件所有者一定是用户hadoop)
tree -L 1 /opt/modules/
1
2
3/opt/modules/
├── hadoop-3.1.2
└── jdk1.8.0_201环境变量
cat /etc/profile | tail -n 9
1
2
3
4
5
6
7
8#java
export JAVA_HOME=/opt/modules/jdk1.8.0_201
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib
#hadoop
export HADOOP_HOME=/opt/modules/hadoop-3.1.2
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin示例运行
一、Standalone Operation 单机运算
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
Ⅰ.Grep
创建输入文件(pwd /opt/modules/hadoop-3.1.2/)
1
2[hadoop@hadoop10 hadoop-3.1.2]$ mkdir input
[hadoop@hadoop10 hadoop-3.1.2]$ cp etc/hadoop/*.xml input目录结构
1
2
3
4
5
6
7
8
9
10
11[hadoop@hadoop10 hadoop-3.1.2]$ tree input/
input/
├── capacity-scheduler.xml
├── core-site.xml
├── hadoop-policy.xml
├── hdfs-site.xml
├── httpfs-site.xml
├── kms-acls.xml
├── kms-site.xml
├── mapred-site.xml
└── yarn-site.xml启动示例(这里注意,output目录如果已经存在则会报错)
1
[hadoop@hadoop10 hadoop-3.1.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input output 'dfs[a-z.]+'
结果(筛选符合正则的文件)
1
2[hadoop@hadoop10 hadoop-3.1.2]$ output/part-r-00000
1 dfsadmin
Ⅱ.WordCount
创建输入文件(pwd /opt/modules/hadoop-3.1.2/)
1
2
3[hadoop@hadoop10 hadoop-3.1.2]$ mkdir wcinput
[hadoop@hadoop10 hadoop-3.1.2]$ cd wcinput
[hadoop@hadoop10 hadoop-3.1.2]$ touch wc.input1
2
3
4
5以下是向wc.input中添加的内容
hadoop mark
hadoop yarn
sun key
sun key启动示例
1
hadoop@hadoop10 hadoop-3.1.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount wcinput wcoutput
结果
1
2
3
4
5
6[hadoop@hadoop10 hadoop-3.1.2]$ cat wcoutput/part-r-00000
hadoop 2
key 2
mark 1
sun 2
yarn 1二、Pseudo-Distributed Operation 伪分布运算
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
配置(pwd /opt/modules/hadoop-3.1.2/)
vim etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18<configuration>
<!--
指定HDFS 中NameNode 的地址
注意,因为示例一依赖的是linux文件系统,所以配置此项过后示例一将无法运行
-->
<property>
<name>fs.defaultFS</name>
<!--规范来说应注意将localhost改为主机名-->
<value>hdfs://hadoop10:9000</value>
</property>
<!--
指定Hadoop 运行时产生文件的存储目录
-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-3.1.2/data/tmp</value>
</property>
</configuration>etc/hadoop/hdfs-site.xml
1
2
3
4
5
6
7
8
9<configuration>
<!--
配置文件副本数,默认为3
-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>- 配置Hadoop的JAVA_HOME
1
2
3vim etc/hadoop/hadoop-env.sh
添加/更改为
export JAVA_HOME=/opt/modules/jdk1.8.0_201配置ssh免密
Now check that you can ssh to the localhost without a passphrase:
1
ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
1
2
3ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys格式化文件系统(Format the filesystem)
注意:只有第一次配置才需要格式化文件系统
因为配置过Hadoops /bin和/sbin下的环境变量,所以这里可以省略
bin/
和sbin/
路径1
bin/hdfs namenode -format
启动NameNode和DataNode守护进程 (Start NameNode daemon and DataNode daemon)
1
2
3
4
5
6
7
8
9
10[hadoop@hadoop10 hadoop-3.1.2]$ start-dfs.sh
Starting namenodes on [hadoop10]
Starting datanodes
Starting secondary namenodes [hadoop10]
检查启动是否成功
jps
3080 SecondaryNameNode
2746 NameNode
3213 Jps
2863 DataNode浏览器查看web页面:Browse the web interface for the NameNode; by default it is available at:
NameNode -
http://localhost:9870/
hdfs文件操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15创建目录
hdfs dfs -mkdir -p /home/hadoop/input、
查看目录
hdfs dfs -ls -R /
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home/hadoop
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home/hadoop/input
将linux文件传至hdfs文件系统
hdfs dfs -put wcinput/wc.input /home/hadoop/input
查看目录
hdfs dfs -ls -R /
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home/hadoop
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:34 /home/hadoop/input
-rw-r--r-- 1 hadoop supergroup 42 2019-09-22 01:34 /home/hadoop/input/wc.inputWordCount示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18运行示例(记得output先删除掉)
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /home/hadoop/input/wc.input /home/hadoop/output
查看生成结果
hdfs dfs -ls -R /
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:30 /home
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:43 /home/hadoop
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:34 /home/hadoop/input
-rw-r--r-- 1 hadoop supergroup 42 2019-09-22 01:34 /home/hadoop/input/wc.input
drwxr-xr-x - hadoop supergroup 0 2019-09-22 01:43 /home/hadoop/output
-rw-r--r-- 1 hadoop supergroup 0 2019-09-22 01:43 /home/hadoop/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 35 2019-09-22 01:43 /home/hadoop/output/part-r-00000
查看输出结果
hdfs dfs -cat /home/hadoop/output/part*
hadoop 2
key 2
mark 1
sun 2
yarn 1三、YARN on a Single Node 单节点YARN
配置(pwd /opt/modules/hadoop-3.1.2/)
vim etc/hadoop/mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19<configuration>
<!--指定mapreduce使用yarn-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>vim etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12<configuration>
<!-- Reducer 获取数据的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN 的ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop10</value>
</property>
</configuration>启动ResourceManager和NodeManager守护进程 (Start ResourceManager daemon and NodeManager daemon)
start-yarn.sh
查看进程
1
2
3
4
5
6
7[hadoop@hadoop10 hadoop-3.1.2]$ jps
10800 Jps
9988 ResourceManager
5432 SecondaryNameNode
5230 DataNode
5119 NameNode
10095 NodeManager浏览器查看web页面:Browse the web interface for the ResourceManager; by default it is available at:
- ResourceManager -
http://localhost:8088/
- ResourceManager -
WordCount示例
1
[hadoop@hadoop10 hadoop-3.1.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /home/hadoop/input/wc.input /home/hadoop/output
查看结果
1
2
3
4
5
6[hadoop@hadoop10 hadoop-3.1.2]$ hdfs dfs -cat /home/hadoop/output/part-r-00000
hadoop 2
key 2
mark 1
sun 2
yarn 1四、历史服务器
配置mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--历史服务器端地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop10:10020</value>
</property>
<!-- 历史服务器web 端地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop10:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>运行Jobhistoryserver
1
[hadoop@hadoop10 hadoop-3.1.2]$ mapred --daemon start historyserver
查看进程 | 访问
http://hadoop10:19888/
查看 jobhistory1
2
3
4
5
6
7
8[hadoop@hadoop10 hadoop-3.1.2]$ jps
1408 NameNode
2096 NodeManager
3216 JobHistoryServer
1970 ResourceManager
1525 DataNode
1736 SecondaryNameNode
3309 Jps五、日志聚集
简单记一下吧,其实都一样
配置yarn_site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22<configuration>
<!-- 日志聚集功能使能-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时间设置7 天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- Reducer 获取数据的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN 的ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop10</value>
</property>
</configuration>关闭yarn(这里可以不重启hdfs和jobhistoryserver)
这里我删除了hdfs 下yarn生成的tmp文件,和原来wordcount输出的output文件
重启yarn
启动WordCount示例
访问
http://hadoop10:19888/jobhistory
找到刚刚完成的WordCount Job,应该也就一个
进入log
查看log,是这个亚子~