hadoop分布式部署与往gbase数据加载

发表于2025-11-24 14:17:1157次浏览3个评论

hadoop分布式部署与gbase数据加载

1，所有节点创建hadoop用户，并配置互信
操作步骤这里省略

2，所有节点都需要配置java和hadoop的环境变量如下：
java和hadoop的安装包已经安装如下环境变量放置到所有节点
[hadoop@hadoopnode1 ~]$ cat .bash_profile
# Source /root/.bashrc if user has one
[ -f ~/.bashrc ] && . ~/.bashrc
export JAVA_HOME=/home/hadoop/hadoop/jdk1.8.0_333
export HADOOP_HOME=/data1/hadoop/hadoop-3.4.2
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
alias cexec='bash /home/hadoop/ssh_cmd.sh'

3，所有节点修改hosts，包括gbase 8a数据库所有节点
[root@hadoopnode1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.28.201 hadoopnode1
192.168.28.202 hadoopnode2
192.168.28.203 hadoopnode3

4，先在首节点进行hadoop配置文件修改：

配置文件目录如下：
[hadoop@hadoopnode1 hadoop]$ cd ${HADOOP_HOME}/etc/hadoop/
[hadoop@hadoopnode1 hadoop]$ ls -l *.xml
-rw-r--r-- 1 hadoop hadoop 9213 Aug 20 18:48 capacity-scheduler.xml
-rw-r--r-- 1 hadoop hadoop 1006 Sep 7 08:47 core-site.xml
-rw-r--r-- 1 hadoop hadoop 14007 Aug 20 18:30 hadoop-policy.xml
-rw-r--r-- 1 hadoop hadoop 683 Aug 20 18:41 hdfs-rbf-site.xml
-rw-r--r-- 1 hadoop hadoop 1561 Sep 7 08:48 hdfs-site.xml
-rw-r--r-- 1 hadoop hadoop 620 Aug 20 18:35 httpfs-site.xml
-rw-r--r-- 1 hadoop hadoop 3518 Aug 20 18:31 kms-acls.xml
-rw-r--r-- 1 hadoop hadoop 682 Aug 20 18:31 kms-site.xml
-rw-r--r-- 1 hadoop hadoop 1272 Sep 7 08:18 mapred-site.xml
-rw-r--r-- 1 hadoop hadoop 690 Aug 20 18:48 yarn-site.xml
[hadoop@hadoopnode1 hadoop]$ pwd
/data1/hadoop/hadoop-3.4.2/etc/hadoop
[hadoop@hadoopnode1 hadoop]$

配置临时数据文件和hdfs地址core-site.xml 配置文件修改如下：

[hadoop@hadoopnode1 hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopnode1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data1/hadoop/data/tmp</value>
</property>
</configuration>
[hadoop@hadoopnode1 hadoop]$

配置hdfs存储相关配置hdfs-site.xml配置文件修改如下：

[hadoop@hadoopnode1 hadoop]$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopnode2:9001</value>
<description>secondaryNamenode地址和端口</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data1/hadoop/data/namenode</value>
<description>保存FsImage镜像的目录，作用是存放hadoop的名称节点namenode里的metadata</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data1/hadoop/data/data</value>
<description>存放HDFS文件系统数据文件的目录，作用是存放hadoop的数据节点datanode里的多个数据块</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>block块副本数，默认值3，根据自己节点数进行配置</description>
</property>
</configuration>
[hadoop@hadoopnode1 hadoop]$
mapred-site.xml配置文件修改如下，实际只添加mapreduce.framework.name即可
[hadoop@hadoopnode1 hadoop]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定运行mapreduce的环境是yarn</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoopnode1:10020</value>
<description>MR JobHistory服务器进程间通信地址</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoopnode1:19888</value>
<description>MR JobHistory服务器的用户界面地址</description>
</property>
</configuration>
[hadoop@hadoopnode1 hadoop]$
将其他datanode写入workers
[hadoop@hadoopnode1 hadoop]$ cat workers
hadoopnode1
hadoopnode2
hadoopnode3
[hadoop@hadoopnode1 hadoop]$

5，将配置文件推送到所有节点上
scp *.xml 192.168.28.202:/data1/hadoop/hadoop-3.4.2/etc/hadoop
scp *.xml 192.168.28.203:/data1/hadoop/hadoop-3.4.2/etc/hadoop

6，格式化格式化数据节点 hdfs namenode -format

7，start-all.sh 执行启动脚本
[hadoop@hadoopnode1 ~]$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoopnode1]
Starting datanodes
Starting secondary namenodes [hadoopnode2]
Starting resourcemanager
Starting nodemanagers
jps检查：
[hadoop@hadoopnode1 ~]$ jps
53024 JournalNode
86519 NameNode
87432 Jps
87259 NodeManager
86685 DataNode
86958 ResourceManager
[hadoop@hadoopnode1 ~]$
检查所有节点
[hadoop@hadoopnode1 ~]$ cexec 'source ~/.bash_profile;jps '
:::: 192.168.28.201 ::::
53024 JournalNode
86519 NameNode
87259 NodeManager
86685 DataNode
87551 Jps
86958 ResourceManager
:::: 192.168.28.202 ::::
78419 DataNode
78628 NodeManager
50757 JournalNode
78823 Jps
78538 SecondaryNameNode
:::: 192.168.28.203 ::::
78295 NodeManager
78495 Jps
[hadoop@hadoopnode1 ~]$ hdfs dfs -ls /

8，部署完成，开始创建数据进行测试：
[hadoop@hadoopnode1 ~]$ vim hdfs_put_test.txt
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ cat hdfs_put_test.txt
1234567
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ hdfs dfs -mkdir -p /mytest
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2025-09-07 08:54 /mytest
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ hdfs dfs -put /home/hadoop/h
hadoop/ hdfs_put_test.txt
[hadoop@hadoopnode1 ~]$ hdfs dfs -put /home/hadoop/h
hadoop/ hdfs_put_test.txt
[hadoop@hadoopnode1 ~]$ hdfs dfs -put /home/hadoop/hdfs_put_test.txt /mytest
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ hdfs dfs -ls /mytest
Found 1 items
-rw-r--r-- 2 hadoop supergroup 8 2025-09-07 08:55 /mytest/hdfs_put_test.txt
[hadoop@hadoopnode1 ~]$
[hadoop@hadoopnode1 ~]$ hadoop fs -ls hdfs://hadoopnode1:9000//mytest/hdfs_put_test.txt
-rw-r--r-- 2 hadoop supergroup 8 2025-09-07 08:55 hdfs://hadoopnode1:9000/mytest/hdfs_put_test.txt
[hadoop@hadoopnode1 ~]$

9，gbase8a上进行数据加载：

gbase> load data infile 'hdfs://hadoop:hadoop@hadoopnode1:9870/mytest/hdfs_put_test.txt' into table hdfs_load_test data_format 3;
Query OK, 1 row affected (Elapsed: 00:00:00.42)
Task 30 finished, Loaded 1 records, Skipped 0 records

gbase>
完成。

登录后才可以发表评论

枫溪发表于 5个月前

感谢分享！

崔哥发表于 1个月前

暖气潜催次第春，梅花已谢杏花新。半开半落闲园里，何异荣枯世上人。

流泪猫猫头发表于 1天前

学习了

hadoop分布式部署与往gbase数据加载

评论

热门帖子