1. Download Hadoop
從 Hadoop 官網下載,或透過 wget 直接下載。(Hadoop download link)
2. Extract, Move, and Create Symlink
$ tar -zxf hadoop-2.6.0.tar.gz
$ sudo mv hadoop-2.6.0/ /usr/local/
$ cd /usr/local
$ sudo ln -s hadoop-2.6.0/ hadoop3. Create a Dedicated Hadoop User
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo
$ sudo chown -R hduser:hadoop /usr/local/hadoop/4. Configure SSH Passwordless Login
$ su hduser
$ sudo apt-get install ssh
$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys && chmod 700 ~/.ssh5. Adjust /etc/sysctl.conf File
在檔案底部加入:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1修改後:$ sudo service networking restart 或 $ sudo shutdown -r now
6. Test SSH Connection
$ su hduser
$ ssh localhost
'Are you sure you want to continue connecting?'(yes/no)yes7. Update and Install Java
$ sudo apt-get update
$ sudo apt-get install default-jdk若已安裝,找到 JAVA_HOME 路徑:$ which java | sed -e 's/\(.*\)\/bin\/java/\1/g'
切換為 hduser($ su hduser)並修改 ~/.bashrc,在底部加入:
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr重新載入設定:$ source ~/.bashrc
8. Adjust Hadoop Configuration Files
建立 HDFS 目錄
$ su hduser
$ mkdir /usr/local/hadoop/data修改 hadoop-env.sh
將 JAVA_HOME=${JAVA_HOME} 改為 JAVA_HOME=/usr,並修改 HADOOP_OPTS:
hadoop-env.sh
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=${HADOOP_PREFIX}/lib"
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native修改 yarn-env.sh
yarn-env.sh
export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-"/lib/native"}
export HADOOP_OPTS="-Djava.library.path=${HADOOP_PREFIX}/lib"修改 core-site.xml
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>修改 mapred-site.xml
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>修改 hdfs-site.xml
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>修改 yarn-site.xml
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>9. Format Namenode and Start Hadoop
$ /usr/local/hadoop/bin/hadoop namenode -format
$ /usr/local/hadoop/sbin/start-dfs.sh
$ /usr/local/hadoop/sbin/start-yarn.sh10. Verify Installation
執行 $ jps 確認各 process 正常運行。
開啟瀏覽器並前往 http://localhost:50070 查看 Web 介面。
原文發表於 Medium