hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/YARN.html
æ¢è£œã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã䜿çšããªãã®ã¯ãªãã§ããïŒ
-ãã¬ãŒãã³ã°ã åæ§ã®èšäºã¯ãClouderaãŸãã¯HortonWorksãã£ã¹ããªãã¥ãŒã·ã§ã³ã§ä»®æ³ãã·ã³ã€ã¡ãŒãžãããŠã³ããŒãããããã®æšå¥šäºé ããå§ãŸãããšããããããŸãã ååãšããŠããã£ã¹ããªãã¥ãŒã·ã§ã³ã¯å€ãã®ã³ã³ããŒãã³ããæã€è€éãªãšã³ã·ã¹ãã ã§ãã åå¿è ã«ãšã£ãŠãã©ãã§äœãã©ã®ããã«çžäºäœçšããããç解ããã®ã¯å®¹æã§ã¯ãããŸããã ã³ã³ããŒãã³ããäžåºŠã«1ã€ãã€æ€èšããæ©äŒãããããããŒãããå§ããŠããšã³ããªã®ãããå€ããããã«æžãããŸãã
-æ©èœãã¹ããšãã³ãããŒã¯ã 補åã®æ°ããããŒãžã§ã³ã®ãªãªãŒã¹ãšããã£ã¹ããªãã¥ãŒã·ã§ã³ã«è¡šç€ºãããç¬éãšã®éã«ããããªé ãããããŸãã æ°ããç»å ŽããããŒãžã§ã³ã®æ°æ©èœããã¹ãããå¿ èŠãããå Žåãæ¢è£œã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã䜿çšããããšã¯ã§ããŸããã ãŸããåããœãããŠã§ã¢ã®2ã€ã®ããŒãžã§ã³ã®ããã©ãŒãã³ã¹ãæ¯èŒããããšã¯å°é£ã§ããããã¯ãå®æãããã£ã¹ããªãã¥ãŒã·ã§ã³ã§ã¯ãéåžžã1ã€ã®ã³ã³ããŒãã³ãã®ããŒãžã§ã³ãæŽæ°ããä»ã®ãã¹ãŠããã®ãŸãŸã«ããæ¹æ³ããªãããã§ãã
-楜ãã¿ã®ããã ãã«ã
ãªããœãŒã¹ããã³ã³ãã€ã«ããã®ã§ããïŒ çµå±ã®ãšãããHadoopãã€ããªã¢ã»ã³ããªãå©çšå¯èœã§ãã
Hadoopã³ãŒãã®äžéšã¯C / C ++ã§èšè¿°ãããŠããŸãã éçºããŒã ãã©ã®ã·ã¹ãã äžã«æ§ç¯ããã®ãããããŸããããHadoopãã€ããªãã«ãã«ä»å±ããCã©ã€ãã©ãªã¯ãRHELã§ãDebian / Ubuntuã§ããªãlibcã®ããŒãžã§ã³ã«äŸåããŠããŸãã Hadoop Cã©ã€ãã©ãªã®éåäœæ§ã¯äžè¬ã«éèŠã§ã¯ãããŸããããäžéšã®æ©èœã¯ãããããªããã°æ©èœããŸããã
å ¬åŒã®ããã¥ã¡ã³ãã«ãã§ã«ãããã¹ãŠã®ãã®ãåèšè¿°ããªããã°ãªããªãã®ã¯ãªãã§ããïŒ
ãã®èšäºã¯æéãç¯çŽããããšãç®çãšããŠããŸãã å ¬åŒããã¥ã¡ã³ãã«ã¯ã¯ã€ãã¯ã¹ã¿ãŒãã®æé ã¯å«ãŸããŠããŸãã-å®è¡ããŠãã ããã äœããã®çç±ã§ãããã©ãHadoopãçµã¿ç«ãŠãå¿ èŠãããããè©Šè¡é¯èª€ã§ãããè¡ãæéããªãå Žåã¯ãã¢ãã¬ã¹ã«ã¢ã¯ã»ã¹ããŸããã
çµç«
louderaã«ãããšãã»ãšãã©ã®ã¯ã©ã¹ã¿ãŒã¯RHELããã³æŽŸçç©ïŒCentOSãOracle LinuxïŒã§åäœããŸãã 7çªç®ã®ããŒãžã§ã³ã¯ããªããžããªã«å¿ èŠãªããŒãžã§ã³ã®protobufã©ã€ãã©ãªããã§ã«ãããããæãé©ããŠããŸãã CentOS 6ã䜿çšããå Žåã¯ãèªåã§protobufããã«ãããå¿ èŠããããŸãã
ïŒèšäºãè€éã«ããªãããã«ïŒrootæš©éã§ã¢ã»ã³ããªããã³ãã®ä»ã®å®éšãè¡ããŸãã
Hadoopã³ãŒãã®çŽ95ïŒ ã¯Javaã§èšè¿°ãããŠããŸãã ãã«ãããã«ã¯ãOracle JDKãšMavenãå¿ èŠã§ãã
Oracleãµã€ãããææ°ã®JDKãããŠã³ããŒããã/ optã«è§£åããŸãã ãŸããJAVA_HOMEå€æ°ïŒHadoopã§äœ¿çšïŒãè¿œå ãã/ opt / java / binãã«ãŒããŠãŒã¶ãŒã®PATHã«è¿œå ããŸãïŒäŸ¿å®äžïŒã
cd ~ wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u112-b15/jdk-8u112-linux-x64.tar.gz tar xvf ~/jdk-8u112-linux-x64.tar.gz mv ~/jdk1.8.0_112 /opt/java echo "PATH=\"/opt/java/bin:\$PATH\"" >> ~/.bashrc echo "export JAVA_HOME=\"/opt/java\"" >> ~/.bashrc
Mavenãã€ã³ã¹ããŒã«ããŸãã ã¢ã»ã³ããªæ®µéã§ã®ã¿å¿ èŠã«ãªããŸãã ãããã£ãŠãç§ãã¡ã¯ãããèªå® ã«ã€ã³ã¹ããŒã«ããŸãïŒã¢ã»ã³ããªã®å®äºåŸãèªå® ã«æ®ã£ãŠãããã¹ãŠã®ãã¡ã€ã«ãåé€ã§ããŸãïŒã
cd ~ wget http://apache.rediris.es/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz tar xvf ~/apache-maven-3.3.9-bin.tar.gz mv ~/apache-maven-3.3.9 ~/maven echo "PATH=\"/root/maven/bin:\$PATH\"" >> ~/.bashrc source ~/.bashrc
Hadoopã³ãŒãã®4ã5ïŒ ã¯C / C ++ã§èšè¿°ãããŠããŸãã ã³ã³ãã€ã©ãŒãšã¢ã»ã³ããªã«å¿ èŠãªãã®ä»ã®ããã±ãŒãžãã€ã³ã¹ããŒã«ããŸãã
yum -y install gcc gcc-c++ autoconf automake libtool cmake
ãŸããããã€ãã®ãµãŒãããŒãã£ã©ã€ãã©ãªãå¿ èŠã«ãªããŸãã
yum -y install zlib-devel openssl openssl-devel snappy snappy-devel bzip2 bzip2-devel protobuf protobuf-devel
ã·ã¹ãã ã®æºåãã§ããŸããã / optã§HadoopãããŠã³ããŒãããã«ããã€ã³ã¹ããŒã«ããŸãã
cd ~ wget http://apache.rediris.es/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz tar -xvf ~/hadoop-2.7.3-src.tar.gz mv ~/hadoop-2.7.3-src ~/hadoop-src cd ~/hadoop-src mvn package -Pdist,native -DskipTests -Dtar tar -C/opt -xvf ~/hadoop-src/hadoop-dist/target/hadoop-2.7.3.tar.gz mv /opt/hadoop-* /opt/hadoop echo "PATH=\"/opt/hadoop/bin:\$PATH\"" >> ~/.bashrc source ~/.bashrc
ãã©ã€ããªèšå®
Hadoopã«ã¯çŽ1000åã®ãã©ã¡ãŒã¿ãŒããããŸãã 幞ããªããšã«ãHadoopãèµ·åããŠãã¹ã¿ãªã³ã°ã®æåã®ã¹ããããå®è¡ããã«ã¯ãçŽ40ã§ååã§ãããæ®ãã¯ããã©ã«ãã§æ®ããŸãã
å§ããŸãããã èŠããŠãããªããHadoopã/ opt / hadoopã«ã€ã³ã¹ããŒã«ããŸããã ãã¹ãŠã®æ§æãã¡ã€ã«ã¯/ opt / hadoop / etc / hadoopã«ãããŸãã åèšã§ã6ã€ã®æ§æãã¡ã€ã«ãç·šéããå¿ èŠããããŸãã 以äžã®ãã¹ãŠã®èšå®ãã³ãã³ãã®åœ¢åŒã§æäŸããŸãã ãã®èšäºã®ããã«Hadoopããã«ãããããšããŠãã人ã¯ãã³ã³ãœãŒã«ã«ã³ãã³ããç°¡åã«ã³ããŒããŠè²Œãä»ããããšãã§ããŸãã
ãŸããhadoop-env.shãã¡ã€ã«ãšyarn-env.shãã¡ã€ã«ã§JAVA_HOMEç°å¢å€æ°ãèšå®ããŸãã ãã®ããããã¹ãŠã®ã³ã³ããŒãã³ãã«ãJavaãã€ã³ã¹ããŒã«ãããŠããå Žæãç¥ãããŸãã
sed -i '1iJAVA_HOME=/opt/java' /opt/hadoop/etc/hadoop/hadoop-env.sh sed -i '1iJAVA_HOME=/opt/java' /opt/hadoop/etc/hadoop/yarn-env.sh
core-site.xmlãã¡ã€ã«ã§HDFSã®URLãæ§æããŸãã hdfsïŒ//ãã¬ãã£ãã¯ã¹ãNameNodeãå®è¡ãããŠãããã¹ãåãããã³ããŒãã§æ§æãããŸãã ãããè¡ãããªãå ŽåãHadoopã¯åæ£ãã¡ã€ã«ã·ã¹ãã ã䜿çšããŸããããã³ã³ãã¥ãŒã¿ãŒäžã®ããŒã«ã«ãã¡ã€ã«ã·ã¹ãã ã§åäœããŸãïŒããã©ã«ãURLïŒãã¡ã€ã«ïŒ///ïŒã
cat << EOF > /opt/hadoop/etc/hadoop/core-site.xml <configuration> <property><name>fs.defaultFS</name><value>hdfs://localhost:8020</value></property> </configuration> EOF
hdfs-site.xmlãã¡ã€ã«ã§ã¯ã4ã€ã®ãã©ã¡ãŒã¿ãŒãæ§æããŸãã ãã¯ã©ã¹ã¿ãŒãã¯1ã€ã®ããŒãã®ã¿ã§æ§æãããŠãããããã¬ããªã«ã®æ°ã1ã«èšå®ããŸãã ãŸããNameNodeãDataNodeãSecondaryNameNodeã®ããŒã¿ãä¿åããããã£ã¬ã¯ããªãæ§æããŸãã
cat << EOF > /opt/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property><name>dfs.replication</name><value>1</value></property> <property><name>dfs.namenode.name.dir</name><value>/data/dfs/nn</value></property> <property><name>dfs.datanode.data.dir</name><value>/data/dfs/dn</value></property> <property><name>dfs.namenode.checkpoint.dir</name><value>/data/dfs/snn</value></property> </configuration> EOF
HDFSã®æ§æãå®äºããŸããã NameNodeãšDataNodeãå®è¡ããFSãæäœããããšãå¯èœã§ãã ãããã次ã®ã»ã¯ã·ã§ã³ã«ãããæ®ããŸãããã YARNèšå®ã«ç§»ããŸãããã
cat << EOF > /opt/hadoop/etc/hadoop/yarn-site.xml <configuration> <property><name>yarn.resourcemanager.hostname</name><value>localhost</value></property> <property><name>yarn.nodemanager.resource.memory-mb</name><value>4096</value></property> <property><name>yarn.nodemanager.resource.cpu-vcores</name><value>4</value></property> <property><name>yarn.scheduler.maximum-allocation-mb</name><value>1024</value></property> <property><name>yarn.scheduler.maximum-allocation-vcores</name><value>1</value></property> <property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property> <property><name>yarn.nodemanager.local-dirs</name><value>/data/yarn</value></property> <property><name>yarn.nodemanager.log-dirs</name><value>/data/yarn/log</value></property> <property><name>yarn.log-aggregation-enable</name><value>true</value></property> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property> <property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property> </configuration> EOF
å€ãã®ãªãã·ã§ã³ããããŸãã ããããé çªã«èŠãŠãããŸãããã
yarn.resourcemanager.hostnameãã©ã¡ãŒã¿ãŒã¯ãRââesourceManagerãµãŒãã¹ãå®è¡ãããŠãããã¹ããæå®ããŸãã
ãã©ã¡ãŒã¿ãŒyarn.nodemanager.resource.memory-mbããã³yarn.nodemanager.resource.cpu-vcoresã¯ããããæãéèŠã§ãã ãã®äžã§ãåããŒããã³ã³ãããå®è¡ããããã«äœ¿çšã§ããã¡ã¢ãªãšCPUã³ã¢ã®éãã¯ã©ã¹ã¿ãŒã«äŒããŸãã
ãã©ã¡ãŒã¿ãŒyarn.scheduler.maximum-allocation-mbããã³yarn.scheduler.maximum-allocation-vcoresã¯ãåå¥ã®ã³ã³ãããŒã«å²ãåœãŠãããšãã§ããã¡ã¢ãªãŒãšã³ã¢ã®éã瀺ããŸãã 1ã€ã®ããŒãã§æ§æããããã¯ã©ã¹ã¿ãŒãã®ãã®æ§æã§ã¯ã4ã€ã®ã³ã³ãããŒãåæã«èµ·åã§ããŸãïŒãããã1 GBã®ã¡ã¢ãªïŒã
yarn.nodemanager.vmem-check-enabledãã©ã¡ãŒã¿ãŒãfalseã«èšå®ãããšã䜿çšãããä»®æ³ã¡ã¢ãªãŒã®éã®ãã§ãã¯ãç¡å¹ã«ãªããŸãã åã®æ®µèœãããããããã«ãåã³ã³ããã§äœ¿çšã§ããã¡ã¢ãªã¯å€ããããŸããããã®æ§æã§ã¯ãããããã¢ããªã±ãŒã·ã§ã³ã¯äœ¿çšå¯èœãªä»®æ³ã¡ã¢ãªã®å¶éãè¶ ããŸãã
yarn.nodemanager.local-dirsãã©ã¡ãŒã¿ãŒã¯ãã³ã³ãããŒã®äžæããŒã¿ãä¿åãããå Žæã瀺ããŸãïŒã¢ããªã±ãŒã·ã§ã³ãã€ãã³ãŒããæ§æãã¡ã€ã«ãå®è¡æã«çæãããäžæããŒã¿ãå«ãjarãªã©ïŒã
yarn.nodemanager.log-dirsãã©ã¡ãŒã¿ãŒã¯ãåã¿ã¹ã¯ã®ãã°ãããŒã«ã«ã«ä¿åãããå Žæãæå®ããŸãã
yarn.log-aggregation-enableãã©ã¡ãŒã¿ãŒã¯ããã°ãHDFSã«ä¿åããããæ瀺ããŸãã ã¢ããªã±ãŒã·ã§ã³ã®çµäºåŸãåããŒãã®yarn.nodemanager.log-dirsããã®ãã°ã¯HDFSïŒããã©ã«ãã§ã¯ã/ tmp / logsãã£ã¬ã¯ããªïŒã«ç§»åãããŸãã
yarn.nodemanager.aux-servicesããã³yarn.nodemanager.aux-services.mapreduce_shuffle.classãã©ã¡ãŒã¿ãŒã¯ãMapReduceãã¬ãŒã ã¯ãŒã¯ã®ãµãŒãããŒãã£ã·ã£ããã«ãµãŒãã¹ãæå®ããŸãã
ãããããYARNã®ãã¹ãŠã§ãã ãŸããMapReduceïŒå¯èœãªåæ£ã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ã®1ã€ïŒã®æ§æã瀺ããŸãã Sparkã®åºçŸã«ããæè¿äººæ°ã倱ããŸããããSparkã䜿çšãããŠããå Žæã¯ããã«å€ããããŸãã
cat << EOF > /opt/hadoop/etc/hadoop/mapred-site.xml <configuration> <property><name>mapreduce.framework.name</name><value>yarn</value></property> <property><name>mapreduce.jobhistory.address</name><value>localhost:10020</value></property> <property><name>mapreduce.jobhistory.webapp.address</name><value>localhost:19888</value></property> <property><name>mapreduce.job.reduce.slowstart.completedmaps</name><value>0.8</value></property> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>1</value></property> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>1024</value></property> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> <property><name>mapreduce.map.cpu.vcores</name><value>1</value></property> <property><name>mapreduce.map.memory.mb</name><value>1024</value></property> <property><name>mapreduce.map.java.opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> <property><name>mapreduce.reduce.cpu.vcores</name><value>1</value></property> <property><name>mapreduce.reduce.memory.mb</name><value>1024</value></property> <property><name>mapreduce.reduce.java.opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> </configuration> EOF
mapreduce.framework.nameãã©ã¡ãŒã¿ãŒã¯ãYARNã§MapReduceã¿ã¹ã¯ãå®è¡ããããšãæå®ããŸãïŒããã©ã«ãå€localã¯ãããã°ã«ã®ã¿äœ¿çšãããŸã-ãã¹ãŠã®ã¿ã¹ã¯ã¯åããã·ã³ã®åãjvmã§å®è¡ãããŸãïŒã
ãã©ã¡ãŒã¿ãŒmapreduce.jobhistory.addressããã³mapreduce.jobhistory.webapp.addressã¯ãJobHistoryãµãŒãã¹ãèµ·åãããããŒãã®ååã瀺ããŸãã
mapreduce.job.reduce.slowstart.completedmapsãã©ã¡ãŒã¿ãŒã¯ãããããã§ãŒãºã®80ïŒ ãå®äºããåã«ãªãã¥ãŒã¹ãã§ãŒãºãéå§ããããã«æ瀺ããŸãã
æ®ãã®ãã©ã¡ãŒã¿ãŒã¯ãããããŒããªãã¥ãŒãµãŒãããã³ã¢ããªã±ãŒã·ã§ã³ãã¹ã¿ãŒã®ã¡ã¢ãªãšCPUã³ã¢ããã³jvmããŒãã®å¯èœãªæ倧å€ãæå®ããŸãã ã芧ã®ãšããããããã¯yarn-site.xmlã§å®çŸ©ããYARNã³ã³ãããŒã®å¯Ÿå¿ããå€ãè¶ ããŠã¯ãªããŸããã JVMããŒãå€ã¯éåžžã* .memory.mbãã©ã¡ãŒã¿ãŒã®75ïŒ ã«èšå®ãããŸãã
éå§ãã
HDFSããŒã¿ãä¿åãã/ dataãã£ã¬ã¯ããªãšãYARNã³ã³ããã®äžæãã¡ã€ã«ãäœæããŸãã
mkdir /data
HDFSã®ãã©ãŒããã
hadoop namenode -format
ãããŠæåŸã«ããã¯ã©ã¹ã¿ãŒãã®ãã¹ãŠã®ãµãŒãã¹ãéå§ããŸãã
/opt/hadoop/sbin/hadoop-daemon.sh start namenode /opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager /opt/hadoop/sbin/yarn-daemon.sh start nodemanager /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
ãã¹ãŠãé 調ã«é²ãã å ŽåïŒ/ opt / hadoop / logsã®ãã°ã§ãšã©ãŒã¡ãã»ãŒãžã確èªã§ããŸãïŒãHadoopããããã€ãããŠæºåå®äºã§ã...
ãã«ã¹ãã§ãã¯
hadoopãã£ã¬ã¯ããªæ§é ãèŠãŠã¿ãŸãããã
/opt/hadoop/ âââ bin âââ etc â âââ hadoop âââ include âââ lib â âââ native âââ libexec âââ logs âââ sbin âââ share âââ doc â âââ hadoop âââ hadoop âââ common âââ hdfs âââ httpfs âââ kms âââ mapreduce âââ tools âââ yarn
HadoopèªäœïŒå®è¡å¯èœãªJavaãã€ãã³ãŒãïŒã¯å ±æãã£ã¬ã¯ããªã«ãããã³ã³ããŒãã³ãïŒhdfsãyarnãmapreduceãªã©ïŒã«åå²ãããŠããŸãã libãã£ã¬ã¯ããªã«ã¯ãCã§èšè¿°ãããã©ã€ãã©ãªãå«ãŸããŠããŸãã
ä»ã®ãã£ã¬ã¯ããªã®ç®çã¯çŽæçã§ãïŒbin-Hadoopãæäœããããã®ã³ãã³ãã©ã€ã³ãŠãŒãã£ãªãã£ãsbin-èµ·åã¹ã¯ãªãããªã©-æ§æããã°-ãã°ã ç§ãã¡ã¯äž»ã«ãbinãã£ã¬ã¯ããªã«ãã2ã€ã®ãŠãŒãã£ãªãã£ãhdfsãšyarnã«èå³ãæã£ãŠããŸãã
èŠããŠãããªãããã§ã«HDFSããã©ãŒãããããå¿ èŠãªãã¹ãŠã®ããã»ã¹ãéå§ããŠããŸãã HDFSã«ãããã®ãèŠãŠã¿ãŸãããã
hdfs dfs -ls -R / drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history/done_intermediate
ãã®ãã£ã¬ã¯ããªæ§é ã¯æ瀺çã«äœæããŸããã§ããããJobHistoryãµãŒãã¹ã«ãã£ãŠäœæãããŸããïŒæåŸã«èµ·åãããããŒã¢ã³ïŒmr-jobhistory-daemon.sh start historyserverïŒã
/ dataãã£ã¬ã¯ããªã«ãããã®ãèŠãŠã¿ãŸãããïŒ
/data/ âââ dfs â âââ dn â â âââ current â â â âââ BP-1600342399-192.168.122.70-1483626613224 â â â â âââ current â â â â â âââ finalized â â â â â âââ rbw â â â â â âââ VERSION â â â â âââ scanner.cursor â â â â âââ tmp â â â âââ VERSION â â âââ in_use.lock â âââ nn â âââ current â â âââ edits_inprogress_0000000000000000001 â â âââ fsimage_0000000000000000000 â â âââ fsimage_0000000000000000000.md5 â â âââ seen_txid â â âââ VERSION â âââ in_use.lock âââ yarn âââ filecache âââ log âââ nmPrivate âââ usercache
ã芧ã®ãšããã/ data / dfs / nnã§NameNodeãfsimageãã¡ã€ã«ãšæåã®ç·šéãã¡ã€ã«ãäœæããŸããã / data / dfs / dnã«ãDataNodeã¯ããŒã¿ãããã¯ãä¿åããããã®ãã£ã¬ã¯ããªãäœæããŸããããããŒã¿èªäœã¯ãŸã ååšããŠããŸããã
ããŒã«ã«FSããHDFSã«ãã¡ã€ã«ãã³ããŒããŸãã
hdfs dfs -put /var/log/messages /tmp/ hdfs dfs -ls /tmp/messages -rw-r--r-- 1 root supergroup 375974 2017-01-05 09:33 /tmp/messages
/ããŒã¿ã®å 容ãããäžåºŠèŠãŠã¿ãŸããã
/data/dfs/dn âââ current â âââ BP-1600342399-192.168.122.70-1483626613224 â â âââ current â â â âââ finalized â â â â âââ subdir0 â â â â âââ subdir0 â â â â âââ blk_1073741825 â â â â âââ blk_1073741825_1001.meta â â â âââ rbw â â â âââ VERSION â â âââ scanner.cursor â â âââ tmp â âââ VERSION âââ in_use.lock
ãã£ããŒïŒ æåã®ãããã¯ãšãã®ãã§ãã¯ãµã ã衚瀺ãããŸããã
YARNãæ£åžžã«æ©èœããããšã確èªããããã«ãã¢ããªã±ãŒã·ã§ã³ãå®è¡ããŠã¿ãŸãããã ããšãã°ãhadoop-mapreduce-examples.jarããã±ãŒãžã®piïŒ
yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 3 100000 ⊠Job Finished in 37.837 seconds Estimated value of Pi is 3.14168000000000000000
ã¢ããªã±ãŒã·ã§ã³ã®å®è¡äžã«/ data / yarnã®å 容ãèŠããšãYARNã¢ããªã±ãŒã·ã§ã³ãã©ã®ããã«å®è¡ããããã«ã€ããŠå€ãã®èå³æ·±ãããšãããããŸãã
/data/yarn/ âââ filecache âââ log â âââ application_1483628783579_0001 â âââ container_1483628783579_0001_01_000001 â â âââ stderr â â âââ stdout â â âââ syslog â âââ container_1483628783579_0001_01_000002 â â âââ stderr â â âââ stdout â â âââ syslog â âââ container_1483628783579_0001_01_000003 â â âââ stderr â â âââ stdout â â âââ syslog â âââ container_1483628783579_0001_01_000004 â âââ stderr â âââ stdout â âââ syslog âââ nmPrivate â âââ application_1483628783579_0001 â âââ container_1483628783579_0001_01_000001 â â âââ container_1483628783579_0001_01_000001.pid â â âââ container_1483628783579_0001_01_000001.tokens â â âââ launch_container.sh â âââ container_1483628783579_0001_01_000002 â â âââ container_1483628783579_0001_01_000002.pid â â âââ container_1483628783579_0001_01_000002.tokens â â âââ launch_container.sh â âââ container_1483628783579_0001_01_000003 â â âââ container_1483628783579_0001_01_000003.pid â â âââ container_1483628783579_0001_01_000003.tokens â â âââ launch_container.sh â âââ container_1483628783579_0001_01_000004 â âââ container_1483628783579_0001_01_000004.pid â âââ container_1483628783579_0001_01_000004.tokens â âââ launch_container.sh âââ usercache âââ root âââ appcache â âââ application_1483628783579_0001 â âââ container_1483628783579_0001_01_000001 â â âââ container_tokens â â âââ default_container_executor_session.sh â â âââ default_container_executor.sh â â âââ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar â â âââ jobSubmitDir â â â âââ job.split -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/12/job.split â â â âââ job.splitmetainfo -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/10/job.splitmetainfo â â âââ job.xml -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/13/job.xml â â âââ launch_container.sh â â âââ tmp â â âââ Jetty_0_0_0_0_37883_mapreduce____.rposvq â â âââ webapp â â âââ webapps â â âââ mapreduce â âââ container_1483628783579_0001_01_000002 â â âââ container_tokens â â âââ default_container_executor_session.sh â â âââ default_container_executor.sh â â âââ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar â â âââ job.xml â â âââ launch_container.sh â â âââ tmp â âââ container_1483628783579_0001_01_000003 â â âââ container_tokens â â âââ default_container_executor_session.sh â â âââ default_container_executor.sh â â âââ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar â â âââ job.xml â â âââ launch_container.sh â â âââ tmp â âââ container_1483628783579_0001_01_000004 â â âââ container_tokens â â âââ default_container_executor_session.sh â â âââ default_container_executor.sh â â âââ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar â â âââ job.xml â â âââ launch_container.sh â â âââ tmp â âââ filecache â â âââ 10 â â â âââ job.splitmetainfo â â âââ 11 â â â âââ job.jar â â â âââ job.jar â â âââ 12 â â â âââ job.split â â âââ 13 â â âââ job.xml â âââ work âââ filecache 42 directories, 50 files
ç¹ã«ããã°ã¯/ data / yarn / logïŒyarn-site.xmlã®ãã©ã¡ãŒã¿ãŒyarn.nodemanager.log-dirsïŒã«æžã蟌ãŸããŠããããšãããããŸãã
ã¢ããªã±ãŒã·ã§ã³/ããŒã¿/ã€ãŒã³ã®æåŸã«å ã®åœ¢åŒããããŸãïŒ
/data/yarn/ âââ filecache âââ log âââ nmPrivate âââ usercache âââ root âââ appcache âââ filecache
HDFSã®å 容ãããäžåºŠèŠããšããã°ã®éèšãæ©èœããŠããããšãããããŸãïŒæ°ããå®è¡ãããã¢ããªã±ãŒã·ã§ã³ã®ãã°ã¯ããŒã«ã«FS / data / yarn / logããHDFS / tmp / logsã«ç§»åãããŸããïŒã
ãŸããJobHistoryãµãŒãã¹ãã¢ããªã±ãŒã·ã§ã³ã«é¢ããæ å ±ã/ tmp / hadoop-yarn / staging / history / doneã«ä¿åããŠããããšãããããŸãã
hdfs dfs -ls -R / drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000 -rwxrwx--- 1 root supergroup 46338 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000/job_1483628783579_0001-1483629144632-root-QuasiMonteCarlo-1483629179995-3-1-SUCCEEDED-default-1483629156270.jhist -rwxrwx--- 1 root supergroup 117543 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000/job_1483628783579_0001_conf.xml drwxrwxrwt - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging/history/done_intermediate drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done_intermediate/root drwx------ - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging/root drwx------ - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/root/.staging drwxrwxrwt - root supergroup 0 2017-01-05 10:12 /tmp/logs drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/logs/root drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/logs/root/logs drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/logs/root/logs/application_1483628783579_0001 -rw-r----- 1 root supergroup 65829 2017-01-05 10:13 /tmp/logs/root/logs/application_1483628783579_0001/master.local_37940 drwxr-xr-x - root supergroup 0 2017-01-05 10:12 /user drwxr-xr-x - root supergroup 0 2017-01-05 10:13 /user/root
åæ£ã¯ã©ã¹ã¿ãŒãã¹ã
ããããããããŸã§ã®ãšããããã¯ã©ã¹ã¿ãŒããåŒçšç¬Šã§å²ãã§ããããšã«æ°ã¥ããã§ãããã çµå±ã®ãšããããã¹ãŠãåããã·ã³äžã§æ©èœããŸãã ãã®è¿·æãªèª€è§£ãä¿®æ£ããŸãã çã®åæ£ã¯ã©ã¹ã¿ãŒã§Hadoopããã¹ãããŸãã
ãŸããHadoopæ§æã調æŽããŸãã çŸæç¹ã§ã¯ãHadoopæ§æã®ãã¹ãåã¯localhostãšããŠæå®ãããŠããŸãã ãã®æ§æãä»ã®ããŒãã«ã³ããŒããã ãã®å ŽåãåããŒãã¯ãã¹ãäžã®NameNodeãResourceManagerãããã³JobHistoryãµãŒãã¹ãèŠã€ããããšããŸãã ãããã£ãŠããããã®ãµãŒãã¹ã§ãã¹ãåãäºåã«æ±ºå®ããæ§æãå€æŽããŸãã
ç§ã®å Žåãäžèšã®ãã¹ãŠã®ãã¹ã¿ãŒãµãŒãã¹ïŒNameNodeãResourceManagerãJobHistoryïŒã¯master.localãã¹ãã§å®è¡ãããŸãã æ§æã®localhostãmaster.localã«çœ®ãæããŸãã
cd /opt/hadoop/etc/hadoop sed -i 's/localhost/master.local/' core-site.xml hdfs-site.xml yarn-site.xml mapred-site.xml
ããã§ã2åæ§ç¯ããä»®æ³ãã·ã³ã®ã¯ããŒã³ãäœæããŠã2ã€ã®ã¹ã¬ãŒãããŒããååŸããŸãã ã¹ã¬ãŒãããŒãã§ã¯ãäžæã®ãã¹ãåãèšå®ããå¿ èŠããããŸãïŒç§ã®å Žåã¯ãslave1.localããã³slave2.localã§ãïŒã ãŸããã¯ã©ã¹ã¿ãŒã®3ã€ã®ããŒããã¹ãŠã§ã/ etc / hostsãæ§æããã¯ã©ã¹ã¿ãŒå ã®åãã·ã³ããã¹ãåã§ä»ã®ãã·ã³ã«ã¢ã¯ã»ã¹ã§ããããã«ããŸãã ç§ã®å Žåãããã¯æ¬¡ã®ããã«ãªããŸãïŒ3å°ã®ãã·ã³ãã¹ãŠã§åãã³ã³ãã³ãïŒïŒ
cat /etc/hosts ⊠192.168.122.70 master.local 192.168.122.59 slave1.local 192.168.122.217 slave2.local
ããã«ãslave1.localããã³slave2.localããŒãã§ã/ data / dfs / dnã®å 容ãã¯ãªã¢ããå¿ èŠããããŸã
rm -rf /data/dfs/dn/*
ãã¹ãŠæºåå®äºã§ãã master.localã§ããã¹ãŠã®ãµãŒãã¹ãå®è¡ããŸãã
/opt/hadoop/sbin/hadoop-daemon.sh start namenode /opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager /opt/hadoop/sbin/yarn-daemon.sh start nodemanager /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
slave1.localããã³slave2.localã§ãDataNodeããã³NodeManagerã®ã¿ãå®è¡ããŸãã
/opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start nodemanager
ã¯ã©ã¹ã¿ãŒã3ã€ã®ããŒãã§æ§æãããŠããããšã確èªããŸãããã
HDFSã®å Žåãdfsadmin -reportã³ãã³ãã®åºåãèŠãŠã3ã€ã®ãã·ã³ãã¹ãŠãLiveããŒã¿ããŒããªã¹ãã«å«ãŸããŠããããšã確èªããŸãã
hdfs dfsadmin -report ... Live datanodes (3): ⊠Name: 192.168.122.70:50010 (master.local) ... Name: 192.168.122.59:50010 (slave1.local) ... Name: 192.168.122.217:50010 (slave2.local)
ãŸãã¯ãNameNode WebããŒãžã«ç§»åããŸãã
master.local ïŒ50070 / dfshealth.htmlïŒtab-datanode
YARNã®å Žåãnode -listã³ãã³ãã®åºåãèŠãŠãã ããïŒ
yarn node -list -all 17/01/06 06:17:52 INFO client.RMProxy: Connecting to ResourceManager at master.local/192.168.122.70:8032 Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers slave2.local:39694 RUNNING slave2.local:8042 0 slave1.local:36880 RUNNING slave1.local:8042 0 master.local:44373 RUNNING master.local:8042 0
ãŸãã¯ãResourceManager WebããŒãžã«ç§»åããŸã
master.local ïŒ8088 /ã¯ã©ã¹ã¿ãŒ/ããŒã
ãã¹ãŠã®ããŒãã¯ãã¹ããŒã¿ã¹ãRUNNINGã®ãªã¹ãã«å«ãŸããŠããå¿ èŠããããŸãã
æåŸã«ãå®è¡äžã®MapReduceã¢ããªã±ãŒã·ã§ã³ã3ã€ã®ããŒããã¹ãŠã§ãªãœãŒã¹ã䜿çšããããšã確èªããŸãã hadoop-mapreduce-examples.jarãã䜿ãæ £ããPiã¢ããªã±ãŒã·ã§ã³ãå®è¡ããŸãã
yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 30 1000
å®è¡æã«ãå床yarn node -list -allã®åºåã確èªããŸãã
... Node-Id Node-State Node-Http-Address Number-of-Running-Containers slave2.local:39694 RUNNING slave2.local:8042 4 slave1.local:36880 RUNNING slave1.local:8042 4 master.local:44373 RUNNING master.local:8042 4
å®è¡ã³ã³ããã®æ°-åããŒãã«4ã€ã
master.local ïŒ8088 / cluster / nodesã«ç§»åããŠãåããŒãã§ãã¹ãŠã®ã¢ããªã±ãŒã·ã§ã³ãåèšã§äœ¿çšããã³ã¢ãšã¡ã¢ãªã®æ°ã確èªããããšãã§ããŸãã
ãããã«
ãœãŒã¹ã³ãŒãããHadoopãã³ã³ãã€ã«ããå¥ã®ãã·ã³ããã³åæ£ã¯ã©ã¹ã¿ãŒã§æ©èœãã€ã³ã¹ããŒã«ãæ§æãããã³ãã¹ãããŸããã ãããã¯ã«èå³ãããå Žåãåæ§ã®æ¹æ³ã§Hadoopãšã³ã·ã¹ãã ããä»ã®ãµãŒãã¹ãåéããå Žåã¯ãèªåã®ããŒãºã«å¯Ÿå¿ããã¹ã¯ãªãããžã®ãªã³ã¯ãæ®ããŸãã
github.com/hadoopfromscratch/hadoopfromscratch
ããã䜿çšããŠãzookeeperãsparkãhiveãhbaseãcassandraãflumeãã€ã³ã¹ããŒã«ã§ããŸãã ãšã©ãŒãäžæ£ç¢ºãªç¹ãèŠã€ããããæžããŠãã ããã æ¬åœã«ãããããã§ãã