芪æãªãèªè
ã®çãããããã«ã¡ã¯ïŒ
å°ãåã«ãHadoopãã¬ãŒã ã¯ãŒã¯ã«é¢ããOreille ã®åºæ¬çãªæ¬ã®ç¿»èš³ãå
¬ââéããŸããã
çŸåšãç·šéè
ã¯ããã®æ¬ã®æ°ãã第4çã翻蚳ããããæ¢åã®æ¬ãå°å·ããããé£ããéžæã«çŽé¢ããŠããŸãã
ãã®ããã2013幎ã«Thoughtworksããã°ã«æ²èŒãããã¢ãã³ãã¯ãªã·ã¥ãã¹ã¯ãã«ããèšäºã®ç¿»èš³ãå
¬éããããšã«ããŸãããèè
ã¯ãHadoopã䜿çšããã®ãé©åã§ãäžèŠãªå ŽåãåæããããšããŠããŸãã
ãã®è³æãèå³æ·±ããè«äºãåŒãèµ·ãããHadoopã§ã®äœæ¥ã®å°è±¡ãå
±æãã調æ»ã«åå ããããšãé¡ã£ãŠããŸãã
Hadoopã·ã¹ãã ã¯ãå€ãã®å Žåãçµç¹ãåé¡ã«æ±ºå®çã«å¯ŸåŠããã®ã«åœ¹ç«ã€ãŠãããŒãµã«ãã¬ãŒã ã¯ãŒã¯ãšããŠäœçœ®ä»ããããŠããŸãã ãããã°ããŒã¿ããŸãã¯ãåæãã«èšåããã ãã§ãããã«é©åãªçããHadoopïŒããèŠã€ããŸãã ãã ããHadoopãã¬ãŒã ã¯ãŒã¯ã¯ãéåžžã«ç¹å®ã®ã¯ã©ã¹ã®åé¡ã解決ããããã«èšèšãããŠããŸãã ãã以å€ã®å Žåã軜床ã§äžå®å
šãªãã®ã«ããããšã§ããHadoopã䜿çšããããšã¯æãããªééãã§ãã ããŒã¿å€æïŒåºçŸ©-ETLæäœ-ãæœåºãå€æãããŒããïŒã¯Hadoopã䜿çšããŠå€§å¹
ã«æé©åãããŸãããããžãã¹ã«ä»¥äžã®5ã€ã®ããããã£ã®å°ãªããšã1ã€ãããå Žåã¯ãHadoopã䜿çšããªãã§ãã ããã ã
1.ããã°ããŒã¿ãžã®æžæ
å€ãã®äŒæ¥ã¯ãèªç±ã«äœ¿ããããŒã¿ãã倧ãã®ã¹ããŒã¿ã¹ãåŒãåºããšä¿¡ããŠããŸãããæ®å¿µãªãããã»ãšãã©ã®å Žåããã®æšå®å€ã¯é倧è©äŸ¡ãããŠããŸãã 調æ»èšäºãã¯ã©ã¹ã¿ãŒã®è³Œå
¥ã§è§£éããã人ã¯ããŸãããã§ã¯ãã倧ããšåºãä¿¡ããããŠããããŒã¿éãè©äŸ¡ã§ããŸãã èè
ã¯ãHadoopã·ã¹ãã ã¯ãã©ãã€ãããã³ãã¿ãã€ãã®ããŒã¿ããªã¥ãŒã ãåŠçããããã«äœæããããã®ã®ãã»ãšãã©ã®å®çšçãªåé¡ã解決ããããã«ãå
¥åããŒã¿ããªã¥ãŒã ã¯100 GBãè¶
ããŠããªããšçµè«ä»ããŠããŸãïŒMicrosoftïŒYahooã®ãžã§ããµã€ãºã®äžå€®å€ã¯14 GBæªæºã§ãã¿ã¹ã¯ã®90ïŒ
Facebookã¯100 GBãã¯ããã«äžåã£ãŠããŸãã ãããã£ãŠããã®èšäºã®èè
ã¯ãå¿
èŠã«å¿ããŠHadoopãåäœããã€ã³ãã©ã¹ãã©ã¯ãã£ãæ°Žå¹³æ¹åã«ã¹ã±ãŒãªã³ã°ããã®ã§ã¯ãªããã€ã³ãã©ã¹ãã©ã¯ãã£ã®äžæçãªåçŽã¹ã±ãŒãªã³ã°çšã«å¥ã®ãµãŒããŒãå²ãåœãŠãããšãé©åã§ãããšèããŠããŸãã
èªåããŠãã ããïŒ
â¢æ°ãã©ãã€ã以äžã®ããŒã¿ããããŸããïŒ
â¢å®å®ããéåžžã«å€§éã®ããŒã¿ãããŒããããŸããïŒ
â¢ã©ã®ãããã®ããŒã¿ãæäœããŸããïŒ
2.ã€ã³ã©ã€ã³
ãžã§ããéä¿¡ããå ŽåãHadoopã®æå°é
延ã¯çŽ1åã§ãã ãããã£ãŠãã·ã¹ãã ã¯è²©å£²æ³šæã«å¿çããŠæšå¥šäºé
ãæäŸããã®ã«1å以äžããããŸãã éåžžã«å¿ å®ã§å¿è匷ãã¯ã©ã€ã¢ã³ãã®ã¿ã60ç§ä»¥äžç»é¢ãèŠãŠãåçãåŸ
ã¡ãŸãã ãŸãã¯ããªã¹ãã«æ¢ã«ããåèŠçŽ ïŒHadoopã䜿çšããŠã¢ããªãªãªïŒã®é¢é£èŠçŽ ãäºåèšç®ãããµã€ããŸãã¯ã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ã«ä¿åãããçµæãžã®å³æïŒ2çªç®ïŒã¢ã¯ã»ã¹ãæäŸã§ããŸãã Hadoopã¯ããã®ãããªäºåèšç®ã®ããã®åªãããšã³ãžã³ã§ãããããã°ããŒã¿ã§ã®äœæ¥ãç°¡çŽ åããŸãã ãã¡ããããã®çš®ã®å
žåçãªå¿çãè€éã«ãªãã»ã©ãçµæã®å®å
šãªäºæž¬ã¯ããŸãå¹æçã§ã¯ãããŸããã
èªåããŠãã ããïŒ
â¢ããã°ã©ã ã®å¿çé床ã«é¢ãããŠãŒã¶ãŒã®æåŸ
ã¯äœã§ããïŒ
â¢ã©ã®ã¿ã¹ã¯ãããã±ãŒãžã«çµåã§ããŸããïŒ
3.é»è©±ãžã®åçãå±ããŸã...
Hadoopã¯ããªã¯ãšã¹ããžã®ãªã¢ã«ã¿ã€ã ã®å¿çãå¿
èŠãšããå Žåã«äœ¿çšããããã®ãã®ã§ã¯ãããŸããã map-reduceãµã€ã¯ã«ãééãããžã§ããã·ã£ããã«ãµã€ã¯ã«ã«æéãããããŸãã ãããã®äž¡æ¹ã®ãµã€ã¯ã«ã®æéã¯ç¡å¶éã§ãããã®çµæãHadoopã«åºã¥ããªã¢ã«ã¿ã€ã ã¢ããªã±ãŒã·ã§ã³ã®éçºã¯éåžžã«è€éã§ãã ããªã¥ãŒã å éå¹³åäŸ¡æ Œãèæ
®ããååŒã¯ãã·ã¹ãã ããã©ã³ã¶ã¯ã·ã§ã³ãå®äºããããã«éçšäžã®å¯Ÿå¿ãå¿
èŠãšããå®çšçãªäŸã§ãã
ã¢ããªã¹ãã¯SQLãªãã§ã¯ã§ããŸããã Hadoopã¯ãããŒã¿ã»ãããžã®ã©ã³ãã ã¢ã¯ã»ã¹ã«ã¯ããŸãé©ããŠããŸããïŒã¯ãšãªããMapReduceãžã§ããå®éã«çæããHiveã§ãïŒã Googleã®Dremelã¢ãŒããã¯ãã£ïŒãããŠãã¡ããBigQueryïŒã¯ã巚倧ãªè¡ã»ããã§æ°ç§ä»¥å
ã®èªçºçãªã¯ãšãªããµããŒãããããã«ç¹å¥ã«èšèšãããŠããŸãã ãã ããSQLã䜿çšãããšãããŒãã«éã®é¢ä¿ãäœæã§ããŸãã ä»ã®ææãªä»£æ¿æ¡ã¯ãã«ãªãã©ã«ãã¢å€§åŠã®ãµã¡ã®éçºãããŒã¯ã¬ãŒå€§åŠã®AmpLabã®éçºãããã³ããŒãã³ã¯ãŒã¯ã¹ã«ãã£ãŠå®è£
ãããã¹ãã£ã³ã¬ãŒã®ã€ãã·ã¢ããã§ãã
èªåããŠãã ããïŒ
â¢ãŠãŒã¶ãŒ/ã¢ããªã¹ãã¯ããŒã¿ãšã©ã®çšåºŠå¯æ¥ã«å¯Ÿè©±ããå¿
èŠããããŸããïŒ
â¢ãã©ãã€ãã®ããŒã¿ãšã®å¯Ÿè©±æ§ãå¿
èŠã§ããããããšãæ
å ±ã®ããäžéšã®ã¿ãå¿
èŠã§ããïŒ
ãã®ãããHadoopã¯ãããã¢ãŒãã§åäœããŸãã ããã¯ãæ°ããæ
å ±ãè¿œå ãããšãã«ãã¿ã¹ã¯ãããŒã¿ã»ããå
šäœãå床éžå¥ããå¿
èŠãããããšãæå³ããŸãã ãããã£ãŠãåæã®æéãé·ããªããŸãã ããŒã¿ã¹ãããã-åãªãæŽæ°ãŸãã¯å°ããªå€æŽ-ããªã¢ã«ã¿ã€ã ã§ååŸã§ããŸãã å€ãã®å Žåãããžãã¹ã¯ãããã®ã€ãã³ãã«åºã¥ããŠæ±ºå®ãäžãå¿
èŠããããŸãã æ°ããããŒã¿ãã·ã¹ãã ã«ã©ãã ãéãããŠã³ããŒããããŠããHadoopã¯ãããã¢ãŒãã§åŠçããŸãã ããããå°æ¥ããã®åé¡ã¯YARNã®å©ããåããŠè§£æ±ºãããã§ãããã Twitterã®Stormãœãªã¥ãŒã·ã§ã³ã¯ãã§ã«äººæ°ããããæé ãªäŸ¡æ Œã®ä»£æ¿åã§ãã StormãšKafkaãªã©ã®åæ£ã¡ãã»ãŒãžã³ã°ã·ã¹ãã ãçµã¿åãããããšã§ãã¹ããªãŒãã³ã°ã®éçŽãšããŒã¿åŠçã®ããã®å€ãã®å¯èœæ§ãæäŸãããŸãã ãã ããã¹ããŒã ã¯è² è·åæ£ã«éåžžã«æ¬ ããŠããŸãããYahooã®S4ã«ã¯ãã®æ©èœããããŸãã
èªåããŠãã ããïŒ
â¢ããŒã¿ã®ãæå¹æéããšã¯äœã§ããïŒ
â¢ããžãã¹ã¯ãçä¿¡ããŒã¿ããã©ããããã®éãã§åçãäžããå¿
èŠããããŸããïŒ
â¢ããžãã¹ããªã¢ã«ã¿ã€ã ã§å€æŽãŸãã¯æŽæ°ã«å¯Ÿå¿ããããšã¯ã©ãã»ã©éèŠã§ããïŒ
ã»ã³ãµãŒããã®ãªã¢ã«ã¿ã€ã åºåãŸãã¯è¿œè·¡ããŒã¿ã«ã¯ããªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°å
¥ååŠçãå¿
èŠã§ãã ããããHadoopãŸãã¯ããã«åºã¥ããŠå®è£
ãããããŒã«ãå¯äžã®éžæè¢ã§ã¯ãããŸããã ããšãã°ãæè¿ã®Indy 500ã¬ãŒã¹ã§ã¯ãRAMã«ä¿åãããSAPã®HANAããŒã¿ããŒã¹ããMATLABãšãšãã«McLarenã®ATLASåæããŒã«ã§äœ¿çšãããã¬ãŒã¹äžã«ã¢ãã«ãå®è¡ãããã¬ã¡ããªãŒã«å¿çããŸããã å€ãã®ã¢ããªã¹ãã¯ãHadoopã®å°æ¥ã¯åæ¹åæ§ãšãªã¢ã«ã¿ã€ã ã®äœæ¥ã«é¢é£ããŠãããšèããŠããŸãã
4.ãæ°ã«å
¥ãã®ãœãŒã·ã£ã«ãããã¯ãŒã¯ã®ã¢ã«ãŠã³ããééããã°ãã
Hadoopãšç¹ã«MapReduceã¯ãã³ã³ããã¹ããæé»çãªé¢ä¿ã倱ããªã¹ã¯ãªãã«ããŒãšå€ã®ãã¢ã«å解ã§ããããŒã¿ãæ±ãã®ã«æé©ã§ãã æé»çãªé¢ä¿ã¯ã°ã©ãïŒãšããžããµãããªãŒãåããã³èŠªã®é¢ä¿ãéã¿ãªã©ïŒã«ããããã®ãããªé¢ä¿ã¯ãã¹ãŠç¹å®ã®ããŒãã«ååšããããšã¯ã§ããŸããã ãããã£ãŠãã°ã©ããæäœããããã®ã»ãšãã©ã®ã¢ã«ãŽãªãºã ã§ã¯ãåå埩ã§ã°ã©ããå®å
šãŸãã¯éšåçã«åŠçããå¿
èŠããããŸãã MapReduceã§ã¯ãããã¯ãã°ãã°äžå¯èœãŸãã¯éåžžã«å°é£ã§ãã ããã«ãããŒãããšã«ããŒã¿ãã»ã°ã¡ã³ãåããããã®æŠç¥ã®éžæã«åé¡ããããŸãã ã¡ã€ã³ã®ããŒã¿æ§é ãã°ã©ããŸãã¯ãããã¯ãŒã¯ã§ããå ŽåãããããNeo4JãDexãªã©ã®ã°ã©ãããŒã¿ããŒã¹ã䜿çšããæ¹ãè¯ãã§ãããã ãŸããGoogleã®PregelãApacheã®Giraphãªã©ãæ°ããéçºã«ç²Ÿéããããšãã§ããŸãã
èªåããŠãã ããïŒ
â¢ããŒã¿ã®åºæ¬æ§é ã¯ãããŒã¿èªäœãšåããããéèŠã ãšèšããŸããïŒ
â¢ããŒã¿æ§é ã«é¢é£ããå¿
èŠãªæ
å ±ã¯ãããŒã¿ãã®ãã®ãšåãããã以äžã§ããïŒ
5. MapReduceã¢ãã«
äžéšã®ã¿ã¹ã¯/ã¿ã¹ã¯/ã¢ã«ãŽãªãºã ã¯ãåã«MapReduceããã°ã©ãã³ã°ã¢ãã«ã«é©åããŸããã ãããã®ã¯ã©ã¹ã®åé¡ã®1ã€ã¯ããã§ã«äžèšã§èª¬æãããŠããŸãã çµæãèšç®ããããã®äœæ¥ã®äžé段éã®çµæãç¥ãå¿
èŠãããã¿ã¹ã¯ã¯ããã®ãããªåé¡ã®å¥ã®ã«ããŽãªãŒã§ãïŒåŠè¡çãªäŸã¯ãã£ããããæ°åã®èšç®ã§ãïŒã äžéšã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ïŒããšãã°ãåŸé
éäžæ³ãŸãã¯æ倧åæåŸ
å€ã«åºã¥ãïŒãMapReduceãã©ãã€ã ã«å®å
šã«ã¯é©åããŸããã ãããã®åé¡ã®ããããã解決ããããã«ããŸããŸãªç 究è
ã«ãã£ãŠææ¡ãããç¹å®ã®æé©åæŠç¥/ãªãã·ã§ã³ïŒã°ããŒãã«ç¶æ
ãåç
§çšã®ããŒã¿æ§é ã®è»¢éãªã©ïŒããããŸããããããã®å®è£
ã¯ãŸã è€éã§çŽæçã§ã¯ãããŸããã
èªåããŠãã ããïŒ
â¢äŒç€Ÿã¯ãéåžžã«å
·äœçãªã¢ã«ãŽãªãºã ãŸãã¯äž»é¡æåã®ããã»ã¹ã«çå£ã«æ³šæãæã£ãŠããŸããïŒ
â¢é©çšãããã¢ã«ãŽãªãºã ãMapReduceã«é©åããŠããå ŽåããŸãã¯é©åããŠããªãå Žåãæè¡éšéã¯åæãããé©åã«åŠçããŸããïŒ
ããã«ãããŒã¿ã»ããã倧ããããªãããŸãã¯ããŒã¿ã®ç·éã倧ãããããã®ã»ããã¯æ°ååã®å°ããªãã¡ã€ã«ã§æ§æãããŠãããããªå®çšçãªã±ãŒã¹ãèæ
®ããå¿
èŠããããŸãïŒããšãã°ãå€ãã®ç»åãã¡ã€ã«ãèŠãŠãç¹å®ã®æ°åãå«ããã¡ã€ã«ãéžæããå¿
èŠããããŸãïŒé£çµããããã åè¿°ã®ããã«ãã¿ã¹ã¯ããåå²ããã³éèšãã®MapReduceãã©ãã€ã ã«é©åããªãå ŽåãHadoopã䜿çšããŠãã®ãããªåé¡ã解決ããã®ã¯çãããäœæ¥ã§ãã
ãã®ãããHadoopãæé©ãªãœãªã¥ãŒã·ã§ã³ã§ã¯ãªãå Žåãæ€èšããã®ã§ãHadoopã䜿çšããã®ãé©åãªå Žåã«ã€ããŠèª¬æããŸãããã
èªåããŠãã ããïŒ
ããªãã®çµç¹ã¯...
1.èšå€§ãªéã®ããã¹ããã°ããæ
å ±ãæœåºããŸããïŒ
2.æ§é åãããŠããªãããŒã¿ãŸãã¯æ§é åãããŠããªãããŒã¿ã®ã»ãšãã©ã䟿å©ãªäœç³»åããã圢åŒã«å€æããŸããïŒ
3.ããŒã¿ã»ããå
šäœã®åŠçã«é¢é£ããã¿ã¹ã¯ã解決ããå€éã«æäœãå®è¡ããŸãïŒã¯ã¬ãžããäŒç€Ÿã§ã®æ¥äžã®æäœãã¯ã¬ãžããäŒç€Ÿã§åŠçãããæ¹æ³ãšåæ§ïŒã
4.åäžã®ããŒã¿åŠçåŸã«è¡ãããçµè«ã«äŸåããããã次ã®ç®çã®ããŒã¿åŠçãŸã§æå¹ã§ãïŒããšãã°ãååŒæ¥ã®çµäºæãããé »ç¹ã«å€åããçºæ¿çžå Žã®å€ã«ã¯é©çšãããŸããïŒã
ãã®ãããªå ŽåãHadoopã«ã»ãŒç¢ºå®ã«æ³šæãæãå¿
èŠããããŸãã
Hadoopã¢ãã«ã«ããŸãé©åããå€ãã®ããžãã¹ã¿ã¹ã¯ããããŸãïŒãã ãããã®ãããªã¿ã¹ã¯ã解決ããããšã¯ããªãéèŠãªããšã§ãã ååãšããŠããã®ãããªã¿ã¹ã¯ã¯ã倧éã®éæ§é åããŒã¿ãŸãã¯åæ§é åããŒã¿ã®åŠçã«ãªããå
容ãèŠçŽãããããŸãã¯ã·ã¹ãã ã®ä»ã®ã³ã³ããŒãã³ããåŸã§äœ¿çšããããã«è¡ããã芳枬ãæ§é å圢åŒã«å€æããŸãã ãã®ãããªå ŽåãHadoopã¢ãã«ã圹ç«ã¡ãŸãã åéããããŒã¿ã«ã察å¿ããå€ã®èå¥åãšããŠç°¡åã«æ©èœã§ããèŠçŽ ãå«ãŸããŠããå ŽåïŒHadoopã§ã¯ãããã¯ããŒãšå€ã®ãã¢ãšåŒã°ããŸãïŒããã®ãããªåçŽãªé¢é£ä»ããè€æ°ã®éèšãªãã·ã§ã³ã«äžåºŠã«äœ¿çšã§ããŸãã
ãããã£ãŠãæãéèŠãªããšã¯ãå©çšå¯èœãªããžãã¹ãªãœãŒã¹ãæ確ã«ç解ãã解決ããããšããŠããåé¡ãç解ããããšã§ãã ãããã®èæ
®äºé
ãšäžèšã®æšå¥šäºé
ããããžãã¹ã«æé©ãªããŒã«ãæ£ç¢ºã«éžæããã®ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
ããã¯ãŸãã«Hadoopã«ãªãããã§ãã