[@tsafin- ãã¥ãŒãªã³ã°è³ã®åè³è ã§ãã ãã€ã±ã«ã»ã¹ããŒã³ãã¬ã€ã«ãŒã玹ä»ããå¿ èŠã¯ãããŸããã圌ãšåœŒã®ããŒã¯ã¬ãŒãšMITã®åŠçã¯ãéå»æ°å幎éã«ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ãšéãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã®ã»ãšãã©ãäœæããããã§ãã IngresãšPostgresãC-StoreãšVerticaãH-StoreãšVoltDB-ãããã¯ããã€ã±ã«ãšåœŒã®åŠçãçŽæ¥åœ±é¿ãäžãããããžã§ã¯ããšäŒæ¥ã®ã»ãã®äžéšã§ããããŸã å€ãã®ãã©ãŒã¯ãšããªããã£ãããããŸã...
T.O. NoSQLã§ããããšHadoopã§ããããšã圌ãäœããæ¹å€ãããšããæ¥çã¯å°ãªããšãè³ãåŸããã¹ãã§ããããããå€åãè©Šã¿ãã¹ãã§ãã
2012幎ãš2014幎ã®èšäºã§è¡šçŸãããHadoopã«å¯Ÿãã圌ã®èŠç¹ã¯èå³æ·±ããã®ã§ããããã®ãããªçæéã§ãã¯ã©ã·ãã¯ãã®èŠç¹ã®çºå±ãè¿œãããšã¯èå³æ·±ããã®ã§ããã
ACMã®ã³ãã¥ãã±ãŒã·ã§ã³http://cacm.acm.org/blogs/blog-cacm/149074-possible-hadoop-trajectories/fulltextã§å ¬éãããæåã®èšäºãPossible Hadoop Trajectoriesãã¯ã2012幎5æã«Stonebreakerãšå ±åã§å·çãããŸããããžã§ã¬ããŒã»ã±ãããŒãåœæã MITã®äžçŽæè¡ã¹ã¿ãããšããŠããŸãMITæ°åŠéšãšMITã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ãšAIã©ãã®ç 究è ãšããŠåããŠããŸããã ã³ã©ãã¬ãŒã·ã§ã³ã§æžããããã®èšäºã¯ã2幎åŸã«åœŒã«ãã£ãŠæžããã2çªç®ã®èšäºãšæ¯èŒããŠãããåããŸããç±çãªããã«èŠããŸãïŒãããŠãããã«ããæåã®èšäºã¯ç§èŠã«ãã£ãŠæé«ã®ã¹ã¿ã€ã«ã§æžãããŸããïŒã .kã ã³ã³ããã¹ãã¯éå»æ°å¹Žéã§å€§ããå€åããŠãããHadoop / HDFSãšã³ã·ã¹ãã ãããã«æ°ä»ããªããŸãŸã«ããã®ã¯äžæ£ã§ãã
æŠããŠãæåã®éšåã®Hadoopãžã®æ¹å€ã¯MapReduce APIã®å®è£ ã«èšåããŠããã ãã§ãããæ°å¹ŽåŸãHadoopæ¥çã¯åé¡ã解決ããããã«å€ãã®ããšãè¡ã£ãŠããŸããã ããããããã§ããæåã®èšäºã§è¿°ã¹ãHPCã³ã³ãã¥ãŒãã£ã³ã°ã¢ããªã±ãŒã·ã§ã³ã®åé¡ã解決ããããšã«åœŒå¥³ãè¿ã¥ããããšã¯ã§ããŸããã§ããã]
Hadoopã®å¯èœãªã¢ããªã±ãŒã·ã§ã³ãã¹
ãã€ã±ã«ã»ã¹ããŒã³ãã¬ã€ã«ãŒããžã§ã¬ããŒã»ã«ãŒããŒã2012幎5æ2æ¥
éå»æ°å¹Žã«ããã£ãŠãHadoopã¯Javaã®äžŠåã³ã³ãã¥ãŒãã£ã³ã°ãã©ãããã©ãŒã ã«ãªããŸããã ãã®ããã圌ã¯Javaã³ãã¥ããã£ã®äœçŸäžäººãã®ããã°ã©ããŒã®å®è·µã«äžŠåã³ã³ãã¥ãŒãã£ã³ã°ããããããšããç®æšãå®å šã«éæããŸããã ãããè¡ã以åã®ãã¹ãŠã®è©Šã¿ïŒJava GrandleãJavaHPCïŒã¯ããŸãæåããŠããããäž»ã«äœæãããç°å¢ã®ã·ã³ãã«ããšã¢ã¯ã»ã·ããªãã£ã®ããã«ãHadoopã«ãã®æåã称è³ããŠããŸãã
ããã«ãããããããå°ãªããšãç§ãã¡ã®1人ãåããŠãããªã³ã«ãŒã³ã©ããªã©ã®ç 究宀ã§ã®ç§åŠç䜿çšåéã§ã¯ã補åã®ã€ã³ã¹ããŒã«ã§çå£ã«äœ¿çšããããã«å¿ èŠãªå€ãã®æ¹åãèŠãããŸãã ã»ãšãã©ã®å ŽåãHadoopã®ç°å¢ã§ã®äœ¿çšã¯ã䞊åã³ã³ãã¥ãŒãã£ã³ã°ïŒç§åŠåæããŒã«ãæ å ±éçŽïŒããã³ããŒã¿ã»ããã®å±éã§ãã
ããã2ã€ã®ãŠãŒã¹ã±ãŒã¹ã詳ããèŠãŠã¿ãŸãããã
Hadoopã®ç§åŠèšç®
å€ãã®å Žåãç§åŠèšç®ãå®è¡ããã³ãŒãã§ã¯ãããŒãã¯2次å ïŒãŸãã¯3次å ãŸãã¯NDïŒã®é·æ¹åœ¢ã®ããŒãã£ã·ã§ã³ã°ãªããïŒã°ãªããïŒã§ç·šæãããŸãã ãããŠãåããŒãã§æ¬¡ã®ãããªãã®ãå®è¡ããŸãã
çµäºæ¡ä»¶ãŸã§{ ããŒã«ã«ããŒã¿ã®ããŒãã£ã·ã§ã³ã§ã®ããŒã«ã«èšç® [å€æŽ]ç¶æ ã®çºè¡ ä»ã®ããŒã¿ããŒãã£ã·ã§ã³ãæ ŒçŽããŠããä»ã®ããŒãã®ãµãã»ãããšã®éã§ããŒã¿ãéåä¿¡ãã }
ãã®ãã³ãã¬ãŒãã«ã¯ãã»ãšãã©ã®èšç®æµäœååŠïŒCFDïŒã¢ã«ãŽãªãºã ããã¹ãŠã®å€§æ°ããã³æµ·æŽã·ãã¥ã¬ãŒã·ã§ã³ã¢ãã«ãç·åœ¢ä»£æ°æŒç®ãã¹ããŒã¹ã°ã©ãæŒç®ãç»ååŠçãããã³ä¿¡å·åŠçãèšè¿°ãããŠããŸãã Hadoopã§ãã®ã¯ã©ã¹ã®åé¡ãæ€èšãããšã次ã®ã¿ã¹ã¯ãšåé¡ã解決ãããŸãã
ããŒã«ã«ã³ã³ãã¥ãŒãã£ã³ã°ã¯ãåå埩ã§åžžã«è¯å¥œãªç¶æ ã§æ©èœããŸãã MapReduceã®ã¹ãããéã§ç¶æ ãä¿åããã«ã¯ããã¡ã€ã«ã·ã¹ãã ãžã®æžã蟌ã¿ãå¿ èŠã§ããããã¯ãå€ãã®å Žåéåžžã«é«äŸ¡ã§ãã ãŸããå€ãã®å Žåããã®ãããªã¢ã«ãŽãªãºã ã¯ããŒãéã®çŽæ¥ã®å¯Ÿè©±ãå¿ èŠãšããŸãããããã¯MapReduceã€ã³ãã©ã¹ãã©ã¯ãã£ã§ã¯ãµããŒããããŠããŸããã å€ãã®å Žåããã®ãããªã¢ã«ãŽãªãºã ã¯ãã³ãŒããåãã°ãªããããŒãã«ãã€ã³ãããŸãããèšç®ã¢ã«ãŽãªãºã ã®ç°ãªãå埩ã§è¡ããŸãã
ç¹°ãè¿ããŸããããã®ãããªã¢ãã«ã¯MapReduceã§çŽæ¥ãµããŒããããŠããŸããã
MapReduceã¯ããªã³ã«ãŒã³ã©ãã®ãŠãŒã¶ãŒã®5ïŒ ã§ã®ã¿æ©èœãããšæšå®ãããŠããŸãã æ®ãã®95ïŒ ã¯ã¢ã«ãŽãªãºã ãMapReduceã¢ãã«ã®æ®é ·ãªéŠã«æŒã蟌ãããšãã1-2æ¡ã®æžéã®çµæãšããŠæ¯æããŸãã ãã®ãããªäŸ¡æ Œã«åæããç§åŠè ã¯ã»ãšãã©ããªãã§ãããã
å€ãã®äººã¯ãé·æçã«ã¯ããã©ãŒãã³ã¹ã¯éèŠã§ã¯ãªããšäž»åŒµãããããããŸãã[@tsafin-WTFïŒ]ã ããã¯ããŒãšã³ã[ãã·ã³]ã®ã¿ã«åœãŠã¯ãŸãå ŽåããããŸããããªã³ã«ãŒã³ã©ããšç§ãã¡ãç¥ã£ãŠããä»ã®ç 究宀ã®äž¡æ¹ã§èŠãããããŒã¿é åã®å Žåãããã©ãŒãã³ã¹ã¯ããã§éåžžã«éèŠã§ããã決ããŠèµ·ãããŸããååãªã³ã³ãã¥ãŒãã£ã³ã°ãªãœãŒã¹ã ããã«ãããšãã°ãåœç€Ÿã®çµç¹ã¯ã次äžä»£ã®ã¹ãŒããŒã³ã³ãã¥ãŒã¿ãŒã»ã³ã¿ãŒãæ°Žåçºé»ãã ã®é£ã«é 眮ããŠãäºé žåççŽ æåºéã1æ¡åæžããããã«1åãã«ãæè³ããŠããŸãã ãŸããHadoopã®å®è£ ã«äŒŽãããã©ãŒãã³ã¹ã®äœäžã¯ã蚱容ã§ããªãè¿œå ã³ã¹ãã§ãã ããªã¥ãŒã ãå°ãããŠããHadoopã®ãããªéå¹ççãªã·ã¹ãã ã䜿çšããããšã¯ãéåžžã«ç°å¢ã«åªããã¹ãããã§ãããå€ãã®å Žåããšãã«ã®ãŒã®æ倱ã«ãããŸããã
èŠããã«ã[ç§åŠçãª]ã³ã³ãã¥ãŒãã£ã³ã°ç°å¢ã§Hadoopã䜿çšããå Žåã次ã®æé ã芳å¯ããŸããã
- ã¹ããã1.ãã€ããããããžã§ã¯ãã§Hadoopãè©ŠããŠãã ããã
- ã¹ããã2.é£æé貚çšã«Hadoopãæ¡åŒµããŸãã
- ã¹ããã3.äžèšã®åé¡ãåå ã§å£ã«ã¶ã€ããã
- æé 4.決å®ã®åœ¢åŒãå€æŽããŠãå¶éãåé¿ããŸãã
ãªã³ã«ãŒã³ã©ãã§ã¯ã4ã€ã®å·ã®ããããã«ãããžã§ã¯ãããããŸãã
ç§ãã¡ã®ç°å¢ã§Hadoopãåç¶ãããã«ã¯ã䞊åã³ã³ãã¥ãŒãã£ã³ã°ã¢ãã«ã®åŒ·åãªæ¹èšãå¿ èŠã§ãããã§ããã°ãã¿ã¹ã¯ã¹ã±ãžã¥ãŒã©ãå€æŽããããã®ææ°ã®Hadoopã§ã®äœæ¥ã«ãã£ãŠè£å®ããããšãæãŸããã§ãã ãããã®åé¡ã解決ãããšãçŸä»£ã®Hadoopãå°æ¥ã®ã·ã¹ãã ã§èªèã§ããªããªãããšãäºæ³ãããŸãã
ä»ã®ãªãã£ã¹ã§ã¯ããã®ãŠãŒã¶ãŒã«é¢é£ããã¿ã¹ã¯ãæ··åšããŠãããšãMapReduceã€ã³ãã©ã¹ãã©ã¯ãã£ãšã®äºææ§ãåäžããå¯èœæ§ããããŸãã ããã«ãããããããç§ãã¡ã®ææ ã¯ãç§ãã¡ã¯äŸå€ãããèŠç¯ã§ãããšæããŠãããŸãã GoogleãMapReduceããä»ã®ã¢ãã«ã«ç§»è¡ããããšã¯ããã®ãããªç念ãè£ä»ããŠããŸãã ãããã£ãŠãHadoopã€ã³ãã©ã¹ãã©ã¯ãã£ã®åçãªå€åãäºæ³ãããŸãã
Hadoopã®ããŒã¿ç®¡ç
æ¥çã«ãããDBMSã®40幎ã«ãããç 究ãšå¿çšã«ããã1970幎ã«ã¯Ted Coddã®è«æã確èªãããŠããŸããããã°ã©ããŒãšã·ã¹ãã ã®å¹çã¯äžè¬çã«é«ããé«ã¬ãã«èšèªã§ã¯é«ã¬ãã«ã®ããŒã¿æäœæäœã䜿çšãããèšèªã§äœæ¥ããå¿ èŠãããå Žåã¯[å¹çãäœã]ããæç¹ã§ã®ã¬ã³ãŒãã®æäœã Hadoopã¯ãäžåºŠã«èšé²ããèšèªãšæ¯èŒããŠéåžžã«é«ã¬ãã«ã§ãããMapReduceãçŽæ¥äœ¿çšããããããHiveã䜿çšããŠããŒã¿ãªã¯ãšã¹ãããšã³ã³ãŒãããæ¹ãç°¡åã§ãã ãããã£ãŠããã¹ãŠã®HadoopããŒã¿ç®¡çããŒã«ããSQLãSQLã«äŒŒãèšèªãªã©ã®é«ã¬ãã«èšèªã«ç§»è¡ããããšã¯å¯èœã ãšæãããŸãã
ããšãã°ãDavid Devittã®ã¬ããŒã[1-1]ã«ãããšãFacebookã®Hadoopã¯ã©ã¹ã¿ãŒã¯ãSQLã«éåžžã«ãã䌌ãããŒã¿ã«ã¢ã¯ã»ã¹ããããã®é«ã¬ãã«èšèªã§ããHiveã§ã»ãŒå®å šã«ããã°ã©ã ãããŠããŸãã ãªã³ã«ãŒã³ã©ãã¯ãã¹ããŒã¹ããŒã¿ã«ã¢ã¯ã»ã¹ããããã®ããªãé«ã¬ãã«ã®ä»£æ°ã€ã³ã¿ãŒãã§ã€ã¹ãåããä»ã®é«ã¬ãã«èšèªïŒHiveã§ã¯ãªãïŒãéžæããŠããŸããã移åã®è»è·¡ã¯éåžžã«äŒŒãŠããŸã[ 1-2ã1-3 ]ã
ãã®ãããMapReduceã¯DBMSã®[å éšã«ã«ãã»ã«åããã]å éšã€ã³ã¿ãŒãã§ã€ã¹ã«ãªãã€ã€ããããã§ãã
èšãæããã°ãHiveãŠãŒã¶ãŒã¯HiveQLã¯ãšãªå ã«ãããã®ã«ã€ããŠããŸãå¿é ããŠããããMapReduceã€ã³ã¿ãŒãã§ãŒã¹ã¯èŠããªããªããDBMSå éšã®æ·±ã¿ã«æµžããŸãã æåŸã«ããããã¯ãŒã¯ãä»ããŠç°ãªãããŒãã®ããã»ããµéã§éä¿¡ããããã«äžŠåDBMSãéä¿¡ãããšãã«ãã©ã®ãããã³ã«ã䜿çšããããã«ã€ããŠå¿é ããŠãã人ã¯ã©ããããããŸããïŒ
ãšããã§ããã®èšäºã®èè ã®1人ã¯5ã€ã®äžŠåDBMSãäœæããŸããããã¡ããããªã¯ãšã¹ãã³ãŒãã£ããŒã¿ãŒãšç°ãªãããŒãäžã®è€æ°ã®ãšã°ãŒãã¥ãŒã¿ãŒãšã®éã®éä¿¡ãããã³ã«ã«ç²ŸéããŠããŸãã ãããŠã圌ã¯ãããã©ãŒããŒã®ããŒããä»ã®ããŒããšçžäºäœçšããŠãçžäºã«äžéããŒã¿ã転éããå¿ èŠãããããšãç¥ã£ãŠããŸãã ãã®ãããªã·ããªãªã§ã¯ãé«æ§èœã·ã¹ãã ãäœæããã«ã¯ã次ã®ã·ã¹ãã ç¹æ§ãå¿ èŠã«ãªããŸãã
- å®è¡ããŒãã¯ãåæ£ã¯ãšãªãã©ã³ã®å埩éã§ããŒã¿ãåå©çšã§ããããã«ç¶æ ãç¶æã§ããå¿ èŠããããŸãã
- ããŒãéã®çžäºäœçšãç¶æããå¿ èŠããã
- èŠæ±åŠçãããŒãã®ããŒã«ã«ããŒã¿ã«ãã€ã³ãããããšãå¯èœã§ããå¿ èŠããããŸãã
äžè¬ã«ãDBMSã«ã¯éåžžãåè¿°ã®ç§åŠèšç®çšã®ã¢ã«ãŽãªãºã ãšåãæ¡ä»¶ã®ã»ãããå¿ èŠã§ãã çµæãšããŠã䞊åDBMSã®å éšã€ã³ã¿ãŒãã§ã€ã¹ãšããŠMapReduceãååŸããŸãããã€ã³ã¿ãŒãã§ã€ã¹ãåããMapReduceã¯éåžžã«é©åãªDBMSã§ã¯ãããŸããã
ç§ãã¡ã®1人ã2009幎ã«ã䞊åDBMSãã¯ãããžãŒãšHadoop [1-4]ãæ¯èŒããèšäºãæžããŸããã 倧ãŸãã«èšã£ãŠãDBMSã¯Hadoopããã1ã2åé«éã§ãã ãã®å©ç¹ã¯ãããŒã¿ã®ã€ã³ããã¯ã¹ã䜿çšããããšãããŒã¿ãååšããããŒãã«ã®ã¿èŠæ±ãéä¿¡ããããšïŒãã®éã§ã¯ãªãïŒãå§çž®ã®å©ç¹ãããã³ããŒââãéã®æé©ãªãããã³ã«ããåŸãããŸãã ç§ãã¡ãç¥ãéãã2012幎ã®ç¶æ³ã¯2009幎ãšæ¯ã¹ãŠããã»ã©å€åããŠããŸãããHadoopã¯ãéå ¬åŒã®æšå®ã«ãããšããŸã 1ã2æ¡é ãã§ãã ããšãã°ããã倧èŠæš¡ãªWebãããžã§ã¯ãã§ã¯ã2700ããŒãã«ãããã€ããã5ãã¿ãã€ãã®Hadoopã¯ã©ã¹ã¿ãŒããããå¥ã®äŸã§ã¯ãåæ§ã®5ãã¿ãã€ãã®ã€ã³ã¹ããŒã«ã§ã¯ãããåçšDBMSã§ç®¡çãããŠãããããããã200ããŒãã§ããã13åå°ããããšã«æ³šæããŠãã ãã ã
ç°¡åã«èšãã°ãçŸæç¹ã§ã¯ãHadoopãä»ããŠããŒã¿ã管çããéã«æ¬¡ã®è»è·¡ã芳å¯ããŸãã
- æé 1.ãã€ããããããžã§ã¯ãã§Hadoopãè©ŠããŠãã ããã
- ã¹ããã2.é£æé貚çšã«Hadoopãæ¡åŒµããŸãã
- ã¹ããã3.蚱容ã§ããªãããã©ãŒãã³ã¹ãªãŒããŒããããååŸããŸãã
- æé 4.䞊åDBMSã䜿çšããŠãœãªã¥ãŒã·ã§ã³ã®åœ¢åŒãå€æŽããŸãã
çŸåšãã»ãšãã©ã®Hadoopã€ã³ã¹ããŒã«ã¯ã¹ããã2ãš3ã®éã«ããããå£ãæã€ãã¹ããŒãžã¯æ¬¡ã®ã¹ãããã«ãããŸããã Hadoopã¯éãããæéã§å®éã®äžŠåDBMSã«æé·ãããããŠãŒã¶ãŒã¯ä»ã®ãœãªã¥ãŒã·ã§ã³ã«åãæ¿ããŠãHadoopãœãªã¥ãŒã·ã§ã³ã®äžéšã亀æããããå€éšããŒã¿ãæäŸããHadoopãžã®ã€ã³ã¿ãŒãã§ãŒã¹ã䜿çšããããä»ã®æ¹æ³ã§äœ¿çšããŸãã éå»3幎éã«ããããªé²å±ãèŠããããããåœç€Ÿã®æéã¯2çªç®ã®æ±ºå®[ä»ã®ãœãªã¥ãŒã·ã§ã³ãžã®åãæ¿ã]ã§æ±ºå®ãããå¯èœæ§ãé«ããªããŸãã
ãããŠæåŸã«èšãããšãã§ããŸããGartnerGroupããã®æåãªæ²ç·ããã€ããµã€ã¯ã«ããäœæãããš[1-5] [@tsafin-ãããã¯ãã·ã¢èªã«ãæè¡æçæ²ç·ããšããŠç¿»èš³ãããŸãããŸãã«ãã®èµ·æºã Hadoopãšã³ã·ã¹ãã ã®çŸç¶ã¯ããããããã³ãšãã¿ãŒã®çºæ以æ¥ã®æåã®è§£æ±ºçããšããŠæ瀺ãããŠããŸãããæéã®çµéãšãšãã«ãèšåããå¶éãåãé€ãããçŽæããããã®ã«å°ãè¿ã¥ãããšãé¡ã£ãŠããŸãã
åç §è³æ
- [1-1] http://searchsqlserver.techtarget.com/news/2240126683/Cloaked-in-secrecy-Microsoft-project-aims-to-wed-SQL-NoSQL-databases
- [1-2] Jeremy Kepnerãetã alãããåçåæ£æ¬¡å ããŒã¿ã¢ãã«ïŒD4MïŒããŒã¿ããŒã¹ããã³èšç®ã·ã¹ãã ããICASSPã2012幎3æ25ã30æ¥
- [1-3] http://accumulo.apache.org
- [1-4] Andy Pavloãetã ä»ãã倧èŠæš¡ããŒã¿åæãžã®ã¢ãããŒãã®æ¯èŒãProcã SIGMOD 2009ãããããã³ã¹ãRIã2009幎6æã
- [1-5] http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
å²è·¯ã«ç«ã€Hadoop
ãã€ã±ã«ã¹ããŒã³ãã¬ã€ã«ãŒã2014幎8æ5æ¥
[@tsafin-2çªç®ã®èšäºã¯2幎åŸã®2014幎8æã«å·çãããåãJournal of Communications of ACM http://cacm.acm.org/blogs/blog-cacm/177467-hadoop-at-ã«æ²èŒãããŸããa-crossroads /ãã«ããã¹ã ]
2012幎ã«æžãããJeremy Kepnerãšã®ä»¥åã®å ±åèšäº[2-1]以æ¥ã倧éã®æ°ŽãæµããŸããã ç§ã¯ãããã€ãã®éèŠãªçºè¡šã«ã€ããŠéç¥ããã ãã§ãªããçºçããããã€ãã®äºå®ãšçºçããæèŠãææããå¿ èŠããããšèããŠããŸãã ãã®çµæãHadoopã¹ã¿ãã¯ãå°æ¥ã©ãã«ç§»åããããäºæž¬ããŠèšäºãçµäºããŸãã
èšåãã䟡å€ã®ããæåã®çºè¡šã¯ãæ°ããDBMS-Cloudera Impala [2-2]ã®ãªãªãŒã¹ã§ããããã¯HDFSäžã§å®è¡ãããŸãã ç°¡åã«èšãã°ãä»ã®ãã¹ãŠã®éå ±æ䞊åSQL DBMSãšåæ§ã«ImpalaãäœæãããŸãïŒ@tsafin-SNãšããçšèªã®ç¢ºç«ããã翻蚳ãã©ã®ãããªãã®ãã¯ãŸã æããã§ã¯ãããŸãããSergeyKuznetsovã®ããŒãžã§ã³ãå ±æãªãœãŒã¹ãªãã®ã¢ãŒããã¯ãã£ãã«çãŸãããšããå§ãããŸãïŒããŒã¿ãŠã§ã¢ããŠã¹åžå Žã ç¹ã«æ³šç®ãã¹ãã¯ãMapReduceã¬ã€ã€ãŒãåé€ãããæèçã«åé€ãããŠãããšããäºå®ã§ãã ç§ãã¡ã®å€ããé·å¹ŽææããŠããããã«ãMapReduceã¯SQLïŒãŸãã¯HiveïŒDBMS [2-3ã2-4]ã«ãšã£ãŠæé©ãªå éšã€ã³ã¿ãŒãã§ã€ã¹ã§ã¯ãããŸããã Impalaã¯ããã®äºå®ãç¥ã£ãŠããéçºè ã«ãã£ãŠäœæãããŸããã å®éãImpalaã®ãããªæŽ»åã¯ãHortonWorksãšFacebookã®äž¡æ¹ã§ãã§ã«è¡ãããŠããŸãã ããã¯Hadoopãã³ããŒã«ãžã¬ã³ãããããããŸããæŽå²çã«ããHadoopãã¯YahooãäœæããMapReduceã®ãªãŒãã³ãœãŒã¹å®è£ ã§ããã ãã ããImpalaã¯ãœãªã¥ãŒã·ã§ã³ã¹ã¿ãã¯ãããã®ã¬ã€ã€ãŒãã¹ããŒããŸããã
Hadoopãã¹ã¿ãã¯ã®ã³ã¢ã§ãªããªã£ãå Žåãã©ãããã°Hadoopãã³ããŒã«ãšã©ãŸãããšãã§ããŸããïŒ
çãã¯ç°¡åã§ã-ãHadoopãã®äŸ¡å€ãåå®çŸ©ããå¿ èŠãããããããæçµçã«Hadoopãã³ããŒãè¡ã£ãããšã§ãã ãHadoopããšã¯ã¹ã¿ãã¯å šäœãæå³ããããã«ãªããŸãããäžçªäžã¯HDFSãImpalaãMapReduceããŸãã¯äžçªäžã§å®è¡ãããŠããä»ã®ã·ã¹ãã ã§ãã Mahoutãªã©ã®é«ã¬ãã«ã®ãœãªã¥ãŒã·ã§ã³ã§ãããããã®ã·ã¹ãã ã§åäœããŸãã ãHadoopãã®æŠå¿µã¯ãçµæã®ãœãªã¥ãŒã·ã§ã³ã®ã³ã¬ã¯ã·ã§ã³å šäœãæãããã«ãªããŸããã
Googleã«ããå¥ã®æè¿ã®çºè¡šã§ã¯ãMapReduceã¯ãã§ã«ãåäžçŽãã§ãããDremmelãBigTableãF1 / Spannerãªã©ã®ã·ã¹ãã äžã«ã·ã¹ãã ãæ§ç¯ããããšã§ããœãªã¥ãŒã·ã§ã³ãããé©åã«é©çšãå§ãã[2-5] ã Googleã¯ä»ã倧ç¬ãããŠããã«éããããŸããã2004幎ã«æ€çŽ¢ãšã³ãžã³ã§ã¯ããŒã©ãŒããµããŒãããããã«MapReduceãçºæããŸããããæ°å¹Žåã«MapReduceãBigTableå®è£ ã«çœ®ãæããŸããã ã€ã³ã¿ã©ã¯ãã£ãã¹ãã¬ãŒãžã·ã¹ãã ãå¿ èŠã§ãããMapReduceã¯ãããã¢ãŒãã§ã®ã¿æ©èœããŸããã ãã®çµæãMapReduceã®èåŸã«ããäž»ãªåååã¯ããã°ããåã«ãããæŸæ£ããããšãç¥ã£ãŠããŸãã ãããŠä»ãGoogleã¯MapReduceãå°æ¥å¿ èŠãšãããªããšå ±åããŠããŸãã
å®éãHadoopãGoogleããããæŸæ£ããŠããã«é²ãã ç¬éãã5幎åŸã«ãã®ãã©ãã€ã ããµããŒãããããšãéžæããã®ã¯ç®èã§ãã äžçã®ä»ã®åœã ã¯ãçŽ10幎ã®ããªãã®é ãã§Hadoopã§Googleããã©ããŒããŠããããšãããããŸããã Googleã¯é·ãéãããæŸæ£ããŸããã ã äžçããã®äºå®ãå®çŸããã®ã«ã©ããããæéããããã®ã ãããïŒ
æçµçã«ãHadoopãœãªã¥ãŒã·ã§ã³ãããã€ããŒã¯ãããŒã¿ãŠã§ã¢ããŠã¹ãããã€ããŒãšéè€ããã³ãŒã¹ã«ç§»è¡ããŠããããšãããããŸããã çŸåšãããŒã¿ãŠã§ã¢ããŠã¹ãããã€ããŒãšæ¬è³ªçã«åãã¢ãŒããã¯ãã£ãå®è£ ããŠããŸãïŒãŸãã¯æ¢ã«å®è£ ããŠããŸãïŒã äœæããå®è£ ã匷åããã®ã«æ°å¹Žããããšããã«ãååãªããã©ãŒãã³ã¹ãçºæ®ã§ããããã«ãªããŸãã çŸæç¹ã§ã¯ãã»ãšãã©ã®ããŒã¿ãŠã§ã¢ããŠã¹ãããã€ããŒããã§ã«HDFSããµããŒãããŠãããå€ãã®ãããã€ããŒãéšåæ§é åããŒã¿ã®å®è£ ãæäŸããŠããããšã«æ³šæããŠãã ããã ãã®ãããããŒã¿ãŠã§ã¢ããŠã¹ãããã€ããŒåžå ŽãšHadoopãµãã©ã€ã€ãŒåžå Žã¯ãŸããªãçµ±åããããšç¢ºä¿¡ããŠããŸãã ãããŠãæé«ã®ã·ã¹ãã ããã®ãããªå¯Ÿé¢ã®ã¹ããªãŒãããã«ã«åã€ãããããŸããïŒ
次ã«ãHadoopã¹ã¿ãã¯ã®äž»èŠãªæ§æèŠçŽ ã®1ã€ã«ãªã£ãHDFSãèŠãŠã¿ãŸãããã HDFSã¯äž»ã«ããŒã¿ã®ãã€ããä¿åã§ãããã¡ã€ã«ã·ã¹ãã ã§ãããããã¯ã©ã®ã³ã³ãã¥ãŒãã£ã³ã°ãã©ãããã©ãŒã ã§ãåœç¶ã®ããšã§ãã HDFSãå°æ¥ç§»åã§ããå Žæã«ã€ããŠã¯ã2ã€ã®ãã¥ãŒãèããããŸãã
ãã¡ã€ã«ã·ã¹ãã ã®äžçã®ç®ãéããŠãããèŠããšããŠãŒã¶ãŒã¯å ±éã®åæ£ãã¡ã€ã«ã·ã¹ãã ãæã¡ãããšèããŠããŸãããã®èŠ³ç¹ãããHDFSã¯çæ³çãªåè£ã§ãã
DBMSã®äžŠåSQL / Hiveã®èŠ³ç¹ããèŠããšãHDFSã«ã¯ãæ»ãããæªãéåœãããããŸãã DBMS ã¯ããã€ã§ãã©ãã§ã ããªã¯ãšã¹ãïŒæ°ãããã€ãïŒãããŒã¿ïŒæ°ã®ã¬ãã€ãïŒã«éä¿¡ããããšèããŠããŸãã ãããã£ãŠããã¹DBMSãšã³ãžã³ããããŒã¿ã®å Žæãé ãããšãåæ§ã§ãããDBMSã¯ãã®ãããªå¶éãåé¿ããããšåžžã«éåžžã«å°é£ã«ãªããŸãã ããŒã¿ãŠã§ã¢ããŠã¹ãããã€ããŒããHadoopãããã€ããŒãŸã§ã®ãã¹ãŠã®äžŠåDBMSã¯ããã±ãŒã·ã§ã³ã®ééæ§ããªãã«ã ãHDFSãåãªãã³ã¬ã¯ã·ã§ã³ã«å€æããŸã
Linuxãã¡ã€ã«ã·ã¹ãã ãããŒãããšã«1ã€ã®ãã¡ã€ã«ã·ã¹ãã ã
åæ§ã«ããã¡ã€ã«ã·ã¹ãã ã®ã¬ããªã«ã䜿çšãããDBMSã¯ãããŸããã [2-6]ã§ã¯ããã®äž»é¡ã«é¢ãã詳现ãªè°è«ãèªãããšãã§ããŸãã èŠããã«ãããŒããã©ã³ã·ã³ã°ãããã³èŠæ±ãšãã©ã³ã¶ã¯ã·ã§ã³åŠçã®åé¡ã®æé©åã®ããã«ããã¹ãŠã§DBMSã§äœ¿çšãããã¬ããªã±ãŒã·ã§ã³ã·ã¹ãã ã奜ããšããããšã§ãã
æéãçµã€ã«ã€ããŠãDBMSãã³ããŒã®èŠç¹ãåžå Žã§æ®åããããšãå€æããå ŽåãHDFSã¯äœ¿ãæããããŸãã DBMSãããã€ããŒã¯äœ¿çšãåæ¢ããŸãã 圌ãã®äžçã§ã¯ãåããŒãã«ã¯ãã§ã«ããŒã«ã«ãã¡ã€ã«ã·ã¹ãã ãããã䞊åDBMSã¯é«éã¯ãšãªèšèªããµããŒãããŠããŸããããã«ããŠãŒã¶ãŒå®çŸ©é¢æ°ã«ãã£ãŠå®çŸ©ãããå€ãã®ããŒã«ãšæ¡åŒµæ©èœããããŸãã ãã®ã·ããªãªã§ã¯ãHadoopãã·ã§ã¢ãŒãããã·ã³ã°ã¢ãŒããã¯ãã£ãåããæšæºDBMSã«å€ãããå€æ°ã®ä»£æ¿DBMSãã³ããŒãããªãã®ãéã®ããã«æŠã£ãŠããŸãã
äžæ¹ããã¡ã€ã«ã·ã¹ãã ã®èŠ³ç¹ãæ®åããŠããå ŽåãHDFSã¯ãã¡ã€ã«ã·ã¹ãã äžã§åäœããããŸããŸãªããŒã«ã§åç¶ããŸãã ããŒããã©ã³ã·ã³ã°ãç£æ»ããªãœãŒã¹ã³ã³ãããŒã©ãŒãããŒã¿ã®ç¬ç«æ§ãããŒã¿ã®æŽåæ§ãé«å¯çšæ§ãåæå®è¡ç®¡çããµãŒãã¹å質ãªã©ã®DBMSç°å¢ã®æšæºæ©èœã¯ãããããã¹ãŠã®æ©èœããã¡ã€ã«ã·ã¹ãã ã®ãŠãŒã¶ãŒã«åŸã ã«è¡ãæž¡ããŸãã ãã®ã·ããªãªã§ã¯ãé«ã¬ãã«ã®æšæºã€ã³ã¿ãŒãã§ãŒã¹ã¯ãããŸããã èšãæãããšãäžçã®DBMSãã¥ãŒã¯ãå¹ åºãè¿œå ã®æçšãªãµãŒãã¹ãæäŸãããŠãŒã¶ãŒã¯ãäœã¬ãã«ã®ã€ã³ã¿ãŒãã§ã€ã¹ãèµ·åãããšãã«éåžžã«æ£ç¢ºã§ããããšãäºåã«èŠåãããŸãã
ãããã®ã·ããªãªã®ãããã«ãããŠãããã¡ã€ã«ã·ã¹ãã ã¯äžè¬çãªéšåã§ãããHadoopãã³ããŒã¯ããã¡ã€ã«ã·ã¹ãã ã«åºã¥ããããŒã«ããŸãã¯DBMSã«åºã¥ããããŒã«ïŒããããããŸãã¯äž¡æ¹ïŒã販売ããŸãã ãã®çµæã圌ãã¯ãœãããŠã§ã¢ããµãŒãã¹ã販売ãããœãããŠã§ã¢ãã³ããŒã®ãã¹ãã«åå ããŸãã ãããŠãæé«ã®è£œåãåã€ããã«ããŸãããïŒ
åç §è³æ
- [2-1] http://cacm.acm.org/blogs/blog-cacm/149074-possible-hadoop-trajectories/fulltext
- [2-2] http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
- [2-3] http://dl.acm.org/citation.cfm?id=1629197
- [2-4] PavloãA. etã ã»ãã ã倧èŠæš¡ããŒã¿åæãžã®ã¢ãããŒãã®æ¯èŒã SIGMOD 2009ã
- [2-5] http://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system/
- [2-6] StonebrakerãM.ãetã ä»ãããšã³ã¿ãŒãã©ã€ãºããŒã¿ã¢ããªã±ãŒã·ã§ã³ãšã¯ã©ãŠãïŒå°é£ãªéã®ãããProc IEEE IC2Eããã¹ãã³ãããµãã¥ãŒã»ããå·ã2014幎3æ