å
é±ã®Habréã«ã¯ãMicrosoft Researchã®åæ£ã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ã«é¢ãã2ã€ã®æçš¿-Dryadãç»å ŽããŸããã ç¹ã«ãäž»èŠãªDryadã³ã³ããŒãã³ãïŒ Dryadã©ã³ã¿ã€ã ãšDryadLINQã¯ãšãªèšèªïŒã®æŠå¿µãšã¢ãŒããã¯ãã£ã«ã€ããŠè©³ãã説æããŸããã
Dryadã·ãªãŒãºã®èšäºã®è«ççãªçµããã¯ãDryadãã¬ãŒã ã¯ãŒã¯ãMPPéçºè
ã«éŠŽæã¿ã®ããä»ã®ããŒã«ãšæ¯èŒããããšã§ãïŒ ãªã¬ãŒã·ã§ãã«DBMS ïŒäžŠåã®ãã®ãå«ãïŒã GPUã³ã³ãã¥ãŒãã£ã³ã°ã Hadoopãã©ãããã©ãŒã ã
ç§ã¯ ãç 究ãããžã§ã¯ãã§ã®ãã©ã€ã¢ãã®äœ¿çšãææ¡ããã ã èœèããããããŸãã ïŒçŸåšãã¢ã«ãããã¯ã©ã€ã»ã³ã¹ã®ã¿ãå©çšå¯èœã§ãïŒã
ãã©ã€ã¢ãã¯ã誰ããç¥ã£ãŠãã
Dryadããããã©ã€ãšã¿ãªãªãœãããŠã§ã¢ã§ãããšããäºå®ã¯ããã®ãã©ãããã©ãŒã ã®åçãšã¢ãŒããã¯ãã£ãå°éçãªéçºã«ããŸãèå³ãæããããæçšã§ã¯ãããŸããïŒããã§ã-ç§èªèº«ïŒã ããªããäœãéããã®ãæã£ãŠããå Žå-
第1éšã§ã¯ãäŸãšããŠDryadã䜿çšããŠãäœã¬ãã«ã®ã³ã³ãã¥ãŒãã£ã³ã°äžŠååããŒã«ã§ããMPIããã³GPUã³ã³ãã¥ãŒãã£ã³ã°ãããé«ãã¬ãã«ã®æœè±¡åãéçºè ã«æäŸããåæ£ã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ã®å©ç¹ã®æŠèŠã説æããŸãã
ã³ã³ããã¹ãïŒã€ãŸããç¹å®ã®ã¿ã¹ã¯ïŒããåãé¢ãããã®ãããªæ¯èŒã¯æ£ãããããŸããããåæ£ã¢ããªã±ãŒã·ã§ã³å®è¡ãã¬ãŒã ã¯ãŒã¯ã®é©åãªäœ¿çšã®ã±ãŒã¹ã瀺ãããšã¯ãç§ãã¡ã®ç®çã«ãšã£ãŠã¯èš±å®¹ã§ããŸãã
2çªç®ã®éšåã§ã¯ãRDBMSãšäžŠåDBMSãšã®æ¯èŒãè¡ãããŸãã ãã¡ããããããªã±ãŒã·ã§ã³ããªã¥ãŒã ã§ã¯ãMySQLãšDryadãåå¥ã«æ¯èŒããããSQL Server 2012 Parallel Data Warehouseãšåå¥ã«æ¯èŒãããããããšã¯ã§ããŸããïŒãã®çç±ã¯ïŒïŒã ãã®ãããDBMSã®ãç é¢ã®å¹³åæ°æž©ããåæã«äœ¿çšããŸããããªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã«åºã¥ããœãªã¥ãŒã·ã§ã³ã®äžè¬çãªåé¡ãè°è«ããDryadãDBMSã®äžçã§æé«ã®ã¢ã€ãã¢ã®ç¶ç¶ãšèŠãªããŸãã
æåŸã®éšåã§ã¯ãHadoopãœãããŠã§ã¢ãã©ãããã©ãŒã ãšã®æ¯èŒãè¡ãããŸãïŒåŸè ãç¥ããªããè³è³ãããããããŸããïŒã
Hadoop 2ã«ã¯å€§ããªå©ç¹ããããŸãïŒãã¡ããããã以äžïŒ- æ°ãããã¬ãŒã ã¯ãŒã¯ ïŒåŸã§è©³ãã説æããŸãïŒã¯ãç¬èªã®åæ£ã¢ã«ãŽãªãºã ãå®è£ ããããã®APIãšè±ããªãšã³ã·ã¹ãã ãæäŸããŸãã é説çã«ããããã¯Hadoopã®äž»ãªæ¬ ç¹ã§ãïŒæ°ããããŒã¿ãã¬ãŒã ã¯ãŒã¯ïŒéçºéå§-2008ïŒãããã³ãšã³ã¿ãŒãã©ã€ãºã»ã°ã¡ã³ãã§Hadoopã䜿çšããŠHadoopãšã³ã·ã¹ãã ããå€ãã®ã³ã³ããŒãã³ããã€ã³ã¹ããŒã«ïŒã€ã³ã¹ããŒã«ããã¬ãŒãã³ã°ããµããŒãïŒããããšã¯ç°¡åãªäœæ¥ã§ã¯ãããŸããã
ãããã£ãŠãDryadã®æ¯èŒã¯ãåçŽãªHadoopãªãªãŒã¹ãã©ã³ããšãHadoopãšã³ã·ã¹ãã ã«ãã£ãŠæäŸãããæ©äŒãšãHadoop v2.0ã§ãã®åé¡ïŒååšããå ŽåïŒãã©ã®ããã«è§£æ±ºããããã«ã€ããŠã®ç¡éã®èª¿æ»ã§ãã
1. Dryad察GPUã ãã©ã€ã¢ãvs MPI
Dryadã®ã¢ã«ãããã¯ã©ã€ã»ã³ã¹ãèãããšãç 究ã®ããã®èšç®ã«Dryadãã¬ãŒã ã¯ãŒã¯ã䜿çšããå¯èœæ§ã«èå³ãæã€ããã«ãªããŸããïŒç§ã¯å€§åŠé¢çã§ãïŒã ããããæŽå²çã«ã¯ïŒç¢ºãã«ç§ã®å€§åŠã®ïŒã¢ã«ãããã¯ç°å¢ã§ã¯ããç§åŠçã³ã³ãã¥ãŒãã£ã³ã°ãã®äž»ãªãã©ãããã©ãŒã ã¯MPIïŒMessage Passing InterfaceïŒãšGPUã³ã³ãã¥ãŒãã£ã³ã°ã§ãã
MPIãšã¯ç°ãªããDryadãã©ãããã©ãŒã ã¯ïŒããŸããŸãªããã»ã¹ã«ãã£ãŠïŒããŒã¿ãå
±æããªãã·ã§ã¢ãŒãããã·ã³ã°ã¢ãŒããã¯ãã£ã«åºã¥ããŠããããã åæããªããã£ãã䜿çšããå¿
èŠã¯ãããŸãã ã ããã«ãããDryadã¯ã©ã¹ã¿ãŒã®ã¹ã±ãŒã©ããªãã£ãåäžããã ãã§ãªãã䞊åããŒã¿ã¢ã«ãŽãªãºã ã䜿çšããŠåé¡ã解決ããããã®æéå¹çãåäžããŸãã
ããã«ãããã©ãŒãã³ã¹ã®ç£èŠããã§ãŒã«ãªãŒããŒãªã©ã®ã€ã³ãã©ã¹ãã©ã¯ãã£ã¿ã¹ã¯ã¯éåžžMPIéçºè
ã®è²¬ä»»ã§ã ãäžæ¹ãDryadã§ã¯ããªã¹ããããŠããã¿ã¹ã¯ã¯ãã¬ãŒã ã¯ãŒã¯ã®è²¬ä»»ã§ãã
GPUã³ã³ãã¥ãŒãã£ã³ã°ã«ã€ããŠèšãã°ãDryadãšã¯ç°ãªãã GPUéçºã¯ã¢ããªã±ãŒã·ã§ã³ãå®è¡ãããããŒããŠã§ã¢ã¬ãã«ã«ã»ãšãã©é¢é£ããŠããããšã«æ³šæãã䟡å€ããããŸãã NVidiaãšAMDã¯ãã°ã©ãã£ãã¯ã«ãŒãïŒããããCUDAãšAPPïŒã®éçºçšã«ç¬èªã®SDKãæäŸããŠããŸãã æããã«ããããã¯äºãã«äºææ§ã®ãªãããŸããŸãªéçºãã©ãããã©ãŒã ã§ãã
æ ªåŒäŒç€Ÿ æª Microsoftã¯ãC ++ AMPããªãªãŒã¹ããããšã«ãããGPUã®éçºããã»ã¹ãçµ±äžããããšããŸããã ãããããã®äºå®ã¯ãGPUã§éçºããå Žåãéçºè
ãã°ã©ãã£ãã¯ã¢ããã¿ãŒã®ããŒããŠã§ã¢ããæ¯ãè¿ããå¿
èŠããããšããäœåãªèšŒæ ã§ãã ããã«ãããŒããŠã§ã¢ã¬ãã«ã®ãã«ãŒããã¯ã³ãŒãã«æ·±ã浞éããŠããããããã³ããŒã®å€æŽã¯èšããŸã§ããªããã°ã©ãã£ãã¯ã«ãŒãã®ã¢ãã«ãå€æŽããå Žåã§ãã¢ããªã±ãŒã·ã§ã³ã®èµ·åãå°é£ã«ãªãå ŽåããããŸãã åœç¶ãããã«ããããããã°æãšãç¹å®ã®ã¿ã¹ã¯ã«ããçç£çã§é©åãªããŒããŠã§ã¢ãã©ãããã©ãŒã ã«ç§»è¡ããéã«ãããã«å°é£ãçããŸãã
æçµçã«ãããããã¹ãŠã«ãããç 究è
ã¯ãç 究察象åéã®é©çšãããåé¡ã解決ããã®ã§ã¯ãªã ãããŒããŠã§ã¢ããããã°ãå±éãããã³ãµããŒãã«é¢é£ããã€ã³ãã©ã¹ãã©ã¯ãã£ã®åé¡ã«å¯ŸåŠããå¿
èŠããããŸã ã
Dryadãã¬ãŒã ã¯ãŒã¯ã¯ãGPUãšã¯ç°ãªããåæ£ã¢ããªã±ãŒã·ã§ã³ã®éçºè
ããããŒããŠã§ã¢ã¬ãã«ãé ããŸãããåæ£ã¢ããªã±ãŒã·ã§ã³ãå®è¡ããããã®ããŒããŠã§ã¢ãã©ãããã©ãŒã ã«ã¯éåžžã«ç¹å®ã®èŠä»¶ããããŸãïŒèŠä»¶ã¯ã·ãªãŒãºã®æåã®èšäºã§èª¬æããŸããïŒã
2.ãã©ã€ã¢ããšäžŠåDB
DryadãšDBMSã®äž»ãªåºæ¬çãªéãã¯ãã¹ãã¬ãŒãžå±€ãããã©ãŒãã³ã¹å±€ã Dryad ã®ãœãããŠã§ã¢ã¢ãã«éã®åŒ·åãªæ¥ç¶ããªãããšãšãDBMSã§ã®ãã®ãããªæ¥ç¶ã®ååšã§ãã ãã®éãã¯
å°å
¥éšã®å³ã
ããã§ããDryadã¯ãåŸæ¥ã®DBMSãšäžŠåDBMSã®äž¡æ¹ã®äžçã®å€ãã®ã¢ã€ãã¢ããåžåãããŸããã
å€ãã®äžŠåDBMS ïŒTeradataãIBM DB2 Parallel EditionïŒãšåæ§ãDryadã¯ã·ã§ã¢ãŒãããã·ã³ã°ã¢ãŒããã¯ãã£ãã·ã£ãŒãã£ã³ã°ïŒæ°Žå¹³åå²ïŒãåçååå²ãåå²æŠç¥ïŒããã·ã¥åå²ãç¯å²åå²ãã©ãŠã³ãããã³ïŒã䜿çšããŸãã
åŸæ¥ã®DBMSã®äžçãããã¯ãšãªãªããã£ãã€ã¶ãŒãšå®è¡èšç»ã®æŠå¿µãæ¡çšãããŸããã ãããã®æŠå¿µã¯éåžžã«å€æãããŠããŸããDryadLINQã¹ã±ãžã¥ãŒã©ãŒã®çµæã¯ãå®è¡èšç»ã°ã©ãã§ããããã¯ãããªã·ãŒã«åºã¥ããŠåçã«å€æŽãããçµ±èšã®å®è¡äžã«åéãããŸãã
ãã¹ãŠã®DBMSãšåæ§ã«ãDryadã¯ããŒã¿ã¯ãšãªèšèªã䜿çšããŸãã Dryadã§ã¯ãã¯ãšãªãèšè¿°ããããã®èšèªã®åœ¹å²ã¯DryadLINQããã°ã©ãã³ã°ã¢ãã«ã«ãã£ãŠæããããŸãã ããããSQLãšã¯ç°ãªããDryadLINQã¯ïŒ
+ ããŒã¿æ§é ãšè€ååã䜿çšããããã«æåã«äœæãããŸãã ã
+ã¯ãã¢ããªã±ãŒã·ã§ã³å±€ãå±€ã«é¢é£ä»ããªãé«ã¬ãã«ã®æœè±¡åã§ã
ã¹ãã¬ãŒãž;
+ã¯ãå埩ãªã©ã®äžè¬çãªããã°ã©ãã³ã°ãã¿ãŒã³ããã€ãã£ãã§ãµããŒãããŠããŸãã
-ãã©ã³ã¶ã¯ã·ã§ã³ããã³æŽæ°æäœããµããŒãããŸããã
ããã«ãSQLã¯ãæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã®èšè¿°ãäºå®ã®ã·ãŒã±ã³ã¹ïŒãã°ãã²ãã ããŒã¿ããŒã¹ïŒã®è§£æãããã³ã°ã©ãåæã«ã¯æ ¹æ¬çã«äžé©åã§ãã Dryadããã©ã³ãã ã¢ã¯ã»ã¹ããŒã¿ã¢ã¯ã»ã¹ãå¿
èŠãšããã¢ã«ãŽãªãºã ã«åºã¥ããŠåé¡ã解決ããã®ã«ã¯å¹æããªãããã«ã
次ã®è¡šã§ã¯ããªã¬ãŒã·ã§ãã«DBMSãšåæ£ã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ã«åºã¥ãããœãªã¥ãŒã·ã§ã³ãæ¯èŒããŠããŸãã
çµè«ãšããŠã䞊åDBMSã«åºã¥ããœãªã¥ãŒã·ã§ã³ãåºãæ¡çšããéã®æ倧ã®é害ã¯ããœãªã¥ãŒã·ã§ã³ã®ã³ã¹ãã§ãããäžè¬çã«ã¯æ°åäžãã«ã«ãªããŸãã DryadããŒã¹ã®ãœãªã¥ãŒã·ã§ã³ã«ãããè²»çšã ç§ã®æèŠã§ã¯ãåèšã¯æ¡éãã«äœãã§ãã
3.ãã©ã€ã¢ããšHadoop
map / reduceãã©ãã€ã ã¯ãããŒã¿äžŠåã¢ã«ãŽãªãºã ãèšè¿°ããéåžžã«ãšã¬ã¬ã³ããªæ¹æ³ã§ãã ã©ã³ã¿ã€ã ã€ã³ãã©ã¹ãã©ã¯ãã£ãšmap / reduceã¢ããªã±ãŒã·ã§ã³ãäœæããããã®ããã°ã©ãã³ã°ã¢ãã«ãæäŸããHadoopã®ç»å Žã¯ãããã°ããŒã¿ã®åé¡ã解決ããäžã§é©åœçãªé£èºã§ããã
*æåéå·¡åã°ã©ãïŒEngãDirected Acyclic GraphïŒã
**ããŒã¿çã®ã¿ãå©çšå¯èœã§ãïŒ2013幎6æçŸåšïŒã
***éçåä»ãã䜿çšããCLSäºæPL
**** Hadoopã¯ã©ã¹ã¿ãŒãå±éããHadoopã¿ã¹ã¯ãå®è¡ããããã®ã€ã³ãã©ã¹ãã©ã¯ãã£ã
***** Hadoopãšã³ã·ã¹ãã ã®ãµãŒãããŒãã£ã³ã³ããŒãã³ããã€ã³ã¹ããŒã«ããå Žåã«ã®ã¿å©çšå¯èœã§ãã
3.1ã Hadoop
Hadoopã®ã€ããªãã®ãŒå®¶ãšéçºè
ã¯ãäžå¿
èŠãªãã®ããã¹ãŠå»æ£ãããããéçºè
ã®æ倧ã®èŒªã«ãšã£ãŠã·ã³ãã«ã§ç解ãããããMPPã¢ããªã±ãŒã·ã§ã³çšã®éåžžã«å¹æçã§éå®çãªéçºãã©ãããã©ãŒã ãäœããŸããã
Hadoopã¯map / reduceã«æé©ã§ããããããŸã§ã®ãšãããä»ã®åæ£ã¢ã«ãŽãªãºã ïŒYARNãåŸ
æ©ïŒåãã«éçºããéã®æ¹å€ã«ã¯èããããŸããã ãããã£ãŠãHadoop MapReduceã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ïŒPigãªã©ïŒã«åºã¥ããŠãããåå¥ã®ã³ã³ãã¥ãŒãã£ã³ã°ãã¬ãŒã ã¯ãŒã¯ïŒHiveãStormãApache GiraphïŒãè¡šãèšå€§ãªæ°ã®Hadoopè£å©ããŒã«ã ãããŠããããã®ããŒã«ã¯ãã¹ãŠããã°è§£æãšPageRankã«ãŠã³ãããã³ã°ã©ãåæã®äž¡æ¹ã解決ããããã®åäžã®æ±çšããŒã«ãæäŸããã®ã§ã¯ãªããçãæ§è³ªã®ã¿ã¹ã¯ ïŒå®éãå¶éã®åé¿ïŒã®éè€ãœãªã¥ãŒã·ã§ã³ãæäŸããŸãã
åœç¶ãæ¥åžžçãªåæã¿ã¹ã¯ã解決ããããã«å¿
èŠãªHadoopãšã³ã·ã¹ãã å
šäœã®ã€ã³ã¹ããŒã«ãæ§æãããã³ãµããŒãã¯ããªãã®æéã§ããããã®çµæã ããžãã¹ãç 究è
ã®ã¿ã¹ã¯ã§ã¯ãªãã ã€ã³ãã©ã¹ãã©ã¯ãã£ã®åé¡ã解決ããããã®è²¡åã³ã¹ãã«ãªããŸãã ãã®åé¡ã®éšåçãªè§£æ±ºçãšããŠãHadoopãã©ãããã©ãŒã ã®ãã£ã¹ããªãã¥ãŒã¿ãŒã¯ãçµã¿ç«ãŠããããããã«èŠããŸããïŒæ倧ã®ãã®ã¯ClouderaãšHortonworksã§ãïŒã ããããããã¯ãŸã åé¡ã®è§£æ±ºçã§ã¯ãããŸãã -ããã¯ãã®ååšã®ç¹å¥ãªç¢ºèªã§ã ã
map / reduce以å€ã®åæ£ã¢ã«ãŽãªãºã ã®éçºã«å¿
èŠãªã³ã³ããŒãã³ããšAPIãéçºè
ã«æäŸãã YARNãœãããŠã§ã¢ãã¬ãŒã ã¯ãŒã¯ã¯ãé²åçãªé£èºã«ãªããŸãïŒãããŸã§ã®ãšããïŒã YARNã¯ããªãœãŒã¹äœ¿çšçã®äœããã¹ã±ãŒã©ããªãã£ã®ãããå€ãªã©ãHadoop v1.0ã®å€ãã®åé¡ã解決ããŸããããã¯ãçŸåšãçŽ4Kã³ã³ãã¥ãŒãã£ã³ã°ããŒãã§ãïŒ2011幎ã«Dryadã«ã¯ãã§ã«10KããŒãããããŸããïŒã
2013幎5æã®æç¹ã§ãYARNã¯ãŸã ãªãªãŒã¹ããŒãžã§ã³ã§ã¯ãããŸããã ãé
ããApacheã³ãã¥ããã£ãèãããšãYARNã®ãªãªãŒã¹ããŒãžã§ã³ã®ãªãªãŒã¹ãšãYARN APIã䜿çšããŠèšè¿°ãããmap / reduce以å€ã®åæ£ã¢ã«ãŽãªãºã ã®ãªãªãŒã¹ããŒãžã§ã³ã®éã®æéééãæ°å¹Žã«ãªãå¯èœæ§ãé«ãããšãèæ
®ããå¿
èŠããããŸãã
3.2ã ãã©ã€ã¢ã
Dryadãã¬ãŒã ã¯ãŒã¯ã«ãããéçºè
ã¯åœåãä»»æã®åæ£ã¢ã«ãŽãªãºã ãå®è£
ã§ããŸããã ãããã£ãŠã Hadoop MapReduceãœãããŠã§ã¢ã¢ãã«ïŒv1.0ïŒã¯ãããäžè¬çãªDryadãœãããŠã§ã¢ã¢ãã«ã®ç¹æ®ãªã±ãŒã¹ã«ãããŸãã ã
çµåæäœã䌎ãHadoopã®åé¡ãPageRankã®èšç®å¹çãHadoopãã©ãããã©ãŒã ã®ä»ã®å¶éãããã³ãããã®è§£æ±ºæ¹æ³ã«ã€ããŠã¯ããã®èšäºã®ç¯å²å€ã§ããããšããæããã«ããŸããã 代ããã«ãDryadãã¬ãŒã ã¯ãŒã¯ã®æ©èœã«ã€ããŠèª¬æããŸããHadoopãã©ãããã©ãŒã ã«ã¯é¡äŒŒç©ããããŸããã
Dryadã«ã¯ãã·ãªãŒãºã®ä»¥åã®èšäºã§èª¬æãããåæ£ã¢ããªã±ãŒã·ã§ã³ã®å®è¡ããã»ã¹ãèšç»ããããã®å°è±¡çãªããŒã«ã®ãªã¹ãããããŸãã ãã®ãããDryadLINQã§èšè¿°ãããåŒãå®è¡ã°ã©ã-EPGã«å€æãã䞊åã³ã³ãã€ã©ããããŸãã EPGã¯ãå®è¡åïŒ éçãªããã£ãã€ã¶ãŒ ïŒãšå®è¡äžïŒå®è¡äžã«åéãããããªã·ãŒãšçµ±èšã«åºã¥ãåçæé© å ïŒã®äž¡æ¹ã§æé©åã¹ããŒãžãééããŸãã
䞊åã³ã³ãã€ã©ãã©ã³ã¿ã€ã ã°ã©ããããã³ã°ã©ããéç/åçã«æé©åããæ©èœã«ãããåæ£ã¢ããªã±ãŒã·ã§ã³ã®ã¹ã±ãžã¥ãŒãªã³ã°/å®è¡ãæ¹åããã³æé©åãããŸãã
æåéå·¡åã°ã©ãã®æŠå¿µã«ãããHadoopã§å®è£
ããããããã¯ããã«æŽç·Žãããæ¹æ³ã§ããã©ãŒã«ããã¬ã©ã³ã¹ãç£èŠãèšç»ãããã³ãªãœãŒã¹ç®¡çã«é¢é£ããå€ãã®åé¡ã解決ã§ããŸãïŒããã«ã€ããŠã¯ã·ãªãŒãºã®æåã®èšäºã§æžããŸããïŒã
ã³ã³ãã¥ãŒãã£ã³ã°ããŒãã®é害åŠçã«ãããã¹ããŒãžå šäœãåèµ·åããªãããã«ããããšãã§ããŸãã
ãé ããã³ã³ãã¥ãŒãã£ã³ã°ããŒããåŠçãããšãæãé ãããŒãããçµäºãããããŒãããåŸ ããªããããšãã§ããŸãïŒããšãã°ãReduceãã§ãŒãºãéå§ããããïŒ
Dryadã®åçãªéçŽã«ããã次ã®æ®µéïŒç³ã¿èŸŒã¿ãªã©ïŒã®éå§åã«ãããã¯ãŒã¯åž¯åå¹ ã®äœäžãåé¿ãããŸãã
Hadoopã«æ¬ ããŠãããã1ã€ã®èå³æ·±ãæ©èœã¯ã channelã®æŠå¿µã®æœè±¡åã§ã ã å°å
¥ãããæœè±¡åã®ãããã§ãTCPãäžæãã¡ã€ã«ãå
±æã¡ã¢ãªFIFOã®äž¡æ¹ãDryadã®ãã£ãã«ãšããŠæ©èœã§ããŸãã ããã«ãããPageRankèšç®ãªã©ã®ã¢ã«ãŽãªãºã ããäœé
延ã®ãã£ãã«ïŒå
±æã¡ã¢ãªFIFOãªã©ïŒã§ã®å埩éã§ããŒã¿ã亀æã§ããŸãã Hadoopã§ã¯ãå埩éã®ããŒã¿è»¢éã¯åžžã«TCPãã£ãã«ãçµç±ããŸããTCPãã£ãã«ã¯å
±æã¡ã¢ãªã«æ¯ã¹ãŠåŸ
ã¡æéãããªãé·ããªããŸãã ïŒãã®åäœã¯YARNã§ä¿®æ£ããããšããæ
å ±ããããŸãããåäœç¢ºèªã¯ãŸã èŠãŠããŸãããïŒ
ã€ã©ã¹ãã®ãœãŒã¹[7]
äžéšã®Dryadã¢ãŒããã¯ãã£ãœãªã¥ãŒã·ã§ã³ïŒåã®èšäºã§èª¬æãããã¡ã¿ããŒã¿ã®å®è¡ã°ã©ããžã®ãã¢ã¿ãããïŒãšé«ã¬ãã«ã®éçåä»ãPLã®ãã¬ãŒã ã¯ãŒã¯ã«ãããã€ãã£ããµããŒãã«ãããDryadã¢ããªã±ãŒã·ã§ã³ã®éçºæã«éåžžã«å³å¯ã«åä»ããããããŒã¿ãæäœã§ããŸããã Hadoopã®éçºè
ã«ãšã£ãŠã¯ãéåžžã®æ¹æ³ã¯å
¥åããŒã¿ã解æããããã«ïŒæãå®å
šã§ã¯ãªãïŒæåŸ
ãããåã«ãã£ã¹ãããããšã§ãã
3.3ã ç·Žç¿ãã
以äžã¯ãäœã¬ãã«APIã䜿çšããHadoopããã³Dryadã®ç®è¡å¹³åã¢ããªã±ãŒã·ã§ã³ã®ãªã¹ãã§ãã
ãªã¹ã1. Hadoopã§ã®ç®è¡å¹³åèšç®ïŒJavaïŒã ãœãŒã¹[1]ã
// InitialReduce: input is a sequence of raw data tuples; // produces a single intermediate result as output static public class Initial extends EvalFunc<Tuple> { @Override public void exec(Tuple input, Tuple output) throws IOException { try { output.appendField(new DataAtom(sum(input))); output.appendField(new DataAtom(count(input))); } catch(RuntimeException t) { throw new RuntimeException([...]); } } } // Combiner: input is a sequence of intermediate results; // produces a single (coalesced) intermediate result static public class Intermed extends EvalFunc<Tuple> { @Override public void exec(Tuple input, Tuple output) throws IOException { combine(input.getBagField(0), output); } } // FinalReduce: input is one or more intermediate results; // produces final output of aggregation function static public class Final extends EvalFunc<DataAtom> { @Override public void exec(Tuple input, DataAtom output) throws IOException { Tuple combined = new Tuple(); if(input.getField(0) instanceof DataBag) { combine(input.getBagField(0), combined); } else { throw new RuntimeException([...]); } double sum = combined.getAtomField(0).numval(); double count = combined.getAtomField(1).numval(); double avg = 0; if (count > 0) { avg = sum / count; } output.setValue(avg); } } static protected void combine(DataBag values, Tuple output) throws IOException { double sum = 0; double count = 0; for (Iterator it = values.iterator(); it.hasNext();) { Tuple t = (Tuple) it.next(); sum += t.getAtomField(0).numval(); count += t.getAtomField(1).numval(); } output.appendField(new DataAtom(sum)); output.appendField(new DataAtom(count)); } static protected long count(Tuple input) throws IOException { DataBag values = input.getBagField(0); return values.size(); } static protected double sum(Tuple input) throws IOException { DataBag values = input.getBagField(0); double sum = 0; for (Iterator it = values.iterator(); it.hasNext();) { Tuple t = (Tuple) it.next(); sum += t.getAtomField(0).numval(); } return sum; }
ãªã¹ã2. Dryadã®ç®è¡å¹³åèšç®ïŒCïŒïŒã ãœãŒã¹[1]ã
public static IntPair InitialReduce(IEnumerable<int> g) { return new IntPair(g.Sum(), g.Count()); } public static IntPair Combine(IEnumerable<IntPair> g) { return new IntPair(g.Select(x => x.first).Sum(), g.Select(x => x.second).Sum()); } [AssociativeDecomposable("InitialReduce", "Combine")] public static IntPair PartialSum(IEnumerable<int> g) { return InitialReduce(g); } public static double Average(IEnumerable<int> g) { IntPair final = g.Aggregate(x => PartialSum(x)); if (final.second == 0) return 0.0; return (double)final.first / (double)final.second; }
3.4ã éçºè
ã®ã¢ã¯ã»ã·ããªãã£
ãã©ã€ã¢ãã¯ããã®ã³ãã¥ããã£ããééãããç¬èªã®ã·ã¹ãã ã§ããããããŸããªæªæ¥ãæã£ãŠããŸãïŒããæ£ç¢ºã«ã¯ããŸã£ãããããŸããïŒã 察ç
§çã«ãHadoopã¯ã巚倧ãªã³ãã¥ããã£ãæ確ãªã©ã€ã»ã³ã¹æ¹æ³ãããã³ããã€ãã®å€§èŠæš¡ãªãã£ã¹ããªãã¥ãŒã¿ãŒïŒClouderaãHortonworksãªã©ïŒãåãããªãŒãã³ãœãŒã¹ãããžã§ã¯ãã§ãã
Hadoopãšã®æ¯èŒã«é¢ããç« ã®çµããã«ãã¯ã©ãŠããµãŒãã¹ã®çŸåšã®éçºã¬ãã«ã§äœ¿çšããHadoopã¯ã©ã¹ã¿ãŒãååŸããããšã¯é£ãããªãããšã«æ³šæããŠãã ãããAmazonWeb ãµãŒãã¹ã¯ ã Amazon Elastic MapReduceãµãŒãã¹ãšMicrosoft HDInsightãä»ããWindows Azureã¯ã©ãŠããã©ãããã©ãŒã ãä»ããŠHadoopã¯ã©ã¹ã¿ãŒãæäŸããŸãã
ãHadoop + {WA | AWS}ãã¹ã¿ãŒãã¢ãããç 究è
åãã®Hadoopãã©ãããã©ãŒã ã®å¯çšæ§ã¯éåžžã«é«ããªã£ãŠããŸãã Dryadã®å
¥æå¯èœæ§ã«ã€ããŠè©±ãå¿
èŠã¯ãããŸãããåçšã©ã€ã»ã³ã¹ã¯ãªããã¢ã«ãããã¯ãªäœ¿çšã«ã€ããŠã¯ã»ãšãã©è©±ãããŠããŸããã
Hadoopã¯ãããã°ããŒã¿ãæäœããããã®äºå®äžã®æšæºã§ãã YARNã®å°æ¥ã®ãªãªãŒã¹åŸããã©ãããã©ãŒã ããã®æšæºã«ãµãããããã®ã«ãªã£ãããšãçã人ã¯ããªãã ãããšããæåŸ
ããããŸãã ãããžã§ã¯ãã®ããã«ããã©ã€ã¢ãã¯ãçãŸãå€ããããæã£ãŠããããã§ãããã®ãã¡ã®1ã€ã¯ãã€ã¢ãã ïŒã€ã³ã¯ãªã¡ã³ã¿ã«ãã©ã€ã¢ãïŒã§ãã ãããŠãDryadã§å®ããããååã¯ãMicrosoft Researchãããžã§ã¯ãã ãã§ãªãããªãŒãã³ãœãŒã¹ã³ãã¥ããã£ã§ãç¶ç¶ããŠããããšã¯ç¢ºãã§ãã
ãããã«
æåéå·¡åã°ã©ãã®æŠå¿µã«åºã¥ããDryadãã¬ãŒã ã¯ãŒã¯ã¯ããã®æŠå¿µã«ã åæ£ã¢ããªã±ãŒã·ã§ã³å®è¡ãã¬ãŒã ã¯ãŒã¯ãäŒçµ±çããã³äžŠåDBMSã®äžçã«ãããææ°ã®ã¢ã€ãã¢ã課ããŸããã ã©ã³ã¿ã€ã ãåæ£ã¹ãã¬ãŒãžãããã³åã
ã®ã¢ãžã¥ãŒã«éã®ãœãããŠã§ã¢ã¢ãã«ã«é¢é£ãã責任ã®åå²ã«ãã ãDryadã¯éåžžã«æè»ãªã·ã¹ãã ãç¶æã§ããŸããã .NETéçºè
ïŒ.NET FrameworkãCïŒãVisual StudioïŒ ã®æ¢åã®ãœãããŠã§ã¢ã¹ã¿ãã¯ãšã®ç·å¯ãªçµ±åã«ãã ããã¬ãŒã ã¯ãŒã¯ã®äœ¿çšãéå§ããã®ã«å¿
èŠãªæéã倧å¹
ã«ççž®ãããŸãã
ã·ã³ãã«ã§ãšã¬ã¬ã³ããªã³ã³ã»ãããé©æ°çãªã¢ã€ãã¢ãçŸããã¢ãŒããã¯ãã£ãããªãã¿ã®ãã¯ãããžãŒã¹ã¿ãã¯ã«ãããDryadã¯ããã°ããŒã¿ãæ±ãããã®å¹æçãªããŒã«ã«ãªããŸãã ããŒããŠã§ã¢ã«ãã€ã³ããããGPUã³ã³ãã¥ãŒãã£ã³ã°ãããå¹ççã§ãã åŸæ¥ã®DBMSã«åºã¥ãæ¡åŒµæ§ã®äœããœãªã¥ãŒã·ã§ã³ã é«äŸ¡ã§ããã䞊åDBMSã«åºã¥ãSQLèšèªãœãªã¥ãŒã·ã§ã³ã®åå§æ§ã«ãã£ãŠå¶éãããŸãã ãã©ã€ã¢ãã¯ããããäžã®ãã«ãŒãã/ Hadoopã¢ãã«ã®åæžãäžåããYARNã®åºçŸåã«ãåäžã®é害ç¹ãå Žåã«ãã£ãŠã¯ãªãœãŒã¹äœ¿çšçã®äœãããŸãã¯åã«ã³ãã¥ããã£ã®æ
£æ§ããèŠãã¿ãŸãã
åæã«ãDryadã®æçœãªå©ç¹ã¯ãã¹ãŠããã®è£œåã®æ§è³ªã«ãã£ãŠå®¹æã«å¹³æºåãããŸããããã¯ã å
éšäœ¿çšã®ããã®Microsoftç¬èªã®è£œåã§ãããMicrosoftãåã
ã«æ±ºå®ããéåœã§ãã
ãããã Dryadããã®ãŸãŸã§ããããšãæ¢ããããã§ã¯ãããŸããã æ°ããèå³æ·±ãå€èŠ³ãMicrosoft Researchã®åæ£ã¢ããªã±ãŒã·ã§ã³å®è¡ã·ã¹ãã ã®é©æ°çãªããžã§ã³ã§ãã
ãœãŒã¹ã®ãªã¹ã
[1] Y. YuãPK GundaãMãIsardã ããŒã¿äžŠåã³ã³ãã¥ãŒãã£ã³ã°ã®åæ£éçŽïŒã€ã³ã¿ãŒãã§ã€ã¹ãšå®è£ ã2009幎ã
[2] M.ã¢ã€ãµãŒããMãããã£ãŠãYããŠãŒãAããã¬ã«ãããã³D.ãã§ãã¿ãŒãªãŒã DryadïŒã·ãŒã±ã³ã·ã£ã«ãã«ãã£ã³ã°ãããã¯ããã®åæ£ããŒã¿äžŠåããã°ã©ã ã ã³ã³ãã¥ãŒã¿ã·ã¹ãã ã«é¢ãã欧å·äŒè°ïŒEuroSysïŒã®è°äºé²ã2007幎ã
[3]ãã ã»ãã¯ã€ãã HadoopïŒæ±ºå®çã¬ã€ãã第3çã O'Reilly Media / Yahoo Pressã2012幎ã
[4] Arun C Murthyã 次äžä»£ã®Apache Hadoop MapReduce ã Yahooã2011幎ã
[5] D.ããŠã£ãããšJ.ã°ã¬ã€ã 䞊åããŒã¿ããŒã¹ã·ã¹ãã ïŒé«æ§èœããŒã¿ããŒã¹åŠçã®æªæ¥ã ACMã®éä¿¡ã36ïŒ6ïŒã1992ã
[6] David TarditiãSidd PuriãJose Oglesbyã ã¢ã¯ã»ã©ã¬ãŒã¿ïŒããŒã¿äžŠåæ§ã䜿çšããŠãæ±çšçšéåãã«GPUãããã°ã©ãã³ã°ããŸãã 2006幎10æãããµãã¥ãŒã»ããå·ãã¹ãã³ã®ããã°ã©ãã³ã°èšèªãšãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã®ã¢ãŒããã¯ãã£ãµããŒãã«é¢ããåœéäŒè°ASPLOSïŒã
[7]æJéœã ãã©ã€ã¢ã/ãã©ã€ã¢ãLINQã¹ã©ã€ã㯠ã2009幎ã«Yuan Yuããã³Michael Isardã®ã¹ã©ã€ããæ¡çšãããã®ã§ã ã