æšå¹ŽãBadooã¯Hadoop + Sparkãã³ãã«ãç©æ¥µçã«äœ¿çšãå§ããSpark Streamingã䜿çšããŠæ°åäžã®ã¡ããªãã¯ãåéããã³åŠçããç¬èªã®ã·ã¹ãã ãæ§ç¯ããŸããã
ç§ãã¡ã®ç¥èãæ¡å€§ãããã®åéã®ææ°ã®ã€ãããŒã·ã§ã³ã«ç²Ÿéããããã«ãä»å¹Ž5ææ«ã«BIïŒããžãã¹ã€ã³ããªãžã§ã³ã¹ïŒéçºè ã¯ãã³ãã³ã«è¡ãã次ã®Hadoop + Strataã·ãªãŒãºäŒè°ãéå¬ãããŸãããããã°ããŒã¿åæã
ãŸããç§ãã¡ã䜿çšããŠããæè¡ã¹ã¿ãã¯ã«èå³ããããŸããã Sparkã«ã€ããŠã®è¬æŒè ã¯ãClouderaãDatabricksãHortonworksãIBMã®æžç±ã®è¬æŒè ãã¯ãªãšã€ã¿ãŒãã¢ã¯ãã£ããªãå¯çš¿è ããèè ã®æµãå šäœã匷調ããŸããã
äŒè°ã®äž»å¬è ã¯ãè¡šçŽã«è±ªè¯ãªåç©ã®æ¬ãåºçããITåºç瀟ãšããŠç¥ãããO'ReillyãšãHadoopããŒã¹ã®ãœãªã¥ãŒã·ã§ã³ãå°éãšããITäŒç€Ÿã®Clouderaã§ãã
ä»å¹Žã®äŒè°ã®äŒå Žã¯ã巚倧ãªãã³ãã³ã®ãšã¯ã»ã«å±ç€ºããŒã«ã§ããã ãã®ãµã€ãºã¯çŽ æŽãããã§ãã DLRåè»ã®2ã€ã®é§ ã®éã«ãããå¿ èŠãªã€ãã³ããã©ã®éšåã§è¡ããããããããªãå Žåã¯ã15ã20åãããŠå»ºç©ã®äžæ¹ã®ç«¯ããããäžæ¹ã®ç«¯ãŸã§ç§»åã§ããŸãã
ã€ãã³ãèªäœã®èŠæš¡ãåæ§ã«éå¿çã§ããã äŒè°ã¯ç«ææ¥ããéææ¥ãŸã§ã®4æ¥éç¶ããŸããã æåã®2æ¥éã§ãå ¥éè¬çŸ©ãã»ãããŒããã¹ã¿ãŒã¯ã©ã¹ãéå¬ãããç¿æ¥ã«ã¯äŒè°èªäœãéå¬ãããŸããã ãããããããã°ããŒã¿ãæ±ãããšã«å°å¿µããäŒè°ã¯ããã§ãããšæãããã®ã§ãããã§æ瀺ãããæ å ±ã®éã¯èšå€§ã§ããã äžé£ã®8ã€ã®ã¬ããŒãïŒïŒïŒã§å§ãŸãããã®åŸ10ãè¶ ãã䞊åã»ã¯ã·ã§ã³ãéããããããã«6æéã®ã¬ããŒãããããŸããã
ã¹ã±ãžã¥ãŒã«ãéåžžã«å³ãããããäž»å¬è ã¯ãããããããããããæ¹æ³ã«ãã£ãšæ³šæãæãå¿ èŠããããŸãã ããšãã°ãåã»ã¯ã·ã§ã³ã®ãããã¯ã瀺ããããã¿ãŒã²ãããªãŒãã£ãšã³ã¹å¥ã«ã¬ããŒããäœæãããïŒã€ãŸãããšã³ãžãã¢ãã¢ããªã¹ããããã³ããžãã¹æ åœè åãã®ã¬ããŒãã«åå²ïŒããããšãã§ããŸãã
代ããã«ãç»é²äžã«ãA3ã·ãŒããæå°ã®ãã©ã³ãã§æžãããã¹ã±ãžã¥ãŒã«ãšãšãã«é åžãããŸããã 確ãã«ã代æ¿æ段ãšããŠãçŸåšçºçããŠããã€ãã³ãã远跡ããç¬èªã®ã¹ã±ãžã¥ãŒã«ãäœæã§ããã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ã䜿çšããããšãã§ããŸããã
äŒè°ã®åæ¥ã¯äžé£ã®çãåºèª¿è¬æŒã§å§ãŸããã¹ããŒã«ãŒã¯äººå·¥ç¥èœãããŒã¿åæãæ©æ¢°åŠç¿ãã»ãã¥ãªãã£ã®åéã®äž»èŠãªåŸåã«ã€ããŠ10ã15åéæèŠãå ±æããŸããã ãããã®ã¹ããŒãã®åŸãã¬ããŒãã®ããã»ã¯ã·ã§ã³ãéãããŸããã æãèå³æ·±ããšæãããã¬ããŒãã«ã€ããŠã¯ãæ¬æ¥ã®ã¬ãã¥ãŒã§ãç¥ããããŸãã
Spark 2.0次ã¯äœã§ããïŒ
Apache Sparkã»ã¯ã·ã§ã³ã®æåã¯ãDatabricksã®éçºè ã§ããTathagata Dazã§ããïŒåœŒã¯ããã«èª°ã圌ã®ååãçºé³ã§ããªãããšã«æ°ä»ããã®ã§ã誰ããåã«TDãšåŒã³ãŸããïŒã 圌ã®ã¬ããŒãã§ã¯ãApache Spark 2.0ã®ãªãªãŒã¹ã§äœãæåŸ ãããã«ã€ããŠè©±ããŸããã
TDã¯ãã¡ãžã£ãŒãªãªãŒã¹ã¯ä»å¹Ž6æã«äºå®ãããŠãããšè¿°ã¹ãŸããã ãã®èšäºã®å·çæç¹ã§ã¯ã äžå®å®ãªãã¬ãã¥ãŒãªãªãŒã¹ããŒãžã§ã³ã®ã¿ããã¹ãŠã®ãŠãŒã¶ãŒãå©çšã§ããŸãã ã¹ããŒã«ãŒã¯ãŸãããªãªãŒã¹ã®ã倧éšåãã«ããããããã1.xãšã®åŸæ¹äºææ§ãã»ãŒå®å šã«ç¶æãããããšãä¿èšŒããŸããã
次ã«ããã®ãªãªãŒã¹ã®çŽæãããç»æçãªæ©èœã«ã€ããŠçŽæ¥èª¬æããŸãã
- ã¿ã³ã°ã¹ãã³ãã§ãŒãº2 ã Tungstenãããžã§ã¯ãã¯ãSparkãã¬ãŒã ã¯ãŒã¯ã§ã®éã®ã¡ã¢ãªãšäœ¿çšçãæ¹åããããšãç®çãšããäžé£ã®æé©åã§ãã æŽæ°ãããããŒãžã§ã³ã§ã¯ãã¿ã³ã°ã¹ãã³ã®äœæ¥ã5ã10åå éããããšãçŽæããŸãã ããã¯ãã³ãŒãçæãæé©åããã¡ã¢ãªã¢ã«ãŽãªãºã ãæ¹åããããšã§éæãããŸããã 以åã«ããã€ãã®é£ç¶ããæäœããã®èŠæ±ãä»®æ³åŒã³åºãã®ãã§ãŒã³ãå¿
èŠãšããŠããå Žåãä»ã§ã¯åäžã®ã³ãŒãã«ã³ã³ãã€ã«ãããŸãã
- æ§é åã¹ããªãŒãã³ã° ã éçºè
ããå€ãã®ãã£ãŒãããã¯ãåãåã£ãSparkããŒã ã¯ãã¹ããªãŒãã³ã°ã¢ãã«ã倧å¹
ã«äœãçŽããŸããã æŽæ°ãããããŒãžã§ã³ã§ã¯ãæ§é åã¹ããªãŒãã³ã°ãšåŒã°ãããã€ã³ã¿ã©ã¯ãã£ããã¹ããªãŒãã³ã°ã䜿çšãããŸãããã®ã¹ããªãŒãã³ã°ã§ã¯ãæ¢åã®ã¹ããªãŒã ã«å¯ŸããŠããŸããŸãªã¯ãšãªãå®è¡ããæ©æ¢°åŠç¿ã¢ãã«ãæ§ç¯ããã©ã³ã¿ã€ã ãå®è¡ã§ããŸãã æ¬è³ªçã«ãããã¯SQL APIã®äžã«æ§ç¯ãããé«ã¬ãã«APIã§ãã ããã«ãããã¹ããªãŒãã³ã°ã¯ã¿ã³ã°ã¹ãã³ãããã¹ãŠã®æé©åãåãåãã¯ãã§ãã
- ããŒã¿ã»ãããšããŒã¿ãã¬ãŒã ã æ°ããããŒãžã§ã³ã§ã¯ãDataFrame = Dataset [Row]ãªããžã§ã¯ããäœæãããšãã«ãããã2ã€ã®APIãããŒãžãããŸãã ããã«ãããDatesetã®DataFrameãªããžã§ã¯ãã§ãããããã£ã«ã¿ãŒãªã©ã®æäœãå®è¡ã§ããŸãã æ©èœæ§ããŒã¿ã»ããã¯å®éšçãªãã®ãšããŠããŒã¯ãããŠããã1.xããŒãžã§ã³ãšã®äºææ§ã厩ããå Žæã®1ã€ã§ãã
Sparkã§ã¢ããªã±ãŒã·ã§ã³ãéçºããŠããå Žåã¯ãå€ã«æ°ããããŒãžã§ã³ãæºåããŠã¢ããã°ã¬ãŒãããã®ã«æéããããããšã確èªããŠãã ããã APIã®æ¹åãšããã©ãŒãã³ã¹ã®åäžã¯èŠäºã§ãã
Matei ZahariaãCTO DatabricksãSparkã®äœæè ã«ããSpark Summitã®éåžžã«ãã䌌ãã¬ããŒãã¯ã Spark 2.0ã§èŠãããšãã§ããŸãã
Sparkã§ã®ã¹ããªãŒãã³ã°ã®æªæ¥ã æ§é åã¹ããªãŒãã³ã°
äŒè°ã§ã®TDããã®2çªç®ã®ã¬ããŒãã¯ãSpark Streamingã®æ¹åæ§ãã©ã®ããã«çºå±ãããã«ã€ããŠã®ãã詳现ãªè©±ã§ããã ã¹ããŒã«ãŒã«ãããšãSparkã䜿çšããŠããéçºè ã®åæ°ä»¥äžããSpark Streamingãã·ã¹ãã ã®æãéèŠãªã³ã³ããŒãã³ãã§ãããšèããŠããŸãã
éçºè ãã¹ããªãŒãã³ã°ã®ååšã®3幎éã«ããã£ãŠè¡ã£ãäž»ãªçµè«ïŒãã®ããã»ã¹ã¯åç¬ã§çºçããã¹ãã§ã¯ãããŸããã ãŠãŒã¶ãŒã¯ãããŒã¿ã¹ããªãŒã ãååŸããŠåŠçããåŸã§äœ¿çšããããã«ããŒã¿ããŒã¹ã«æ ŒçŽããã ãã§ãªãã ã»ãšãã©ã®å Žåãç£èŠããã®ã¹ããªãŒã ã«æ¥ç¶ããæ©æ¢°åŠç¿çšã®ããŒã¿ãåéãããªã©ãå¿ èŠã§ãã
ãã®ç¹ã§ãéçºè ã¯ããåºãèãå§ããäžèšã®ãã¹ãŠã®å¯èœæ§ãè¿œå ããããã¬ãŒã ã¯ãŒã¯å ã§ãã¹ããªãŒãã³ã°ã ãã§ãªãç¶ç¶çã¢ããªã±ãŒã·ã§ã³ãšããæ°ãããããžã§ã¯ããåŒã³å§ããŸããã
TDã¯ãçŸåšã®D-Streamsã¢ãã«ã®äž»èŠãªåé¡é åã調æ»ããæ§é åã¹ããªãŒãã³ã°ãšåŒã°ããæ°ãããœãªã¥ãŒã·ã§ã³ãå°å ¥ããŸããã
æ§é åã¹ããªãŒãã³ã°ã¯ãç¡éã®ããŒãã«ãæã€äœåãšããŠã¹ããªãŒãã³ã°ãèŠãããšãã§ããæ°ããæŠå¿µã§ãã
ãã®ããŒãã«ã®ããŒã¿ã¯ãDataFrames APIãä»ããSQLã¯ãšãªã䜿çšããŠã¯ãšãªã§ããŸãã ãŠãŒã¶ãŒãå¿ èŠãšãããã®ã«å¿ããŠããã¹ãŠã®ããŒã¿ã«å¯ŸããŠãããã³åä¿¡ããããŒã¿ã®ãã«ã¿ã«å¯ŸããŠã®ã¿ãªã¯ãšã¹ããåŒã³åºãããšãã§ããŸãã
API dstreamsãšDataFramesãçµã¿åãããããšã«ãããã¹ããªãŒã ããã®ããŒã¿ãéçã»ãããšçµã¿åãããæäœãå®è¡ã§ããããã«ãªããŸããã
ãã®ã¬ããŒãã§ã¯ãæ°ããã·ã¹ãã ããå éšãã§ã©ã®ããã«æ©èœããããæãéèŠãªããšãšããŠããã©ãŒã«ããã¬ã©ã³ã¹ãã©ã®ããã«éæããããã«ã€ããŠãæ€èšããŸããã
Spark Streamingã®äžã«è€éãªã·ã¹ãã ãæ§ç¯ããå Žåãéçºè ã«ãããšãæãåçŽåãããAPIã§é«éã§èé害æ§ã®ããã¹ããªãŒãã³ã°ãåŸãããšãã§ãããããæ°ããStructured Streamingã³ã³ã»ãããå¿ ãæ€èšããå¿ èŠããããŸãã
ä»æ§ã¯ã æ§é åã¹ããªãŒãã³ã°ããã°ã©ãã³ã°æœè±¡åã»ãã³ãã£ã¯ã¹ãšAPIã«ãããŸã ã
åãäœè ã«ãããSpark Summitã䜿çšããåæ§ã®ã¬ããŒãã®èšé²ã¯ãããã§èŠãããšãã§ããŸãïŒ æ§é åãããã¹ããªãŒãã³ã°ã®è©³çŽ°
å®é£
å ±åã®åéã«ãäž»å¬è ã¯å®å šã«æšæºçãªã³ãŒããŒãã¬ã€ã¯ãæé ããããæ¥ãæŒé£ãå§ãŸããŸããã å€ãã®ITäŒè°ãšã¯ç°ãªããHadoop + Strataã§ã®æŒé£ã¯è¿œå ãªãã·ã§ã³ã§ã¯ãããŸããã§ããããäŒè°ã®ã¹ãã³ãµãŒããæäŸãããŸããã ããšãã°ã1æ¥ç®ã¯Teradataã®ã©ã³ãã¯æ¶ŒãããŠæ°é®®ã§ã2æ¥ç®ã¯ããªã¥ãŒã ãã£ã·ãã®IBMããã§ããã
Holden Karauã«ããã·ã£ããã«ã®å
圌女ã®éŠã®éå±ãªã³ã°ã®ãã£ã€ã ã®äžã«ããèŽ æ²¢ãªã¹ããŒã«ãŒã¯ãSparkã®å éšã«ã€ããŠå€ãã®èå³æ·±ãããšãèªã£ãã
Sparkã¿ã¹ã¯ãå®è¡ãããã·ã£ããã«ã¹ããŒãžã®æãæ¥ãŸãã... OOMãã©ãŒïŒ
ãããŠãç§ã¯å¹žããšã·ãŒã«ã欲ããã§ãã ç«ã«ã€ããŠèšãã°ãå ±åæžã«ã¯ç«ãããããããŸããã ãŸãã圌女ã¯ãããã奜ãã§ãã éèŠã§ã¯ãªãããmi-mi-miãã ãã§ãããã ãã§ãã
ã§ã¯ãéå°ãªã¡ã¢ãªæ¶è²»ãšããã©ãŒãã³ã¹ã®äœäžã¯ã©ãããæ¥ãã®ã§ããããïŒ
- ããŒã«ããäžèŠãªã°ã«ãŒãåã®å®è¡ã å¯èœã§ããã°ãgroupByKeyã§ã¯ãªãreduceByKeyã䜿çšããå¿ èŠããããŸãã ããã«ãããããŒã¿éãããã«åæžãããŸãã
- äžåäžã«åæ£ãããããŒã¿ã 1ã€ã®ããŒãããã®ããŒã¿ãä»ã®ããŒãããã¯ããã«å€ãå Žåãã·ã£ããã«ãå®è¡ãããšããã®ããŒã¿ã¯ãã¹ãŠ1ã€ã®ã¬ãã¥ãŒãµãŒã«éããã... OOM-killerãããããã«ãããŸãïŒ ãããã£ãŠãã·ã£ããã«ã®äº€æã圹ç«ã€å ŽåããããŸãã
- ä»ã®ããŒã¿ã»ãããšã®çµåã䜿çšããå¿ èŠæ§ã ãŸããããã¯äžè¬çã«ãã¹ãŠã®map-reduceã¢ã«ãŽãªãºã ã«ãšã£ãŠçœå®³ã§ããããŒã¿éãæžããããšã¯äžå¯èœã§ãããå¢ããããšããã§ããªãããã§ãã ççºçãªå¢å ããªãããšã«æ³šæããå¿ èŠããããŸãã
ããã¯ãã¹ãŠç解ããã説æããã解決çãææ¡ãããŸãã ãããªãæé©åãåŸ ã€éã«ã©ã®ãããªåé¡ãçºçããå¯èœæ§ããããã瀺ãããŸãã
ãããŠãã¡ãããä»åŸã®Spark 2.0ã®æ°æ©èœãªãã§ã¯ã
次ã¯ãSparkã®åäœãã¹ãã§ãã ãããŠãæå§ãã«-æé©åã§ããªãã³ãŒããéä¿¡ããææ¡ã
äžè¬ã«ãå®çšçã§ãªããŠãã誰ãããã¹ãŠãæ確ã§ããããã§ã¯ãããŸããããé¢çœãæçã§ããã
Slideshareã¬ããŒãã®ã¹ã©ã€ã
ããŒã¹ã®ãã¢
äŒæ¥ã®ã¹ã¿ã³ãã§ã¯ã圌ãã®ä»£è¡šè ãšããããŠäžéšã§ã¯-補åã®äœè ãšè©±ãããšãã§ããŸããã
ããšãã°ãMapRã¹ã¿ã³ãã§Ted DunningãMapR-FSã®ä»çµã¿ãæããŠãããŸããã ã¯ã©ã¹ã¿ãŒãããŒã ãã£ã¬ã¯ããªãšããŠããŠã³ãããäžæ¹ã®ã³ã³ãœãŒã«ã§å®æçã«çŸåšã®æå»ããã¡ã€ã«ã«æžã蟌ã¿å§ããä»æ¹ã§tail -fãå®è¡ããŸããã äžè¬çã«ãã¯ãŒã«ïŒ ãã¡ã€ã«ãæäœããã ãã§ããã¡ã€ã«ã·ã¹ãã èªäœããµãŒããŒãã¬ããªã±ãŒã·ã§ã³ããã®ä»ãã¹ãŠãåŠçããŸãã ãŸããããŒã¿ãèªã¿åãã«ã¯ãFSã¯ã©ã€ã¢ã³ãã«ããŠã³ãããããã¡ã€ã«ãšããŠã ãã§ãªããåŠçã®ããã«Hive / Sparkã§äœ¿çšããããšãã§ããŸãã
ãã®FSã«ã¯ã³ãã¥ããã£ããŒãžã§ã³ããããŸãã ç§ãã¡ã¯ããã䜿ã£ãŠã¿ãã¹ãã ãšæããŸãïŒ
å®åHadoopã¯ã©ã¹ã¿ãŒã§ã®Apache Sparkã®ä¿è·
SparkãŸãã¯Hadoopã®ã»ãã¥ãªãã£ãæ§æããå¿ èŠãããå Žåã¯ãééããªããã®ãã¬ãŒã³ããŒã·ã§ã³ãããã¯ããŒã¯ã«è¿œå ããŠãã ããïŒ æåã«ãã¹ããŒã«ãŒã¯Hadoopã®ã»ãã¥ãªãã£ã·ã¹ãã ã®éçºã®æŽå²ããå°ã話ãããŸãããã©ã®ã³ã³ããŒãã³ããå«ãŸããŠãããã§ãã äžè¬çã«ãæåã¯ããã«ã»ãã¥ãªãã£ã¯æ³å®ãããŠããŸããã§ãããããã¹ãŠããã£ãšåŸã«ç»å Žããããã¯ãã¹ãŠãã©ã®ããã«é 眮ãããããéšåçã«èª¬æããŸãã
Sparkã®ããŒã¿ã»ãã¥ãªãã£ã¯ãHadoopã®ããŒã¿ã»ãã¥ãªãã£ã«åºã¥ããŠããŸãã ãããã£ãŠãã¹ããŒãªãŒã¯Kerberosããå§ãŸãããŠãŒã¶ãŒãæ¿èªããHDFS / YARNãæ§æããŸãã
ãããã£ãŠããã¹ãŠãé çªã«ä¿è·ããŸãã
- ãŠãŒã¶ãŒèªèšŒ;
- HDFS
- ã€ãŒã³;
- Web UI
- PRC API
- EncryptedFS;
- æç·ããŒã¿æå·å;
- JVMã¡ã¢ãª
- äžæçãªã·ã£ããã«ãããã¯ã®æå·åã
ãããç§ã¯äœãå¿ããŠããªãããã§ãã å¿ããå Žåããã¹ãŠãã¬ããŒãã«å«ãŸããŠããŸãïŒ
次ã«ããã¡ã€ã«ã¬ãã«ãHiveããŒãã«ã®ã¬ãã«ãè¡ãšåã®ã¬ãã«ã§ãç¹æš©ãé åžããããã®ãªãã·ã§ã³ã«ã€ããŠèª¬æããŸããã å¯èœãªå Žåã¯ãã©ã®ããã«ç°ãªããŸããã
Sparkã®ã»ãã¥ãªãã£ã®éçºèŠéãã«ã€ããŠè©±ããŸããã
ã ããä»ãç§ã¯ã»ãã¥ãªãã£ã«é¢ãããã¹ãŠãç¥ã£ãŠããŸãã ãŸããããã¯ç§ã«ã¯æããŸãã
Hadoop Summitã§ã®åæ§ã®ã¬ããŒãã¯ãæ¬çªçšHadoopã¯ã©ã¹ã¿ãŒã§ã®Sparkã®ä¿è·ãã芧ãã ããã
ãã³ãšãµãŒã«ã¹
ã¬ããŒãã®åæ¥ã®çµããã«ã2ã€ã®ã¢ãã¿ãŒããŒãã£ãŒãããã«è¡ãããŸããã æåã¯ãäŒè°ã®ã¹ããŒã«ãŒãšåå è ã®ã»ãšãã©ãå±ç€ºããŒã«ã«è¡ããäž»å¬è ã¯ã¹ãã³ãµãŒã®ã¹ã¿ã³ãã«ã¢ã«ã³ãŒã«ãšãœããããªã³ã¯ãæž©ããã¹ããã¯ã眮ããŸãããåå è ãç¹å®ã®ã«ãŒãã«æ²¿ã£ãŠããããããå¥ã®ããã«ç§»åããŠå€ãéãããšãïŒã
æ¥äžãäž»å¬è ãšãã©ã³ãã£ã¢ã¯äŒè°åå è ã«ãã«ãŒãã·ãŒãããé åžããŸããããããã«ã¯4ã€ã®ãã³ãã³ã®ããã®ãã§ãŒã³ã瀺ãããŠãããé çªã«ãäŒæ¯ãããå¿ èŠããããŸããã å€é£ã®ããã«ãããŒã«ãããã®ã¹ããã¯ã¯ã¹ãã³ãµãŒããæäŸãããŸããã äžéšã®æœèšã¯å€äžå®å šã«è³è²žãããŠããŸããããä»ã®æœèšã§ã¯ãåå è ãå°å ã®å©çšè ãææã«é¥ããŸããã
ç¿æ¥ã®æã«ã¯ããããã¡ãã®é¡ãšè³åã®ããŒã«ãŒã§ãã¹ãŠã®ããã蚪ããããšãã§ããåå è ãç°¡åã«åºå¥ã§ããŸããã
åµã®å€ã®åŸãäŒè°ã®2æ¥ç®ã«å šå¡ãåå ã§ããããã§ã¯ãããŸããã§ããã ããããç§ãã¡ã¯ããã€ãã®èå³æ·±ãå ±åãåŸãããšãã§ããããã«ã¯åãåããŸããã 次ã«ãå°è±¡ãå ±æããŸãã
Hadoopã®ãžã§ããé ãã®ã¯ãªãã§ããïŒ
Apache Ambari Hadoopã¯ã©ã¹ã¿ãŒãã«ã¹ã¢ãã¿ãªã³ã°ããŒã«ã«é¢ããæçãªã¬ããŒãã
Ambari Metrics Systemã®äœ¿çšäŸããµãã·ã¹ãã ïŒHDFSãYARNãHBaseïŒã®æšæºããã·ã¥ããŒãã瀺ãããŠããŸãã
芳å¯ãããç¶æ³ã®å ·äœäŸã瀺ãããŠããŸãã
HDFSããã³YARNç£æ»ãã°ã§ã©ã®ããã«äœãèŠã€ããããšãã§ããããAmbariãä»ããŠãã°ãæäœããæ¹æ³ã瀺ããŸãã
䟿å©ãªããŒã«ïŒ ããã«ãããã¯ã©ã¹ã¿ãŒãäœããã®æ¹æ³ã§åäœããçç±ãã¿ã¹ã¯ã«ååãªãªãœãŒã¹ããããã©ãããå®è¡æ¹æ³ãšå®è¡å 容ããã詳现ã«ç解ã§ããŸãã
ç§ãã¡ã¯ãã®ãããªãã®ãã©ã®ããã«äœ¿çšãã¹ãããç解ããããšããŸãããããããŸã§ã®ãšãããå ¬åŒææžã«ãããšã¯ã©ã¹ã¿ãŒã¯Ambariãä»ããŠãããã€ããå¿ èŠããããæ¢ã«ã¯ã©ã¹ã¿ãŒããããŸãã 圌ã殺ããŸãããïŒ ã ããç§ã¯ããã«æãäžããŸãã
ãã¬ãŒã³ããŒã·ã§ã³ã¹ã©ã€ã ïŒ.pptxãã¡ã€ã«ãžã®æ³šææ·±ããªã³ã¯ïŒ
ãµã€ã³äŒ
äŒè°ã¯ãªã©ã€ãªãŒãäž»å¬ãããã¡ããå±ç€ºããŒã«ã®å ¥ãå£ã«å€§ããªã¹ã¿ã³ããææããŠããŸããã ããã§ã¯ãBig Booksã®æ¬æ°ãªæ¬ãå²åŒäŸ¡æ Œã§è³Œå ¥ããäŒè°ã§èè ãèŠã€ããŠæ¬ã«çœ²åããããšãã§ããŸãã
ãŸãããµã€ã³äŒã®ã¹ã±ãžã¥ãŒã«ã¯ã¹ã¿ã³ãã«ããã£ãŠãããã¬ããŒãéã®é·ãéãèè ã®æããå°æ¥ã®æ¬ã®ãµã€ã³å ¥ãã®æ©æãªãªãŒã¹ãèŽãç©ãšããŠåãåãããšãã§ããŸããã
ãããããã¹ãŠãç¡æã§ããããã«ããããã®åæãªãªãŒã¹ã®æ¬ã¯ããªã圹ã«ç«ãããå Žåã«ãã£ãŠã¯ççŽã«èšã£ãŠåºåã§ããããšãå€æããŸããã
è»èã§é£ã³ãããªãã¯ãåŸæŽãã
ä»ã®ã«ã³ãã¡ã¬ã³ã¹ãšåæ§ã«ãããŒã±ãã£ã³ã°æ åœè ã¯Strataã«åãæ®ãããŠããŸãããããã³ãäžããã«ãé åå šäœã«unningãªsetãä»æããã ãã§ãã
äŒè°ã®éå§åã§ãããçµéšè±å¯ãªäººã ã¯ããã¹ãŠãæ¬è³ªçã«åºåã®ã¿ã§ãããããã¹ã±ãžã¥ãŒã«ã«Xã®ã¹ãã³ãµãŒãä»ããã¬ããŒãã«ã¢ã¯ã»ã¹ããããšã«å¯ŸããŠèŠåããŸããã
ãŸããæ®å¿µãªãããèãæã«å¿ èŠãªæµè¡èªã䜿çšããããšããååãšèª¬æã®å ±åããããŸããããå®éã«ã¯ãã¹ããŒã«ãŒã¯è£œåã宣äŒãã説æã«èšèŒãããæè¡ã«ã€ããŠã¯äžèšãèšããŸããã§ããã
åäŒè°åå è ã¯ãå人çšãã©ã¹ããã¯ã«ãŒããšã€ã³ã¹ããŒã«ãããRFIDã¿ã°ãä»ãããããžãæã¡ãcãªã¹ãã³ãµãŒã¯ãæ¥è¿ãã人ãã¹ãã£ã³ããååãããžãäžããåŸã®ã¿ãåã¹ã¿ã³ãã§ã¹ããã«ãŒãã¬ãããã¿ããã®ä»ã®ãåç£ãé ããŸããã 圌ã®ãã¹ãŠã®ç»é²ããŒã¿ããã®ã¹ãã³ãµãŒãšå ±æããŸããã
äžéšã®ã¹ã¿ã³ãã§ã¯ããªã¹ããŒã«è³ªåã§çããåã«ãããžãã¹ãã£ã³ããããšãã§ããŸããã
ãããã®ããªãã¯ã®ããã«ãç§ã¯ãã¹ãŠã®ã¹ãã³ãµãŒã·ããã®ã¡ãŒã«é ä¿¡ããéäŒããã®ã«ãã°ããæéãè²»ãããªããã°ãªããŸããã§ããã
çµè«ã®ä»£ããã«
Strata + Hadoopã«ã³ãã¡ã¬ã³ã¹ã¯ãäžçã®ããŸããŸãªåœã§å¹Ž5åéå¬ãããŸãã ããšãã°ã次ã®ã€ãã³ãã¯8æäžæ¬ã«å京ã§éå¬ãããŸãã ãã®ã·ãªãŒãºã®äŒè°ã«åºåžãããã©ãããŸã 決å®ããŠããªãå Žåã¯ãããã§æšå¥šããã®ã¯ããªãå°é£ã§ãã
äžæ¹ã§ãäž»ã«ãšã³ãžãã¢ãªã³ã°ã¬ããŒããšç¹å®ã®æè¡ã«é¢å¿ãããå Žåã¯ãããå°éçãªã€ãã³ããã芧ãã ããã ããšãã°ã Spark Summit ã 説æããå€æãããšãå€ãã®ãšã³ãžãã¢ãããŠã圌ãèªèº«ã®å€ãã¯ãå°æ¥ã®éçºã®ããã«èŽè¡ãããã£ãŒãããã¯ãšãæ©èœèŠæ±ããåŸãããšæã£ãŠããŸãã
äžæ¹ãããªã倧ããªBIããŒã ãããå Žåãéåžžã«å¹ åºãã¬ããŒãã®ãããã§ããã¹ãŠã®äººã«ãšã£ãŠå€ãã®èå³æ·±ãããšããããééããªãæéãç¡é§ã«ããããšã¯ãããŸããã ãŸããã€ãã³ãã®éåžžã«å°éçãªçµç¹ãååããªé°å²æ°ãããã³ããŒã¿ãµã€ãšã³ã¹ãæ©æ¢°åŠç¿ãããžãã¹åæã®åéã§äººæ°ã®ãããã¹ãŠã®è£œåã®ã¯ãªãšã€ã¿ãŒãšãã£ããããæ©äŒãåŸãããŸãã
BIãœãããŠã§ã¢ãšã³ãžãã¢Vadim Babaev
BIãœãããŠã§ã¢ãšã³ãžãã¢Valery Starynin