ãã®æåã®èšäºã§ã¯ãçŸåšã®ã€ãã³ãé ä¿¡ã·ã¹ãã ãã©ã®ããã«æ©èœãããã説æããããã䜿çšããŠåŠãã ããã€ãã®æèšã«ã€ããŠèª¬æããŸãã 次㫠ãæ°ããã·ã¹ãã ã®äœæãšããã¹ãŠã®ã€ãã³ãã®ãã©ã³ã¹ããŒãã¡ã«ããºã ãšããŠCloud Pub / Subãéžæããçç±ãæ€èšããŸãã 3çªç®ã®æåŸã®èšäºã§ã¯ãDataFlowã䜿çšããŠãã¹ãŠã®ã€ãã³ããåŠçããæ¹æ³ãšããããã©ãã»ã©è¿ éã«çºçãããã«ã€ããŠèª¬æããŸãã
é ä¿¡ã·ã¹ãã ãéããŠé ä¿¡ãããã€ãã³ãã«ã¯å€ãã®çšéããããŸãã 補åèšèšã«ãããåœç€Ÿã®ãœãªã¥ãŒã·ã§ã³ã®ã»ãšãã©ã¯ãA / Bãã¹ãã®çµæã«åºã¥ããŠããããã®çµæã倧èŠæš¡ã§æ£ç¢ºãªããŒã¿ã«äŸåããå¿ èŠããããŸãã 2015幎ã«ãªãªãŒã¹ãããDiscover Weeklyãã¬ã€ãªã¹ãã¯ãããã«Spotifyã®æããã䜿çšãããæ©èœã®1ã€ã«ãªããŸããã é³æ¥œåçããŒã¿ã«åºã¥ããŠäœæãããŸãã é³æ¥œã®å¹Ž ã Spotify Party ããã®ä»ã®å€ãã®Spotifyæ©èœãããŒã¿ããŒã¹ã«åºã¥ããŠããŸãã ããã«ãSpotifyããŒã¿ã¯ã ãã«ããŒãããããã³ã³ãã€ã«ããããã®ãœãŒã¹ã®1ã€ã§ãã
ã¡ãã»ãŒãžã³ã°ã·ã¹ãã ã¯ãSpotifyããŒã¿ã€ã³ãã©ã¹ãã©ã¯ãã£ã®åºæ¬çãªéšåã®1ã€ã§ãã ãã®ããã®éèŠãªèŠä»¶ã¯ãååã«èª¬æãããã€ã³ã¿ãŒãã§ã€ã¹ãä»ããŠãéçºè ãäºæž¬å¯èœãªé 延ãšå¯çšæ§ã§ãã¹ãŠã®ããŒã¿ãé ä¿¡ããããšã§ãã 䜿çšç¶æ³ããŒã¿ã¯ãããæç¹ã§äºåã«èšå®ãããã¢ã¯ã·ã§ã³ãžã®å¿çãšããŠåœ¢æãããæ§é åã€ãã³ãã®ã»ãããšããŠèª¬æã§ããŸãã
Spotifyã䜿çšããã€ãã³ãã®ã»ãšãã©ã¯ãç¹å®ã®ãŠãŒã¶ãŒã¢ã¯ã·ã§ã³ã«å¯Ÿããå¿çãšããŠSpotifyã¯ã©ã€ã¢ã³ãã«ãã£ãŠçŽæ¥çæãããŸãã Spotifyã¯ã©ã€ã¢ã³ãã§ã€ãã³ããçºçãããã³ã«ãSpotifyã²ãŒããŠã§ã€ã®1ã€ã«ã€ãã³ãã«é¢ããæ å ±ãéä¿¡ãããSpotifyã²ãŒããŠã§ã€ã¯ãã®ã€ãã³ããã·ã¹ãã ãã°ã«æžã蟌ã¿ãŸãã ããã§ãã¡ãã»ãŒãžé ä¿¡ã·ã¹ãã ã§äœ¿çšãããã¿ã€ã ã¹ã¿ã³ããå²ãåœãŠãããŸãã ã¡ãã»ãŒãžé ä¿¡ã®ç¹å®ã®é 延ãšå®å šæ§ãä¿èšŒããããã«ããµãŒããŒã«å°çããåã«ã€ãã³ããå¶åŸ¡ã§ããªããããã¯ã©ã€ã¢ã³ãã§ã¯ãªãã€ãã³ãã«ãã°ã©ãã«ïŒsyslogã¿ã€ã ã¹ã¿ã³ãïŒã䜿çšããããšã決å®ãããŸããã
Spotifyã®å Žåããã¹ãŠã®ããŒã¿ãäžå€®ã®Hadoopã¯ã©ã¹ã¿ãŒã«é ä¿¡ããå¿ èŠããããŸãã ããŒã¿ãåéããSpotifyãµãŒããŒã¯ã2ã€ã®å€§éžã®ããã€ãã®ããŒã¿ã»ã³ã¿ãŒã«ãããŸãã ããŒã¿ã»ã³ã¿ãŒéã®åž¯åå¹ ã¯åžå°ãªãªãœãŒã¹ã§ãããããŒã¿äŒéãç¹å¥ãªæ³šæã§æ±ãå¿ èŠããããŸãã
ããŒã¿ã€ã³ã¿ãŒãã§ã€ã¹ã¯ãHadoopå ã®ããŒã¿ã®å Žæãšä¿åããã圢åŒã«ãã£ãŠæ±ºãŸããŸãã ãµãŒãã¹ã«ãã£ãŠé ä¿¡ããããã¹ãŠã®ããŒã¿ã¯ãHDFSã§Avro圢åŒã§èšé²ãããŸãã é ä¿¡ãããããŒã¿ã¯ã60åéïŒæéïŒã»ã¯ã·ã§ã³ïŒããŒãã£ã·ã§ã³ïŒã«åå²ãããŸãã ããã¯ãæåã®ã¡ãã»ãŒãžé ä¿¡ã·ã¹ãã ãscpã³ãã³ãã«åºã¥ããŠãããHadoopäžã®ãã¹ãŠã®ãµãŒããŒããsyslogãã¡ã€ã«ã1æéããšã«ã³ããŒããŠããéå»ã®éºç©ã§ãã çŸåšãSpotifyã®ãã¹ãŠã®ããŒã¿åŠçããã»ã¹ã¯1æéããšã®ããŒã¿ã«åºã¥ããŠããããããã®ã€ã³ã¿ãŒãã§ãŒã¹ã¯è¿ãå°æ¥ã«æ®ãã§ãããã
Spotifyã®ã»ãšãã©ã®ããŒã¿ããã»ã¹ã¯ãç£èŠã¢ã»ã³ããªãã1åã ãããŒã¿ãèªã¿åããŸãã äžéšã®ããã»ã¹ã®åºåå€ã¯ãä»ã®ããã»ã¹ãžã®å ¥åãšããŠæ©èœãããããå€æã®é·ããã§ãŒã³ã圢æããŸãã ããã»ã¹ã1æéããŒã¿ãåŠçããåŸããã®å ã®æéã«å€æŽã®ãã§ãã¯ãå®è¡ããªããªããŸãã ããŒã¿ãå€æŽãããå Žåããããã®å€æŽãããã«åçŸããå¯äžã®æ¹æ³ã¯ããã®ç¹å®ã®ééïŒæéïŒã§ãã¹ãŠã®é¢é£ã¿ã¹ã¯ïŒããã³é¢é£ã¿ã¹ã¯ïŒãæåã§åèµ·åããããšã§ãã ããã¯é«äŸ¡ã§æéã®ãããããã»ã¹ã§ãããã®ãããã¡ãã»ãŒãžé ä¿¡ãµãŒãã¹ã«ãã®ãããªèŠä»¶ãæ瀺ããæéã»ãããæäŸããåŸããã®ããŒã¿ãè£è¶³ã§ããªããªããŸããã ããŒã¿ã®å®å šæ§ã®åé¡ãšããŠç¥ããããã®åé¡ã¯ãããŒã¿åŠçã®é 延ãæå°éã«æãããšããèŠä»¶ã«åããŠããŸãã ããŒã¿ã®å®å šæ§ã®åé¡ã«é¢ããèå³æ·±ãèŠç¹ã¯ãGoogleã®Dataflowã¬ããŒãã«ç€ºãããŠããŸãã
ãªãªãžãã«ã®ã¡ãã»ãŒãžé ä¿¡ã·ã¹ãã
ã·ã¹ãã æ§æ
æåã®ã¡ãã»ãŒãžã³ã°ã·ã¹ãã ã¯ãKafka 0.7ã®äžã«æ§ç¯ãããŸããã
ãã®äžã§ãã€ãã³ãé ä¿¡ã·ã¹ãã ã¯ã1æéããšã®ãã¡ã€ã«ã®æœè±¡åãäžå¿ã«æ§ç¯ãããŠããŸãã ãµãŒãã¹ãã·ã³ããã®ã€ãã³ããå«ããã°ãã¡ã€ã«ãHDFSã«ã¹ããªãŒãã³ã°ããããã«èšèšãããŠããŸãã ç¹å®ã®æéå ã«ãã¹ãŠã®ãã°ãã¡ã€ã«ãHDFSã«è»¢éãããåŸãã¿ãä»ãã®ããã¹ãããAvro圢åŒã«å€æãããŸãã
ã·ã¹ãã ãæåã«äœæããããšããKafka 0.7ã«æ¬ ããŠããæ©èœã®1ã€ã¯ãKafka Brokerã¯ã©ã¹ã¿ãŒãä¿¡é Œæ§ã®é«ãæ°žç¶ã¹ãã¬ãŒãžãæäœã§ããããšã§ããã ããã¯ãããŒã¿ãããã¥ãŒãµãŒãKafka Syslogãããã¥ãŒãµãŒãããã³Hadoopã®éã§äžå®ã®ç¶æ ãç¶æããªããšããéèŠãªèšèšæ±ºå®ã«åœ±é¿ãäžããŸããã ã€ãã³ãã¯ãHDFSäžã®ãã¡ã€ã«ã«æžã蟌ãŸãããšãã«ã®ã¿å®å šã«ä¿åããããšèŠãªãããŸãã
Hadoopå ã§ã®ã¿ã€ãã³ãã確å®ã«ååšããåé¡ã¯ãHadoopã¯ã©ã¹ã¿ãŒãã¡ãã»ãŒãžé ä¿¡ã·ã¹ãã ã®åäžé害ç¹ã«ãªãããšã§ãã Hadoopã倱æãããšãé ä¿¡ã·ã¹ãã å šäœãåæ¢ããŸãã ããã«å¯ŸåŠããã«ã¯ãã€ãã³ããåéãããã¹ãŠã®ãµãŒãã¹ã«ååãªãã£ã¹ã¯å®¹éãããããšã確èªããå¿ èŠããããŸãã HadoopããµãŒãã¹ã«åŸ©åž°ãããããã®ç¶æ ã«ãè¿œãã€ããå¿ èŠãããããã¹ãŠã®ããŒã¿ãã§ããã ãæ©ã転éããŸãã 埩æ§æéã¯ãäž»ã«ããŒã¿ã»ã³ã¿ãŒéã§äœ¿çšã§ãã垯åå¹ ã«ãã£ãŠå¶éãããŸãã
ãããã¥ãŒãµãŒã¯ãHadoopã«ã€ãã³ããéä¿¡ãããã¹ãŠã®ãã¹ãã§å®è¡ãããããŒã¢ã³ã§ãã ãã°ãã¡ã€ã«ã远跡ãããã°ããã±ãŒãžãKafka Syslog Consumerã«éä¿¡ããŸãã ãããã¥ãŒãµãŒã¯ãã€ãã³ãã®ã¿ã€ããã€ãã³ãã®ããããã£ã«ã€ããŠäœãç¥ããŸããã 圌ã®èŠ³ç¹ããèŠããšãã€ãã³ãã¯ãã¡ã€ã«å ã®äžé£ã®è¡ã§ããããã¹ãŠã®è¡ã¯åããã£ãã«ã«ãªãã€ã¬ã¯ããããŸãã ã€ãŸãã1ã€ã®ãã°ãã¡ã€ã«ã«å«ãŸãããã¹ãŠã®ã¿ã€ãã®ã€ãã³ãã1ã€ã®ãã£ãã«ãä»ããŠéä¿¡ãããŸãã ãã®ãããªã·ã¹ãã ã§ã¯ãKafkaãããã¯ã¯ã€ãã³ããéä¿¡ããããã®ãã£ãã«ãšããŠäœ¿çšãããŸãã ãããã¥ãŒãµãŒããã°ãã³ã³ã·ã¥ãŒããŒã«éä¿¡ããåŸãã³ã³ã·ã¥ãŒããŒããã°è¡ãHDFSã«æ£åžžã«ä¿åããããšã®ç¢ºèªïŒACKïŒãåŸ ã€å¿ èŠããããŸãã ãããã¥ãŒãµãŒã¯ãéä¿¡ããããã°ã®ACKãåä¿¡ããåŸã«ã®ã¿ãããããå®å šã«ä¿åãããŠãããšèããä»ã®ã¬ã³ãŒãã転éããŸãã
ã€ãã³ãã®å Žåããããã¥ãŒãµãŒããã³ã³ã·ã¥ãŒããŒã«å°éããã«ã¯ãKafka BrokersãééããŠãããKafka Groupersãééããå¿ èŠããããŸãã Kafka Brokersã¯Kafkaã®æšæºã³ã³ããŒãã³ãã§ãããKafka Groupersã¯ç§ãã¡ãæžããã³ã³ããŒãã³ãã§ãã ã°ã«ãŒããŒã¯ãããŒã«ã«ããŒã¿ã»ã³ã¿ãŒããã®ãã¹ãŠã®ã€ãã³ãã¹ããªãŒã ãåŠçããããããåã³å§çž®ããŠå ¬éãã1ã€ã®ãããã¯ã«å¹æçã«ã°ã«ãŒãåããã³ã³ã·ã¥ãŒããŒããã«ããŸãã
æœåºãå€æãããŒãïŒETLïŒã¿ã¹ã¯ã¯ãã¿ãã§åºåãããåçŽãªåœ¢åŒã®ããŒã¿ãAvro圢åŒã«å€æããããã«äœ¿çšãããŸãã ãã®ããã»ã¹ã¯éåžžã®Hadoop MapReduceãžã§ãã§ããã Crunchãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠå®è£ ããã1æéããšã®ã»ããã§åäœããŸãã ç¹å®ã®æéã«äœæ¥ãéå§ããåã«ã圌ã¯ãã¹ãŠã®ãã¡ã€ã«ãå®å šã«è»¢éãããããšã確èªããå¿ èŠããããŸãã
ãã¹ãŠã®ãããã¥ãŒãµãŒã¯ããã¡ã€ã«ã®çµããããŒã¯ã³ãå«ãå¯èœæ§ã®ãããã§ãã¯ããŒã¯ãåžžã«éä¿¡ããŠããŸãã Producerã¯ããã¡ã€ã«å šäœãå®å šã«Hadoopã«ä¿åãããŠãããšProducerãå€æãããšãã«1åã ãéä¿¡ãããŸãã ç¶æ ïŒãŸãã¯ããµãã€ãããªãã£ãïŒã¢ãã¿ãŒã¯ãç¹å®ã®æéã«ãµãŒãã¹ãã·ã³ãå®è¡ãããŠãããã¹ãŠã®ããŒã¿ã»ã³ã¿ãŒã®ãµãŒãã¹æ€åºã·ã¹ãã ãåžžã«ããŒãªã³ã°ããŸãã ãã®1æéã§ãã¹ãŠã®ãã¡ã€ã«ãæçµçã«è»¢éããããã©ããã確èªããããã«ãETLã¯ãããŒã¿ã®çµãããäºæ³ããããµãŒããŒã«é¢ããæ å ±ããã¡ã€ã«ã®çµããããŒã«ãŒãšæ¯èŒããŸãã ETLãäžäžèŽãšäžå®å šãªããŒã¿è»¢éãå€æããå ŽåãETLã¯ããŒã¿ã®åŠçãç¹å®ã®æéé 延ãããŸãã
äžè¬çãªHadoop MapReduceã¿ã¹ã¯ã§ããæ¢åã®ããããŒããã³ã¬ãã¥ãŒãµãŒã§ããETLãæ倧éã«æŽ»çšã§ããããã«ããã«ã¯ãå ¥åããŒã¿ãã·ã£ãŒãã£ã³ã°ããæ¹æ³ãç¥ãå¿ èŠããããŸãã ããããŒãšã¬ãã¥ãŒãµãŒã¯ãå ¥åããŒã¿ã®ãµã€ãºã«åºã¥ããŠèšç®ãããŸãã æé©ãªã·ã£ãŒãã£ã³ã°ã¯ãã³ã³ã·ã¥ãŒããŒããç¶ç¶çã«åä¿¡ãããã€ãã³ãã®æ°ã«åºã¥ããŠèšç®ãããŸãã
ã¬ãã¹ã³
ãã®èšèšã«é¢é£ããäž»ãªåé¡ã®1ã€ã¯ãããŒã«ã«ã®ãããã¥ãŒãµãŒãããŒã¿ã確å®ã«é ä¿¡ããããšèŠãªãããåã«ãäžå€®ã®å Žæã®HDFSã«ä¿åãããããšã確èªããå¿ èŠãããããšã§ãã ããã¯ãç±³åœè¥¿æµ·å²žã®ProducerãµãŒããŒãããã³ãã³ã®ãã£ã¹ã¯ã«ããŒã¿ãæžã蟌ãŸããŠããããšãç¥ãå¿ èŠãããããšãæå³ããŸãã ã»ãšãã©ã®å Žåãæ£åžžã«æ©èœããŸãããããŒã¿è»¢éãé ããªããšãé ä¿¡ã®é 延ãçºçãããããåãé€ãã®ãå°é£ã«ãªããŸãã
ãµãŒãã¹ãã€ã³ããããŒã«ã«ããŒã¿ã»ã³ã¿ãŒã«ããå Žåã®ãªãã·ã§ã³ãšæ¯èŒããŠãã ããã éåžžãããŒã¿ã»ã³ã¿ãŒå ã®ãã¹ãéã®ãããã¯ãŒã¯ã¯éåžžã«ä¿¡é Œæ§ãé«ããããããã«ãããããã¥ãŒãµãŒã®èšèšãç°¡çŽ åãããŸãã
åé¡ãèŠçŽãããšãäžçäžã®1ç§ããã700,000ãè¶ ããã€ãã³ãã確å®ã«é ä¿¡ã§ããã·ã¹ãã ã«éåžžã«æºè¶³ããŠããŸãã ã·ã¹ãã ã®åèšèšã«ããããœãããŠã§ã¢éçºããã»ã¹ãæ¹åããæ©äŒãäžããããŸããã
1ã€ã®ãã£ãã«ãä»ããŠãã¹ãŠã®ã€ãã³ããäžç·ã«éä¿¡ããããšã«ãããç°ãªãQuality of ServiceïŒQoSïŒã§ã€ãã³ããããŒã管çããæè»æ§ã倱ãããŸããã ãŸãããªã¢ã«ã¿ã€ã ã§åäœããããã»ã¹ã¯ãã¹ããªãŒã å šäœãå ¥ã£ãŠããåäžã®ãã£ãã«ãä»ããŠããŒã¿ã転éããããããå¿ èŠãªãã®ã®ã¿ãé€å€ããå¿ èŠãããããããªã¢ã«ã¿ã€ã ã§ã®äœæ¥ãå¶éãããŠããŸããã
éæ§é åããŒã¿ã®è»¢éã§ã¯ãè¿œå ã®ETLå€æãå¿ èŠã«ãªããããäžèŠãªé 延ãè¿œå ãããŸãã çŸåšãETLäœæ¥ã«ãããã€ãã³ãé ä¿¡ã«çŽ30åã®é 延ãè¿œå ãããŸãã ããŒã¿ãAvro圢åŒã§éä¿¡ãããå ŽåãHDFSã§ã®èšé²æã«ããã«å©çšã§ããŸããã
éä¿¡è ãæéã®çµããã远跡ããå¿ èŠãåé¡ãåŒãèµ·ãããŸããã ããšãã°ããã·ã³ãåæ¢ããå Žåããã¡ã€ã«ã®çµããã«é¢ããã¡ãã»ãŒãžãéä¿¡ã§ããŸããã ãã¡ã€ã«ã®çµããããŒã«ãŒã倱ãããå Žåããã®ããã»ã¹ãæåã§äžæããããŸã§æ°žé ã«åŸ æ©ããŸãã è»ã®æ°ãå¢ãããšããã®åé¡ã¯ãŸããŸãç·æ¥ã«ãªããŸãã
次ã®ã¹ããã
Spotifyã§é ä¿¡ãããã¡ãã»ãŒãžã®æ°ã¯åžžã«å¢å ããŠããŸãã è² è·ãå¢å ããçµæãããå€ãã®åé¡ãçºçãå§ããŸããã æéãçµã€ã«ã€ããŠãåæ¢ã®æ°ã¯ç§ãã¡ãå¿é ãå§ããŸããã ç§ãã¡ãã·ã¹ãã ããå¢å ããè² è·ã«å¯ŸåŠã§ããªãããšã«æ°ä»ããŸããã 次ã®èšäºã§ ãã·ã¹ãã ã®å€æŽã決å®ããæ¹æ³ã«ã€ããŠèª¬æããŸãã
ç¹å®ã®æç¹ã§ã·ã¹ãã ã«ãã£ãŠåŠçãããã¡ãã»ãŒãžã®æ°ã