å
å»¶ã°ãã«ããªãããã«ããããã¯ã
Zaumiãªãã®MapReduce ãã§çŽæãããŠããMapReduceã®ä»ã®äŸãããã«èª¬æããŸãã ïŒMapReduceãšã¯äœããå®å
šã«çè§£ããŠããªãå Žåã¯ããŸããã®ãããã¯ãèªãã§ãã ããïŒãããªãã§ã¯çè§£ã§ããŸããïŒ
ããã§ãéœåžã®åœç±ã®èšç®ãåŠçã®å¹³åæçžŸãšãã©ã€ããTICãPageRankãã€ã³ããŠã³ããªã³ã¯ããããããŒã¯ãŒããå矩èªããœãŒã·ã£ã«ãããã¯ãŒã¯ãçžäºã®å人ã«ã€ããŠè©±ããŸãããã æ°åŠèšå·ããã¿ãªãã§ããããšããŸãã
ãã ãããããã¯èªäœã¯è€éã§ãããè³ã«è² æ
ããããå¿
èŠããããŸãã ããªããçè§£ãããšã-ããã¯éåžžã«ç°¡åã«ãªããŸãã
ã€ã³ããŠã³ããªã³ã¯
ã€ã³ã¿ãŒãããããããšããŸãããã ã€ã³ã¿ãŒãããã«ã¯å€éšãžã®ãªã³ã¯ããããŸãã
å
¥ãå£ã«ãã¹ãã€ããŒã«ãã£ãŠåéãããçºä¿¡ãªã³ã¯ã«é¢ããããŒã¿ããããšããŸãã
habrahabr.ru -> thematicmedia.ru, apple.ru, microsoft.com, ubuntu.com, yandex.ru
thematicmedia.ru -> habrahabr.ru, autokadabra.ru
autokadabra.ru -> habrahabr.ru, yandex.ru
ã€ãŸã HabrãAppleãMSãUbuntuãYandexãæãããšãç¥ã£ãŠããŸãããHabrãæãã®ã¯èª°ã§ããïŒ ã¯ãã質åã¯åå§çã§ãããMapReduceã§åè§£ããŸãã ããã«è峿·±ããã®ã«ãªãããã®äŸãå¿
èŠã«ãªããŸãã
Mapã¹ãããïŒç¬ââèªã«äœæïŒã¯æ¬¡ã®ããšãè¡ããŸãã
ïŒããäžåºŠ-å
容ãããããªãå Žåã¯ãã
zaumiã䜿çšããªãMapReduce ãããèªã¿ãã ããïŒã
äŸã®ã //
ãã®åŸã«æ¥ããã®ã¯ãã¹ãŠã³ã¡ã³ãã§ããããããã©ãããæ¥ãã®ãã説æããã ãã§ãMapReduceã¯çŽæ¥é¢é£ããŠããŸããã
map("habrahabr.ru -> thematicmedia.ru, apple.ru, microsoft.com, ubuntu.com")
->
["thematicmedia.ru", "habrahabr.ru"] // thematicmedia.ru habrahabr.ru
["apple.ru", "habrahabr.ru"]
.....
map("thematicmedia.ru -> habrahabr.ru, autokadabra.ru")
->
["habrahabr.ru", "thematicmedia.ru"]
["autokadabra.ru", "thematicmedia.ru"]
....
ãReduceãã¹ãããã¯äœãããŸããããªããªããå
¥åæã«æ¬¡ã®ããã«ãªã£ãŠããããã§ãã
["autokadabra.ru", ["thematicmedia.ru"]]
["habrahabr.ru", ["thematicmedia.ru", "autokadabra.ru"]] <-- ,
....
ããã«çãããããŸã-çä¿¡ãªã³ã¯ã
ã©ã®ãµã€ããTICãŸãã¯PageRankãæäŸããŸããïŒ
ãããTICããªã³ã¯ååŒã®èæ¯ïŒ2010
幎é ïŒã åŸæ¥ã®è©äŸ¡åäœãã¡ã¬ã©ã³ã¯ãšåŒã³ãŸãããïŒTICãPageRankãªã©ã®å¯èœæ§ããããŸã-éãã¯ãããŸããïŒã
ãã®ãããªçè«ãæ³å®ããŠãã ãããåãµã€ãã«ã¯ç¹å®ã®MegaRankããããMegaRankãããçšåºŠãéä¿¡ããããããéä¿¡ããªããã§äžè¬ã«ã殺ããããšãã§ããŸãã æçµãµã€ãã®MegaRankã®ã¿ãç¥ã£ãŠããã®ã§ãæåã®ãã¡ã©ããMegaRankãéä¿¡ããããç¥ãæ¹æ³ã¯ïŒ
ç¹°ãè¿ããŸãããåæããŒã¿ããããŸãããµã€ãAã«ã¯MegaRank 100ãããããµã€ãBãCãDããã®çä¿¡ãªã³ã¯ããããŸãã ãµã€ãBã«ã¯MegaRank 0ãããïŒãã®çè«ãæé€ã§ããªããããMegaRankã¯äžè¯ãªã³ã¯ã«ãã£ãŠã殺ãããããšä»®å®ïŒããµã€ãAãDãEãFãªã©ããã®çä¿¡ãªã³ã¯ããããŸã...
ã¿ã¹ã¯ã次ã®ããã«åè§£ããŸãããµã€ãAã¯BãCãDãã100ã«çãããªããã®ãåãåããŸãããBãCãDã®ããããããµã€ãAã«
33åäœã®MegaRankãäžãããšä»®å®ããŸãã ãµã€ãBã¯MegaRankã匷å¶çµäºããŸãããMegaRank= -500ã§ããããŒãã§ã¯ãªããšæ³å®ããŠããŸãã ãããã£ãŠãMegaRankããæ®ºãã人ã倧å¹
ã«éå°è©äŸ¡ããããšããŸãïŒããã¯ãåºåã®è³ªãäœããæ€çŽ¢ãšã³ãžã³ã«ãã£ãŠæ§ããã«ãããŠãããµã€ãAãšããŸãããïŒã ãµã€ãAãDãEãFã®ãããããããµã€ãBã«-500/4 =
-125åäœã®MegaRankãäžããããšãããããŸãã
çºä¿¡ãªã³ã¯ãçä¿¡ãªã³ã¯ã§æ°ããæ¹æ³-ç§ã¯ãã§ã«äžèšã§èª¬æããŸãããããã§ã«ãããè¡ã£ãŠãããšä»®å®ããŸã...
ãããã£ãŠãå
¥åããŒã¿ããããŸãã
A (100) <- B,C,D // MegaRank=100 B,C,D
B (-500) <- A,D,E
ãããããã¯æ¬¡ã®ããšãè¡ããŸãã
AïŒ100ïŒ<-BãCãDïŒ
B, 100/3 = 33 // B 33 MegaRank B A
C, 33
D, 33
BïŒ-500ïŒ<-AãDãEãFïŒ
A, -500/4 = -125
D, -125
E, -125
F, -125
Reduceã®ã¹ãããã§ã¯æ¬¡ã®ããšãè¡ãããŸãã
A (-125) // "" -125 MegaRank
B (33)
C (33)
D (33, -125) // "D" 33-125 = -92 MegaRank
E (-125)
F (-125)
å床Reduceãå®è¡ãããšãAãEãFãªã©ã®æã䟡å€ã®ãªããµã€ãã§ãããä»ã®ããŒã¿ããªãå ŽåãMegaRankããŒãã®ãµã€ãã®ã¿ãåç
§ããŸãã ããããDã¯ãAãšBã®äž¡æ¹ãæããšããäºå®ã«ããããããããçããããããAãããåªããŠããã
ããã§ãMegaRankã«ãã£ãŠéä¿¡ãããéé ã§ãã¶ãã®
ãªãMapReduceã®äŸã䜿çšããŠïŒäººæ°é ã«ãœãŒãïŒäžŠã¹ãããšãã§ããŸãã
ãã®äŸã¯ãMapReduceãå¿
èŠãªçç±ããããã瀺ããŠããŸã-æ°ååã®ãªã³ã¯ã§åæã«æ°çŸäžã®ãµã€ããåæãã1å°ã®ãã·ã³ã§ãã¡ã¢ãªãéãããŠãããšããåé¡ã«ééããä»ã®ãã·ã³ãæ¥ç¶ããŠäžŠååŠçã匷åã§ããŸã
ïŒãããŠãããªããå°ããåã«-ã¯ããç§ã¯TICã§çè«ããã§ãã¯ããŸãããã¯ããããã¯åäœããŸã:)ç§ã¯äœãšãèå³ã®ãªããã®TICãæã£ãŠããŸãïŒã
ãããã£ãŠããµã€ãããã®çºä¿¡ãªã³ã¯ãšããããã®ãªã³ã¯ãåããããŠãããµã€ãã®ç¹å®ã®ãã©ã¡ãŒã¿ãŒïŒTICãPageRankïŒã®ããŒã¿ã®ã¿ãããããããã®ãã©ã¡ãŒã¿ãŒã«åæ Œãããµã€ããèŠã€ããããšãã§ããŸãã
éœåžã®äººå£ã®å²å
Gorod1ã®Gdettoå°æ¹ã«150人ã®ãã·ã¢äººã190人ã®ãã©ã«ãŒã·äººã3ââ人ã®Udmurtãäœãã§ãããGorod1ã®Tuttoå°æ¹ã«3人ã®ãã·ã¢äººã5人ã®ãã©ã«ãŒã·äººãäœãã§ããéœåžå°åºã«é¢ããæ
å ±ããããšããŸãã
ããã§ã®ã¿ã¹ã¯ã¯ãéœåžã®äººå£ã®å²åãèšç®ããããšã§ãïŒãããã®éœåžã®å°åã§ç¥ã£ãŠããŸãïŒã åé¡ïŒ æ°çŸäžã®å°åºãããŒã¿ã¯éœåžããšã«ã°ã«ãŒãåãããŠãããããã¹ãŠãäžç·ã«ãªã£ãŠã¡ã¢ãªã«åãŸããŸãã...
MapReduceã§éåžžã«ç°¡åïŒ
ãå°å³ãïŒ
map(" Gdetto Gorod1 150 , 190 , 3 ")
->
"gorod1", [ (150, ""), (190, ""), (3, "") ]
map(" Tutto Gorod1 3 , 5 ")
->
"gorod1", [ (3, ""), (5, "") ]
map(" Butto Gorod2 1 ")
->
"gorod2", [ (1, "") ]
ãåæžãïŒ
å
¥ãå£ã§ïŒ
"gorod1", [ (150, ""), (190, ""), (3, ""), (3, ""), (5, "") ]
// , -
"gorod2", [ (1, "") ]
ãããã®å€ãå ç®ããã ãã§ããã§ã«èšæ¶ã«åãŸããåéœåžã®äººã
ã®æ°ã§å²ãããšãã§ããã®ã¯æããã ãšæããŸãã
ãããè¡ãæ¹æ³ã®å¥ã®ãªãã·ã§ã³-ããªãã¯èªåã§èãåºãããšãã§ããŸãã
誰ãå°ãªããšã2人ã®äžè¬çãªå人ãæã£ãŠããŸããïŒ
AãBãCãDãE-ããšãã°ãSocketã«FingersããããœãŒã·ã£ã«ãããã¯ãŒã¯ã®3,000äžäººã®ãŠãŒã¶ãŒã«ãã£ãŠçºæããããšããŸãã æ¬¡ã®ããšãç¥ã£ãŠãããšããŸãïŒ
B, D, E
B A, D, E
C B, E
å°ãªããšã2人ã®å
±éã®å人ããã人ãèŠã€ããã«ã¯ã©ãããã°ããã§ããïŒ
ïŒæšæž¬ïŒïŒMapReduceã§éåžžã«ç°¡åã§ãã ç§ãã¡ã¯ãäžäººäžäººã®å人ã®ã«ããã«ããšã«ãã«ãŒããã©ãŒã¹ãåããæåã®å Žæã«çœ®ããŸãã
ãå°å³ãïŒ
(B,D), A // "" " B D"
(B,E), A
(D,E), A
(A,D), B
(A,E), B
(D,E), B
(B,E), C
ãReduceãã¯åä¿¡ããŸãïŒ2çªç®ã®èŠçŽ ãè€æ°ã®å€ãæã€ãã¹ãŠã®è¡ãéžæãã以å€ã¯äœãè¡ããŸããïŒïŒ
(A,D), (B) // "A,D" "B"
(A,E), (B)
(B,D), (A)
(B,E), (A,C) <-- - "B,E" "" ""
(D,E), (A,B) <--
çãã¯ïŒ
AãšCã«ã¯å
±éã®å人BãEãããŸã
AãšBã«ã¯å
±éã®å人DãEãããŸã
ã§ããïŒ Algortime
OïŒn 2 ïŒã¯ãã¿ã³ã¯ã«ãã人ã«ãšã£ãŠã¯ã倧éã®å
¥åããŒã¿ãããã°ã倪éœãæ¶ããå°ãåã«äœæ¥ãå®äºããå¯èœæ§ãé«ãããšãæå³ããŸãã å®éã«ã¯ããã¡ããåã«ããããäºæ¬¡ã¢ã«ãŽãªãºã ã¯æªãã§ãã
ããŒã¯ãŒãã§ããããªãããã¯ãèŠã€ãã
ããã€ãã®ããŒã¯ãŒãããããšããŸãããïŒ
ãããã®äžã§äœãæžãããšãã§ãããã匷調ããããšããŸãã ãã®ãããªããããããããã¯ã¯ããã®åèªãåå ããå°ãªããšã2ã€ã®ç°ãªãããŒã¯ãŒããæã¡ãããèªäœãããŒã¯ãŒãã§ãããã®ãšèŠãªãããšãã§ããŸãã ããšãã°ããã³ãŒããŒã¡ãŒã«ãŒããšãbabrababrã³ãŒããŒã¡ãŒã«ãŒãã¯ãã³ãŒããŒã¡ãŒã«ãŒãããããã¯ã§ããããã¬ã¬ãŒãžãã¯ããèªäœãããŒã¯ãŒãã§ã¯ãªãããããããã§ã¯ãããŸããã ããã€ã«ãã¯ãããããã§ããããŸããããªããªããããèªäœãé€ããŠãç°ãªãããŒã¯ãŒãããªãããã§ãã ãŸãã©ãïŒ MapReduceãªãã§èŠã€ããæºåã¯ã§ããŸãããïŒ :)
ãããŠããã¯éåžžã«ç°¡åã§ãã åè¡ãåèªã«åå²ããŸãïŒã§ããã°n-gramã«åå²ããŸãããããã¯ããªãèªèº«ã§ãïŒã ãããã®ããããã«ã€ããŠããã®åèªãè¡å
šäœãå ããå Žåã¯ãFULLããçºè¡ããäžéšã®ã¿ãå ããå Žåã¯ãPARTããçºè¡ããŸãã ã€ãŸã äžèšã®äŸããïŒ
ãå°å³ãïŒ
, PART // " "
, PART // " "
, FULL // ""
, FULL
, FULL
, PART
, PART
, PART
, FULL
, FULL
å
¥ãå£ã®ãåæžãã¯ä»¥äžãåãåããŸãïŒ
, (PART, FULL)
, (FULL, PART, PART)
, (PART)
, (PART, FULL, FULL)
, (FULL)
ããã§ãPARTããã³FULLãããå ŽæãèŠãŠãã ããããbabrababrãããgatesããããã³ãcoffeemakersã-ç§ãã¡ãæ¢ããŠããã®ã¯ãããããã§ããã
ä»®åãæ€çŽ¢ãã
ã€ã³ã¿ãŒãããããããšããŸãããã ããã«ã¯å€ãã®ãã¬ãŒãºããããšä»®å®ããŸããããã®äžã«ã倧ããªã³ã³ãã¥ãŒã¿ãŒãè²·ã£ããããé«äŸ¡ãªã³ã³ãã¥ãŒã¿ãŒãè²·ã£ãããªã©ã®ãã¬ãŒãºãããããšã«æ°ã¥ããŸã...ããã§ã¯ããé«äŸ¡ãªããšã倧ããªãã¯æ¬äŒŒå矩èªã ãšæããŸãã Gotooooæ€çŽ¢ãšã³ãžã³ãäžè¯ãµã€ããç Žæ£ãããŸã§ãäžè¯ãµã€ããäœæããŠWebCashãç²åŸããå¿
èŠããããŸãã
åé¡ã¯ãäœååãã®åèªããããååãªã¡ã¢ãªããªãããšã§ãã ã¯ããããã¯ãæšæ¥ã³ã³ãã¥ãŒã¿ãè²·ã£ããããšã倿ãããããããŸããããããŠãæšæ¥ãã¯å
šãã倧ããããªããäžè¬ã«ããã¯äžåºŠå¶ç¶ãããä¹ãè¶ããŸãã...ã©ãããã°ããã§ããïŒ äœ¿çšããã¢ã«ãŽãªãºã ïŒç³ãåãªããç³ãåãªããç³ãåãé€ããŸããçãããæšæž¬ããããšã¯ããã£ãŠããŸãïŒã è¡ããïŒ
å
¥åããŒã¿ãååŸãããããã3ã€ã®åèªãéžæããããã«ãå¿
èŠãªãã®ã ããæ®ããŠãããŸãã
" "
" "
" "
" "
" "
ã¹ããŒãž1ïŒ2ã€ã®ã¹ããŒãžã®ã«ã¹ã±ãŒããå¿
èŠã§ãïŒïŒ
ãå°å³ãïŒ
" * ", "" // " "
" * ", "" // " "
" * ", ""
" * ", ""
" * ", ""
ã¢ã¹ã¿ãªã¹ã¯ã¯åãªãã¢ã¹ã¿ãªã¹ã¯èšå·ã§ãããããã¯ãä»»æã®åèªãããã«ãããããšãæå³ãããããReduceã¯æ¬¡ã®ããã«ãªããŸãã
" * ", ["", "", ""]
// "" ""
" * ", ["", ""]
Reduceã¯äœãããŸããã
ã¹ããŒãž2ïŒã«ã¹ã±ãŒãïŒïŒ
ããããã-å
¥åã¯å°ãé«ãæžããããã®ãåãåãã2çªç®ã®å€ïŒ["big"ã "yesterday"ã "dear"]ïŒããã®åŸ¹åºçãªæ€çŽ¢ã«ãã£ãŠãã¢ãæäŸããŸãã
, // ["", "", ""]
, // ["", "", ""]
, // ["", "", ""]
, // ["", "", ""]
, // ["", "", ""]
, // ["", "", ""]
, // ["", ""]
, // ["", ""]
Reduceã§ã¯ã次ã®ããã«ãªããŸãã
, (, , )
, (, )
, (, , )
ã©ããã«åŒçšç¬Šãããããšã«æ³šæãæã£ãŠã¯ãããŸãããããã«ã¯ç§å¯ã®æå³ã¯ãããŸãããã©ããã«åŒçšç¬Šãä»ããŠãã ããã
çŸåšããReduceãã¯ã«ãã³å
ã®åèªïŒåŒæ°ã®2çªç®ã®èŠçŽ ïŒã®ã¿ãã«ãŠã³ãã§ãããããã¡ã¢ãªã«åãŸãã1åã ãåºçŸããèŠçŽ ãç Žæ£ããŸãã
- (2) // "" - "", 2
- (2)
ã§ããïŒ è€éãOïŒn
2 ïŒåã³ïŒ
åå ãã
çµåããå¿
èŠããã2ã€ã®MapReduceãæ¢ã«ãããšããŸãã MapReduceã®1ã€ã¯ã1ããŒã¹ã®æåž«ã®éèªã®ã¯ã©ã¹ã®çåŸã®å¹³åæçžŸãå¹³åãšèŠãªãã2ã€ç®ã®MapReduceã¯ãæ°çŸã®èŠå¯çœ²ã®ããŒã¿ã«åºã¥ããŠãéå»1幎éã®çåŸã®èŠå¯ãžã®éè»¢åæ°ãèæ
®ãããšããŸã 2ã€ã®MapReduceã1ã€ã®çµæã«ããŒãžããã«ã¯ã©ãããã°ããã§ããïŒã€ãŸããINNER JOINãŸãã¯OUTER JOINãäœæã§ããŸãã
æåã®MapReduceããããã¥ãŒã¹ããïŒã¹ã³ã¢ïŒïŒ
, ("", 3.5)
ãããŠã2çªç®ã®MapReduceãçæããŸããïŒãã©ã€ãïŒïŒ
, ("", 2)
次ã«ãããããã¹ãŠé£ç¶ããŠçŽæ¥éä¿¡ããŸãã
, ("", 3.5)
, ("", 2)
次ã«ããåæžãã§ä»¥äžãåãåããŸãã
, [ ("", 3.5), ("", 2) ]
çµè«ã®ä»£ããã«
MegaRankãããããå人ãå矩èªãåœãäºäººçµãMapReduceã®å¯äžã®äœ¿çšæ³ãšã¯ã»ã©é ãããšã¯æããã§ãããä»ã®ãšãããããã䜿çšããæ¹æ³ã«ã€ããŠèãå§ããã«ã¯ããã§ååã ãšæããŸãã ç§ã¯åœŒã®ããã«å€ãã®ã¢ããªã±ãŒã·ã§ã³ãèŠã€ããŸããããããŠãç§ã¯åžžã«ããã䜿ããŸãã
ãã§ã«è¿°ã¹ãããã«ãã»ãšãã©ãã¹ãŠã®SQLã¯ãšãªã¯MapReduceã§åè§£ããããããå°ããã¬ãŒãã³ã°ããã ãã§ãã ãªãã§ïŒ 次ã«ãé«éåããããã«ãSQLã«å¿
èŠãªé¢æ°ãå¿
ããããã¹ãŠååšãããšã¯éããŸããã ããšãã°ãå
¥åè¡ããã®n-gramïŒåé·ãã¬ãŒãºïŒã®åããžã§ãã¬ãŒã¿ãŒ...ãã¡ããééã£ãŠãããããããŸããããMapReduceããããããããã§ãããéåžžã«æçšã§ããïŒãããŠåæã«ãããã¯ãŒã¯äžã§é©ãã»ã©å·§åŠã«èšè¿°ãããŠããïŒããšã¯äºå®ã§ãã MapReduceãããã¯ãŒã¯ã«ãã©ãã°ã§ããããã«ãªã£ãããšãé¡ã£ãŠããŸãã
æåŸã®ãããã¯ã§ã¯ãPythonããã³PHPã®MapReduceã®ãªãã¡ã¬ã³ã¹å®è£
ããããŸãã
ãšã€ããž
ãã€ãã®ããã«ãHabrãã
2010
ïŒãã€ãç°¡åã«èª¬æããããšãåŠã³ãŸã....ïŒ