ã¯ããã«
æå°å šåæšïŒMST-æå°å šåæšïŒã®æ€çŽ¢ã®åé¡ã解決ããããšã¯ãããŸããŸãªç 究åéã«ãããäžè¬çãªã¿ã¹ã¯ã§ãïŒããŸããŸãªãªããžã§ã¯ãã®èªèãã³ã³ãã¥ãŒã¿ãŒããžã§ã³ããããã¯ãŒã¯ã®åæãšæ§ç¯ïŒé»è©±ãé»æ°ãã³ã³ãã¥ãŒã¿ãŒãéè·¯ãªã©ïŒãååŠãããŠçç©åŠãšä»ã®å€ãã®ã ãã®åé¡ã解決ããå°ãªããšã3ã€ã®æåãªã¢ã«ãŽãªãºã ããããŸãïŒBoruvkiãKruskalããã³Primaã 倧ããªã°ã©ãïŒæ°GBãå ããïŒã®åŠçã¯ãäžå€®åŠçè£ çœ®ïŒCPUïŒã«ãšã£ãŠããªãæéããããã¿ã¹ã¯ã§ãããçŸæç¹ã§ã¯éèŠããããŸãã CPUãããã¯ããã«åªããããã©ãŒãã³ã¹ãçºæ®ã§ããã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ïŒGPUïŒãæ®åããŠããŸãã ããããå€ãã®ã°ã©ãåŠçã¿ã¹ã¯ãšåæ§ã«ãMSTã¿ã¹ã¯ã¯GPUã¢ãŒããã¯ãã£ã«ããŸãé©åããŸããã ãã®èšäºã§ã¯ãGPUã§ã®ãã®ã¢ã«ãŽãªãºã ã®å®è£ ã«ã€ããŠèª¬æããŸãã ãŸããCPUã䜿çšããŠã1ã€ã®ããŒãïŒGPUãšè€æ°ã®CPUã§æ§æãããïŒã®å ±æã¡ã¢ãªäžã§ãã®ã¢ã«ãŽãªãºã ã®ãã€ããªããå®è£ ãæ§ç¯ããæ¹æ³ã瀺ããŸãã
ã°ã©ã衚瀺圢åŒã®èª¬æ
å°æ¥çã«èšåãããå€æããããããç¡åå éã°ã©ãã®ã¹ãã¬ãŒãžæ§é ãç°¡åã«æ€èšããŸãã ã°ã©ãã¯ãå§çž®CSRïŒå§çž®ã¹ããŒã¹è¡ïŒ [1]圢åŒã§èšå®ãããŸãã ãã®åœ¢åŒã¯ãçè¡åãšã°ã©ãã®ä¿åã«åºã䜿çšãããŠããŸãã Nåã®é ç¹ãšMåã®ãšããžãæã€ã°ã©ãã®å ŽåãXãAãããã³Wã®3ã€ã®é åãå¿ èŠã§ãããµã€ãºN + 1ã®é åXãä»ã®2ã€ã¯2 * Mã§ããé ç¹ã®ä»»æã®ãã¢ã®ç¡åã°ã©ãã§ã¯ãçŽæ¥ã¢ãŒã¯ãšéã¢ãŒã¯ãä¿åããå¿ èŠãããããã§ãã é åXã¯ãé åAã«æ ŒçŽãããŠãããã€ããŒã®ãªã¹ãã®æåãšæåŸãæ ŒçŽããŸããã€ãŸããé ç¹Jã®ãã€ããŒã®ãªã¹ãå šäœã¯ãã€ã³ããã¯ã¹X [J]ããX [J + 1]ãŸã§ã®é åAã«ãããŸãã é ç¹Jããã®åãšããžã®éã¿ã¯ãåæ§ã®ã€ã³ããã¯ã¹ã«ãã£ãŠä¿åãããŸãã説æã®ããã«ãäžã®å³ã¯ãé£æ¥è¡åãšå³åŽã®CSR圢åŒã䜿çšããŠèšè¿°ããã6ã€ã®é ç¹ã®ã°ã©ãã瀺ããŠããŸãïŒç°¡åã«ãããããåãšããžã®éã¿ã¯ç€ºããŠããŸããïŒã
ãã¹ãæžã¿ã°ã©ã
å€æã¢ã«ãŽãªãºã ãšMSTã¢ã«ãŽãªãºã ã®èª¬æã«ã¯åé¡ã®ã°ã©ãã®æ§é ã®ç¥èãå¿ èŠã«ãªãããããã¹ããè¡ââãããã°ã©ãã«ã€ããŠããã«èª¬æããŸãã å®è£ ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«ãRMATã°ã©ããšSSCA2ã°ã©ãã®2çš®é¡ã®åæã°ã©ãã䜿çšãããŸãã R-MAT-ã°ã©ãã¯ããœãŒã·ã£ã«ãããã¯ãŒã¯ãã€ã³ã¿ãŒããã[2]ããã®å®éã®ã°ã©ããããŸãã¢ãã«åããŸãã ãã®å Žåãé ç¹32ã®å¹³åé£çµåºŠãæã€RMATã°ã©ããæ€èšããé ç¹ã®æ°ã¯2ã®ã¹ãä¹ã§ãã ãã®ãããªRMATã°ã©ãã«ã¯ã1ã€ã®å€§ããªé£çµã³ã³ããŒãã³ããšãå€æ°ã®å°ããªé£çµã³ã³ããŒãã³ããŸãã¯åãäžãã£ãé ç¹ããããŸãã SSCA2-graphã¯ããšããžã§äºãã«æ¥ç¶ãããç¬ç«ããã³ã³ããŒãã³ãã®å€§ããªã»ããã§ã[3] ã SSCA2ã°ã©ãã¯ãé ç¹ã®å¹³åæ¥ç¶åºŠã32ã«è¿ãããã®é ç¹ã®æ°ã2ã®ã¹ãä¹ã«ãªãããã«çæãããŸãã ãããã£ãŠã2ã€ã®å®å šã«ç°ãªãã°ã©ããèæ ®ãããŸãã
å ¥åå€æ
ã¢ã«ãŽãªãºã ã®ãã¹ãã¯ããžã§ãã¬ãŒã¿ãŒã䜿çšããŠååŸãããã°ã©ãRMATããã³SSCA2ã§å®è¡ããããããã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãæ¹åããã«ã¯ããã€ãã®å€æãå¿ èŠã§ãã ãã¹ãŠã®ã³ã³ããŒãžã§ã³ã¯ããã©ãŒãã³ã¹èšç®ã«å«ââãŸããŸããã
- é ç¹ã®ãªã¹ããããŒã«ã«ã§äžŠã¹æ¿ãã
åé ç¹ã«ã€ããŠãéã¿ã®é£æ¥ãªã¹ããæé ã§äžŠã¹æ¿ããŸãã ããã«ãããã¢ã«ãŽãªãºã ã®åå埩ã§ã®æå°ãšããžã®éžæãéšåçã«ç°¡çŽ åãããŸãã ãã®ãœãŒãã¯ããŒã«ã«ã§ãããããåé¡ã®å®å šãªè§£æ±ºçãæäŸããŸããã
- ã°ã©ãã®ãã¹ãŠã®é ç¹ã®çªå·ãä»ãçŽã
æãæ¥ç¶ãããé ç¹ãæãè¿ãçªå·ãæã€ããã«ãã°ã©ãã®é ç¹ã«çªå·ãä»ããŸãã ãã®æäœã®çµæãæ¥ç¶ãããåã³ã³ããŒãã³ãã§ãæ倧é ç¹æ°ãšæå°é ç¹æ°ã®å·®ãæå°ã«ãªããå°ããªã°ã©ãã£ãã¯ã¹ããã»ã¹ãã£ãã·ã¥ãæ倧éã«æŽ»çšã§ããŸãã RMATã°ã©ãã®å Žåããã®æé©åãé©çšããåŸã§ããã®ã°ã©ãã«ã¯ãã£ãã·ã¥ã«åãŸããªãéåžžã«å€§ããªã³ã³ããŒãã³ãããããããRMATã°ã©ãã®å Žåããã®çªå·ä»ãã®å€æŽã¯å€§ããªåœ±é¿ãäžããŸããã SSCA2ã°ã©ãã®å Žåããã®ã°ã©ãã«ã¯å°ããªã³ã³ããŒãã³ããå€æ°ããããããã®å€æã®å¹æã¯ããé¡èã§ãã
- ã°ã©ãã®éã¿ãæŽæ°ã«ãããã³ã°ãã
ãã®åé¡ã§ã¯ãã°ã©ãã®éã¿ã«å¯ŸããŠæäœãå®è¡ããå¿ èŠã¯ãããŸããã 2ã€ã®rib骚ã®ééãæ¯èŒã§ããå¿ èŠããããŸãã ãããã®ç®çã«ã¯ãGPUã§ã®å粟床æ°å€ã®åŠçé床ã2åããã¯ããã«éããããå粟床æ°å€ã®ä»£ããã«æŽæ°ã䜿çšã§ããŸãã ãã®å€æã¯ãäžæã®ãšããžã®æ°ã2 ^ 32ïŒç¬Šå·ãªãintã«åãŸãç°ãªãæ°ã®æ倧æ°ïŒãè¶ ããªãã°ã©ãã«å¯ŸããŠå®è¡ã§ããŸãã åé ç¹ã®å¹³åæ¥ç¶åºŠã32 mã®å Žåããã®å€æã䜿çšããŠåŠçã§ããæ倧ã®ã°ã©ãã¯2 ^ 28åã®é ç¹ãæã¡ãã¡ã¢ãªã§64 GBãå æããŸãã çŸåšãŸã§ãã¢ã¯ã»ã©ã¬ãŒã¿NVidia Tesla k40 [4] / NVidia Titan X [5]ããã³AMD FirePro w9100 [6]ã®æ倧ã¡ã¢ãªéã¯ããããã12GBããã³16GBã§ãã ãããã£ãŠããã®å€æã䜿çšããåäžã®GPUã§ãéåžžã«å€§ããªã°ã©ããåŠçã§ããŸãã
- é ç¹å§çž®
ãã®å€æã¯ãæ§é ãåå ã§SSCA2ã°ã©ãã«ã®ã¿é©çšãããŸãã ãã®ã¿ã¹ã¯ã§ã¯ããã¹ãŠã®ã¬ãã«ã®ã¡ã¢ãªããã©ãŒãã³ã¹ã決å®çãªåœ¹å²ãæãããŸããã°ããŒãã«ã¡ã¢ãªãã1次ãã£ãã·ã¥ãŸã§ã§ãã ã°ããŒãã«ã¡ã¢ãªãšL2ãã£ãã·ã¥éã®ãã©ãã£ãã¯ãæžããããã«ãé ç¹æ å ±ãå§çž®åœ¢åŒã§ä¿åã§ããŸãã æåã«ãé ç¹ã«é¢ããæ å ±ã¯2ã€ã®é åã®åœ¢åŒã§è¡šç€ºãããŸããé åXã¯ãé åAã®é£æ¥ãªã¹ãã®éå§ãšçµäºãæ ŒçŽããŸãïŒ1ã€ã®é ç¹ã®ã¿ã®äŸïŒã
é ç¹Jã«ã¯10åã®ãã€ããŒé ç¹ããããåãã€ããŒã®æ°ãunsigned intåã䜿çšããŠæ ŒçŽãããå ŽåãJé ç¹ã®ãã€ããŒã®ãªã¹ããæ ŒçŽããã«ã¯10 * sizeofïŒunsigned intïŒãã€ããå¿ èŠã§ã2 * M * sizeofïŒã°ã©ãå šäœã«å¯ŸããŠç¬Šå·ãªãïŒ intïŒãã€ãã sizeofïŒunsigned intïŒ= 4ãã€ããsizeofïŒunsigned shortïŒ= 2ãã€ããsizeofïŒunsigned charïŒ= 1ãã€ããšä»®å®ããŸãã 次ã«ããã®é ç¹ã«ã€ããŠãé£æ¥ãªã¹ããä¿åããããã«40ãã€ããå¿ èŠã§ãã
ãã®ãªã¹ãã®é ç¹ã®æ倧æ°ãšæå°æ°ã®å·®ã8ã§ããããã®æ°ãæ ŒçŽããããã«å¿ èŠãªã®ã¯4ãããã ãã§ããããšã«æ°ä»ãã®ã¯é£ãããããŸããã é ç¹ã®æ倧æ°ãšæå°æ°ã®å·®ãunsigned intããå°ããå¯èœæ§ããããšããèæ ®äºé ã«åºã¥ããŠã次ã®ããã«åé ç¹ã®æ°ãè¡šãããšãã§ããŸãã
base_J + 256 * k + short_endVã
base_Jã¯ãããšãã°ãé£æ¥ãªã¹ãå šäœã®æå°é ç¹çªå·ã§ãã ãã®äŸã§ã¯ã1ã«ãªããŸãããã®å€æ°ã¯unsigned intåã§ãããã°ã©ãå ã®é ç¹ãšåãæ°ã®å€æ°ããããŸãã 次ã«ãé ç¹çªå·ãšéžæããããŒã¹ã®å·®ãèšç®ããŸãã æå°ã®ããŒã¯ãããŒã¹ãšããŠéžæããããããã®å·®ã¯åžžã«æ£ã«ãªããŸãã SSCA2ã®å Žåããã®å·®ã¯ç¬Šå·ãªãã·ã§ãŒãã«çœ®ãããŸãã short_endVã¯ã256ã§é€ç®ããæ®ãã®éšåã§ãããã®å€æ°ãæ ŒçŽããã«ã¯ãunsigned charåã䜿çšããŸãã k㯠256ã§å²ã£ãæŽæ°éšåã§ããkã®å Žåã2ããããéžæããŸãïŒã€ãŸããkã¯0ã3ã®ç¯å²ã«ãããŸãïŒã åé¡ã®ã°ã©ãã«ã¯ãéžæããè¡šçŸã§ååã§ãã ãããè¡šçŸã§ã¯ã次ã®ããã«ãªããŸãã
ãããã£ãŠãé ç¹ã®ãªã¹ããæ ŒçŽããã«ã¯ããã®äŸã§ã¯ã40ãã€ãã§ã¯ãªãïŒ1 + 0.25ïŒ* 10 + 4 = 16.5ãã€ããå¿ èŠã§ãããã°ã©ãå šäœã§ã¯ïŒ2 * M + 4 * N + 2 * M / 4 ïŒã®ä»£ããã«2 * M *4ãN= 2 * M / 32ã®å Žåãåèšé³éã¯
ïŒ8 * MïŒ/ïŒ2 * M + 8 * M / 32 + 2 * M / 4ïŒ = 2.9å
ã¢ã«ãŽãªãºã ã®äžè¬çãªèª¬æ
MSTã¢ã«ãŽãªãºã ãå®è£ ããããã«ãBoruwkaã¢ã«ãŽãªãºã ãéžæãããŸããã Boruwkaã¢ã«ãŽãªãºã ã®åºæ¬çãªèª¬æãšãã®å埩ã®èª¬æã¯ããã[7]ã§ååã«èª¬æãããŠããŸãã
ã¢ã«ãŽãªãºã ã«ããã°ããã¹ãŠã®é ç¹ã¯æåã«æå°ããªãŒã«å«ãŸããŸãã 次ã«ã次ã®æé ãå®äºããå¿ èŠããããŸãã
- åŸç¶ã®çµåã®ããã«ããã¹ãŠã®ããªãŒéã®æå°ãšããžãèŠã€ããŸãã ãã®ã¹ãããã§ãšããžãéžæãããŠããªãå Žåãåé¡ã«å¯ŸããçããåãåãããŸã
- äžèŽããããªãŒãããŒãžããŸãã ãã®ã¹ãããã¯2ã€ã®æ®µéã«åãããŠããŸãã2ã€ã®ããªãŒãããŒãžã®åè£ãšããŠçžäºã«ç€ºãããšãã§ããããããµã€ã¯ã«ãåé€ããããšãšãçµåããããµãããªãŒãå«ãããªãŒã®æ°ãéžæãããå Žåã®ããŒãžã¹ãããã§ãã æ確ã«ããããã«ãæå°æ°ãéžæããŸãã ããŒãžäžã«ããªãŒã1ã€ããæ®ã£ãŠããªãå Žåãåé¡ã«å¯Ÿããçããè¿ãããŸãã
- çµæã®ããªãŒã«çªå·ãä»ãçŽããŠæåã®ã¹ãããã«é²ã¿ãŸãïŒãã¹ãŠã®ããªãŒã«0ããkãŸã§ã®çªå·ãä»ããããããã«ïŒ
ã¢ã«ãŽãªãºã ã®æ®µé
äžè¬ã«ãå®è£ ãããã¢ã«ãŽãªãºã ã¯æ¬¡ã®ãšããã§ãã
ã¢ã«ãŽãªãºã å šäœã®æãéã¯2ã€ã®å Žåã«çºçããŸãïŒNåã®å埩åŸã®ãã¹ãŠã®é ç¹ã1ã€ã®ããªãŒã«çµåãããå ŽåããŸãã¯åããªãŒããæå°ãšããžãèŠã€ããããšãã§ããªãå ŽåïŒãã®å Žåãæå°ã¹ããã³ã°ããªãŒãæ€åºãããŸãïŒã
1.æå°ãšããžãæ€çŽ¢ããŸãã
ãŸããã°ã©ãã®åé ç¹ãåå¥ã®ããªãŒã«é 眮ãããŸãã 次ã«ãäžèšã®4ã€ã®æé ã§æ§æãããããªãŒãçµåããå埩ããã»ã¹ãçºçããŸãã æå°ãšããžãèŠã€ããæé ã«ãããæå°ã¹ããã³ã°ããªãŒã«å«ãŸãããšããžãæ£ç¢ºã«éžæã§ããŸãã äžèšã®ããã«ããã®æé ã®å ¥åæã«ãå€æãããã°ã©ãã¯CSR圢åŒã§ä¿åãããŸãã ãšããžã¯é£æ¥ãªã¹ãã®éã¿ã§éšåçã«ãœãŒããããŠãããããæå°é ç¹ãéžæãããšãé£æ¥ãªã¹ãã衚瀺ããå¥ã®ããªãŒã«å±ããæåã®é ç¹ãéžæããããšã«ãªããŸãã ã°ã©ãã«ã«ãŒãããªããšä»®å®ããå Žåãã¢ã«ãŽãªãºã ã®æåã®ã¹ãããã§ãæå°é ç¹ãéžæãããšãèæ ®äžã®åé ç¹ã®é£æ¥ãªã¹ãããæåã®é ç¹ãéžæãããŸããããã¯ãé£æ¥é ç¹ã®ãªã¹ãïŒã°ã©ãã®ãšããžãšèæ ®äžã®é ç¹ãå«ãïŒãéã¿ã®å¢å ã«ãã£ãŠãœãŒããããããã§ããšããžãšåé ç¹ã¯å¥ã ã®ããªãŒã«å ¥ããŸãã ä»ã®ã¹ãããã§ã¯ãé£æ¥ãããã¹ãŠã®é ç¹ã®ãªã¹ããé çªã«è¡šç€ºããå¥ã®ããªãŒã«å±ããé ç¹ãéžæããå¿ èŠããããŸãã
é£æ¥ããé ç¹ã®ãªã¹ããã2çªç®ã®é ç¹ãéžæãããã®ãšããžãæå°ã«ããããšãã§ããªãã®ã¯ãªãã§ããïŒ ããªãŒãçµåããæé ïŒåŸã§æ€èšããŸãïŒã®åŸã«ãé£æ¥ãããã®ã®ãªã¹ãããããã€ãã®é ç¹ãæ€èšäžã®é ç¹ãšåãããªãŒã«è¡šç€ºãããå ŽåããããŸããããã«ããããã®ãšããžã¯ãã®ããªãŒã®ã«ãŒãã«ãªããã¢ã«ãŽãªãºã ã®æ¡ä»¶ã«ãããæå°ãšããžãéžæããå¿ èŠããããŸãä»ã®æšã«ã
Union Implement [8]㯠ãé ç¹åŠçãå®è£ ããæ€çŽ¢ãããŒãžãããã³ããŒãžãªã¹ããå®è¡ããã®ã«é©ããŠããŸãã æ®å¿µãªããããã¹ãŠã®æ§é ãGPUã§æé©ã«åŠçãããããã§ã¯ãããŸããã ãã®ã¿ã¹ã¯ã§æãæçãªã®ã¯ïŒä»ã®ã»ãšãã©ã®å Žåãšåæ§ïŒããªã³ã¯ãªã¹ãã®ä»£ããã«GPUã¡ã¢ãªã§é£ç¶é åã䜿çšããããšã§ãã 以äžã§ã¯ãã°ã©ãå ã®æå°ãšããžãèŠã€ããã»ã°ã¡ã³ããçµåãããµã€ã¯ã«ãåé€ããããã®åæ§ã®ã¢ã«ãŽãªãºã ãæ€èšããŸãã
æå°ãšããžãèŠã€ããã¢ã«ãŽãªãºã ãæ€èšããŠãã ããã 次ã®2ã€ã®æé ã§è¡šãããšãã§ããŸãã
- åé¡ã®ã°ã©ãã®åé ç¹ïŒã»ã°ã¡ã³ãã«å«ãŸããïŒããåºãæå°ãšããžã®éžæã
- åããªãŒã®æå°éã¿ã®ãšããžã®éžæã
CSR圢åŒã§èšé²ãããé ç¹æ å ±ã移åããªãããã«ããããã«ã2ã€ã®è£å©é åã䜿çšããŠãé£æ¥ãªã¹ãã®é åAã®æåãšæåŸã®ã€ã³ããã¯ã¹ãä¿åããŸãã æå®ããã2ã€ã®é åã¯ã1ã€ã®ããªãŒã«å±ããé ç¹ãªã¹ãã®ã»ã°ã¡ã³ãã瀺ããŸãã ããšãã°ãæåã®ã¹ãããã§ã¯ãéå§å€ãŸãã¯äžéå€ã®é åã®å€ã¯é åXã®0..Nã«ãªããçµäºå€ã®é åãŸãã¯äžéå€ã®å€ã¯é åXã®1..N + 1ã«ãªããŸãããããŠãããªãŒãçµåããæé ã®åŸïŒããããã®ã»ã°ã¡ã³ãã¯æ··åãããŸãããã¡ã¢ãªå ã®ãã€ããŒAã®é åã¯å€æŽãããŸããã
äž¡æ¹ã®ã¹ãããã䞊è¡ããŠå®è¡ã§ããŸãã æåã®ã¹ããããå®äºããã«ã¯ãåé ç¹ïŒãŸãã¯åã»ã°ã¡ã³ãïŒã®é£æ¥ãªã¹ããèŠãŠãå¥ã®ããªãŒã«å±ããæåã®ãšããžãéžæããå¿ èŠããããŸãã 1ã€ã®ã¯ãŒãïŒ32ã¹ã¬ããã§æ§æãããïŒãéžæããŠãåé ç¹ã®é£æ¥ãªã¹ãã衚瀺ã§ããŸãã é£æ¥ããããŒã¯Aã®é åã®ããã€ãã®ã»ã°ã¡ã³ããè¡ã«ãªãã1ã€ã®ããªãŒã«å±ããŠããããšãèŠããŠãã䟡å€ããããŸãïŒããªãŒ0ã«å±ããã»ã°ã¡ã³ãã¯èµ€ã§åŒ·èª¿è¡šç€ºãããããªãŒ1ã¯ç·ã§åŒ·èª¿è¡šç€ºãããŸãïŒã
é£æ¥ãªã¹ãã®åã»ã°ã¡ã³ãã¯ãœãŒããããŠããããããã¹ãŠã®é ç¹ã衚瀺ããå¿ èŠã¯ãããŸããã 1ã€ã®ã¯ãŒãã¯32ã®ã¹ã¬ããã§æ§æãããããã衚瀺ã¯32ã®é ç¹ã®éšåã§è¡ãããŸãã 32åã®é ç¹ã衚瀺ããåŸãçµæãçµåããå¿ èŠããããäœãèŠã€ãããªãå Žåã¯ã次ã®32åã®é ç¹ã衚瀺ããŸãã çµæãçµã¿åãããã«ã¯ãã¹ãã£ã³ã¢ã«ãŽãªãºã ã䜿çšã§ããŸã[9] ã å ±æã¡ã¢ãªãŸãã¯æ°ããshflåœä»€[10] ïŒKeplerã¢ãŒããã¯ãã£ããå ¥æå¯èœïŒã䜿çšããŠã1ã€ã®ã¯ãŒãå ã«ãã®ã¢ã«ãŽãªãºã ãå®è£ ã§ããŸããããã«ããã1ã€ã®åœä»€ã§1ã€ã®ã¯ãŒãã®ã¹ã¬ããéã§ããŒã¿ã亀æã§ããŸãã å®éšã®çµæãshflåœä»€ã¯ã¢ã«ãŽãªãºã å šäœã®äœæ¥ãçŽ2åé«éåã§ããããšãå€æããŸããã ãããã£ãŠããã®æäœã¯ãããšãã°æ¬¡ã®ããã«shflåœä»€ã䜿çšããŠå®è¡ã§ããŸãã
unsigned idx = blockIdx.x * blockDim.x + threadIdx.x; // unsigned lidx = idx % 32; #pragma unroll for (int offset = 1; offset <= 16; offset *= 2) { tmpv = __shfl_up(val, (unsigned)offset); if(lidx >= offset) val += tmpv; } tmpv = __shfl(val, 31); // . 1, - // , .
ãã®ã¹ãããã®çµæãåã»ã°ã¡ã³ãã«ã€ããŠæ¬¡ã®æ å ±ãèšé²ãããŸããæå°éã¿ã®ãšããžã«å«ãŸããé åAã®é ç¹ã®æ°ãšãšããžèªäœã®éã¿ã äœãèŠã€ãããªãå Žåã¯ãããšãã°ãé ç¹çªå·ã«N + 2ã®æ°ãæžã蟌ãããšãã§ããŸãã
2çªç®ã®ã¹ãããã¯ãéžæããæ å ±ãã€ãŸãåããªãŒã®æå°éã¿ãæã€ãšããžã®éžæãæžããããã«å¿ èŠã§ãã åãããªãŒã«å±ããã»ã°ã¡ã³ãã¯ã䞊è¡ããŠç¬ç«ããŠã¹ãã£ã³ãããåã»ã°ã¡ã³ãã«å¯ŸããŠæå°éã¿ãšããžãéžæãããããããã®ã¹ããããå®è¡ãããŸãã ãã®ã¹ãããã§ã¯ã1ã€ã®ã¯ãŒãã§åããªãŒã®æ å ±ãïŒè€æ°ã®ã»ã°ã¡ã³ãã«ã€ããŠïŒåæžã§ããshflåœä»€ãåæžã«äœ¿çšã§ããŸãã ãã®æé ãå®äºãããšãåããªãŒãæå°ãšããžïŒååšããå ŽåïŒã«ãã£ãŠã©ã®ããªãŒã«æ¥ç¶ãããŠããããããããŸãã ãã®æ å ±ãèšé²ããããã«ãããã«2ã€ã®è£å©é åãå°å ¥ããŸãã1ã€ã¯ãæå°ãšããžãååšãããŸã§ã®ããªãŒã®æ°ãæ ŒçŽããŸãã2ã€ç®ã¯ãå ã®ã°ã©ãã®é ç¹ã®æ°ã§ãããªãŒã«å ¥ãé ç¹ã®ã«ãŒãã§ãã ãã®æé ã®çµæã以äžã«ç€ºããŸãã
ã€ã³ããã¯ã¹ãæäœããã«ã¯ãããã«2ã€ã®é åãå¿ èŠã§ããããã¯ãå ã®ã€ã³ããã¯ã¹ãæ°ããã€ã³ããã¯ã¹ã«å€æããæ°ããã€ã³ããã¯ã¹ã䜿çšããŠå ã®ã€ã³ããã¯ã¹ãååŸããã®ã«åœ¹ç«ã¡ãŸãã ãããã®ããããã€ã³ããã¯ã¹å€æããŒãã«ã¯ãã¢ã«ãŽãªãºã ã®åå埩ã§æŽæ°ãããŸãã åæã€ã³ããã¯ã¹ã«ãã£ãŠæ°ããã€ã³ããã¯ã¹ãååŸããããã®ããŒãã«ã®ãµã€ãºã¯N-ã°ã©ãå ã®é ç¹ã®æ°ã§ãããåæã€ã³ããã¯ã¹ãååŸããããã®ããŒãã«ã¯åå埩ã§æ°ããæ¹æ³ã§çž®å°ãããã¢ã«ãŽãªãºã ã®éžæãããå埩ã§ã®ããªãŒã®æ°ã«çãããµã€ãºãæã¡ãŸãïŒã¢ã«ãŽãªãºã ã®æåã®å埩ã§ã¯ããã®ããŒãã«ããµã€ãºNïŒã
2.ãµã€ã¯ã«ãåé€ããŸãã
ãã®æé ã¯ã2ã€ã®ããªãŒéã®ã«ãŒããåé€ããããã«å¿ èŠã§ãã ãã®ç¶æ³ã¯ãããªãŒN1ã®æå°ãšããžãããªãŒN2ã«ãããããªãŒN2ã®æå°ãšããžãããªãŒN1ã«ããå Žåã«çºçããŸãã äžã®å³ã§ã¯ãçªå·2ãš4ã®2ã€ã®ããªãŒéã«ã®ã¿ãµã€ã¯ã«ããããŸããåå埩ã§ããªãŒãå°ãªãããããµã€ã¯ã«ãæ§æãã2ã€ã®ããªãŒã®æå°æ°ãéžæããŸãã ãã®å Žåã2ã¯2ãæãã4ã¯2ãæãç¶ããŸãããããã®ãã§ãã¯ã䜿çšããŠããã®ãããªãµã€ã¯ã«ã決å®ããæå°æ°ãåªå ããŠæé€ããããšãã§ããŸãã
unsigned i = blockIdx.x * blockDim.x + threadIdx.x; unsigned local_f = F[i]; if (F[local_f] == i) { if (i < local_f) { F[i] = i; . . . . . . . } }
ãã®æé ã¯ãåé ç¹ãç¬ç«ããŠåŠçã§ãããµã€ã¯ã«ã®ãªãé ç¹ã®æ°ããé åã®ã¬ã³ãŒãã亀差ããªãããã䞊è¡ããŠå®è¡ã§ããŸãã
3.æšã®çµåã
ãã®æé ã§ã¯ãããªãŒããã倧ããªããªãŒã«çµåããŸãã 2ã€ã®ããªãŒéã®ã«ãŒããåé€ããæé ã¯ãåºæ¬çã«ãã®æé ã®åã®ååŠçã§ãã ããªãŒãããŒãžãããšãã®ã«ãŒããåé¿ããŸãã ããªãŒã®çµåã¯ããªã³ã¯ãå€æŽããŠæ°ããã«ãŒããéžæããããã»ã¹ã§ãã ããšãã°ãããªãŒ0ãããªãŒ1ãæããããªãŒ1ãããªãŒ3ãæããŠããå ŽåãããªãŒ0ã®ãªã³ã¯ãããªãŒ1ããããªãŒ3ã«å€æŽã§ããŸãããã®ãªã³ã¯ã®å€æŽã¯ããªã³ã¯ãå€æŽããŠã2ã€ã®ããªãŒéã®ã«ãŒããçºçããªãå Žåã«äŸ¡å€ããããŸãã äžèšã®äŸãèæ ®ãããšããµã€ã¯ã«ã®åé€ãšããªãŒã®çµååŸã2çªã®ããªãŒã1ã€ã ãæ®ããŸããçµåããã»ã¹ã¯æ¬¡ã®ããã«è¡šãããšãã§ããŸãã
ã°ã©ãã®æ§é ãšãã®åŠçã®åçã¯ãããã·ãŒãžã£ãã«ãŒãããç¶æ³ããªãã䞊è¡ããŠå®è¡ã§ãããããªãã®ã§ãã
4.ããŒã¯ïŒããªãŒïŒã®çªå·ãä»ãçŽããŸãã
ããŒãžæé ãå®è¡ããåŸãçµæã®ããªãŒã«çªå·ãä»ãçŽããŠçªå·ã0ããPã«ãªãããã«ããå¿ èŠããããŸããæ§ç¯ã«ãããæ°ããçªå·ã¯æ¡ä»¶F [i] == iãæºããé åèŠçŽ ãåãåãå¿ èŠããããŸãïŒäžèšã®äŸã§ã¯ãèŠçŽ ã®ã¿ããã®æ¡ä»¶ãæºããã€ã³ããã¯ã¹2ïŒã ãããã£ãŠãã¢ãããã¯æäœã䜿çšãããšã1 ...ïŒP + 1ïŒã®æ°ããå€ã§é åå šäœãããŒã¯ã§ããŸãã 次ã«ãæ°ããã€ã³ããã¯ã¹ã®åæã€ã³ããã¯ã¹ãšåæã€ã³ããã¯ã¹ã®æ°ããã€ã³ããã¯ã¹ãååŸããããã®è¡šãå®æãããŸãã
ãããã®ããŒãã«ã®æäœã«ã€ããŠã¯ãæå°ãšããžãèŠã€ããæé ã§èª¬æããŠããŸãã ããŒãã«ããŒã¿ãæŽæ°ããªããšã次ã®å埩ãæ£ããå®è¡ã§ããŸããã 説æãããŠãããã¹ãŠã®æäœã¯ãGPUäžã§äžŠè¡ããŠå®è¡ãããŸãã
çãèŠçŽãèŠçŽãã ã 4ã€ã®æé ã¯ãã¹ãŠã䞊è¡ããŠã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ã§å®è¡ãããŸãã äœæ¥ã¯1次å é åã§è¡ãããŠããŸãã å¯äžã®é£ç¹ã¯ããããã®ãã¹ãŠã®æé ã«éæ¥çãªã€ã³ããã¯ã¹ä»ããããããšã§ãã ãŸãããã®ãããªé åã®åŠçã«ãããã£ãã·ã¥ãã¹ãæžããããã«ãæåã«èª¬æããã°ã©ãã®ããŸããŸãªé åã䜿çšãããŸããã ããããæ®å¿µãªããšã«ããã¹ãŠã®ã°ã©ããéæ¥ã€ã³ããã¯ã¹äœæã«ããæ倱ãåæžããããã§ã¯ãããŸããã åŸã§ç€ºãããã«ããã®ã¢ãããŒãã§ã¯ãRMATã°ã©ãã¯éåžžã«é«ãããã©ãŒãã³ã¹ãéæããŸããã æå°ãšããžãèŠã€ããã«ã¯ãã¢ã«ãŽãªãºã å šäœãæ©èœããæéã®80ïŒ ãŸã§ããããæ®ãã¯æ®ãã®20ïŒ ãå ããŸãã ããã¯ãã«ãŒãã®çµåãåé€ãããã³çªå·ã®ä»ãçŽãã®æé ã§ãé·ãã絶ããæžå°ããé åïŒå埩ããå埩ãžïŒã§äœæ¥ãè¡ããããšããäºå®ã«ãããã®ã§ãã èæ ®ãããã°ã©ãã§ã¯ãçŽ7ã8åã®å埩ãè¡ãå¿ èŠããããŸãã ããã¯ãæåã®ã¹ãããã§ãã§ã«åŠçãããé ç¹ã®æ°ãN / 2ããã¯ããã«å°ãªããªãããšãæå³ããŸããæå°ãšããžãèŠã€ããããã®ã¡ã€ã³æé ã§ã¯ãé ç¹Aã®é åãšéã¿Wã®é åã§äœæ¥ãè¡ãããŸãïŒç¹å®ã®èŠçŽ ãéžæãããŸãïŒã
ã°ã©ãã®ä¿åã«å ããŠãé·ãNã®é åãããã«ããã€ã䜿çšãããŸããã
- äœãå€ã®é åãšé«ãå€ã®é åã é åAã®ã»ã°ã¡ã³ããæäœããããã«äœ¿çšãããŸãã
- æ°ããã€ã³ããã¯ã¹ã®å ã®ã€ã³ããã¯ã¹ãååŸããé åããŒãã«ã
- ãªãªãžãã«ã®æ°ããã€ã³ããã¯ã¹ãååŸããé åããŒãã«ã
- é ç¹çªå·ã®é åãšãããã«å¯Ÿå¿ããéã¿ã®é åããããã¯æå°ãšããžæ€çŽ¢æé ã®2çªç®ã®ã¹ãããã§äœ¿çšãããŸãã
- æé ã®æåã®ã¹ãããã§ããã®ããªãŒãŸãã¯ãã®ã»ã°ã¡ã³ããå±ããããªãŒã®æå°ãšããžãèŠã€ããããã®è£å©é åã
æå°ãšããžæ€çŽ¢æé ã®ãã€ããªããå®è£ ã
äžèšã®ã¢ã«ãŽãªãºã ã¯ãæçµçã«åäžã®GPUã§ããã©ãŒãã³ã¹ãäœäžããããšã¯ãããŸããã ãã®åé¡ã®è§£æ±ºçã¯ãCPUäžã§ããã®æé ã䞊ååã§ããããã«ç·šæãããŠããŸãã ãã¡ãããããã¯å ±æã¡ã¢ãªã§ã®ã¿è¡ãããšãã§ãããã®ããã«OpenMPæšæºã䜿çšããPCIeãã¹ãä»ããŠCPUãšGPUã®éã§ããŒã¿ã転éããŸããã ã¿ã€ã ã©ã€ã³ã§ã®1åã®ç¹°ãè¿ãã§ã®ããã·ãŒãžã£ã®å®è¡ãæ³åãããšã1ã€ã®GPUã䜿çšãããšãã®å³ã¯æ¬¡ã®ããã«ãªããŸãã
æåã¯ããã¹ãŠã®ã°ã©ãããŒã¿ãCPUãšGPUã®äž¡æ¹ã«ä¿åãããŸãã CPUãèªã¿åããããã«ããã«ã¯ãããªãŒã®ããŒãžäžã«ç§»åããã»ã°ã¡ã³ãã«é¢ããæ å ±ãéä¿¡ããå¿ èŠããããŸãã ãŸããGPUãã¢ã«ãŽãªãºã ã®å埩ãç¶ç¶ããã«ã¯ãèšç®ãããããŒã¿ãè¿ãå¿ èŠããããŸãã ãã¹ããšã¢ã¯ã»ã©ã¬ãŒã¿éã§éåæã³ããŒã䜿çšããããšã¯è«ççã§ãïŒ
CPUã®ã¢ã«ãŽãªãºã ã¯GPUã§äœ¿çšãããã¢ã«ãŽãªãºã ãç¹°ãè¿ããOpenMPã®ã¿ãã«ãŒãã®äžŠååã«äœ¿çšãããŸã[11] ã äºæ³ã©ãããCPUã¯GPUã»ã©éãã«ãŠã³ãããããã³ããŒã®ãªãŒããŒããããå¹²æžããŸãã CPUããã®éšåãèšç®ããã«ã¯ãèšç®ããŒã¿ã1ïŒ5ã®æ¯çã§åå²ããå¿ èŠããããŸããã€ãŸããCPUã«è»¢éãããã®ã¯20ïŒ ã25ïŒ ã®ã¿ã§ãæ®ãã¯GPUã§èšç®ããå¿ èŠããããŸãã æ®ãã®æé ã¯ãæéãããããããªãŒããŒããããšCPUé床ã®äœäžãã¢ã«ãŽãªãºã æéãå¢å ãããã ããªã®ã§ããã¡ãã¡ãèªãã®ã«æå©ã§ã¯ãããŸããã CPUãšGPUéã®ã³ããŒé床ãéåžžã«éèŠã§ãã ãã¹ãããããã©ãããã©ãŒã ã¯ãPCIe 3.0ããµããŒãããŠããã12GB / sã«å°éã§ããŸããã
ãããŸã§ã®ãšãããGPUãšCPUã®RAMã®éã¯å€§ããç°ãªããåŸè ã®æ¹ãæå©ã§ãã ãã¹ããã©ãããã©ãŒã ã§ã¯ã6 GB GDDR5ãã€ã³ã¹ããŒã«ãããŸããããCPUã§ã¯48 GBããããŸããã GPUã®ã¡ã¢ãªå¶éã«ããã倧ããªã°ã©ããèšç®ã§ããŸããã ãããŠãCPUãšãŠããã¡ã€ãã¡ã¢ãªãã¯ãããžãŒ[12]ã¯ãCPUã¡ã¢ãªããGPUã«ã¢ã¯ã»ã¹ããããšãå¯èœã«ããŸãã ã°ã©ãã«é¢ããæ å ±ã¯æå°ãšããžãèŠã€ããããã®æé ã§ã®ã¿å¿ èŠã§ããããã倧ããªã°ã©ãã®å Žåãæåã«ãã¹ãŠã®è£å©é åãGPUã¡ã¢ãªã«é 眮ãã次ã«ã°ã©ãé åã®äžéšïŒé£æ¥é åAãé åXããã³éã¿Wã®é åïŒãã¡ã¢ãªã«é 眮ã§ããŸãGPUããããé©åããªãã£ããã®-CPUã®ã¡ã¢ãªå ã ããã«ãèšç®äžã«ãGPUã«é©åããªãã£ãéšåãCPUã§åŠçãããããã«ããŒã¿ãåå²ããããšãã§ããGPUã¯CPUã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ãæå°éã«äœ¿çšããŸãïŒã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ããCPUã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ã¯PCIeãã¹ãä»ããŠã 15 GB /ç§ïŒã ããŒã¿ãã©ã®å²åã§åå²ãããŠãããã¯äºåã«ããã£ãŠãããããGPUãŸãã¯CPUã§ã¢ã¯ã»ã¹ããã¡ã¢ãªã決å®ããã«ã¯ãé åãåé¢ããããã€ã³ãã瀺ãå®æ°ãå ¥åããã ãã§ååã§ãGPUã®ã¢ã«ãŽãªãºã ã1åãã§ãã¯ããã ãã§ã©ãã§å®è¡ãããã決å®ã§ããŸãã¢ããŒã«ã ãããã®é åã®ã¡ã¢ãªå ã®å Žæã¯ããããã次ã®ããã«è¡šãããšãã§ããŸãã
ãããã£ãŠãèšèŒãããŠããå§çž®ã¢ã«ãŽãªãºã ã䜿çšããŠããGPUã«æåã¯åãŸããªãã°ã©ããåŠçã§ããŸãããPCIeã®ã¹ã«ãŒãããã¯éåžžã«éãããŠãããããããäœãé床ã§åŠçã§ããŸãã
è©Šéšçµæ
ãã¹ãã¯ã192åã®cudaã³ã¢ïŒåèš2688ïŒãåãã14åã®SMXãšã3.7 GHzã®åšæ³¢æ°ãåãã6åã®ã³ã¢ïŒ12çªç®ïŒã®Intel Xeon E5 v1660ããã»ããµãåããNVidia GTX Titan GPUã§å®è¡ãããŸããã ãã¹ããå®è¡ãããã°ã©ãã¯äžèšã®ãšããã§ãã ããã€ãã®ç¹åŸŽã®ã¿ã瀺ããŸãã
ã¹ã±ãŒã«ïŒ2 ^ NïŒ | é ç¹ã®æ° | ãªãã®æ°ïŒ2 * MïŒ | ã°ã©ããµã€ãºãGB | |
---|---|---|---|---|
RMAT | SSCA2 | |||
16 | 65,536 | 2,097 152 | ã2 100 000 | ã0.023 |
21 | 2,097 152 | 67 108 864 | ã67,200,000 | ã0.760 |
24 | 16 777 216 | 536 870 912 | ã537çŸäž | ã6.3 |
25 | 33554432 | 1,073,741,824 | ã1,075,000,000 | ã12.5 |
26 | 67108864 | 2 147 483 648 | ã2 150 000 000 | ã25.2 |
27 | 134 217 728 | 4,294,967,296 | ã4,300,000,000 | ã51.2 |
ã¹ã±ãŒã«16ã®ã°ã©ãã¯éåžžã«å°ããïŒçŽ25 MBïŒãå€æãªãã§ã1ã€ã®ææ°ã®Intel Xeonããã»ããµã®ãã£ãã·ã¥ã«ç°¡åã«åãŸãããšãããããŸãã ã°ã©ãã®éã¿ã¯åèšã®2/3ãå ãããããå®éã«ã¯çŽ8 MBãåŠçããå¿ èŠããããŸãããããã¯L2 GPUãã£ãã·ã¥ã®çŽ5åã«ãããŸããã ãã ãã倧ããªã°ã©ãã«ã¯ååãªéã®ã¡ã¢ãªãå¿ èŠã§ãããã¹ã±ãŒã«24ã®ã°ã©ãã§ãããå§çž®ããã«ãã¹ãæžã¿GPUã®ã¡ã¢ãªã«åãŸããªããªããŸãã ã°ã©ãè¡šçŸã«åºã¥ããŠã26çªç®ã®ã¹ã±ãŒã«ã¯æåŸã®ã¹ã±ãŒã«ã§ããšããžã®æ°ã笊å·ãªãæŽæ°ã«é 眮ãããŸããããã¯ãããã«ã¹ã±ãŒãªã³ã°ããããã®ã¢ã«ãŽãªãºã ã®å¶éã§ãã ãã®å¶éã¯ãããŒã¿åãæ¡åŒµããããšã§ç°¡åã«åé¿ã§ããŸãã ããã¯ãå粟床ïŒunsigned intïŒã®åŠçãdoubleïŒunsigned long longïŒãããäœåãé«éã§ãããã¡ã¢ãªã®éããŸã éåžžã«å°ãªãããããããŸã§ã®ãšããããã»ã©é¢é£ããŠããªãããã§ãã ããã©ãŒãã³ã¹ã¯ã1ç§ãããã«åŠçããããšããžã®æ°ã§æž¬å®ãããŸãïŒ1ç§ãããã®ééãšããž-TEPSïŒã
ã³ã³ãã€ã«ã¯ããªãã·ã§ã³-O3 -arch = sm_35ã®NVidia CUDA Toolkit 7.0ã䜿çšããŠããªãã·ã§ã³-O3ã®Intel Composer 2015ã䜿çšããŠå®è¡ãããŸããã å®è£ ãããã¢ã«ãŽãªãºã ã®æ倧ããã©ãŒãã³ã¹ã¯ã以äžã®ã°ã©ãã§èŠãããšãã§ããŸãïŒ
ã°ã©ãã¯ããã¹ãŠã®SSCA2æé©åã䜿çšããŠãã°ã©ããè¯å¥œãªããã©ãŒãã³ã¹ã瀺ããŠããããšã瀺ããŠããŸããã°ã©ãã倧ããã»ã©ãããã©ãŒãã³ã¹ãåäžããŠããŸãã ãã®æé·ã¯ããã¹ãŠã®ããŒã¿ãGPUã®ã¡ã¢ãªã«é 眮ããããŸã§ç¶æãããŸãã 25ã¹ã±ãŒã«ãš26ã¹ã±ãŒã«ã§ã¯ãUnified Memoryã¡ã«ããºã ã䜿çšãããŸãããããã«ãããé床ã¯äœäžããŸãããçµæãåŸãããšãã§ããŸããïŒãã ãã以äžã«ç€ºãããã«ãCPUã®ã¿ããé«éã§ãïŒã 12 GBã®ã¡ã¢ãªãæèŒããECCãšIntel Xeon E5 V2 / V3ããã»ããµãç¡å¹ã«ããTesla k40ã§èšç®ãå®è¡ããå Žåãã¹ã±ãŒã«25ã®SSCA2ã°ã©ãã§çŽ3000 MTEPSãéæã§ãã26ã¹ã±ãŒã«ã®ã°ã©ãã ãã§ãªãã 27.ãã®ãããªå®éšã¯ããã®è€éãªæ§é ãšã¢ã«ãŽãªãºã ã®äžååãªé©å¿ã®ããã«ãRMATã°ã©ãã§ã¯å®æœãããŸããã§ããã
ããŸããŸãªã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ã®æ¯èŒ
ãã®åé¡ã¯ãGraphHPC 2015ã«ã³ãã¡ã¬ã³ã¹ã®ã³ã³ããã£ã·ã§ã³ã®æ çµã¿ã§è§£æ±ºãããŸããããèè ã«ãããšããã®ã³ã³ããã£ã·ã§ã³ã§1äœã«ãªã£ãAlexander Daryinã«ãã£ãŠæžãããããã°ã©ã ãšæ¯èŒããããšæããŸãã
äžè¬çãªè¡šã«ã¯ãäœæè ãæäŸãããã¹ããã©ãããã©ãŒã ã®çµæãå«ãŸããŠããããã説æãããã©ãããã©ãŒã ïŒGTX Titan + Xeon E5 v2ïŒã®CPUããã³GPUã«ã°ã©ãã£ãã¯ã¹ãé 眮ããã®ã¯é©åã§ãã 以äžã¯ã2ã€ã®ã°ã©ãã®çµæã§ãã
ã°ã©ãããããã®èšäºã§èª¬æããã¢ã«ãŽãªãºã ã¯SSCA2ã°ã©ãã«å¯ŸããŠããæé©åãããŠããã®ã«å¯ŸããAlexander Daryinã«ãã£ãŠå®è£ ãããã¢ã«ãŽãªãºã ã¯RMATã°ã©ãã«å¯ŸããŠæé©åãããŠããããšãããããŸãã ãã®å Žåãã©ã®å®è£ ãæè¯ã§ããããæ確ã«èšãããšã¯äžå¯èœã§ããããããã«ç¬èªã®é·æãšçæãããããã§ãã ãŸããã¢ã«ãŽãªãºã ãè©äŸ¡ããåºæºã¯æ確ã§ã¯ãããŸããã 倧ããªã°ã©ãã®åŠçã«ã€ããŠè©±ãå Žåãã¢ã«ãŽãªãºã ã24ã26ã¹ã±ãŒã«ã®ã°ã©ããåŠçã§ãããšããäºå®ã¯å€§ããªãã©ã¹ã§ãããå©ç¹ã§ãã ä»»æã®ãµã€ãºã®ã°ã©ãã®å¹³ååŠçé床ã«ã€ããŠè©±ãå Žåãã©ã®å¹³åå€ãèæ ®ãããã¯æ確ã§ã¯ãããŸããã æ確ãªããšã¯1ã€ã ãã§ãã1ã€ã®ã¢ã«ãŽãªãºã ã¯SSCA2ã°ã©ããé©åã«åŠçãã2çªç®ã¯RMATã§ãã ããã2ã€ã®å®è£ ãçµã¿åãããå Žåãå¹³åããã©ãŒãã³ã¹ã¯23ã¹ã±ãŒã«ã§çŽ3200 MTEPSã«ãªããŸãã ããã§ã¢ã¬ã¯ãµã³ããŒããªã³ã¢ã«ãŽãªãºã ã®ããã€ãã®æé©åã®èšè¿°ã®ãã¬ãŒã³ããŒã·ã§ã³ãèŠã€ããããšãã§ããŸã ã
å€åœã®èšäºããã次ã®ãã®ãåºå¥ããããšãã§ããŸãã
1ïŒ [13]ãã®èšäºããã説æããã¢ã«ãŽãªãºã ã®å®è£ ã«ããã€ãã®ã¢ã€ãã¢ã䜿çšãããŸããã å€ãNVidia Tesla S1070ã§ãã¹ããè¡ââããããããèè ã«ãã£ãŠåŸãããçµæãçŽæ¥æ¯èŒããããšã¯ã§ããŸããã èè ãGPUã§éæããããã©ãŒãã³ã¹ã¯18ã36 MTEPSã®ç¯å²ã§ãã 2009幎ãš2013幎ã«å ¬éã
2ïŒ [14] GPUã§ã®Primã¢ã«ãŽãªãºã ã®å®è£ ã
3ïŒ [15] GPUã§ã®k NN-Boruvkaã®å®è£ ã
CPUã«ã¯ããã€ãã®äžŠåå®è£ ããããŸãã ããããå€åœã®èšäºã§ã¯é«ãããã©ãŒãã³ã¹ãèŠã€ããããšãã§ããŸããã§ããã ãã¶ããèªè ã®äžäººãç§ãäœããèŠéãããã©ãããç¥ãããšãã§ããã§ãããã ãŸãããã·ã¢ã§ã¯ãã®ãããã¯ã«é¢ããåºçç©ãã»ãšãã©ãªãããšã泚ç®ã«å€ããŸãïŒ Vadim Zaitsevãé€ãïŒãããã¯éåžžã«æ²ããããšã§ãã
競äºã«ã€ããŠããããŠçµè«ã§ã¯ãªã
ç§ã¯ãéå»ã«ã€ããŠè¿°ã¹ãMSTã®æè¯ã®å®è£ ã®ããã®ç«¶äºã«ã€ããŠèšåããããšæããŸãã ãããã®ã¡ã¢ãèªãã§ãç§ã®å人çãªæèŠãè¿°ã¹ãå¿ èŠã¯ãããŸããã 誰ããéåžžã«ç°ãªã£ãèãæ¹ãããŠããå¯èœæ§ããããŸãã
ãã¹ãŠã®åå è ã«ã€ããŠãã®åé¡ã解決ããããã®åºç€ã¯ãå®éã«ã¯åãBoruvkaã¢ã«ãŽãªãºã ã«åºã¥ããŠããŸããã ä»ã®ã¢ã«ãŽãªãºã ïŒKruskalaããã³PrimaïŒã®èšç®ã¯éåžžã«è€éã§ãGPUãªã©ã®äžŠåã¢ãŒããã¯ãã£ã«ãããã³ã°ãé ããäžååã§ãããããã¿ã¹ã¯ã¯å°ãç°¡ç¥åãããŠããŸãã äŒè°ã®ååããè«ççã«ã¯ãã¡ã¢ãªå ã§1 GB以äžãå ãããããªã°ã©ãïŒ22以äžã®ã¹ã±ãŒã«ãæã€ã°ã©ããªã©ïŒã®ãããªå€§ããªã°ã©ããé©åã«åŠçããã¢ã«ãŽãªãºã ãèšè¿°ããå¿ èŠããããšããããšã§ãã æ®å¿µãªããšã«ãäœããã®çç±ã§èè ã¯ãã®äºå®ãèæ ®ããããã¹ããã©ãããã©ãŒã ã«ã¯åèš50 MBã®2ã€ã®CPUïŒæ倧17ã¹ã±ãŒã«<= 50 MBã®ã°ã©ãïŒãå«ãŸããŠããããã競åå šäœããã£ãã·ã¥ã§ããŸãæ©èœããã¢ã«ãŽãªãºã ãæžãããšã«ãªããŸããã åãå ¥ãå¯èœãªçµæãââ瀺ããã®ã¯åå è ã®1人ã ãã§ãã-Vadim Zaitsevã¯ãã¹ã±ãŒã«ã®ã°ã©ã22ã§2ã€ã®CPUã§ããªãé«ãå¹³åå€ãåãåããŸããã ããããäŒè°äžã«å€æããããã«ããã®åå è ã¯ããªãé·ãéMSTã¿ã¹ã¯ã«åŸäºããŠããŸããã ä»ã®å®è£ ãããã¢ã«ãŽãªãºã ã®å€§ããªã°ã©ãã®åŠçé床ã¯å€§ãããªãå¯èœæ§ãé«ããã³ã³ãã¹ãWebãµã€ãã§å ¬éãããŠãããããã®æ°å€ïŒããã«æªãããšã«ïŒãšã¯å€§ããç°ãªãå¯èœæ§ããããŸãã ãŸããã°ã©ãæ§é ãéåžžã«ç°ãªããšããäºå®ãããã³ç®è¡å¹³åãå®å šã«æ確ã§ã¯ãªããããçç£æ§ã®å¹³åå€ãçªç¶èæ ®ããå¿ èŠãããçç±ã«ã泚æãæã䟡å€ããããŸãã åŠçãããã°ã©ãã®ãµã€ãºãèæ ®ãããŸããã§ããã æäŸãããã·ã¹ãã ã®ãã1ã€ã®äžå¿«ãªãæ©èœãïŒ2x Intel Xeon E5-2690ããã³NVidia Tesla K20xãå«ãïŒã¯PCIe 3.0ã§åäœããŠããŸããïŒãã ããGPUã§ãµããŒãããããµãŒããŒããŒãã«ååšããŸãïŒã ãã®çµæãPCIe 2.0ã®é床ã¯ã»ãŒ3åäœããããXeon E5ãããé«éãªïŒãããã§ã¯ãããŸããïŒ2ã€ã®ããã»ããµãŒã䜿çšããããšã¯ã§ããŸããã§ããã
ã°ã©ãåŠçã¯ã¢ãŒããã¯ãã£äžã®æ©èœã®ããã«GPUã§äžŠååããã®ãé£ãããããGPUã§ãã®ãããªåé¡ã解決ããã®ã¯ç°¡åã§ã¯ãªãããšã«æ³šæããŠãã ããã ãããŠããããããããã®ã³ã³ãã¹ãã¯ãã°ã©ãã£ãã¯ããã»ããµçšã®éæ§é ã°ãªããã䜿çšããŠãã¢ã«ãŽãªãºã ãèšè¿°ããåéã®å°é家ã®éçºã«è²¢ç®ããã¯ãã§ãã ããããä»å¹Žã®çµæãšååã®çµæããå€æãããšãæ®å¿µãªããããã®ãããªã¿ã¹ã¯ã§ã®GPUã®äœ¿çšã¯éåžžã«éãããŠããŸãã
åç §ïŒ
[1] en.wikipedia.org/wiki/Sparse_matrix
[2] www.dislab.org/GraphHPC-2014/rmat-siam04.pdf
[3] www.dislab.org/GraphHPC-2015/SSCA2-TechReport.pdf
[4] www.nvidia.ru/object/tesla-supercomputer-workstations-ru.html
[5] www.nvidia.ru/object/geforce-gtx-titan-x-ru.html#pdpContent=2
[6] www.amd.com/ru-ru/products/graphics/workstation/firepro-3d/9100
[7] en.wikipedia.org/wiki/Bor%C5%AFvka 's_algorithm
[8] www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
[9] habrahabr.ru/company/epam_systems/blog/247805
[10] on-demand.gputechconf.com/gtc/2013/presentations/S3174-Kepler-Shuffle-Tips-Tricks.pdf
[11] openmp.org/wp
[12] devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6
[13] stanford.edu/~vibhavv/papers/old/Vibhav09Fast.pdf
[14] ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5678261&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5678261
[15] link.springer.com/chapter/10.1007%2F978-3-642-31125-3_6