ãªãããã
èšäºã®ã¿ã€ãã«ãé©åããŸããã§ãã-ãããã®çµæã¯ã Graph500ã®è©äŸ¡ã«ãããšãã®ããã«èŠãªãããŸãã ãŸãã調æ»äžã«å®éšçãªæã¡äžããè¡ãããã«æäŸããããªãœãŒã¹ã«ã€ããŠãIBMãšRSCã«æè¬ããŸãã
ã¯ããã«
å¹ åªå æ€çŽ¢ïŒBFSïŒã¯ãäž»èŠãªã°ã©ããã©ããŒãµã«ã¢ã«ãŽãªãºã ã®1ã€ã§ãããå€ãã®é«ã¬ãã«ã°ã©ãåæã¢ã«ãŽãªãºã ã®åºæ¬ã§ãã ã°ã©ãå šäœã®æ€çŽ¢ã¯ãäžèŠåãªã¡ã¢ãªã¢ã¯ã»ã¹ãšäžèŠåãªããŒã¿äŸåæ§ãæã€ã¿ã¹ã¯ã§ãããæ¢åã®ãã¹ãŠã®ã¢ãŒããã¯ãã£ãžã®äžŠååãå€§å¹ ã«è€éã«ããŸãã ãã®èšäºã§ã¯ãããŸããŸãªã¢ãŒããã¯ãã£ïŒIntel x86ãIBM Power8 +ãIntel KNLãNVidia GPUïŒã§å€§èŠæš¡ãªã°ã©ããåŠçããããã®å¹ åªå æ€çŽ¢ã¢ã«ãŽãªãºã ïŒ Graph500è©äŸ¡ã®ã¡ã€ã³ãã¹ãïŒã®å®è£ ã«ã€ããŠèª¬æããŸãã å ±æã¡ã¢ãªã§ã®ã¢ã«ãŽãªãºã ã®å®è£ ã®ç¹åŸŽãããã³ã°ã©ãå€æã«ããããã¹ãŠã®ã·ã³ã°ã«ããŒãã¬ãŒãã£ã³ã°ã·ã¹ãã Graph500ããã³GreenGraph500ã§ãã®ã¢ã«ãŽãªãºã ã§èšé²çãªããã©ãŒãã³ã¹ãšãšãã«ã®ãŒå¹çã®ææšãéæã§ããããã«ãªããŸã ã
æè¿ãã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ïŒGPUïŒã¯ãéã°ã©ãã£ã«ã«ã³ã³ãã¥ãŒãã£ã³ã°ã§ãŸããŸãéèŠãªåœ¹å²ãæãããŠããŸãã ãããã®äœ¿çšã®å¿ èŠæ§ã¯ããããã®æ¯èŒçé«ãçç£æ§ãšäœã³ã¹ãã«ãããã®ã§ãã ãåç¥ã®ããã«ãGPUããã³äžå€®åŠçè£ çœ®ïŒCPUïŒã§ã¯ã䞊ååŠçãäœããã®æ¹æ³ã§ç°¡åã«åºå¥ãããæ§é çãªéåžžã®ã°ãªããã§åé¡ãååã«è§£æ±ºãããŸãã ãã ãã倧容éãå¿ èŠãšããéæ§é åã°ãªãããŸãã¯ããŒã¿ã䜿çšããã¿ã¹ã¯ããããŸãã ãã®ãããªã¿ã¹ã¯ã®äŸã¯æ¬¡ã®ãšããã§ããåäžã®æçãœãŒã¹ãã¹åé¡ïŒ SSSP ïŒ-éã¿ä»ãã°ã©ãå ã®ç¹å®ã®é ç¹ããä»ã®ãã¹ãŠãžã®æçãã¹ãèŠã€ããã¿ã¹ã¯ãå¹ åªå æ€çŽ¢ã¿ã¹ã¯ïŒBFS [1]ïŒ-ç¡åã°ã©ãã®å¹ åªå æ¢çŽ¢ã¿ã¹ã¯ãæå°ã¹ããã³ã°ããªãŒïŒäŸãã°ãMSTã å€éšããã³ç§ã®å®è£ ïŒ-匷ãé¢é£ããã³ã³ããŒãã³ããªã©ãèŠã€ããã¿ã¹ã¯ã
ãããã®ã¿ã¹ã¯ã¯ãå€ãã®ã°ã©ãã¢ã«ãŽãªãºã ã®åºæ¬ã§ãã çŸåšãBFSããã³SSSPã¢ã«ãŽãªãºã ã¯ãGraph500ããã³GreenGraph500ã®è©äŸ¡ã§ã³ã³ãã¥ãŒã¿ãŒãã©ã³ã¯ä»ãããããã«äœ¿çšãããŠããŸãã BFSïŒå¹ åªå æ€çŽ¢ãŸãã¯å¹ åªå æ€çŽ¢ïŒã¢ã«ãŽãªãºã ã¯ãæãéèŠãªã°ã©ãåæã¢ã«ãŽãªãºã ã®1ã€ã§ãã ç¹å®ã®ã°ã©ãå ã®ããŒãéã®æ¥ç¶ã®ããããã£ãååŸããããã«äœ¿çšãããŸãã åºæ¬çã«ãBFSã¯ãªã³ã¯ãšããŠäœ¿çšãããŸããããšãã°ãæ¥ç¶ã³ã³ããŒãã³ãã®æ€çŽ¢[2]ãæ倧ãããŒã®æ€çŽ¢[3]ãäžå¿ã³ã³ããŒãã³ãã®æ€çŽ¢ïŒäžéäžå¿æ§ïŒ[4ã5]ãã¯ã©ã¹ã¿ãªã³ã°[6]ãªã©ã®ã¢ã«ãŽãªãºã ã§äœ¿çšãããŸãã
BFSã¢ã«ãŽãªãºã ã®ç·åœ¢èšç®ã®è€éãã¯OïŒn + mïŒã§ããããã§ãnã¯é ç¹ã®æ°ãmã¯ã°ã©ãã®ãšããžã®æ°ã§ãã ãã®èšç®ã®è€éãã¯ãé 次å®è£ ã«æé©ã§ãã ãã ããé次å®è£ ïŒããšãã°ã ãã€ã¯ã¹ãã©ã¢ã«ãŽãªãºã ãäœ¿çš ïŒã«ã¯ããŒã¿ã®äŸåé¢ä¿ãããããã䞊ååã劚ããããããã®èšç®ã®è€éãã®è©äŸ¡ã¯äžŠåå®è£ ã«ã¯é©çšãããŸããã ãŸãããã®ã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ã¯ãç¹å®ã®ã¢ãŒããã¯ãã£ã®ã¡ã¢ãªããã©ãŒãã³ã¹ã«ãã£ãŠå¶éãããŸãã ãããã£ãŠããã¹ãŠã®ã¬ãã«ã®ã¡ã¢ãªã䜿çšããŠäœæ¥ãæ¹åããããšãç®çãšããæé©åãæãéèŠã§ãã
æ¢åã®ãœãªã¥ãŒã·ã§ã³ãšGraph500ã¬ãŒãã£ã³ã°ã®æŠèŠ
Graph500ããã³GreenGraph500
Graph500ã¬ãŒãã£ã³ã°ã¯ã Top500ã¬ãŒãã£ã³ã°ã®ä»£æ¿ãšããŠäœæãããŸããã ãã®è©äŸ¡ã¯ãåŸè ãšã¯ç°ãªããäžèŠåãªã¡ã¢ãªã¢ã¯ã»ã¹ã䜿çšããã¢ããªã±ãŒã·ã§ã³ã§ã³ã³ãã¥ãŒã¿ãŒãã©ã³ã¯ä»ãããããã«äœ¿çšãããŸãã Graph500ã¬ãŒãã£ã³ã°ã®ãã¹ã察象ã¢ããªã±ãŒã·ã§ã³ã§ã¯ãã¡ã¢ãªãšéä¿¡ã®åž¯åå¹ ãæãéèŠãªåœ¹å²ãæãããŸãã GreenGraph500è©äŸ¡ã¯Green500è©äŸ¡ã®ä»£æ¿ã§ããã Graph500ã«å ããŠäœ¿çšãããŸãã
Graph500ã¯ã¡ããªãã¯-ã°ã©ãã®åŠçããããšããžã®1ç§ãããã®æ°ïŒTEPS-èµ°æ»ããããšããž/ç§ïŒã䜿çšããGreenGraph500ã¯ã¡ããªãã¯-ã°ã©ãã®åŠçããããšããžã®1ç§ãããã®ã¯ããæ°ã䜿çšããŸãã ãããã£ãŠãæåã®è©äŸ¡ã§ã¯ã³ã³ãã¥ãŒã¿ãŒã®é床ãèšç®é床ã§ã©ã³ã¯ä»ãããã2çªç®ã®è©äŸ¡ã§ã¯ãšãã«ã®ãŒå¹çãã©ã³ã¯ä»ããããŸãã ãããã®è©äŸ¡ã¯6ãæããšã«æŽæ°ãããŸãã
æ¢åã®ãœãªã¥ãŒã·ã§ã³
å¹ åªå ã®æ€çŽ¢ã¢ã«ãŽãªãºã ã¯ã50幎以äžåã«äœãããŸããã ãŸããããŸããŸãªããã€ã¹ã§ã®å¹æçãªäžŠåå®è£ ã®ããã®ç 究ãé²è¡äžã§ãã ãã®ã¢ã«ãŽãªãºã ã¯ãã³ã³ãã¥ãŒã¿ãŒã®ã¡ã¢ãªããã³éä¿¡ç°å¢ã§ã®äœæ¥ãã©ãã ãããŸãæ§æãããŠãããã瀺ããŸãã x86ã·ã¹ãã [7-11]ããã³GPU [12-13]ã§ãã®ã¢ã«ãŽãªãºã ã䞊ååããäœæ¥ã¯æ°å€ããããŸãã ãŸããå®è£ ãããã¢ã«ãŽãªãºã ã®å®è£ ã®è©³çŽ°ãªçµæã¯ãGreen500ããã³GreenGraph500ã®è©äŸ¡ã§ç¢ºèªã§ããŸãã æ®å¿µãªãããå€ãã®å¹æçãªå®è£ ã®ã¢ã«ãŽãªãºã ã¯ãå€åœã®æ å ±æºã§ã¯å ¬éãããŠããŸããã
Graph500è©äŸ¡ã§åäžããŒãã·ã¹ãã ã®ã¿ãéžæããå Žåã次ã®è¡šã«ç€ºãããŠããããã€ãã®ããŒã¿ãååŸãããŸãã ãã®ããŒããŒã§èª¬æããçµæã¯å€ªåã§ããŒã¯ãããŠããŸãã ããŒãã«ã«ã¯ã2 25ãè¶ ããé ç¹ã®æ°ãæã€ã°ã©ããå«ãŸããŠããŸããã åŸãããããŒã¿ãããçŸæç¹ã§ã¯ããã®èšäºã§ææ¡ããããã1ã€ã®ããŒãã®ã¿ã䜿çšããå¹ççãªå®è£ ã¯ãªãããšãæããã§ãã ãã詳现ãªåæã«ã€ããŠã¯ããçµæã®åæãã»ã¯ã·ã§ã³ãåç §ããŠãã ããã
åœ¹è· | ã·ã¹ãã | ãµã€ãº2 N | Gteps | ã¯ãã |
---|---|---|---|---|
50 | GPU NVidia Tesla P100 | 26 | 204 | 175 |
67 | GPU NVidia GTXã¿ã€ã¿ã³ | 25 | 114 | 212 |
76 | 4 x Intel Xeon E7-4890 v2 | 32 | 55.9 | 1153 |
86 | GPU NVidia Tesla P100 | 30 | 41.7 | 235 |
103 | GPU NVidia GTXã¿ã€ã¿ã³ | 25 | 17.2 | 233.8 |
104 | Intel Xeon E5 2699 v3 | 30 | 16.3 | 145 |
106 | 4åºã®AMD Radeon R9 Nano GPU | 25 | 15.8 | |
112 | IBM POWER8 + | 30 | 13.2 | 200 |
ã°ã©ãä¿å圢åŒ
éæåRMATã°ã©ãã¯ãBFSã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«äœ¿çšãããŸãã RMATã°ã©ãã¯ããœãŒã·ã£ã«ãããã¯ãŒã¯ãã€ã³ã¿ãŒãããããã®å®éã®ã°ã©ããããŸãã¢ãã«åããŠããŸãã ãã®å Žåãé ç¹16ã®å¹³åé£çµåºŠãæã€RMATã°ã©ããæ€èšããé ç¹ã®æ°ã¯2ã®ã¹ãä¹ã§ãã ãã®ãããªRMATã°ã©ãã«ã¯ã1ã€ã®å€§ããªæ¥ç¶ã³ã³ããŒãã³ããšãå€æ°ã®å°ããªæ¥ç¶ã³ã³ããŒãã³ããŸãã¯ã¶ãäžããé ç¹ããããŸãã ã³ã³ããŒãã³ãã匷åã«æ¥ç¶ãããŠãããããã°ã©ãããã£ãã·ã¥ã¡ã¢ãªã«åãŸããµãã°ã©ãã«åå²ããããšã¯ã§ããŸããã
ã°ã©ããäœæããã«ã¯ãGraph500è©äŸ¡ã®éçºè ã«ãã£ãŠæäŸããããžã§ãã¬ãŒã¿ãŒã䜿çšãããŸãã ãã®ãžã§ãã¬ãŒã¿ãŒã¯ãRââMAT圢åŒã§ç¡åã°ã©ããäœæããåºåã¯ã°ã©ããšããžã®ã»ãããšããŠè¡šç€ºãããŸãã ãã®ãããªåœ¢åŒã¯ãåé ç¹ã®éçŽæ å ±ãã€ãŸãã©ã®é ç¹ãäžããããé ç¹ã®è¿åã§ããããå¿ èŠãªãããã°ã©ãã¢ã«ãŽãªãºã ã®å¹ççãªäžŠåå®è£ ã«ã¯ããŸã䟿å©ã§ã¯ãããŸããã ãã®ãã¬ãŒã³ããŒã·ã§ã³ã«äŸ¿å©ãªåœ¢åŒã¯ãCSRïŒCompressed Sparse RowsïŒãšåŒã°ããŸãã
ãã®åœ¢åŒã¯ãçè¡åãšã°ã©ãã®ä¿åã«åºã䜿çšãããŠããŸãã Nåã®é ç¹ãšMåã®ãšããžãæã€ç¡åã°ã©ãã®å ŽåãXïŒé£æ¥ããé ç¹ãžã®ãã€ã³ã¿ãŒã®é åïŒãšAïŒé£æ¥ããé ç¹ã®ãªã¹ãã®é åïŒã®2ã€ã®é åãå¿ èŠã§ãã é åXã®ãµã€ãºã¯N + 1ã§ãé åAã¯2 * Mã§ããããã¯ãä»»æã®é ç¹ãã¢ã®ç¡åã°ã©ãã§ã¯ãçŽæ¥ã¢ãŒã¯ãšéã¢ãŒã¯ãä¿åããå¿ èŠãããããã§ãã é åXã¯ãé åAã«ããé£æ¥ãªã¹ãã®å é ãšæ«å°Ÿãæ ŒçŽããŸããã€ãŸããé ç¹Jã®é£æ¥ãªã¹ãå šäœã¯ãã€ã³ããã¯ã¹X [J]ããX [J + 1]ãŸã§ã®é åAã«ãããŸãã 説æã®ããã«ãäžã®å³ã¯ãå·ŠåŽã«é£æ¥è¡åã䜿çšããŠèšè¿°ããã4ã€ã®é ç¹ã®ã°ã©ããšãå³åŽã«CSR圢åŒã§ç€ºãããŠããŸãã
ã°ã©ããCSR圢åŒã«å€æããåŸãã³ã³ãã¥ãŒãã£ã³ã°ããã€ã¹ã®ãã£ãã·ã¥ãšã¡ã¢ãªã®å¹çãåäžãããã«ã¯ãå
¥åã°ã©ãã§ããã«äœæ¥ãè¡ãå¿
èŠããããŸãã 以äžã«èª¬æããå€æãå®è¡ããåŸãã°ã©ãã¯åãCSR圢åŒã®ãŸãŸã§ãããå®è¡ãããå€æã«é¢é£ããããã€ãã®ããããã£ãååŸããŸãã
å°å ¥ãããå€æã«ãããCSR圢åŒã®ã»ãšãã©ã®ã°ã©ããã©ããŒãµã«ã¢ã«ãŽãªãºã ã«æé©ãªåœ¢åŒã§ã°ã©ããæ§ç¯ã§ããŸãã ã°ã©ãã«æ°ããé ç¹ãè¿œå ããŠãããã¹ãŠã®å€æãæ°ãã«å®è¡ãããããã§ã¯ãããŸãããã«ãŒã«ã«åŸã£ãŠé ç¹ãè¿œå ããã°ãé ç¹ã®äžè¬çãªé åºã«éåããããšã¯ãããŸããã
é ç¹ã®ãªã¹ããããŒã«ã«ã§äžŠã¹æ¿ãã
åé ç¹ã«ã€ããŠãé£æ¥ãªã¹ããå¢ãããŠãœãŒãããŸãã 䞊ã¹æ¿ãããŒãšããŠã䞊ã¹æ¿ããããåé ç¹ã®è¿åã®æ°ã䜿çšããŸãã ãã®äžŠã¹æ¿ããå®è¡ããåŸãåé ç¹ã§è¿åã®ãªã¹ããã¯ããŒã«ããããšã«ãããæåã«æãéãé ç¹ïŒå€æ°ã®è¿åãæã€é ç¹ïŒãåŠçããŸãã ãã®ãœãŒãã¯ãé ç¹ããšã«ç¬ç«ããŠäžŠè¡ããŠå®è¡ã§ããŸãã ãã®ãœãŒããå®è¡ããåŸãã¡ã¢ãªå ã®ã°ã©ãã®é ç¹ã®æ°ã¯å€ãããŸããã
é ç¹ãªã¹ãã®ã°ããŒãã«ãœãŒã
ã°ã©ãã®ãã¹ãŠã®é ç¹ã®ãªã¹ãã«ã€ããŠã¯ãæé ã§äžŠã¹æ¿ããŸãã ããŒãšããŠãåé ç¹ã®è¿åæ°ã䜿çšããŸãã ããŒã«ã«ãœãŒããšã¯ç°ãªãããªã¹ãå ã®é ç¹ã®äœçœ®ãå€ããããããã®ãœãŒãã§ã¯çµæã®é ç¹ã®çªå·ãä»ãçŽãå¿ èŠããããŸãã ãœãŒãæé ã¯è€é床OïŒN * logïŒNïŒïŒã§ãããé çªã«å®è¡ãããŸãããé ç¹ã®çªå·ãå€æŽããæé ã¯äžŠè¡ããŠå®è¡ã§ããã¡ã¢ãªã®ããã»ã¯ã·ã§ã³ãå¥ã®ã»ã¯ã·ã§ã³ã«ã³ããŒããã®ã«ããã£ãæéãšé床ãåçã§ãã
ã°ã©ãã®ãã¹ãŠã®é ç¹ã®çªå·ãä»ãçŽã
æãæ¥ç¶ãããé ç¹ãæãè¿ãçªå·ãæã€ããã«ãã°ã©ãã®é ç¹ã«çªå·ãä»ããŸãã ãã®æé ã¯æ¬¡ã®ããã«æ§æãããŠããŸãã æåã«ãçªå·ãä»ãçŽãããã«ãªã¹ãããæåã®é ç¹ãååŸãããŸãã çªå·0ãååŸããŸãã次ã«ãåé¡ã®é ç¹ãæã€ãã¹ãŠã®é£æ¥ããé ç¹ãçªå·ãä»ãçŽãããã«ãã¥ãŒã«è¿œå ãããŸãã åçªå·ä»ããªã¹ãã®æ¬¡ã®é ç¹ã¯ãçªå·1ãååŸããŸãã ãã®æäœã®çµæãæ¥ç¶ãããåã³ã³ããŒãã³ãã§ãæ倧é ç¹æ°ãšæå°é ç¹æ°ã®å·®ãæå°ã«ãªããã³ã³ãã¥ãŒãã£ã³ã°ããã€ã¹ã®å°ããªãã£ãã·ã¥ãµã€ãºãæ倧éã«æŽ»çšã§ããŸãã
ã¢ã«ãŽãªãºã ã®å®è£
ç¡åã°ã©ãã®å¹ åªå æ¢çŽ¢ã¢ã«ãŽãªãºã ã¯ã次ã®ããã«æ§æãããŠããŸãã å ¥åã§ã¯ãã°ã©ãå ã®æªå®çŸ©ã®é ç¹ã®åæé ç¹ïŒæ€çŽ¢ã®ã«ãŒãé ç¹ïŒãæäŸãããŸãã ã¢ã«ãŽãªãºã ã¯ãã«ãŒãé ç¹ããéå§ããŠãã°ã©ãå ã®åé ç¹ãäœçœ®ããã¬ãã«ã決å®ããå¿ èŠããããŸãã ã¬ãã«ã¯ãã«ãŒãé ç¹ããã«ãŒããšã¯ç°ãªãé ç¹ã«å°éããããã«å æããªããã°ãªããªããšããžã®æå°æ°ãæããŸãã ãŸããã«ãŒããé€ãåé ç¹ã«ã€ããŠã芪ã®é ç¹ã決å®ããå¿ èŠããããŸãã 1ã€ã®é ç¹ã«è€æ°ã®èŠªé ç¹ãå«ããããšãã§ããããããããã®ãããããåçãšããŠåãå ¥ããããŸãã
å¹ åªå æ€çŽ¢ã¢ã«ãŽãªãºã ã«ã¯ããã€ãã®å®è£ ããããŸãã æãå¹æçãªå®è£ ã¯ãã¬ãã«åæã䜿çšããå埩ã°ã©ããã©ããŒãµã«ã§ãã åã¹ãããã¯ãã¬ãã«Jã®æ å ±ãã¬ãã«J + 1ã«è»¢éãããã¢ã«ãŽãªãºã ã®å埩ã§ãã ã·ãŒã±ã³ã·ã£ã«ã¢ã«ãŽãªãºã ã®æ¬äŒŒã³ãŒããããã«ç€ºããŸã ã
䞊åå®è£ ã¯ããã®èšäºã®èè ã«ãã£ãŠææ¡ããããããããŠã³ïŒTDïŒããã³ããã ã¢ããïŒBUïŒããã·ãŒãžã£ã§æ§æããããã€ããªããã¢ã«ãŽãªãºã ã«åºã¥ããŠããŸã[11]ã ãã®ã¢ã«ãŽãªãºã ã®æ¬è³ªã¯æ¬¡ã®ãšããã§ãã TDããã·ãŒãžã£ã䜿çšãããšãã°ã©ãã®é ç¹ãçŽæ¥ã®é åºã§ç§»åã§ããŸããã€ãŸããé ç¹ã䞊ã¹æ¿ããé¢ä¿ãèæ ®ããŸãã 芪åãšããŠã 2çªç®ã®æé BUã䜿çšãããšãé ç¹ãéã®é åºã§ãã€ãã¹ã§ããŸããã€ãŸããé ç¹ã䞊ã¹æ¿ããŠãé¢ä¿ãèæ ®ããŸãã 芪ã®åå«ãšããŠã
TD-BUãã€ããªããã¢ã«ãŽãªãºã ã®é 次å®è£ ãæ€èšããŠãã ããããã®æ¬äŒŒã³ãŒãã以äžã«ç€ºããŸãã é ç¹ã°ã©ããåŠçããã«ã¯ãçŸåšã®ã¬ãã«ã®é ç¹ã®ã»ãã-Q curr ãããã³æ¬¡ã®ã¬ãã«ã®é ç¹ã®ã»ãã-Q nextãå«ã2ã€ã®è¿œå ã®ãã¥ãŒé åãäœæããå¿ èŠããããŸãã ãã¥ãŒå ã®é ç¹ã®ååšãããé«éã«ãã§ãã¯ããã«ã¯ã蚪åããé ç¹ã®é åãå ¥åããå¿ èŠããããŸãã ããããã¢ã«ãŽãªãºã ã®äœæ¥ã®çµæãšããŠãåé ç¹ãã©ã®ã¬ãã«ã«ãããã«ã€ããŠã®æ å ±ãååŸããå¿ èŠãããããããã®é åã¯ã蚪åããããŒã¯ãããé ç¹ã®ã€ã³ãžã±ãŒã¿ãšããŠã䜿çšã§ããŸãã
ã·ãªã¢ã«BFSãã€ããªããã¢ã«ãŽãªãºã ïŒ
void bfs_hybrid(G, N, M, Vstart) { Levels <- (-1); Parents <- (N + 1); Qcurr+=Vstart; CountQ=lvl=1; while (CountQ) { CountQ = 0; vis = 0; inLvl = 0; if (state == TD) foreach Vi in Qcurr foreach Vk in G.Edges(Vi) inLvl++; if (Levels[Vk] == -1) Qnext += Vk; Levels[Vk] = lvl; Parents[Vk] = Vi; vis++; else if (state == BU) foreach Vi in G if (Levels[Vi] == -1) foreach Vk in G.Edges(Vi) inLvl++; if (Levels[Vk] == lvl - 1) Qnext += Vi; Levels[Vi] = lvl; Parents[Vi] = Vk; vis++; break; change_state(Qcurr, Qnext, vis, inLvl, G); Qcurr <- Qnext; CountQ = Qnext.size(); } }
ã¢ã«ãŽãªãºã ã®ã¡ã€ã³ã«ãŒãã¯ãQ currãã¥ãŒå ã®åé ç¹ã®é 次åŠçã§æ§æãããŸãã Q currãã¥ãŒã«é ç¹ãæ®ã£ãŠããªãå Žåãã¢ã«ãŽãªãºã ã¯åæ¢ããå¿çãåä¿¡ãããŸãã
ãã¥ãŒã«ã¯é ç¹ã1ã€ããå«ãŸããŠããªããããã¢ã«ãŽãªãºã ã®æåã®æ®µéã§TDããã·ãŒãžã£ãæ©èœãå§ããŸãã TDããã·ãŒãžã£ã§ã¯ããã¥ãŒQ currããã®åé ç¹V iã«ã€ããŠããã®é ç¹V kã®é£æ¥ãªã¹ãã調ã¹ããŸã 蚪åæžã¿ãšããŠããŒã¯ãããŠããªããã®ããã¥ãŒQã«è¿œå ããŸãã ãŸãããã®ãããªé ç¹V kã¯ãã¹ãŠãçŸåšã®ã¬ãã«ã®çªå·ãšèŠªé ç¹V iãåãåããŸãã Q currãã¥ãŒãããã¹ãŠã®é ç¹ã衚瀺ããåŸã次ã®ç¶æ ãéžæããæé ãéå§ãããŸããããã¯ã次ã®å埩ã®ããã«TDæé ã«æ®ãããæé ãBUã«å€æŽããããšãã§ããŸãã
BUããã·ãŒãžã£ã§ã¯ãQ currãã¥ãŒããã§ã¯ãªãããŸã ããŒã¯ãããŠããªãé ç¹ã調ã¹ãŸãã ãã®æ å ±ã¯ã¬ãã«ã®é åã«å«ãŸããŠããŸãã ãã®ãããªé ç¹V iããŸã ã©ãã«ä»ããããŠããªãå Žåããã®ãã¹ãŠã®è¿åV kãééããV iã®èŠªã§ãããããã®é ç¹ãåã®ã¬ãã«ã«ããå Žåãé ç¹V iã¯ãã¥ãŒQ nextã«å ¥ããŸãã TDããã·ãŒãžã£ãšã¯ç°ãªãããã®ããã·ãŒãžã£ã§ã¯ã芪é ç¹ãèŠã€ããã ãã§ååãªã®ã§ãé£æ¥ããé ç¹V kã®è¡šç€ºãäžæã§ããŸãã
TDããã·ãŒãžã£ã®ã¿ã§æ€çŽ¢ããå Žåãã¢ã«ãŽãªãºã ã®æåŸã®å埩ã§ã¯ãåŠçããå¿ èŠã®ããé ç¹ã®ãªã¹ããéåžžã«å€§ãããªããããŒã¯ãããŠããªãé ç¹ãããªãå€ããªããŸãã ãããã£ãŠãããã·ãŒãžã£ã¯äžèŠãªã¢ã¯ã·ã§ã³ãšäžèŠãªã¡ã¢ãªã¢ã¯ã»ã¹ãå®è¡ããŸãã BUããã·ãŒãžã£ã®ã¿ã§æ€çŽ¢ããå Žåãã¢ã«ãŽãªãºã ã®æåã®å埩ã§æªå²ãåœãŠã®é ç¹ãå€ããªããTDããã·ãŒãžã£ãšåæ§ã«ãè¿œå ã®ã¢ã¯ã·ã§ã³ãšè¿œå ã®ã¡ã¢ãªã¢ã¯ã»ã¹ãå®è¡ãããŸãã
æåã®æé ã¯BFSã¢ã«ãŽãªãºã ã®æåã®å埩ã§æå¹ã§ããã2çªç®ã®æé ã¯æåŸã«æå¹ã§ããããšãããããŸãã äž¡æ¹ã®æé ã䜿çšãããšãæ倧ã®å¹æãåŸãããããšã¯æããã§ãã ããããã·ãŒãžã£ããå¥ã®ããã·ãŒãžã£ã«åãæ¿ããå¿ èŠãããææãèªåçã«å€æããããã«ããã®èšäºã®èè ã«ãã£ãŠææ¡ãããã¢ã«ãŽãªãºã ïŒchange_stateããã·ãŒãžã£ïŒã䜿çšããŸã[11]ã ãã®ã¢ã«ãŽãªãºã ã¯ã2ã€ã®é£æ¥ããå埩ã§åŠçãããé ç¹ã®æ°ã«é¢ããæ å ±ã«åºã¥ããŠããã©ããŒã¹ã®åäœã®æ§è³ªãç解ããããšããŸãã ãã®ã¢ã«ãŽãªãºã ã¯ãåŠçãããã°ã©ãã«å¿ããŠã1ã€ã®æé ããå¥ã®æé ãžã®åãæ¿ããæ§æã§ãã2ã€ã®ä¿æ°ãå°å ¥ããŸãã
ç¶æ é·ç§»æé ã¯ãTDãBUã«å€æããã ãã§ãªããBUãTDã«æ»ãããšãã§ããŸãã ç¶æ ã®æåŸã®å€æŽã¯ã衚瀺ããå¿ èŠãããé ç¹ã®æ°ãéåžžã«å°ãªãå Žåã«åœ¹ç«ã¡ãŸãã ãã®ããã«ãããŒã¯ä»ãããŒã¯ãšããŒã¯ãªãããŒã¯ã®ç«ã¡äžããããã³ããšæžè¡°ããã³ãã®æŠå¿µãå°å ¥ãããŠããŸãã 以äžã«ç€ºã次ã®æ¬äŒŒã³ãŒãã¯ã調æŽãããã¢ã«ãã¡ããã³ããŒã¿ä¿æ°ã«å¿ããŠãã°ã©ããã©ããŒãµã«ã®ç¹å®ã®å埩ã§åŸãããç¹æ§ã«å¿ããŠç¶æ ã®å€æŽãå®è¡ããŸãã ãã®é¢æ°ã¯ã èŠå ã«å¿ããŠä»»æã®å ¥åã°ã©ãã§æ§æã§ããŸãïŒèŠå ãšã¯ãã°ã©ãã®é ç¹ã®å¹³åæ¥ç¶æ§ãæããŸãïŒã
ç¶æ å€æŽæ©èœïŒ
state change_state(Qcurr, Qnext, vis, inLvl, G) { new_state = old_state; factor = GM / GN / 2; if(Qcurr.size() < Qnext.size()) // Growing phase if(old_state == TOP_DOWN) if(inLvl < ((GN - vis) * factor + GN) / alpha) new_state = TOP_DOWN; else new_state = BOTTOM_UP; else // Shrinking phase if (Qnext.size() < ((GN - vis) * factor + GN) / (factor * beta)) new_state = TOP_DOWN; else new_state = BOTTOM_UP; return new_state; }
äžèšã®BFSã¢ã«ãŽãªãºã ã®ãã€ããªããå®è£ ã®æŠå¿µã¯ããã®ãããªã·ã¹ãã ã®CPUãšGPUã®äž¡æ¹ã®äžŠåå®è£ ã«äœ¿çšãããŸããã ãã ããåŸã§èª¬æããããã€ãã®éãããããŸãã
å ±æã¡ã¢ãªäžã®CPUã§ã®äžŠåå®è£
CPUã·ã¹ãã ïŒPower 8 +ãIntel KNLãããã³Intel x86ïŒã®äžŠåå®è£ ã¯ã OpenMPã䜿çšããŠå®è¡ãããŸããã åãã³ãŒããå®è¡ã«äœ¿çšãããŸããããåãã©ãããã©ãŒã ã«ã¯OpenMPãã£ã¬ã¯ãã£ãã®ç¬èªã®èšå®ããããŸãããããšãã°ãã¹ã¬ããéã§ã®è² è·åæ£ã®ç°ãªãã¢ãŒãïŒã¹ã±ãžã¥ãŒã«ïŒãèšå®ãããŸããã OpenMPã䜿çšããŠCPUã«å®è£ ããããã«ãå¥ã®ã°ã©ãå€æãå®è¡ã§ããŸããã€ãŸããè¿é£ã®é ç¹ã®ãªã¹ããå§çž®ããŸãã
å§çž®ã¯ãé åAããåèŠçŽ ã®éèŠã§ãªããŒãããããåé€ããããšã§æ§æããããã®å€æã¯åç¯å²[X i ãX i + 1 ïŒã«å¯ŸããŠåå¥ã«è¡ãããŸãã é åAã®èŠçŽ ã¯å§çž®ããããã®å§çž®ã«ããã2 30é ç¹ãš2 34ãšããžã®ãªãŒããŒã®å€§ããªã°ã©ãã®å Žåãã°ã©ãã®ä¿åã«äœ¿çšãããã¡ã¢ãªéãçŽ30ïŒ åæžãããŸãã å°ããã°ã©ãã®å Žåãã°ã©ãã®æ倧é ç¹æ°ãå ãããããæ°ãæžå°ããããããã®ãããªå€æã«ããç¯çŽã¯æ¯äŸããŠå¢å ããŸãã
ãã®ãããªã°ã©ãå€æã¯ãé ç¹åŠçã«ããã€ãã®å¶éã課ããŸãã ãŸãããã¹ãŠã®é£æ¥ããé ç¹ã¯å§çž®ãããç¹å®ã®æ¹æ³ã§èŠçŽ ã®ã·ãŒã±ã³ã¹ã«ãšã³ã³ãŒããããŠãããããé çªã«åŠçããå¿ èŠããããŸãã 第äºã«ãå§çž®èŠçŽ ã解åããããã«è¿œå ã®æé ãå®è¡ããå¿ èŠããããŸãã ãã®æé ã¯ç°¡åã§ã¯ãªããPower8 + CPUã®å Žåãå¹æãåŸãããšãã§ããŸããã§ããã ãã®çç±ã¯ãã³ã³ãã€ã©ãŒã®æé©åãäžååã§ããããIntelãšã¯ç°ãªãããŒããŠã§ã¢ã§ããå¯èœæ§ããããŸãã
ã¢ã«ãŽãªãºã ã®1ã€ã®å埩ã䞊è¡ããŠå®è¡ããã«ã¯ãOpenMPã¹ããªãŒã ããšã«ç¬èªã®Q th_nextãã¥ãŒãäœæããå¿ èŠããããŸãã ãããŠããã¹ãŠã®ã«ãŒããå®äºããåŸãåä¿¡ãããã¥ãŒãããŒãžããŸãã ãŸããçž®çŽäŸåããããã¹ãŠã®å€æ°ãããŒã«ã©ã€ãºããå¿ èŠããããŸãã æé©åãšããŠãQ currãã¥ãŒå ã®é ç¹ã®æ°ãæå®ã®ãããå€ïŒããšãã°ã300æªæºïŒæªæºã®å ŽåãTDããã·ãŒãžã£ã¯ã·ãŒã±ã³ã·ã£ã«ã¢ãŒãã§å®è¡ãããŸãã ããŸããŸãªãµã€ãºã®ã°ã©ãã®å Žåãããã³ã¢ãŒããã¯ãã£ã«ãäŸåããããããã®ãããå€ã«ã¯ããŸããŸãªå€ãèšå®ã§ããŸãã TDããã·ãŒãžã£ãŒã®å Žåã¯ã«ãŒãïŒQcurrã®Foreach ViïŒãBUããã·ãŒãžã£ãŒã®å Žåã¯ïŒQcurã®Vi ForeachïŒã«ãŒãã®åã«äžŠåãã£ã¬ã¯ãã£ããé 眮ãããŸããã
GPU䞊åå®è£
GPUã®äžŠåå®è£ ã¯ã CUDAãã¯ãããžãŒã䜿çšããŠå®è¡ãããŸããã ç¹å®ã®ããã·ãŒãžã£ã®å®è¡äžã«ããŒã¿ã«ã¢ã¯ã»ã¹ããããã®ã¢ã«ãŽãªãºã ãå€§å¹ ã«ç°ãªããããGPUã䜿çšããå ŽåãTDããã³BUããã·ãŒãžã£ã®å®è£ ã¯å€§å¹ ã«ç°ãªããŸãã
TDããã·ãŒãžã£ã¯ã CUDAåçåæå®è¡æ§ã䜿çšããŠå®è£ ãããŸããã ãã®æ©èœã«ãããè² è·åæ£ã«é¢é£ããããã€ãã®äœæ¥ãGPUæ©åšã«è»¢éã§ããŸãã ãã¥ãŒQ currããã®åé ç¹V iã«ã¯ã以åã¯æªç¥ã®æ°ã®ãã€ããŒã«å¯ŸããŠå®å šã«ç°ãªããã®ãå«ããããšãã§ããŸãã ãã®ãããCUDAã§ã¯åºå®ãµã€ãºã®ã¹ã¬ããã®ãããã¯ã䜿çšã§ãããããäžé£ã®ã¹ã¬ããã§ãµã€ã¯ã«å šäœãçŽæ¥è¡šç€ºãããšã匷ãè² è·ã®äžåè¡¡ãçºçããŸãã
説æãããŠããåé¡ã¯ãåç䞊è¡æ§ã䜿çšããããšã§è§£æ±ºãããŸãã æåã«ããã¹ã¿ãŒã¹ã¬ãããèµ·åãããŸãã Q currãã¥ãŒã«ããé ç¹ãšåãæ°ã®ã¹ã¬ããããããŸãã 次ã«ãåãã¹ã¿ãŒã¹ã¬ããã¯ãé ç¹V iã«é£æ¥ããã¹ã¬ãããšåãæ°ã®è¿œå ã®ã¹ã¬ãããå®è¡ããŸãã ãããã£ãŠã䜿çšãããã¹ã¬ããã®æ°ã¯ãå ¥åããŒã¿ã«å¿ããŠãããã°ã©ã ã®å®è¡äžã«ãã€ããã¯ã¹ã§åœ¢æãããŸãã
ãã®æé ã¯ãåç䞊ååŠçã䜿çšããå¿ èŠããããããGPUã§ã®å®è¡ã«ã¯äžäŸ¿ã§ãã ãã¹ã¿ãŒã¹ã¬ããããäœæããå¿ èŠãããã¹ã¬ããã®ç·æ°ãå€ããšããªãŒããŒãããã倧ãããªããŸãã ãããã£ãŠãBUããã·ãŒãžã£ãžã®åãæ¿ãã¯ãCPUãããæ©ãå®è¡ãããŸãã
è¿œå ã®ããŒã¿å€æãè¡ãå ŽåãBUããã·ãŒãžã£ã¯GPUã§ã®å®è¡ã«é©ããŠããŸãã ãã®æé ã¯ãã°ã©ãã®é£ç¶ãããã¹ãŠã®é ç¹ã§ééãå®è¡ããããšããç¹ã§ãTDæé ãšã¯å€§ããç°ãªããŸãã ãããã£ãŠãçµç¹åããããµã€ã¯ã«ã«ãããã¡ã¢ãªãžã®é©åãªã¢ã¯ã»ã¹ã®ããã®ããŒã¿æºåãå®è¡ã§ããŸãã
å€æã¯æ¬¡ã®ãšããã§ãã 1ã€ã®ã¯ãŒãã®é£æ¥ããã¹ã¬ãããåœä»€ãåæçãã€äžŠåã«å®è¡ããããšãç¥ãããŠããŸãã ãŸããå¹æçãªã¡ã¢ãªã¢ã¯ã»ã¹ã§ã¯ãã¯ãŒãå ã®é£æ¥ã¹ã¬ãããã¡ã¢ãªå ã®é£æ¥ã»ã«ã«ã¢ã¯ã»ã¹ããå¿ èŠããããŸãã ããšãã°ãã¯ãŒãã®ã¹ã¬ããæ°ã2ã«èšå®ããŸããåã¹ã¬ãããã«ãŒãã®1ã€ã®é ç¹ã«é¢é£ä»ããããŠããå ŽåïŒGã®ViããšïŒãã«ãŒãå ã®ãã€ããŒã®åŠçäžïŒGã®VkããšããšããžïŒViïŒïŒãåã¹ã¬ããã¯ã¡ã¢ãªé åã«ç§»åããŸããé£æ¥ããã¹ã¬ãããã¡ã¢ãªå ã®ééã®åºãã»ã«ãåŠçãããããããã©ãŒãã³ã¹ã«æªåœ±é¿ãåãŒããŸãã ç¶æ³ãä¿®æ£ããã«ã¯ãé åAã®èŠçŽ ãæ··åããŠãV 0ãšV 1ã®æåã®2ã€ã®é£æ¥èŠçŽ ã«æé©ãªæ¹æ³ã§ã¢ã¯ã»ã¹ããŸããé£æ¥èŠçŽ ã¯ã¡ã¢ãªå ã®é£æ¥ã»ã«ã«é 眮ãããŸãã ããã«ã2çªç®ã3çªç®ãªã©ãåæ§ã«ã¡ã¢ãªå ã«ãããŸãã èŠçŽ ã
ãã®æŽåã«ãŒã«ã¯ãã¯ãŒãã¹ã¬ããã®ã°ã«ãŒãïŒ32ã¹ã¬ããïŒã«é©çšãããŸããé£æ¥ããã¹ã¬ãããæ··åãããŠããŸããæåã®32åã®èŠçŽ ãæåã«é 眮ããã次ã«2çªç®ã®32åã®èŠçŽ ãé 眮ãããŸãã ã°ã©ãã¯è¿é£ã®æ°ã®éé ã§äžŠã¹æ¿ããããŠãããããé£æ¥ããé ç¹ã®ã°ã«ãŒãã¯ãè¿é£ã®é ç¹ã®æ°ãããªãè¿ããªããŸãã
BUããã·ãŒãžã£ã®æäœäžãå éšã«ãŒãã«æ©æçµäºããããããã°ã©ãå šäœã§ãã®æ¹æ³ã§èŠçŽ ãæ··åšãããããšã¯ã§ããŸããã äžèšã®ã°ããŒãã«ããã³ããŒã«ã«ã®é ç¹ãœãŒãã«ããããã®ãµã€ã¯ã«ãããªãæ©ãçµäºã§ããŸãã ãããã£ãŠãã°ã©ãã®ãã¹ãŠã®é ç¹ã®40ïŒ ã®ã¿ãæ··åãããŸãã ãã®å€æã§ã¯ãæ··åã°ã©ããä¿åããããã«è¿œå ã®ã¡ã¢ãªãå¿ èŠã«ãªããŸãããããã©ãŒãã³ã¹ãå€§å¹ ã«åäžããŸãã 以äžã¯ãã¡ã¢ãªå ã®èŠçŽ ã®æ··åé 眮ã䜿çšããBUããã·ãŒãžã£ã®ã«ãŒãã«æ¬äŒŒã³ãŒãã§ãã
BUæé ã®äžŠåã«ãŒãã«ïŒ
__global__ void bu_align( ... ) { idx = blockDim.x * blockIdx.x + threadIdx.x; countQNext = 0; inlvl = 0; for(i = idx; i < N; i += stride) if (levels[i] == 0) start_k = rowsIndices[i]; end_k = rowsIndices[i + N]; for(k = start_k; k < start_k + 32 * end_k; k += 32) inlvl++; vertex_id_t endk = endV[k]; if (levels[endk] == lvl - 1) parents[i] = endk; levels_out[i] = lvl; countQNext++; break; atomicAdd(red_qnext, countQNext); atomicAdd(red_lvl, inlvl); }
çµæã®åæ
ãã¹ããããããã°ã©ã ã¯ãIntel Xeon PhiïŒXeon KNL 7250ïŒãIntel x86ïŒXeon E5 2699 v3ïŒãIBM Power8 +ïŒPower 8+ s822lcïŒãããã³NVidia Tesla P100 GPUã®4ã€ã®ç°ãªããã©ãããã©ãŒã ã§ããã«å®è¡ãããŸããã 察象ã®ãããã®ããã€ã¹ã®ç¹æ§ãæ¯èŒããŸã
äžã®è¡šã
次ã®ã³ã³ãã€ã©ã䜿çšãããŸããã Intel Xeon E5ã®å Žå-gcc 5.3ãIntel KNLã®å Žå-icc v17ãIBMã®å Žå-gccã®Powerã¢ãŒããã¯ãã£ãŒãNVidiaã®å Žå-CUDA 9.0ã
ä»å ¥å | ã³ã¢/ã¹ã¬ãã | åšæ³¢æ°ãGHz | RAMãGB / s | ããã¯ã¹ TDPãW | TransããBillion |
---|---|---|---|---|---|
Xeon KNL 7250 | 68/272 | 1.4 | 115/400 | 215 | ã8 |
Xeon E5 2699 v3 | 18/36 | 2.3 | 68 | 145 | ã5.69 |
Power 8+ s822lc | 10/80 | 3.5 | 205 | 270 | ã6 |
ãã¹ã©P100 | 56/3584 | 1.4 | 40/700 | 300 | ã15.3 |
æè¿ãã¡ãŒã«ãŒã¯ã¡ã¢ãªåž¯åå¹ ã«ã€ããŠãŸããŸãèããŠããŸãã ãã®çµæãRAMãžã®äœéã¢ã¯ã»ã¹ã®åé¡ã«å¯ŸããããŸããŸãªè§£æ±ºçãçŸããŸãã æ€èšäžã®ãã©ãããã©ãŒã ã®ãã¡ã2ã€ã«ã¯2ã¬ãã«ã®RAMæ§é ããããŸãã
ãããã®æåã®-ã€ã³ãã«KNLã¯ãã¢ã¯ã»ã¹é床ãçŽ400 GB / sã®ãããäžã®é«éã¡ã¢ãªãšãã¢ã¯ã»ã¹é床ã115 GB / s以äžã®ããé ãããªãã¿ã®DDR4ãå«ãã§ããŸãã é«éã¡ã¢ãªã®ãµã€ãºã¯ããªãå°ããã16GBãããããŸããããéåžžã®ã¡ã¢ãªã¯æ倧384GBã§ãã ãã¹ããããµãŒããŒã«96 GBã®ã¡ã¢ãªãã€ã³ã¹ããŒã«ãããŸããã 2çªç®ã®ãã€ããªãããœãªã¥ãŒã·ã§ã³ãã©ãããã©ãŒã ã¯ãPower + NVidia Teslaã§ãã ãã®ãœãªã¥ãŒã·ã§ã³ã¯æ°ããNVlinkãã¯ãããžãŒã«åºã¥ããŠãããéåžžã®ã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ã¯40 GB /ç§ã®é床ã§ãé«éã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ã¯700 GB /ç§ã®é床ã§å®è¡ãããŸãã é«éã¡ã¢ãªã®éã¯ãIntel KNL-16GBãšåãã§ãã
ãããã®ãœãªã¥ãŒã·ã§ã³ã¯ãçµç¹ã®ç¹ã§äŒŒãŠããŸã-ãµã€ãºã®å°ããé«éã¡ã¢ãªãšãµã€ãºã®å€§ããäœéã¡ã¢ãªããããŸãã 倧ããªã°ã©ããåŠçãããšãã«2ã¬ãã«ã®ã¡ã¢ãªã䜿çšããã·ããªãªã¯æããã§ããé«éã¡ã¢ãªã¯çµæãšäžéé åãæ ŒçŽããããã«äœ¿çšããããã®ãµã€ãºã¯å ¥åããŒã¿ã«æ¯ã¹ãŠéåžžã«å°ãããå ã®ã°ã©ãã¯äœéã¡ã¢ãªããèªã¿åãããŸãã
å®è£ ã®èŠ³ç¹ããããŠãŒã¶ãŒã¯æ¬¡ã®ããŒã«ã䜿çšã§ããŸãã Intel KNLã®å Žåãéåžžã®mallocã®ä»£ããã«ä»ã®ã¡ã¢ãªå²ãåœãŠé¢æ°hbm_mallocã䜿çšããã ãã§ååã§ãã ããã°ã©ã ãmallocæŒç®åã䜿çšããå Žåããã®æ©èœã䜿çšããã«ã¯ã1ã€ã®å®çŸ©ã宣èšããã ãã§ååã§ãã NVidia Teslaã®å Žåãä»ã®ã¡ã¢ãªå²ãåœãŠé¢æ°ã䜿çšããå¿ èŠããããŸã-cudaMallocã®ä»£ããã«cudaMallocHostã䜿çšããŸãã ãããã®ã³ãŒãã®ä¿®æ£ã¯ååã§ãããããã°ã©ã ã®èšç®éšåã®ä¿®æ£ãå¿ èŠãšããŸããã
å®éšã¯ã2 25 ïŒ4 GBïŒãã2 30 ïŒ128 GBïŒã§çµããããŸããŸãªãµã€ãºã®ã°ã©ãã«å¯ŸããŠè¡ãããŸããã Graph500ã®è©äŸ¡ã§ã¯ãå¹³åçãªæ¥ç¶ã®åºŠåããšã°ã©ãã®çš®é¡ã¯ã°ã©ããžã§ãã¬ãŒã¿ãŒããååŸããŸããã ãã®ãžã§ãã¬ãŒã¿ãŒã¯ãå¹³åé£çµåºŠ16ããã³ä¿æ°A = 0.57ãB = 0.19ãC = 0.19ã®ã¯ãããã«ãŒã°ã©ããäœæããŸãã
ãã®ã¿ã€ãã®ã°ã©ãã¯ãè©äŸ¡è¡šã®ãã¹ãŠã®åå è ã䜿çšãããããåå è éã§å®è£ ãæ£ããæ¯èŒã§ããŸãã ããã©ãŒãã³ã¹å€ã¯ãGraph500ããŒãã«ã®TEPSã¡ããªãã¯ãšGreenGraph500ããŒãã«ã®TEPS / wã«åŸã£ãŠèšç®ãããŸãã ãã®ç¹æ§ãèšç®ããããã«ãç°ãªãéå§é ç¹ããã®BFSã¢ã«ãŽãªãºã ã®64åã®éå§ãå®è¡ãããå¹³åå€ãååŸãããŸãã ã¢ã«ãŽãªãºã ã®æ¶è²»éãèšç®ããããã«ãã¡ã¢ãªæ¶è²»éãèæ ®ã«å ¥ããŠãã¢ã«ãŽãªãºã æäœæã®çŸåšã®ã·ã¹ãã æ¶è²»éãååŸãããŸãã
次ã®è¡šã¯ããã¹ãããããã¹ãŠã®ã°ã©ãã®GTEPSã§ã®çµæã®ããã©ãŒãã³ã¹ã瀺ããŠããŸãã è¡šã«ã¯2ã€ã®å€ã瀺ãããŠããŸããåã°ã©ãã§éæãããããã©ãŒãã³ã¹ã®æå°å€/æ倧å€ã§ãã ãŸããIntel KNLã䜿çšããå Žåãã¢ã«ãŽãªãºã ã®çµæã¯DDR4ã¡ã¢ãªã®ã¿ã䜿çšããŠååŸãããŸããã æ®å¿µãªããããã¹ãŠã®ããŒã¿å§çž®ã¢ã«ãŽãªãºã ã䜿çšããå Žåã§ããæäŸããããµãŒããŒäžã®Intel KNLã§2 30ã®é ç¹ãæã€ã°ã©ããå®è¡ããããšã¯ã§ããŸããã§ããã ããããIntelããã»ããµãŒã®å®å®æ§ãšIntelã³ã³ãã€ã©ãŒã®ãã¯ãããžãŒãèãããšãã°ã©ããµã€ãºã倧ãããªã£ãŠãããã©ãŒãã³ã¹ã¯å€ãããªããšæ³å®ã§ããŸãïŒIntel Xeon E5ã§èŠãããããã«ïŒã
GTEPSã§ã®çµæã®ããã©ãŒãã³ã¹ïŒ
ã«ãŠã³ãïŒ | 2 25 | 2 26 | 2 27 | 2 28 | 2 29 | 2 30 |
---|---|---|---|---|---|---|
KNL 7250 | 10.7 / 30.6 | 12.9 / 41 | 8.4 / 43.3 | 4.6 / 40.2 | 6.2 / 42.6 | N / a |
KNL 7250 DDR4 | 6.7 / 25.2 | 4.3 / 27 | 4.9 / 28.4 | 5.7 / 31.6 | 10.8 / 38.8 | N / a |
E5 2699 v3 | 11 / 16.5 | 11.8 / 17.3 | 12.7 / 18.5 | 13.1 / 18.3 | 12.1 / 18.0 | 12.4 / 21.1 |
P8 + s822lc | 8.8 / 22.5 | 9.02 / 23.3 | 7.98 / 23.4 | 10.4 / 23.7 | 10.1 / 24.6 | 7.59 / 14.8 |
ãã¹ã©P100 | 41/282 | 99/333 | 34/324 | 50/274 | 7.2 / 61 | 6.5 / 52 |
以äžã®ã°ã©ãã¯ããã¹ãããããã©ãããã©ãŒã ã®å¹³åããã©ãŒãã³ã¹ã瀺ããŠããŸãã 64 GBãã128 GBã®ã°ã©ãã«åãæ¿ãããšãPower 8+ã®å®å®æ§ãããŸãè¯ããªãããšãããããŸãã ããã¯ããããã2ã€ã®é¡äŒŒããããã»ããµãã2ã€ã®ãœã±ããã䜿çšããåããã»ããµã«128 GBã®ã¡ã¢ãªããã£ãããã§ãã ãŸãããã倧ããªã°ã©ããåŠçããå ŽåãããŒã¿ã®äžéšã¯ããœã±ããã«å±ããªãã¡ã¢ãªã«é 眮ãããŸããã ãŸããæéã®CPUããã€ã¹ãšGPUã®å·®ã¯çŽ10åã§ããããããã®ã°ã©ãã§ã¯ãããå°ããã°ã©ãã§ã®Tesla P100ã®ããã©ãŒãã³ã¹ã¯è¡šç€ºãããŸããã ã°ã©ããéåžžã«å€§ãããªããã£ãã·ã¥ã«åãŸãããNVlinkãä»ããŠã°ã©ãã«ã¢ã¯ã»ã¹ãããšããã®å éã¯æ¥æ¿ã«äœäžããŸãã ãããããã®å¶éã«ãããããããGPUã®ããã©ãŒãã³ã¹ã¯äŸç¶ãšããŠãã¹ãŠã®CPUããã€ã¹ãäžåã£ãŠããŸãã ãã®åäœã¯ãCUDAã䜿çšãããšã³ã³ãã¥ãŒãã£ã³ã°ãšã¡ã¢ãªã¢ã¯ã»ã¹ãããé©åã«å¶åŸ¡ã§ããGPUã®äžŠåã³ã³ãã¥ãŒãã£ã³ã°ãžã®é©å¿æ§ãåäžãããšããäºå®ã«ãã£ãŠèª¬æãããŸãã
以äžã®è¡šã¯ããã¹ãããããã¹ãŠã®ã°ã©ãã®GTEPS / wã§ã®çµæã®ããã©ãŒãã³ã¹ã瀺ããŠããŸãã è¡šã«ã¯ãåã°ã©ãã®å¹³åæ¶è²»é»åãšéæãããå¹³åããã©ãŒãã³ã¹ã瀺ãããŠããŸãã NVidia Tesla P100ã§å2 28ããå2 29ã«åãæ¿ãããšãã®ããã©ãŒãã³ã¹ãšãšãã«ã®ãŒå¹çã®æ¥æ¿ãªäœäžã¯ãæãé »ç¹ã«ã¢ã¯ã»ã¹ãããã°ã©ãã®æŽåéšåãé
眮ããã®ã«ååãªé«éã¡ã¢ãªããªããšããäºå®ã«ãã£ãŠèª¬æãããŸãã ããå€ãã®ã¡ã¢ãªïŒ32 GBãªã©ïŒã䜿çšããNVlink 2.0 CPUãšã®éä¿¡ãã£ãã«ãå¢ãããšã倧ããªã°ã©ãã®åŠçå¹çã倧å¹
ã«é«ããããšãã§ããŸãã
çµæãšããŠçãããšãã«ã®ãŒå¹çã¯MTEPS / wã§è¡šãããŸãã
ã°ã©ããµã€ãº | 2 25 | 2 26 | 2 27 | 2 28 | 2 29 | 2 30 |
---|---|---|---|---|---|---|
KNL 7250 | 121.4 | 149.3 | 142.33 | 136.56 | 130.96 | N / a |
E5 2699 v3 | 95.56 | 98.65 | 100 | 101.9 | 91.12 | 84.46 |
P8 + s822lc | 93.8 | 97.04 | 93.2 | 95.28 | 92.41 | 53.23 |
ãã¹ã©P100 | 1228.57 | 1165.71 | 1235.96 | 1016.57 | 195.61 | 177.45 |
ãããŠæåŸã«
è¡ãããäœæ¥ã®çµæãšããŠã2ã€ã®äžŠåBFSã¢ã«ãŽãªãºã ããåæ§ã®ã·ã¹ãã ã®CPUãšGPUã«å®è£ ãããŸããã IBM Power8 +ãIntel x86ãIntel Xeon PhiïŒKNLïŒãNVidia Tesla P100ãªã©ã®ããŸããŸãªãã©ãããã©ãŒã ã«å®è£ ãããã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãšãšãã«ã®ãŒå¹çã®èª¿æ»ãè¡ãããŸããã ãããã®ãã©ãããã©ãŒã ã«ã¯ãããŸããŸãªã¢ãŒããã¯ãã£æ©èœããããŸãã ããã«ãããããããæåã®3ã€ã¯æ§é ãéåžžã«äŒŒãŠããŸãã ããã«ãããOpenMPã¢ããªã±ãŒã·ã§ã³ã¯ãããã®ãã©ãããã©ãŒã ã§å€§ããªå€æŽãªãã«èµ·åã§ããŸãã ããã©ããããGPUã¢ãŒããã¯ãã£ã¯åæ§ã®ãã©ãããã©ãŒã ã®CPUãšã¯éåžžã«ç°ãªããèšç®ã³ãŒããå®è£ ããããã«CUDAã¢ãŒããã¯ãã£ãšããç°ãªãæŠå¿µã䜿çšããŠããŸãã
Graph500ã¬ãŒãã£ã³ã°ã®ãžã§ãã¬ãŒã¿ãŒã®åŸã«ååŸãããã°ã©ããèæ ®ãããŸããã 2ã€ã®ããŒã¿ã¯ã©ã¹ã§ã®åã¢ãŒããã¯ãã£ã®ããã©ãŒãã³ã¹ã調æ»ããŸããã æåã®ã¯ã©ã¹ã«ã¯ãã³ã³ãã¥ãŒã¿ãŒã®æéã®ã¡ã¢ãªãŒã«é 眮ãããã°ã©ããå«ãŸããŸãã 2çªç®ã®ã¯ã©ã¹ã«ã¯ãå šäœãšããŠé«éã¡ã¢ãªã«é 眮ã§ããªã倧ããªã°ã©ããå«ãŸããŸãã GreenGraph500ã¡ããªãã¯ã¯ããšãã«ã®ãŒå¹çãå®èšŒããããã«äœ¿çšãããŸããã ããã°ããŒã¿ã¯ã©ã¹ã®GreenGraph500è©äŸ¡ã§èæ ®ãããæå°ã°ã©ãã¯ã2 30ã®é ç¹ãš2 34ã®ãšããžãå«ã¿ãå ã®åœ¢åŒã§128 GBã®ã¡ã¢ãªãå æããŸãã 2 30 2 34 , , .
Graph500 GreenGraph500 NVidia Tesla P100 ( 220 GTEPS 1235.96 MTEPS/w), ( 41.7 GTEPS 177.45 MTEPS/w). NVLink, IBM Power8+.
NVidia Volta NVlink 2.0, .
[1] EF Moore. The shortest path through a maze. In Int. Symp. on Th.
of Switching, pp. 285â292, 1959
[2] Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press,
Cambridge (1990)
[3] Edmonds, J., Karp, RM: Theoretical improvements in algorithmic efficiency for
network flow problems. Journal of the ACM 19(2), 248â264 (1972)
[4] Brandes, U.: A Faster Algorithm for Betweenness Centrality. J. Math. Sociol. 25(2),
163â177 (2001)
[5] Frasca, M., Madduri, K., Raghavan, P.: NUMA-Aware Graph Mining Techniques
for Performance and Energy Efficiency. In: Proc. ACM/IEEE Int. ç¢ºèª High Performance
Computing, Networking, Storage and Analysis (SC 2012), pp. 1â11. IEEE
Computer Society (2012)
[6] Girvan, M., Newman, MEJ: Community structure in social and biological networks.
æç¶ã Natl. Acad. Sci. USA 99, 7821â7826 (2002)
[7] DA Bader and K. Madduri. Designing multithreaded algorithms for
breadth-first search and st-connectivity on the Cray MTA-2. In ICPP,
pp. 523â530, 2006.
[8] RE Korf and P. Schultze. Large-scale parallel breadth-first search.
In AAAI, pp. 1380â1385, 2005.
[9] A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and
U. Catalyurek. A scalable distributed parallel breadth-first search
algorithm on BlueGene/L. In SC '05, p. 25, 2005.
[10] Y. Zhang and EA Hansen. Parallel breadth-first heuristic search on a shared-memory architecture. In AAAI Workshop on Heuristic
Search, Memory-Based Heuristics and Their Applications, 2006.
[11] Yasui Y., Fujisawa K., Sato Y. (2014) Fast and Energy-efficient Breadth-First Search on a Single NUMA System. In: Kunkel JM, Ludwig T., Meuer HW (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham
[12] Hiragushi T., Takahashi D. (2013) Efficient Hybrid Breadth-First Search on GPUs. In: Aversa R., KoÅodziej J., Zhang J., Amato F., Fortino G. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8286. Springer, Cham
[13] Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. In: Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2012), pp. 117â128 (2012)