DL_MESOã¯ã ã¡ãœã¹ã³ããã¯ã¬ãã«ã§åçž®ç©è³ªãã·ãã¥ã¬ãŒãããããã®ç§åŠããã±ãŒãžã§ãïŒååŠè ãšç©çåŠè ã¯ãæ£ãã翻蚳ããªãã£ãå Žåã¯èš±ããŠãããŸãïŒã ãã®ããã±ãŒãžã¯ããŒãºããªãŒã®ç 究æã§éçºãããç 究ã³ãã¥ããã£ãšæ¥çïŒUnileverãSyngentaãInfineumïŒã®äž¡æ¹ã§åºã䜿çšãããŠããŸãã ãã®ãœãããŠã§ã¢ã䜿çšããŠãã·ã£ã³ããŒãè¥æãçææ·»å ç©ã®æé©ãªåŠæ¹ãæ€çŽ¢ãããŸãã ãã®ããã»ã¹ã¯ãã³ã³ãã¥ãŒã¿ãŒæ¯æŽè£œå€ãïŒCAFïŒãšåŒã°ããŸã-ç§ã¯ããããååŠåŒã®éçºã«ãããCADããšããŠç¿»èš³ããŸããã
CAFã·ãã¥ã¬ãŒã·ã§ã³ã¯éåžžã«ãªãœãŒã¹ãæ¶è²»ããèšç®ãªã®ã§ãéçºè ã¯ããã«æãçç£çãªèšèšã«èå³ãæã¡ãŸããã ãŸããDL_MESOã¯ãIntelãšHartreeã®éã®Intel Parallel Computing CenterïŒIPCCïŒã®å ±åãããžã§ã¯ãã®1ã€ãšããŠéžã°ããŸããã
DL_MESOéçºè ã¯ãAVX-512ãªã©ã®ä»åŸã®ãã¯ãããžãŒã«ãããå粟床ã®æ°å€ã§ã³ãŒãã8åé«éåã§ããå¯èœæ§ãããããïŒãã¯ãã«åãããŠããªãã³ãŒããšæ¯èŒããŠïŒããã¯ãã«äžŠååŠçã®ããŒããŠã§ã¢æ©èœãå©çšããããšèããŸããã
ãã®æçš¿ã§ã¯ãDarsburyã®ç§åŠè ããã¯ãã«åã¢ããã€ã¶ãŒã䜿çšããŠDL_MESOã®ã©ãã£ã¹ãã«ããã³æ¹çšåŒã³ãŒããåæããæ¹æ³ãèŠã€ãã£ãå ·äœçãªåé¡ãããã³ã³ãŒããä¿®æ£ããŠ2.5åãªãŒããŒã¯ããã¯ããæ¹æ³ã«ã€ããŠèª¬æããŸãã
調æ»ãããã¡ã€ã«
ãã¯ãã«åã¢ããã€ã¶ãŒãšã¯ã æåã®èšäºãã芧ãã ãã ã ããã§ãLattice Boltzmanã³ã³ããŒãã³ãã®Surveyãããã¡ã€ã«ã«çŽæ¥é²ã¿ãŸãã å šäœã®å®è¡æéã®çŽååã¯10ã®ããããããµã€ã¯ã«ã«è©²åœãããã®äžã«ã¯æ確ãªãªãŒããŒã¯ããŸãããåãªãŒããŒã¯åèšããã°ã©ã æéã®12ïŒ ä»¥äžããè²»ãããŸããã ãã®å³ã¯ããã©ãããªãããã¡ã€ã«ãã«è¿ããã®ã§ãéåžžããã°ã©ããŒã«ãšã£ãŠã¯æªãããšã§ãã å®éãé¡èãªå éãéæããããã«ã¯ãåãµã€ã¯ã«ãåå¥ã«æé©åããå¿ èŠããããŸãã
ããããDarsburyéçºè ã«ãšã£ãŠå¹žããªããšã«ããã¯ãã«åã¢ããã€ã¶ãŒã¯æ¬¡ã®ããã«ã«ãŒãããã°ããç¹åŸŽä»ããããšãã§ããŸãã
- SIMDã³ãŒãçæãå¯èœã«ããããã«æå°éã®ã³ãŒãå€æŽïŒéåžžã¯OpenMP4.xã䜿çšïŒãå¿ èŠãšãããã¯ãã«åãããã«ãŒãã§ããããã¯ãã«åãããŠããªãã«ãŒãã æåã®4ãµã€ã¯ã«ïŒCPUæéïŒã¯ãã®ã«ããŽãªã«åé¡ãããŸãã
- åçŽãªæäœã§ããã©ãŒãã³ã¹ãæ¹åã§ãããã¯ãã«åãããã«ãŒãã
- ããŒã¿ã¬ã€ã¢ãŠãæ§é ã«ãã£ãŠããã©ãŒãã³ã¹ãå¶éããããã¯ãã«åãããã«ãŒãã ãã®ãããªã«ãŒãã®æé©åã«ã¯ãããæ·±å»ãªã³ãŒãåŠçãå¿ èŠã§ãã åŸã§èŠãããã«ãæåã®2ã€ã®ã«ããŽãªã®åé¡ã解æ¶ããåŸããããã¯æãè² è·ã®é«ã2ã€ã®ãµã€ã¯ã«ã«ãªããŸãã
- ãã§ã«ããŸãæ©èœããŠãããã¯ãã«åãããã«ãŒãã
- ãã®ä»ã®ãã¹ãŠã®ã±ãŒã¹ïŒãã¯ãã«åã§ããªãã«ãŒããå«ãïŒã
ãã¯ãã«åã¢ããã€ã¶ãŒã¯ããµã€ã¯ã«ã«é¢ããæ å ±ãæäŸããã ãã§ã¯ãããŸããã ãæšå¥šäºé ãããã³ãã³ã³ãã€ã©ãŒèšºæã®è©³çŽ°ãã¿ãã«ã¯ãç¹å®ã®åé¡ãšè§£æ±ºçã«é¢ããæ å ±ã衚瀺ãããŸãã
ãã®å Žåã3çªç®ã®ãããã¹ãããïŒããããããµã€ã¯ã«ãããšãã°èšèªã®çŽç²ãã®ä¿è·è ã¯ç§ãèš±ããŠãããŸãïŒã¯ãã³ã³ãã€ã©ãŒãæã€å埩åæ°ãæšå®ã§ããªãã£ããããfGetSpeedSiteã§ãã¯ãã«åã§ããŸããã§ããã ã³ã³ãã€ã©ã®èšºæã®è©³çŽ°ã§ã¯ãåé¡ã®æ¬è³ªãäŸãšããã解決ããããã®ææ¡ãšãšãã«èª¬æããŸã-ãã£ã¬ã¯ãã£ãã#pragma loop countããè¿œå ããŸãã ã¢ããã€ã¹ã«åŸã£ãŠããã®ãµã€ã¯ã«ã¯ããã«ãã¯ãã«åãããã«ããŽãª2ããã«ããŽãª4ã«ç§»è¡ããŸããã
ã³ãŒãããã¯ãã«åã§ããå Žåã§ããå¿ ãããããã«çç£æ§ãåäžããããã§ã¯ãããŸããïŒã«ããŽãª2ããã³3ïŒã ãããã£ãŠããã§ã«ãã¯ãã«åããããµã€ã¯ã«ã®æå¹æ§ã調æ»ããããšãéèŠã§ãã
åçŽãªæé©åïŒã平衡ååžãã³ã¢ã®ããã£ã³ã°
int fGetEquilibriumF(double *feq, double *v, double rho) { double modv = v[0]*v[0] + v[1]*v[1] + v[2]*v[2]; double uv; for(int i=0; i<lbsy.nq; i++) { uv = lbv[i*3] * v[0] + lbv[i*3+1] * v[1] + lbv[i*3+2] * v[2]; feq[i] = rho * lbw[i] * (1 + 3.0 * uv + 4.5 * uv * uv - 1.5 * modv); } return 0; }
lbvé åã¯ããã¹ãŠã®æ¬¡å ã«ããã空éæ Œåã®é床ãæ ŒçŽããŸãã lbsy.nqå€æ°ã«ã¯é床ã®æ°ãå«ãŸããŠããŸãã ãã®ã¢ãã«ã¯ã3次å ã®19ã¹ããŒãã°ãªã«ïŒD3Q19ã¹ããŒã ïŒãè¡šããŠããŸãã ã€ãŸã lbsy.nqã®å€ã¯19ã§ããçµæã®å¹³è¡¡ã¯feq [i]é åã«ä¿åãããŸãã
æåã®åæã§ã¯ããµã€ã¯ã«ã¯ãã¯ãã«åã§ã¯ãªãã¹ã«ã©ãŒã§ããã ã«ãŒãã®åã«ã#pragma omp simdããè¿œå ããã ãã§ããã¯ãã«åãããåèšCPUæéã«å ããå²åã13ïŒ ãã9ïŒ ã«äœäžããŸããã ãããããããã®å€æŽããã£ãŠãããŸã æ¹åã®äœå°ããããŸãã
Advisor XEã®æ°ããçµæã¯ãã³ã³ãã€ã©ã1ã€ã§ã¯ãªã2ã€ã®ãµã€ã¯ã«ãçæããããšã瀺ããŸããã
- 256ãããAVXã¬ãžã¹ã¿ã®4ã4åã®doubleåã®ãã¯ãã«é·ïŒãã¯ãã«é·-VLïŒãæã€ãµã€ã¯ã«ã®ãã¯ãã«åãããããã£ã
- ã¹ã«ã©ãŒã®æ®ãã¯ããã®ãµã€ã¯ã«ã®æéã®30ïŒ ãæ¶è²»ããŸãã
ãã®ã¹ã«ã©ãŒã®æ®ãã¯äžå¿ èŠãªãªãŒããŒãããã§ãã ãã®ååšã¯ã䞊åå¹æã«æ害ã§ãã ãã®ãããªå€§ããªãéã¿ãã®äœãã¯ãå埩åæ°ãVLïŒãã¯ãã«é·ïŒã§æ£ç¢ºã«é€ç®ãããªããšããäºå®ã«ãã£ãŠåŒãèµ·ããããŸãã ã³ã³ãã€ã©ã¯ãå埩0ã15ã®ãã¯ãã«åœä»€ãçæããæ®ãã®å埩16ã18ã¯ã¹ã«ã©ãŒå°äœã§å®è¡ãããŸãã 3åã®å埩ã§ããã£ãããšé 次å®è¡ãããå Žåã§ãããµã€ã¯ã«æéã®30ïŒ ãå ããŸãã çæ³çã«ã¯ããã¹ãŠã®å埩ã¯ãã¯ãã«ã³ãŒãã§å®è¡ãããæ®ãã¯ãŸã£ããå®è¡ãããŸããã
ã«ãŒãã«ããŒã¿ããã£ã³ã°ææ³ãé©çšã§ããŸãã å埩åæ°ã20ã«å¢ãããŸããããã¯VLã®åæ°ã«ãªããŸãïŒ4ïŒã Advisor XEã¯ã[æšå¥šäºé ]ã¿ãã§ãããæ瀺çã«ã¢ããã€ã¹ããŸãã
ããŒã¿ããããã¯ã¢ãŠããããã«ã¯ãã»ã°ã¡ã³ããŒã·ã§ã³éåãçºçããªãããã«ãé åfeq []ãlbv []ãããã³lbw []ã®ãµã€ãºã倧ããããå¿ èŠããããŸãã èšäºã®æåŸã«ããè¡šã¯ãã³ãŒãã®å€æŽã瀺ããŠããŸãã lbsy.nqpadå€ã¯ãå ã®å埩åæ°ãšããã£ã³ã°å€ïŒNQPAD_COUNTïŒã®åèšã§ãã
ããã«ãDL_MESOéçºè ã¯ãã«ãŒãã®åã«ã#pragma loop countããã£ã¬ã¯ãã£ããè¿œå ããŸããã å埩åæ°ãVLã®åæ°ã§ããããšãæ瀺çã«ç¢ºèªãããšãã³ã³ãã€ã©ãŒã¯ãã¯ãã«ã³ãŒããçæããæ®ãã¯å®è¡ãããŸããã
DL_MESOã³ãŒãã«ã¯ãåãæ¹æ³ã§æ¹åã§ããå€ãã®åãæ§é ããããŸãã åããœãŒã¹ãã¡ã€ã«å ã®ä»ã®3ã€ã®ãµã€ã¯ã«ãä¿®æ£ããããããã®ãµã€ã¯ã«ã15ïŒ å éããŸããã
ãªãŒããŒãããã®ãã©ã³ã¹ãšæé©åã®ãã¬ãŒããªã
æåã®2ãµã€ã¯ã«ã«äœ¿çšãããããã£ã³ã°ææ³ã«ã¯ã代åã䌎ããŸãã
- ããã©ãŒãã³ã¹ã®èŠ³ç¹ãããã¹ã«ã©ãŒå°äœã®ãªãŒããŒããããé€å»ããŸããããã¯ãã«éšåã«è¿œå ã®èšç®ãå°å ¥ããŸãã
- ã³ãŒããµããŒãã®èŠ³ç¹ãããå ¥åããŒã¿ã®ãµã€ãºãšããŒãã®ã¿ã€ãã«å¿ããŠãããŒã¿æ§é ã®å²ãåœãŠãåå®çŸ©ããã³ã³ãã€ã©ãã£ã¬ã¯ãã£ãã«æœåšçã«å°å ¥ãããå€ãå€æŽããŸããã
ç§ãã¡ã®å Žåãããã©ãŒãã³ã¹ã«å¯Ÿãããã©ã¹ã®å¹æãçæãäžåããã³ãŒãã®è€éãã¯èš±å®¹ç¯å²ã§ããã
ããŒã¿ã¬ã€ã¢ãŠãã®æé©åïŒé åæ§é
ãã¯ãã«åãããŒã¿ããã£ã³ã°ãããã³ããŒã¿ã¢ã©ã€ã¡ã³ãã«ããããããã¹ãããNo. 1ã®ããã©ãŒãã³ã¹ã25ã30ïŒ åäžããAdvisor XEã«ãããã¯ãã«åã®å¹çã¯56ïŒ ã«åäžããŸããã
ãªããªã 56ïŒ ã¯ãŸã 100ïŒ ãšã¯ã»ã©é ããããéçºè ã¯çç£æ§ã®åäžã劚ããèŠå ã調æ»ããããšã«ããŸããã ããã¯ã¿ãŒã®åé¡/æšå¥šäºé ããããäžåºŠèŠãŠã圌ãã¯æ°ããåé¡ãçºèŠããŸãã-ãå¯èœæ§ã®ããéå¹ççãªã¡ã¢ãªã¢ã¯ã»ã¹ãã¿ãŒã³ãååšãããã 察å¿ããæšå¥šäºé ã¯ããã¡ã¢ãªã¢ã¯ã»ã¹ãã¿ãŒã³ãïŒMAPïŒåæãå®è¡ããããšã§ãã åãã¢ããã€ã¹ãç¹æ§åã«ããããŸããïŒ
MAPåæãéå§ããã«ã¯ãéçºè ã¯å¿ èŠãªãµã€ã¯ã«ã«ããŒã¯ãä»ãã[ã¯ãŒã¯ãããŒ]ããã«ã®[MAPéå§]ãã¿ã³ãã¯ãªãã¯ããŸãã
MAPã®çµæãšããŠã®ã¹ãã©ã€ãã®ååžã¯ããŠãããã¹ãã©ã€ãïŒã·ãŒã±ã³ã·ã£ã«ã¢ã¯ã»ã¹ïŒãšéãŠããããäžå®ãã¹ãã©ã€ã-äžå®ã®ã¹ãããã§ã®ã¡ã¢ãªã¢ã¯ã»ã¹ã®ååšã瀺ããŠããŸãã
ãœãŒã¹ã³ãŒãã®MAPããŒã¿ã¯ãlbvé åã«ã¢ã¯ã»ã¹ãããšãã«ãã¹ãã©ã€ã3ïŒå ã®ã¹ã«ã©ãŒããŒãžã§ã³ã®å ŽåïŒããã³ã¹ãã©ã€ã12ïŒããã£ã³ã°ã®ãããã¯ãã«åããŒãžã§ã³ã®å ŽåïŒã®ãããã®ååšã瀺ããŸãïŒæåŸã®è¡šãåç §ïŒã
ã¹ããã3ã®é åã¯ãé床lbvã®é åã®èŠçŽ ã«çºçããŸãã æ°ããå埩ããšã«ãé åã®3ã€ã®èŠçŽ ãã·ãããããŸãã 3ãã©ãããæ¥ããã¯ãlbv [i * 3 + X]ãšããåŒããæããã§ããããã¯ãã¡ã¢ãªã¢ã¯ã»ã¹ã®åå ã§ãã
ãã®ãããªäžè²«æ§ã®ãªãã¢ã¯ã»ã¹ã¯ãã¿ã€ãmovã®1ã€ã®åœä»€ã®ãã¯ãã«ã¬ãžã¹ã¿ã«ãã¹ãŠã®èŠçŽ ãããŒãã§ããªãããããã¯ãã«åã«ã¯ããŸãé©ããŠããŸããïŒãããã¯ãããŒãžã§ã³ã¯å¥ã®æ¹æ³ã§åŒã³åºãããŸãïŒã ãããããæ§é ã®é åããããé åã®æ§é ããžã®å€æãé©çšããããšã«ãããäžå®ã®ã¹ãããã§ã®åŠçãã·ãŒã±ã³ã·ã£ã«ã¢ã¯ã»ã¹ã«å€æã§ããããšããããããŸãã MAPåæåŸãæšå¥šãŠã£ã³ããŠã¯ãããæ£ç¢ºã«éç¥ãïŒAoS-> SoAïŒãéå¹ççãªã¡ã¢ãªã¢ã¯ã»ã¹ã®åé¡ã解決ããããšã«æ³šæããŠãã ãã-äžèšã®ã¹ã¯ãªãŒã³ã·ã§ãããåç §ããŠãã ããã
éçºè ã¯ããã®å€æãlbvé åã«é©çšããŸããã ãããè¡ãããã«ãããšããšXãYãZã®é床ãå«ãã§ããlbvé åã¯ãlbvxãlbvyãlbvzã®3ã€ã®é åã«åå²ãããŸããã
DL_MESOéçºè ã¯ãé åã®æ§é ã®å€æã¯ããã£ã³ã°ãšæ¯èŒããŠæéãããããšèšããŸããããçµæã¯åªåãã䟡å€ããããŸãã-fGetEquilibriumã®ãµã€ã¯ã«ã¯2åéããªããlbvã¢ã¬ã€ã§åäœããããã€ãã®ãµã€ã¯ã«ã§åæ§ã®æ¹åãèµ·ãããŸããã
ããŒã¿æ§é ã®å€æãšã«ãŒãã®æé©åã®çµæïŒããã£ã³ã°ãã¢ã©ã€ã¡ã³ãïŒãããã³ããã©ãŒãã³ã¹ã¡ããªãã¯ãšAdvisor XEã¡ã¢ãªã¢ã¯ã»ã¹ãã¿ãŒã³ïŒ
åé¡ã®ãµã€ã¯ã«ã®ãœãŒã¹ã³ãŒãã®é²å-ãã¯ãã«åãã£ã¬ã¯ãã£ããã¢ã©ã€ã¡ã³ããããã£ã³ã°ãAoS-> SoAå€æãããã³Advisor XEããã®MAPçµæïŒ
ãŸãšã
Vectorization Advisorã®DL_MESOåæã䜿çšããŠãã³ãŒãã«ããã€ãã®ãã£ã¬ã¯ãã£ããè¿œå ãããšã3ã€ã®æãããããªãµã€ã¯ã«ã®æéã10ã19ïŒ ççž®ã§ããŸããã ãã¹ãŠã®æé©åã¯ãã¢ããã€ã¶ã®æšå¥šäºé ã«åºã¥ããŠããŸãã ãã¯ãã«åããæå¹åããããããã£ã³ã°ãã䜿çšããŠã«ãŒãããã©ãŒãã³ã¹ãæ¹åããäœæ¥ãè¡ãããŸããã åæ§ã®ææ³ãããã«æ°ãµã€ã¯ã«ã«é©çšãããšãã¢ããªã±ãŒã·ã§ã³å šäœã18ïŒ å éãããŸããã
ããŒã¿ããæ§é ã®é åããããé åæ§é ãã«å€æããããšã«ããã次ã®å€§ããªæ¹åãåŸãããŸããã ç¹°ãè¿ããŸãããAdvisor XEã®æšå¥šã«åºã¥ããŠããŸãã
ããã«ãXeonããã»ããµãæèŒãããµãŒããŒã§äœæ¥ãšãã¹ããæåã«å®è¡ãããŸããïŒAdvisorã¯ã³ããã»ããµã§ã¯ãŸã åäœããŸããïŒã Xeon Phiã³ããã»ããµãŒã§å®è¡ãããŠããã³ãŒãã«å¯ŸããŠåãããšãè¡ããããšããç§ãã¡ã¯ã»ãŒåãå©çãåŸãŸãã-è¿œå ã®åŽåãªãã§ã³ããã»ããµãŒãæé©åããŸããã
以äžã¯ãéåžžã®ãµãŒããŒïŒAVXã®ã©ãã«ïŒãšXeon Phiã«ãŒãïŒKnCã®ã©ãã«ïŒã§åŸãããã¢ã¯ã»ã©ã¬ãŒã·ã§ã³ã瀺ããŠããŸãã Xeon CPUã§ã¯2.5åãXeon Phiã§ã¯4.1åå éããŸããã
DL_MESOéçºè ãåŒçšãããšããç§ã¯ãã§ã«ãã®ããŒã«ã§è²©å£²ãããŠããŸããçŸåšå¿ èŠãªã®ã¯Xeon Phiã®ããŒãžã§ã³ã§ãïŒãïŒDukeslabç 究æã®èšç®ç§åŠè Luke MasonïŒã
ãã®æçš¿ã¯ãZakhar MatveevïŒIntel CorporationïŒã«ããèšäºã®ç¿»èš³ã§ãã