ãã®èšäºã¯ãæ¯å¹Žãªã£ã¶ã³ç¡ç·å·¥åŠå€§åŠã§éå¬ãããŠãããTexas Instruments C66xãã«ãã³ã¢ããžã¿ã«ä¿¡å·åŠçããã»ããµãããã°ã©ã ã®ç¶ç¶æè²ã³ãŒã¹ã®äžç°ãšããŠåŠçã«æäŸãããè¬çŸ©ãšå®è·µè³æãåæ ããŠããŸãã ãã®èšäºã¯ãç§åŠããã³æè¡ãžã£ãŒãã«ã®ããããã§ã®åºçãèšç»ããŠããŸããããæ€èšäžã®åé¡ã®è©³çŽ°ã«ããããã«ãã³ã¢DSPããã»ããµã®ãã¬ãŒãã³ã°ããã¥ã¢ã«ã®è³æãèç©ããããšã決å®ãããŸããã ãããŸã§ã®éããã®è³æã¯èç©ãããã€ã³ã¿ãŒãããã«ç¡æã§ã¢ã¯ã»ã¹ã§ããå¯èœæ§ããããŸãã ãã£ãŒãããã¯ãšææ¡ãæè¿ããŸãã
ã¯ããã«
é«æ§èœããã»ããµãšã¬ã¡ã³ãã®çç£ã®ããã®çŸä»£ç£æ¥ã¯ãçŸåšããã«ãã³ã¢ã¢ãŒããã¯ãã£ãžã®ç§»è¡ã«é¢é£ããç¹åŸŽçãªã©ãŠã³ããçµéšããŠããŸã[1ã2]ã ãã®ç§»è¡ã¯ãããã»ããµã®èªç¶ãªé²åéçšãããããã匷å¶çãªæ段ã§ãã ãšãã«ã®ãŒå¹çã®æ¥æ¿ãªäœäžã«ãããã³ã³ãã¥ãŒãã£ã³ã°æ§èœã®å¯Ÿå¿ããå¢å ã«äŒŽããã¯ããã¯åšæ³¢æ°ã®å°ååããã³å¢å ã®çµè·¯ã«æ²¿ã£ãåå°äœæè¡ã®ãããªãéçºã¯äžå¯èœã«ãªããŸããã ããã»ããµãã¯ãããžã®ã¡ãŒã«ãŒã¯ããã®ç¶æ³ããã®è«ççãªæ¹æ³ãšããŠãã«ãã³ã¢ã¢ãŒããã¯ãã£ãžã®ç§»è¡ãæ€èšããŸãããããã«ãããããã»ããµã®åŠçèœåãé«ããããšãã§ããŸããã ãã®ã©ãŠã³ãã¯ãäžè¬çãªããã»ããµãã¯ãããžãŒãç¹ã«ãç¹å®ã®ã¢ããªã±ãŒã·ã§ã³åéãšãèšç®å¹çãå éšããã³å€éšããŒã¿è»¢éå¹çãäœæ¶è²»é»åããµã€ãºãããã³äŸ¡æ Œã«å¯Ÿããç¹å¥ãªèŠä»¶ãåããããžã¿ã«ä¿¡å·åŠçããã»ããµã«äžè¬çã§ãã
ãªã¢ã«ã¿ã€ã ä¿¡å·åŠçã·ã¹ãã ã®éçºè ã®èŠ³ç¹ãããããžã¿ã«ã·ã°ãã«ããã»ããµïŒDSPïŒã®ãã«ãã³ã¢ã¢ãŒããã¯ãã£ã®äœ¿çšãžã®ç§»è¡ã¯ã3ã€ã®äž»èŠãªåé¡ã§è¡šçŸã§ããŸãã 1ã€ç®ã¯ãããŒããŠã§ã¢ãã©ãããã©ãŒã ã®éçºããã®æ©èœãç¹å®ã®ãããã¯ã®å²ãåœãŠãšãããã®åäœã¢ãŒãã§ãããã¡ãŒã«ãŒã«ãã£ãŠå®ããããŠããŸã[1]ã 2ã€ç®ã¯ãåŠçã¢ã«ãŽãªãºã ã®é©å¿ãšããã«ãã³ã¢DSPïŒMTsSPïŒã§ã®å®è£ ã®ããã«ã·ã¹ãã ãç·šæããååã§ã[3]ã 3çªç®ã¯ãICMPã§å®è£ ãããããžã¿ã«ä¿¡å·åŠççšã®ãœãããŠã§ã¢ïŒãœãããŠã§ã¢ïŒã®éçºã§ãã åæã«ãICSPã®ãœãããŠã§ã¢ã®éçºã«ã¯ãã³ã¢éã§ã®ç¹å®ã®ã³ãŒããã©ã°ã¡ã³ãã®åæ£ãããŒã¿åé¢ãã³ã¢ã®åæãã«ãŒãã«éã®ããŒã¿ããã³ãµãŒãã¹æ å ±ã®äº€æããã£ãã·ã¥ã®åæãªã©ãåŸæ¥ã®ã·ã³ã°ã«ã³ã¢ã¢ããªã±ãŒã·ã§ã³ã®éçºãšããã€ãã®æ ¹æ¬çãªéãããããŸãã
æ¢åã®ãã·ã³ã°ã«ã³ã¢ããœãããŠã§ã¢ããã«ãã³ã¢ãã©ãããã©ãŒã ã«ç§»æ€ããããŸãã¯æ°ããã䞊åããœãããŠã§ã¢è£œåãéçºããããã®æãé åçãªãœãªã¥ãŒã·ã§ã³ã®1ã€ã¯ãOpen Multi-ProcessingïŒOpenMPïŒããŒã«ã§ãã OpenMPã¯ãäž»ã«æãäžè¬çãªCèšèªã®æšæºããã°ã©ãã³ã°èšèªã«åã蟌ãããšãã§ããã³ã³ãã€ã©ãã£ã¬ã¯ãã£ããé¢æ°ãããã³ç°å¢å€æ°ã®ã»ããã§ããã䞊åã³ã³ãã¥ãŒãã£ã³ã°ãæŽçããããšã«ããæ©èœãæ¡åŒµããŸãã ãããOpenMPã¢ãããŒãã®äž»ãªå©ç¹ã§ãã æ°ãã䞊åããã°ã©ãã³ã°èšèªãçºæ/åŠç¿ããå¿ èŠã¯ãããŸããã æšæºã³ãŒãã®ã³ã³ãã€ã©ã«åçŽã§æ確ãªãã£ã¬ã¯ãã£ããè¿œå ããããšã«ãããã·ã³ã°ã«ã³ã¢ããã°ã©ã ã¯ç°¡åã«ãã«ãã³ã¢ããã°ã©ã ã«å€ãããŸãã å¿ èŠãªã®ã¯ããã®ããã»ããµãŒã®ã³ã³ãã€ã©ãŒãOpenMPããµããŒãããããšã ãã§ãã ã€ãŸããããã»ããµãŒã®è£œé å ã¯ãã³ã³ãã€ã©ãŒãOpenMPæšæºãã£ã¬ã¯ãã£ãããç解ããã察å¿ããã¢ã»ã³ãã©ãŒã³ãŒãã«å€æããããšã確èªããå¿ èŠããããŸãã
OpenMPæšæºã¯ãããã€ãã®äž»èŠãªã³ã³ãã¥ãŒã¿ãŒã¡ãŒã«ãŒã®åäŒã«ãã£ãŠéçºãããOpenMP Architecture Review BoardïŒARBïŒ[4]ã«ãã£ãŠèŠå¶ãããŠããŸãã ããã«ãããã¯æ±çšã§ãããç¹å®ã®ã¡ãŒã«ãŒã®ç¹å®ã®ããŒããŠã§ã¢ãã©ãããã©ãŒã åãã§ã¯ãããŸããã ARBã¯ãæšæºã®å°æ¥ã®ããŒãžã§ã³ã®ä»æ§ãå ¬éããŠããŸã[5]ã OpenMP [6]ã®ã¯ã€ãã¯ãªãã¡ã¬ã³ã¹ãèå³æ·±ããã®ã§ãã
æè¿ãèšå€§ãªæ°ã®äœåããããŸããŸãªã¢ããªã±ãŒã·ã§ã³ããã³ããŸããŸãªãã©ãããã©ãŒã ã§ã®OpenMPã®äœ¿çšã«æ³šãããŠããŸã[7-12]ã ç¹ã«èå³æ·±ãã®ã¯ãOpenMPã®äœ¿çšã«é¢ããåºæ¬çãªç¥èãå®å šã«èº«ã«ã€ããããšãã§ããæ¬ã§ãã åœå ã®æç®ã§ã¯ããããã¯æ å ±æºã§ã[13-16]ã
ãã®ããŒããŒã§ã¯ãOpenMPã®ãã£ã¬ã¯ãã£ããé¢æ°ãç°å¢å€æ°ã«ã€ããŠèª¬æããŸãã ãã®å Žåãäœæ¥ã®è©³çŽ°ã¯ãããžã¿ã«ä¿¡å·åŠçã®ã¿ã¹ã¯ã«å¯Ÿããæ¹åã§ãã ç¹å®ã®ãã£ã¬ã¯ãã£ãã®æå³ã瀺ãäŸã¯ãICSPã§ã®å®è£ ã«éç¹ã眮ããŠããŸãã ããŒããŠã§ã¢ãã©ãããã©ãŒã ãšããŠã8ã€ã®DSPã³ã¢ãå«ãTexas Instrumentsã®MTsSP TMS320C6678ããã»ããµ[17]ãéžæããŸããã ãã®ICSPãã©ãããã©ãŒã ã¯ãåœå åžå Žã§æãå é²çãªéèŠã®1ã€ã§ãã ããã«ããã®ããŒããŒã§ã¯ããªã¢ã«ã¿ã€ã ä¿¡å·åŠçã¿ã¹ã¯ã«é¢é£ããOpenMPã¡ã«ããºã ã®å éšçµç¹ã®åé¡ãããã³æé©åã®åé¡ãæ€èšããŠããŸãã
åé¡ã®å£°æ
ãããã£ãŠãåŠçã¿ã¹ã¯ã¯ãåãé·ãã®2ã€ã®å ¥åä¿¡å·ã®åèšãšããŠåºåä¿¡å·ãçæããããšã«ãªããŸãã
z(n) = x(n) + y(n), n = 0, 1, âŠ, N-1
æšæºC / C ++èšèªã§ã®ãã®ã¿ã¹ã¯ã®ãã·ã³ã°ã«ã³ã¢ãå®è£ ã¯ã次ã®ããã«ãªããŸãã
void vecsum(float * x, float * y, float * z, int N) { for ( int i=0; i<N; i++) z[i] = x[i] + y[i]; }
ä»ã8ã³ã¢ããã»ããµTMS320C6678ããããšããŸãã åé¡ã¯ããã«ãã³ã¢ã¢ãŒããã¯ãã£ã®æ©èœã䜿çšããŠãã®ããã°ã©ã ãå®è£ ããæ¹æ³ã§ããïŒ
1ã€ã®è§£æ±ºçã¯ã8ã€ã®å¥åã®ããã°ã©ã ãéçºãããããã8ã€ã®ã³ã¢ã«åå¥ã«ããŒãããããšã§ãã ããã«ã¯ãã¡ã¢ãªå ã®é åã®äœçœ®ãã«ãŒãã«éã®é åã®éšåã®åé¢ãªã©ãå ±åå®è¡ã«ãŒã«ãèæ ®ããå¿ èŠããã8ã€ã®åå¥ã®ãããžã§ã¯ããååšããŸãã ããã«ãã³ã¢ãåæããè¿œå ããã°ã©ã ãäœæããå¿ èŠããããŸãã1ã€ã®ã³ã¢ãã¢ã¬ã€ã®äžéšã®åœ¢æãå®äºããå Žåãããã¯ã¢ã¬ã€å šäœã®æºåãã§ããŠããããšãæå³ããŸããã ãã¹ãŠã®ã³ã¢ã®å®äºãæåã§ç¢ºèªãããããã¹ãŠã®ã³ã¢ãããã©ã°ãéä¿¡ããŠ1ã€ã®ãã¡ã€ã³ãã³ã¢ã®åŠçãå®äºããå¿ èŠããããŸããããã«ãããåºåé åã®æºåç¶æ³ã«é¢ããé©åãªã¡ãã»ãŒãžã衚瀺ãããŸãã
説æããã¢ãããŒãã¯æ£ç¢ºãã€å¹æçã§ãããå®è£ ããã®ã¯éåžžã«é£ããããããã«ããŠãéçºè ã¯æ¢åã®ãœãããŠã§ã¢ãå€§å¹ ã«æ¹è¯ããå¿ èŠããããŸãã ãœãŒã¹ã³ãŒããžã®æå°éã®å€æŽã§ãã·ã³ã°ã«ã³ã¢ãããã«ãã³ã¢ãžã®å®è£ ã«ç§»è¡ã§ããããã«ããããšèããŠããŸãã ãããOpenMPã解決ããåé¡ã§ãã
OpenMPã®åæèšå®
ããã°ã©ã ã§OpenMPã䜿çšããåã«ãæããã«ããã®æ©èœããããžã§ã¯ãã«æ¥ç¶ããå¿ èŠããããŸãã TMS320C6678ããã»ããµã®å Žåãããã¯ãããžã§ã¯ãæ§æãã¡ã€ã«ãšäœ¿çšãããã©ãããã©ãŒã ãå€æŽããããšãããã³ãããžã§ã¯ãããããã£ã«OpenMPã³ã³ããŒãã³ããžã®ãªã³ã¯ãå«ããããšãæå³ããŸãã ãã®èšäºã§ã¯ãç¹å®ã®ããŒããŠã§ã¢ãã©ãããã©ãŒã ã«åºæã®ãã®ãããªèšå®ã¯èæ ®ããŸããã ããäžè¬çãªåæOpenMPèšå®ãæ€èšããŠãã ããã
OpenMPã¯Cèšèªã®æ¡åŒµæ©èœã§ãããããããã°ã©ã ã«ãã£ã¬ã¯ãã£ããšæ©èœãå«ããã«ã¯ããã®æ©èœã®èª¬æãã¡ã€ã«ãå«ããå¿ èŠããããŸãã
#include <ti/omp/omp.h>
次ã«ãåŠçããã³ã¢ã®æ°ãã³ã³ãã€ã©ãŒïŒããã³OpenMPæ©èœïŒã«äŒããå¿ èŠããããŸãã OpenMPã¯ã«ãŒãã«ã§ã¯ãªãã䞊åã¹ã¬ããã§åäœããããšã«æ³šæããŠãã ããã 䞊åãããŒã¯è«ççãªæŠå¿µã§ãããã³ã¢ã¯ç©ççãªããŒããŠã§ã¢ã§ãã ç¹ã«ãè€æ°ã®äžŠåã¹ã¬ããã1ã€ã®ã³ã¢ã«å®è£ ã§ããŸãã åæã«ãã³ãŒãã®çã®äžŠåå®è¡ã¯ãåœç¶ã䞊åã¹ã¬ããã®æ°ãã³ã¢ã®æ°ãšäžèŽããåã¹ã¬ãããç¬èªã®ã³ã¢ã«å®è£ ãããŠããããšãæå³ããŸãã å°æ¥çã«ã¯ãããããŸãã«ç¶æ³ã®ããã«èŠãããšä»®å®ããŸãã ãã ãã䞊åã¹ã¬ããã®æ°ãšãã®å®è£ ã®ã«ãŒãã«çªå·ã¯äžèŽããå¿ èŠããªãããšã«æ³šæããŠãã ããïŒ
OpenMPã®åæèšå®ã«ã次ã®OpenMPé¢æ°ã䜿çšããŠäžŠåã¹ã¬ããã®æ°ãå²ãåœãŠãŸãã
omp_set_num_threads(8);
ã³ã¢ïŒã¹ã¬ããïŒã®æ°ã8ã«èšå®ããŸãã
䞊åãã£ã¬ã¯ãã£ã
ãããã£ãŠãäžèšã®ããã°ã©ã ã®ã³ãŒãã8ã³ã¢ã§å®è¡ããå¿ èŠããããŸãã OpenMPã§ã¯ã次ã®ããã«ã³ãŒãã«parallelãã£ã¬ã¯ãã£ããè¿œå ããã ãã§ãã
#include <ti/omp/omp.h> void vecsum (float * x, float * y, float * z, int N) { omp_set_num_threads(8); #pragma omp parallel { for ( int i=0; i<N; i++) z[i] = x[i] + y[i]; } }
ãã¹ãŠã®OpenMPãã£ã¬ã¯ãã£ãã¯ã次ã®åœ¢åŒã®æ§é ã®åœ¢åŒã§çºè¡ãããŸãã
#pragma omp <_> [[(,)][[(,)]] âŠ].
ç§ãã¡ã®å Žåããªãã·ã§ã³ã䜿çšããŸããã䞊åãã£ã¬ã¯ãã£ãã¯ãäžæ¬åŒ§ã§åŒ·èª¿è¡šç€ºããã次ã®ã³ãŒããã©ã°ã¡ã³ãã䞊åé åãåç §ãã1ã€ã§ã¯ãªãæå®ãããã³ã¢å šäœã§å®è¡ããå¿ èŠãããããšãæå³ããŸãã
1ã€ã®ã¡ã€ã³ã³ã¢ãŸãã¯ãªãŒãã£ã³ã°ã³ã¢ïŒãã¹ã¿ãŒã³ã¢ïŒã§å®è¡ãããããã°ã©ã ãååŸãããã©ã¬ã«ãã£ã¬ã¯ãã£ãã§åŒ·èª¿è¡šç€ºãããŠãããã©ã°ã¡ã³ãã¯ããªãŒãã£ã³ã°ã«ãŒãã«ãšã¹ã¬ãŒãã«ãŒãã«ã®äž¡æ¹ãå«ãç¹å®ã®æ°ã®ã³ã¢ã§å®è¡ãããŸãã çµæã®å®è£ ã§ã¯ãåããµã€ã¯ã«ã®å ç®ãã¯ãã«ã8ã³ã¢ã§ããã«å®è¡ãããŸãã
OpenMPã§ã®äžŠåã³ã³ãã¥ãŒãã£ã³ã°ã®å žåçãªçµç¹æ§é ãå³1ã«ç€ºããŸãã
å³1. OpenMPã§ã®äžŠåã³ã³ãã¥ãŒãã£ã³ã°ã®åç
ããã°ã©ã ã³ãŒãã®å®è¡ã¯åžžã«ããã¹ã¿ãŒã¹ã¬ããã®1ã€ã®ã³ã¢ã§å®è¡ãããé 次é åããå§ãŸããŸãã 察å¿ããOpenMPãã£ã¬ã¯ãã£ãã§ç€ºããã䞊åé åã®éå§ç¹ã§ãã¹ããªãŒã ã»ããïŒäžŠåé åïŒã®OpenMPãã£ã¬ã¯ãã£ãã«ç¶ãã³ãŒãã®äžŠåå®è¡ã®ç·šæãè¡ãããŸãã ç°¡åã«ããããã«ãå³ã«ã¯4ã€ã®äžŠåãããŒã®ã¿ã瀺ãããŠããŸãã 䞊åé åã®çµããã§ããããŒã¯çµåãããäºãã®äœæ¥ã®å®äºãåŸ ã£ãŠãããé 次é åãåã³ç¶ããŸãã
ãããã£ãŠãããã°ã©ã ãå®è£ ããããã«8ã€ã®ã³ã¢ã䜿çšããããšãã§ããŸãããããã¹ãŠã®ã³ã¢ãåãäœæ¥ãè¡ãããããã®ãããªäžŠååã«ã¯æå³ããããŸããã 8ã€ã®ã³ã¢ã8ååãåºåããŒã¿é åã圢æããŸããã åŠçæéã¯ççž®ãããŠããŸããã æããã«ãäœæ¥ãç°ãªãã³ã¢ã«åå²ããå¿ èŠããããŸãã
ã¢ãããžãŒãæããŸãããã 8人ã®ããŒã ãäœããŸãããã ãããã®1ã€ãã¡ã€ã³ã§ãã æ®ãã¯åœŒã®ã¢ã·ã¹ã¿ã³ãã§ãã 圌ãã¯ããŸããŸãªæŽ»åã®ãªã¯ãšã¹ããåãåããŸãã äž»ãªåŸæ¥å¡ã¯æ³šæãåãå ¥ããŠå®è¡ããå¯èœãªå Žåã¯ã¢ã·ã¹ã¿ã³ããæ¥ç¶ããŸãã åŸæ¥å¡ãæåã«åãçµãã äœæ¥ã¯ãããã¹ããè±èªãããã·ã¢èªã«ç¿»èš³ããããšã§ããã ããŒã ãªãŒããŒã¯äœæ¥ãéå§ãããœãŒã¹ããã¹ããåããèŸæžãæºåããåã¢ã·ã¹ã¿ã³ãã®ããã¹ããã³ããŒããŠãåãããã¹ããå šå¡ã«é åžããŸããã 翻蚳ãå®äºããŸãã ã¿ã¹ã¯ã¯æ£ãã解決ãããŸãã ãã ãã7人ã®ã¢ã·ã¹ã¿ã³ããããããšã«ããå©çã¯ãããŸããã ãŸã£ããéã§ãã åããã£ã¯ã·ã§ããªãã³ã³ãã¥ãŒã¿ãŒããŸãã¯ãœãŒã¹ã³ãŒããå ±æããå¿ èŠãããå Žåãã¿ã¹ã¯ãå®äºããã®ã«æéããããããšããããŸãã OpenMPã¯æåã®äŸã§ãæ©èœããŸãã ä»äºã®åé¢ãå¿ èŠã§ãã ååŸæ¥å¡ã¯ãäžè¬çãªããã¹ãã®ã©ã®éšåãèªåã翻蚳ãã¹ããã瀺ãå¿ èŠããããŸãã
é åãåèšããåé¡ã®ã³ã³ããã¹ãã§ã«ãŒãã«éã§äœæ¥ãåå²ããæãããªæ¹æ³ã¯ãã«ãŒãã«ã®æ°ã«å¿ããŠã«ãŒãã«éã§ãµã€ã¯ã«ã®å埩ãåæ£ããããšã§ãã ã³ãŒããå®è¡ãããŠããã«ãŒãã«ãèŠã€ãããã®æ°ã«å¿ããŠã«ãŒãã®å埩ç¯å²ãèšå®ããã«ã¯ã䞊åé åå ã§ååã§ãã
#include <ti/omp/omp.h> void vecsum (float * x, float * y, float * z, int N) { omp_set_num_threads(8); #pragma omp parallel { core_num = omp_get_thread_num(); a=(N/8)*core_num; b=a+N/8; for (int i=a; i<b; i++) z[i] = x[i] + y[i]; } }
ã«ãŒãã«çªå·ã¯ãOpenMPé¢æ°omp_get_thread_numïŒïŒ;ã«ãã£ãŠèªã¿åãããŸãã ãã®æ©èœã¯ã䞊åé åå ã§ã¯ãã¹ãŠã®ã³ã¢ã§åãããã«å®è¡ãããŸãããç°ãªãã³ã¢ã§ã¯ç°ãªãçµæãåŸãããŸãã ããã«ããã䞊åé åå ã§äœæ¥ãããã«åå²ããããšãå¯èœã«ãªããŸãã ç°¡åã«ããããã«ããµã€ã¯ã«Nã®å埩åæ°ã¯ã«ãŒãã«æ°ã®åæ°ã§ãããšä»®å®ããŸãã ã«ãŒãã«çªå·ã®èªã¿åãã¯ãç¹å¥ãªã«ãŒãã«çªå·ã¬ãžã¹ã¿ïŒTMS320C6678ããã»ããµã®DNUMã¬ãžã¹ã¿ïŒã®åã³ã¢ã®ååšã«åºã¥ãããŒããŠã§ã¢ã«åºã¥ããŠè¡ãããšãã§ããŸãã ã¢ã»ã³ãã©ã³ãã³ããCSLããããµããŒãã©ã€ãã©ãªã®æ©èœãªã©ãããŸããŸãªæ¹æ³ã§ã¢ã¯ã»ã¹ã§ããŸãã ãã ããOpenMPã¢ãã€ã³ãæäŸããæ©èœãå©çšã§ããŸãã ãã ããããã§ã¯ãOpenMPã®ã«ãŒãã«çªå·ãšäžŠåé åçªå·ãç°ãªãæŠå¿µã§ãããšããäºå®ã«åã³æ³šæãæãå¿ èŠããããŸãã ããšãã°ã3çªç®ã®äžŠåã¹ã¬ããã¯ãããšãã°5çªç®ã®ã³ã¢ã§å®è¡ãããŸãã ããã«ã次ã®äžŠåé åã§ããŸãã¯åã䞊åé åãééãããšãã«ãããšãã°4çªç®ã®ã³ã¢ã§3çªç®ã®ã¹ã¬ãããå®è¡ã§ããŸãã ãªã©ãªã©ã
8ã³ã¢ã§å®è¡ãããããã°ã©ã ããããŸããã åã³ã¢ã¯å ¥åé åã®ç¬èªã®éšåãåŠçããåºåé åã®å¯Ÿå¿ããé åã圢æããŸãã ååŸæ¥å¡ã¯ããã¹ãã®1/8ã®éšåã翻蚳ããçæ³çã«ã¯ãåé¡ã解決ããã®ã«8åã®å éãåŸãããšãã§ããŸãã
Forããã³Parallel forãã£ã¬ã¯ãã£ã
æãåçŽãªãã£ã¬ã¯ãã£ãparallelãæ€èšããŸãããããã«ãããè€æ°ã®ã³ã¢ã§äžŠè¡ããŠå®è¡ããå¿ èŠãããã³ãŒãå ã®ãã©ã°ã¡ã³ããéžæã§ããŸãã ãã ãããã®ãã£ã¬ã¯ãã£ãã¯ããã¹ãŠã®ã«ãŒãã«ãåãã³ãŒããå®è¡ããäœæ¥ã®åé¢ããªãããšãæå³ããŸãã ç§ãã¡ã¯èªåã§ãããããªããã°ãªããŸããã§ããã
䞊åé åå ã®äœæ¥ãã«ãŒãã«éã§ã©ã®ããã«åå²ãããããèªåçã«ç€ºããå Žåã«ãã£ãŠã¯è¿œå ã®forãã£ã¬ã¯ãã£ãã䜿çšããŸãã ãã®ãã£ã¬ã¯ãã£ãã¯foråã®ã«ãŒãã®çŽåã®äžŠåé åå ã§äœ¿çšãããã«ãŒãã«éã§ã«ãŒãã®ç¹°ãè¿ããåæ£ããå¿ èŠãããããšã瀺ããŸãã 䞊åãã£ã¬ã¯ãã£ããšforãã£ã¬ã¯ãã£ãã¯å¥ã ã«äœ¿çšã§ããŸãã
#pragma omp parallel #pragma omp for
ãŸããã¬ã³ãŒããåæžããããã«ã1ã€ã®ãã£ã¬ã¯ãã£ãã§äžç·ã«äœ¿çšã§ããŸãã
#pragma omp parallel for
é åã®äŸã§ãã£ã¬ã¯ãã£ãfor parallelã䜿çšãããšã次ã®ããã°ã©ã ã³ãŒãã«ãªããŸãã
#include <ti/omp/omp.h> void vecsum (float * x, float * y, float * z, int N) { int i; omp_set_num_threads(8); #pragma omp parallel for for (i=0; i<N; i++) z[i] = x[i] + y[i]; }
ãã®ããã°ã©ã ãå ã®ã·ã³ã°ã«ã³ã¢å®è£ ãšæ¯èŒãããšãéãã¯ãããããã§ããããšãããããŸãã omp.hããããŒãã¡ã€ã«ãæ¥ç¶ãããã©ã¬ã«ã¹ã¬ããã®æ°ãèšå®ãã1è¡ïŒãã©ã¬ã«forãã£ã¬ã¯ãã£ãïŒãè¿œå ããŸããã
泚é1.æšè«ã§æå³çã«é ããã1ã€ã®éãã¯ãå€æ°iã®å®£èšãã«ãŒãããé¢æ°å€æ°ãèšè¿°ããã»ã¯ã·ã§ã³ã«ãããæ£ç¢ºã«ã¯ã³ãŒãã®äžŠåé åããé 次é åã«è»¢éããããšã§ãã ãã®ã¢ã¯ã·ã§ã³ã説æããã«ã¯ææå°æ©ã§ãããããã¯åºæ¬çãªãã®ã§ããããã©ã€ããŒããªãã·ã§ã³ãšå ±æãªãã·ã§ã³ã«é¢ããã»ã¯ã·ã§ã³ã§åŸã»ã©èª¬æããŸãã
åè2.ã«ãŒãã®ç¹°ãè¿ãã¯ã«ãŒãã«éã§åå²ããããšèšããŸãããã©ã®ããã«æ£ç¢ºã«åå²ããããã¯è¿°ã¹ãŠããŸããã ã©ã®ã³ã¢ã§å®è¡ããããµã€ã¯ã«ã®å ·äœçãªå埩ã¯äœã§ããïŒ OpenMPã«ã¯ã䞊åã¹ã¬ããã«å埩ãåæ£ããããã®ã«ãŒã«ãèšå®ããæ©èœããããŸãããããã®æ©èœã«ã€ããŠã¯åŸã§èª¬æããŸãã ãã ãã以åã«æ€èšããæ¹æ³ã§æåã§ã®ã¿ç¹å®ã®ã«ãŒãã«ãç¹å®ã®å埩ã«åºå®ããããšãã§ããŸãã 確ãã«ãéåžžããã®ãããªãã€ã³ãã£ã³ã°ã¯å¿ èŠãããŸããã ãµã€ã¯ã«ã®å埩åæ°ãã«ãŒãã«æ°ã®åæ°ã§ãªãå Žåãã«ãŒãã«å šäœã®å埩ã®åæ£ãå®è¡ãããè² è·ãå¯èœãªéãåçã«åæ£ãããŸãã
ã»ã¯ã·ã§ã³ãšäžŠåã»ã¯ã·ã§ã³ã®ãã£ã¬ã¯ãã£ã
ã³ã¢éã®äœæ¥ã®åé¢ã¯ãããŒã¿ã®åé¢ã«åºã¥ããŠããŸãã¯ã¿ã¹ã¯ã®åé¢ã«åºã¥ããŠè¡ãããšãã§ããŸãã ã¢ãããžãŒãæãåºããŠãã ããã ãã¹ãŠã®åŸæ¥å¡ãåãããšïŒããã¹ãã翻蚳ããŠããïŒãããŠãããããããããç°ãªãããã¹ãã翻蚳ããŠããå Žåãããã¯æåã®ã¿ã€ãã®äœæ¥åºåãã€ãŸãããŒã¿åé¢ãæå³ããŸãã åŸæ¥å¡ãããŸããŸãªã¢ã¯ã·ã§ã³ãå®è¡ããå Žåãããšãã°ã1ã€ã¯ããã¹ãå šäœã翻蚳ãããã1ã€ã¯åœŒã®èŸæžã§åèªãæ¢ãã3ã€ç®ã¯ç¿»èš³ããã¹ããå ¥åããŸãã 調æ»ãã䞊åãã£ã¬ã¯ãã£ããšforãã£ã¬ã¯ãã£ãã«ãããããŒã¿ãåå²ããŠäœæ¥ãå ±æã§ããŸããã ã«ãŒãã«éã§ã¿ã¹ã¯ãåé¢ãããšãã»ã¯ã·ã§ã³ãã£ã¬ã¯ãã£ããå®è¡ã§ããŸããããã¯ãforãã£ã¬ã¯ãã£ãã®å Žåã®ããã«ããã©ã¬ã«ãã£ã¬ã¯ãã£ããšã¯ç¬ç«ããŠããŸãã¯äžç·ã«äœ¿çšããŠã¬ã³ãŒããåæžã§ããŸãã
#pragma omp parallel #pragma omp sections
ãããŠ
#pragma omp parallel sections
äŸãšããŠã3ã€ã®ããã»ããµã³ã¢ã䜿çšããããã°ã©ã ãæäŸããŸããåã³ã¢ã¯ãå ¥åä¿¡å·xãåŠçããç¬èªã®ã¢ã«ãŽãªãºã ãå®è¡ããŸãã
#include <ti/omp/omp.h> void sect_example (float* x) { omp_set_num_threads(3); #pragma omp parallel sections { #pragma omp section Algorithm1(x); #pragma omp section Algorithm2(x); #pragma omp section Algorithm3(x); } }
å ±æããã©ã€ããŒããããã©ã«ãã®ãªãã·ã§ã³
æ€èšã®ããã«æ°ããäŸãéžæããŸãã 2ã€ã®ãã¯ãã«ã®ã¹ã«ã©ãŒç©ãèšç®ããŸãã ãã®æé ãå®è£ ããåçŽãªCããã°ã©ã ã¯æ¬¡ã®ããã«ãªããŸãã
float x[N]; float y[N]; void dotp (void) { int i; float sum; sum = 0; for (i=0; i<N; i++) sum = sum + x[i]*y[i]; }
å®è¡çµæïŒ16èŠçŽ ã®ãã¹ãé åã®å ŽåïŒã¯çããããšãå€æããŸããã
[TMS320C66x_0] sum = 331.0
parallel forãã£ã¬ã¯ãã£ãã䜿çšããŠããã®ããã°ã©ã ã®äžŠåå®è£ ã«é²ã¿ãŸãããã
float x[N]; float y[N]; void dotp (void) { int i; float sum; sum = 0; #pragmaomp parallel for { for (i=0; i<N; i++) sum = sum + x[i]*y[i]; } }
å®è¡çµæïŒ
[TMS320C66x_0] sum= 6.0
ããã°ã©ã ã¯ééã£ãçµæãåºããŸãïŒ ãªãã§ïŒ
ãã®è³ªåã«çããã«ã¯ãå€æ°ã®å€ãã·ãŒã±ã³ã·ã£ã«é åãšãã©ã¬ã«é åã§ã©ã®ããã«æ¥ç¶ãããŠããããç解ããå¿ èŠããããŸãã OpenMPã®ããžãã¯ã«ã€ããŠè©³ãã説æããŸãã
dotpïŒïŒé¢æ°ã¯ã0çªç®ã®ããã»ããµã³ã¢ã®ã·ãŒã±ã³ã·ã£ã«é åãšããŠå®è¡ãéå§ããŸãã åæã«ãé åxããã³yã¯ãå€æ°Iããã³sumãšåæ§ã«ãããã»ããµã¡ã¢ãªå ã§ç·šæãããŸãã parallelãã£ã¬ã¯ãã£ãã«éãããšãOpenMPãŠãŒãã£ãªãã£é¢æ°ãæ©èœããã³ã¢ã®åŸç¶ã®äžŠåæäœãæŽçããŸãã ã«ãŒãã«ã¯åæåãããåæãããããŒã¿ãæºåãããäžè¬çãªéå§ãè¡ãããŸãã å€æ°ãšé åã¯ã©ããªããŸããïŒ
OpenMPã®ãã¹ãŠã®ãªããžã§ã¯ãïŒå€æ°ãšé åïŒã¯ãå ±æïŒå ±æïŒãšãã©ã€ããŒãïŒãã©ã€ããŒãïŒã«åããããšãã§ããŸãã å ±æãªããžã§ã¯ãã¯å ±æã¡ã¢ãªã«é 眮ããã䞊åé åå ã®ãã¹ãŠã®ã³ã¢ã«ãã£ãŠåãåºç€ã§äœ¿çšãããŸãã å ±éãªããžã§ã¯ãã¯ãé 次é åå ã®åãååã®ãªããžã§ã¯ããšäžèŽããŸãã ãããã¯ãã®æå³ãä¿æãããŸãŸãã·ãŒã±ã³ã·ã£ã«ãããªãŒãžã§ã³ã«å¹³è¡ã«ç§»åããå€æŽãªãã«æ»ããŸãã 䞊åé åå ã®ãã®ãããªãªããžã§ã¯ããžã®ã¢ã¯ã»ã¹ã¯ããã¹ãŠã®ã³ã¢ã«å¯ŸããŠåãåºç€ã§å®è¡ãããå ±æã®ç«¶åãçºçããå¯èœæ§ããããŸãã ãã®äŸã§ã¯ãå€æ°xãšyã®é åã¯ããã©ã«ãã§å ±éã§ããããšãå€æããŸããã ãã¹ãŠã®ã³ã¢ãããããªãŒãšåãå€æ°åèšã䜿çšããããšãããããŸããã ãã®çµæãããã€ãã®ã³ã¢ãããããªãŒã®åãé»æµå€ãåæã«èªã¿åãããããã«éšåçãªå¯äžãè¿œå ããæ°ããå€ãããããªãŒã«æžã蟌ãç¶æ³ãæã çºçããŸãã åæã«ãæåŸã«èšé²ããã³ã¢ã¯ä»ã®ã³ã¢ã®çµæãæ¶å»ããŸãã ãã®ããããã®äŸã§ã¯ééã£ãçµæãåºãŸããã
äžè¬å€æ°ãšãã©ã€ããŒãå€æ°ã䜿çšããåçãå³2ã«ç€ºããŸãã
å³2.ãããªãã¯å€æ°ãšãã©ã€ããŒãå€æ°ãæäœããOpenMPã®å³
ãã©ã€ããŒããªããžã§ã¯ãã¯ãã³ã¢ããšã«åå¥ã«äœæãããå ã®ãªããžã§ã¯ãã®ã³ããŒã§ãã ãããã®ã³ããŒã¯ã䞊åé åã®åæåäžã«åçã«äœæãããŸãã ãã®äŸã§ã¯ãã«ãŒãå埩ã«ãŠã³ã¿ãŒãšããŠã®å€æ°iã¯ããã©ã«ãã§ãã©ã€ããŒããšèŠãªãããŸãã 䞊åãã£ã¬ã¯ãã£ãã«å°éãããšããã®å€æ°ã®8ã€ã®ã³ããŒïŒäžŠåã¹ã¬ããã®æ°ã«ããïŒãããã»ããµã¡ã¢ãªã«äœæãããŸãã ãã©ã€ããŒãå€æ°ã¯ãåã³ã¢ã®ãã©ã€ããŒãã¡ã¢ãªã«é 眮ãããŸãïŒããŒã«ã«ã¡ã¢ãªã«é 眮ããããšããäžè¬ã«ãå€æ°ã®å®£èšæ¹æ³ãã¡ã¢ãªã®æ§ææ¹æ³ã«å¿ããŠé 眮ããããšãã§ããŸãïŒã ãã©ã€ããŒãã³ããŒã¯ãã·ãŒã±ã³ã·ã£ã«ãªãŒãžã§ã³ã®ãœãŒã¹ãªããžã§ã¯ãã«æ±ºããŠé¢é£ä»ããããŸããã ããã©ã«ãã§ã¯ããœãŒã¹ãªããžã§ã¯ãã®å€ã¯äžŠåé åã«è»¢éãããŸããã ãªããžã§ã¯ãã®ãã©ã€ããŒãã³ããŒãã䞊åé åå®è¡ã®éå§æã«ã©ã®ããã«ãªã£ãŠãããã¯ããããŸããã 䞊åé åã®æåŸã§ããã©ã€ããŒãã³ããŒã®å€ã¯ããããã®å€ãé 次é åã«è»¢éããããã®ç¹å¥ãªæªçœ®ãè¬ããããªãéããåã«å€±ãããŸããããã«ã€ããŠã¯åŸã§èª¬æããŸãã
ã©ã®ãªããžã§ã¯ãããã©ã€ããŒããšèŠãªãã¹ãããã©ã®ãªããžã§ã¯ããå ±éãšèŠãªãããã³ã³ãã€ã©ãŒã«æ瀺çã«äŒããããã«ãOpenMPãã£ã¬ã¯ãã£ããšãšãã«å ±æããã³ãã©ã€ããŒããªãã·ã§ã³ã䜿çšãããŸãã äžè¬ãŸãã¯ãã©ã€ããŒãã«é¢é£ãããªããžã§ã¯ãã®ãªã¹ãã¯ã察å¿ãããªãã·ã§ã³ã®åŸã«æ¬åŒ§ã§å²ãŸããã«ã³ãã§ç€ºãããŸãã ãã®å Žåãå€æ°iãšsumã¯ãã©ã€ããŒãã§ãããé åxãšyã¯å ±æãããŠããå¿ èŠããããŸãã ãããã£ãŠã次ã®åœ¢åŒã®æ§é ã䜿çšããŸãã
#pragma omp parallel for private(i, sum) shared(x, y)
䞊åé åãéããšãã ããã§ãåã³ã¢ã«ã¯ç¬èªã®ããããªãŒããããèç©ã¯äºãã«ç¬ç«ããŠè¡ãããŸãã ããã«ãåæå€ãäžæãªã®ã§ãããããªãŒããŒãã«ãªã»ããããå¿ èŠããããŸãã ããã«ãåã³ã¢ã§åŸãããç¹å®ã®çµæãã©ã®ããã«çµã¿åãããããšããåé¡ãçããŸãã 1ã€ã®ãªãã·ã§ã³ã¯ã8ã»ã«ã®ç¹æ®ãªå ±éé åã䜿çšããããšã§ããåã³ã¢ã¯çµæã䞊åé åå ã«é 眮ãã䞊åé åãé¢ããåŸãã¡ã€ã³ã³ã¢ã¯ãã®é åã®èŠçŽ ãåèšããŠæçµçµæã圢æããŸãã 次ã®ããã°ã©ã ã³ãŒããååŸããŸãã
float x[N]; float y[N]; float z[8]; void dotp (void) { int i, core_num; float sum; sum = 0; #pragma omp parallel private(i, sum, core_num) shared(x, y, z) { core_num = omp_get_thread_num(); sum = 0; #pragma omp for for (i=0; i<N; i++) sum = sum + x[i]*y[i]; z[core_num] = sum; } for (i=0; i<8; i++) sum = sum + z[i]; }
å®è¡çµæïŒ
[TMS320C66x_0] sum= 331.0
ããã°ã©ã ã¯æ£ããåäœããŸãããå°ãé¢åã§ãã ããã«åçŽåããæ¹æ³ã«ã€ããŠèª¬æããŸãã
èå³æ·±ãã®ã¯ã䞊åé åã®åæåäžã«OpenMPé ååããã©ã€ããŒããªããžã§ã¯ããšããŠæå®ãããšãå€æ°ã®å Žåãšåãããã«åäœããããšã§ãããããã®é åã®ãã©ã€ããŒãã³ããŒãåçã«äœæãããŸãã ããã¯ãç°¡åãªå®éšãè¡ãããšã§ç¢ºèªã§ããŸãããã©ã€ããŒããªãã·ã§ã³ã䜿çšããŠé åã宣èšãããã®é åãžã®ãã€ã³ã¿ãŒã®å€ãã·ãªã¢ã«ããã³ãã©ã¬ã«é åã§åºåããŸãã 9ã€ã®ç°ãªãã¢ãã¬ã¹ã衚瀺ãããŸãïŒã³ã¢ã®æ°-8ïŒã
次ã«ãé åã®èŠçŽ ã®å€ãäºãã«é¢é£ããŠããªãããšã確èªã§ããŸãã ãŸããåã䞊åé åãç¶ããŠå ¥åãããšãé åã®ãã©ã€ããŒãã³ããŒã®ã¢ãã¬ã¹ãç°ãªãå Žåããããããã©ã«ãã§ã¯èŠçŽ å€ã¯ä¿åãããŸããã ããã¯ãã¹ãŠã䞊åé åãéãããéãããããOpenMPãã£ã¬ã¯ãã£ããéåžžã«é¢åã§ãããç¹å®ã®å®è¡æéãå¿ èŠãšãããšããäºå®ã«ã€ãªãããŸãã
䞊åé åãéãããã®ãã£ã¬ã¯ãã£ãã§ãªããžã§ã¯ãã®ã¿ã€ãïŒãããªãã¯/ãã©ã€ããŒãïŒãæ瀺çã«ç€ºãããŠããªãå ŽåãOpenMPã¯[5]ã§èª¬æãããŠããç¹å®ã®ã«ãŒã«ã«åŸã£ãŠãåäœãããŸãã OpenMPãªããžã§ã¯ãã¯ããã©ã«ããšããŠèª¬æãããŠããŸããã ã¿ã€ãããã©ã€ããŒãã§ãããå ±æã§ãããã¯ãOpenMPæäœã®ãã©ã¡ãŒã¿ãŒã®1ã€ã§ããç°å¢å€æ°ã«ãã£ãŠæ±ºãŸããŸãããã®ãã©ã¡ãŒã¿ãŒã¯ãæäœäžã«èšå®ããã³å€æŽã§ããŸããäŸå€ã¯ãã«ãŒãå埩ã«ãŠã³ã¿ãŒãšããŠäœ¿çšãããå€æ°ã§ããããã©ã«ãã§ã¯ãã©ã€ããŒããšèŠãªãããŸãã確ãã«ããã®èŠåã¯forãparallel forãªã©ã®ãã£ã¬ã¯ãã£ãã«ã®ã¿é©çšãããããããããã®å€æ°ã«ã¯ç¹ã«æ³šæãæãããšããå§ãããŸãã
ãã®ç¹ã§ãããã©ã«ããªãã·ã§ã³ã䜿çšãããšäŸ¿å©ã§ãããã®ãªãã·ã§ã³ã䜿çšãããšãã«ãŒã«ãé©çšããããªããžã§ã¯ãïŒããã©ã«ãã®ã¿ã€ãïŒãæå®ã§ããŸããåæã«ããã®ãªãã·ã§ã³ã®ãã©ã¡ãŒã¿ãŒãšããŠnoneãéžæããå Žåãå€æ°ã¯ããã©ã«ãã®åãåãå ¥ããããªãããšãæå³ããŸããã€ãŸãã䞊åé åã§çºçãããã¹ãŠã®ãªããžã§ã¯ãã®åã®å¿ é ã®æ瀺çãªæ瀺ãå¿ èŠã§ãã
#pragma omp parallel private(sum, core_num) shared(x, y, z) default(i)
ãŸãã¯ïŒ
#pragma omp parallel private(i, sum, core_num) shared(x, y, z) default(none)
åæžãªãã·ã§ã³
8ã€ã®ã³ã¢ã«ã¹ã«ã©ãŒç©ãå®è£ ããèæ ®ãããäŸã§ã¯ã1ã€ã®æ¬ ç¹ã«æ³šæããŸãããã³ã¢ã®éšåçãªçµæãçµåããã«ã¯ã³ãŒããå€§å¹ ã«å€æŽããå¿ èŠããããé¢åã§äžäŸ¿ã§ããåæã«ãopenMPã®æŠå¿µã¯ãã·ã³ã°ã«ã³ã¢ãããã«ãã³ã¢ãžã®å®è£ ããŸãã¯ãã®éãžã®ç§»è¡ã«ãããæ倧ã®éææ§ãæå³ããŸããåã®ã»ã¯ã·ã§ã³ã§èª¬æããããã°ã©ã ãç°¡çŽ åããããã«ãåæžãªãã·ã§ã³ã䜿çšã§ããŸãã
åæžãªãã·ã§ã³ã䜿çšãããšãã«ãŒãã«ã®çµæãçµåããå¿ èŠãããããšãã³ã³ãã€ã©ãŒã«äŒããããšãã§ãããã®ãããªçµåã®èŠåãèšå®ã§ããŸããåæžãªãã·ã§ã³ã¯ãå€ãã®æãäžè¬çãªç¶æ³ã«å¯Ÿå¿ããŠããŸãããªãã·ã§ã³ã®æ§æã¯æ¬¡ã®ãšããã§ãã
reduction ( : )
identifier-ãã©ã€ããŒããªçµæãçµåããã©ã®æäœãå®è¡ãããã決å®ããŸããç¹å®ã®çµæãè¡šãå€æ°ã®åæå€ãèšå®ããŸãã
ãªããžã§ã¯ãã®ãªã¹ãâã«ãŒãã«ã®æäœã®ç¹å®ã®çµæãå®åŒåããããã«äœ¿çšãããå€æ°ã®åå
çŸåšOpenMPæšæºã§æäŸãããŠããåæžãªãã·ã§ã³ã䜿çšããããã®ãã¹ãŠã®å¯èœãªãªãã·ã§ã³ãè¡š1ã«ç€ºããŸãã
å¯èœãªæäœèå¥åïŒ+ã*ã-ãïŒã|ã^ã&&ã||ãmaxãmin
察å¿ããå€æ°ã®åæå€ïŒ0ã 1ã0ã0ã0ã0ã1ã0ããã®ã¿ã€ãã®æå°å€ããã®ã¿ã€ãã®æ倧å€ã
ã¹ã«ã©ãŒè£œåããã°ã©ã ã§ã¯ãsumå€æ°ã«èå¥åã+ããæå®ããçž®çŽãªãã·ã§ã³ã䜿çšããŸãã
float x[N]; float y[N]; void dotp (void) { int i; float sum; #pragma omp parallel for private(i) shared(x, y) reduction(+:sum) for (i=0; i<N; i++) sum += x[i]*y[i]; }
å®è¡çµæïŒ
[TMS320C66x_0] sum= 331.0
ããã°ã©ã ã¯æ£ããçµæãæäŸãããšåæã«ãéåžžã«ã³ã³ãã¯ãã«èŠããå ã®ãã·ãŒã±ã³ã·ã£ã«ãã³ãŒããšã®æå°éã®éãã®ã¿ãå«ã¿ãŸãïŒ
OpenMP Sync
ãã«ãã³ã¢ããã»ããµã§çºçããäž»ãªåé¡ã®1ã€ã¯ãã³ã¢ã®åæã®åé¡ã§ããè€æ°ã®ã³ã¢ã1ã€ã®äžè¬çãªåé¡ãåæã«è§£æ±ºããå ŽåãååãšããŠãã¢ã¯ã·ã§ã³ã調æŽããå¿ èŠããããŸããããã³ã¢ãå¥ã®ã³ã¢ãããæ©ãããã€ãã®æ©èœãå®è¡ãå§ãããšãäžè¬çãªäœæ¥ã®çµæãäžæ£ç¢ºã«ãªãããšããããŸãããã¹ãŠã®ã«ãŒãã«ã1ã€ã®å ±éå€æ°ã§åäœããããã«ãããšãã«ããã§ã«ãã®åé¡ã«éšåçã«ééããŸãããççŸã¯ééã£ãçµæããããããŸããã
äžè¬çãªå Žåãã«ãŒãã«ã®åæã¯ãããã°ã©ã ã³ãŒãã®ç¹å®ã®ãã€ã³ãã§ãã¹ãŠã®ã«ãŒãã«ãŸãã¯ãã®å¿ èŠãªéšåãäœæ¥ãåæ¢ããç¹å®ã®ãã€ã³ãïŒåæãã€ã³ãïŒã«å°éããããšãä»ã®ã«ãŒãã«ã«éç¥ããä»ã®ãã¹ãŠã®ã«ãŒãã«ããã®ãã€ã³ãã«å°éãããŸã§äœæ¥ãç¶è¡ããªããšããäºå®ããæããŸãåæã 1ã€ã®äžŠåãã©ã°ã¡ã³ããå®äºãããšããã¥ãŒã¯ãªã¢ã¹ã¯äºãã«åŸ æ©ãã次ã®ãã©ã°ã¡ã³ãã«ç§»åããŠäœæ¥ã調æŽããŸããã³ã¢ïŒãŸãã¯äžŠåã¹ã¬ããïŒã®åæã¯ãå®è¡å¯èœãªããã°ã©ã ã³ãŒãã«ããåæã ãã§ãªããããŒã¿ã«ããåæãæå³ããããšã«æ³šæããããšãéèŠã§ãããã£ãã·ã¥ã®åæããããŸãïŒãã£ãã·ã¥ã§å€æŽãããããŒã¿ã®ã¡ã€ã³ã¡ã¢ãªãžã®æ»ããããã¯éåžžã«éèŠãªãã€ã³ãã§ããOpenMPã³ã³ã»ããã®ã«ãŒãã«ã¯äž»ã«å ±æã¡ã¢ãªã§åäœãããã®ãã©ã°ã¡ã³ãã¯åã³ã¢ã®ããŒã«ã«ã¡ã¢ãªã«ãã£ãã·ã¥ãããŸãããã®çµæãæåã®ã³ã¢ã®ãã£ãã·ã¥ãšå ±æïŒã¡ã€ã³ïŒã¡ã¢ãªã®éåæåã«ããã1ã€ã®ã³ã¢ã«ãã£ãŠå€æŽãããå ±æå€æ°ã®å€ãä»ã®ã³ã¢ã«ãã£ãŠæ£ããèªã¿åãããªãå ŽåããããŸãã
OpenMPã«ã¯ãæé»çãšæ瀺çã®2çš®é¡ã®åæããããŸããæé»çãªåæã¯ã䞊åé åã®çµãããããã³omp forãompã»ã¯ã·ã§ã³ãªã©ãå«ã䞊åé åå ã«é©çšã§ããããã€ãã®ãã£ã¬ã¯ãã£ãã®çµããã§èªåçã«çºçããŸãããã®å Žåããã£ãã·ã¥ã®åæãèªåçã«è¡ãããŸãã
åé¡ã解決ããããã®ã¢ã«ãŽãªãºã ããèªååæãæäŸãããªã䞊åé åå ã®ããã°ã©ã ã®ãããã®ãã€ã³ãã§ã«ãŒãã«ãåæããå¿ èŠãããå Žåãéçºè ã¯æ瀺çãªåæã䜿çšã§ããŸã-ç¹å¥ãªãã£ã¬ã¯ãã£ãã䜿çšããŠOpenMPã³ã³ãã€ã©ã«ãããã°ã©ã ã®ãã®ãã€ã³ãã§åæãå¿ èŠã§ããããšãæ瀺çã«ç€ºããŸã ãããã®ãã£ã¬ã¯ãã£ãã®ã¡ã€ã³ãæ€èšããŠãã ããã
ããªã¢æ什
ããªã¢ãã£ã¬ã¯ãã£ãã¯æ¬¡ã®ããã«èšè¿°ãããŸãã
#pragma omp barrier
䞊åé åå ã®äžŠåOpenMPã¹ããªãŒã ã®åæãã€ã³ããæ瀺çã«èšå®ããŸãã以äžã¯ããã£ã¬ã¯ãã£ãã®äœ¿çšäŸã§ãã
#define CORE_NUM 8 float z[CORE_NUM]; void arr_proc(void) { omp_set_num_threads(CORE_NUM); int i, core_num; float sum; #pragma omp parallel private(core_num, i, sum) { core_num=omp_get_thread_num(); z[core_num]=core_num; #pragma omp barrier sum = 0; for(i=0;i<CORE_NUM;i++) sum=sum+z[i]; #pragma omp barrier z[core_num]=sum; } for(i=0;i<CORE_NUM;i++) printf("z[%d] = %f\n", i, z[i]); }
ãã®ããã°ã©ã ã§ã¯ã次ã®ç¶æ³ãã·ãã¥ã¬ãŒãããŸãããä¿¡å·ã®åŠçã«ãzé åã§ããŒã¿ãçæããã¹ããããzé åã§ããŒã¿ãåŠçããã¹ããããzé åã§åŠççµæãèšé²ããã¹ããããå«ããŸããããã°ã©ã ã®å Žåãæåã®æ®µéã§ãåã³ã¢ã¯å ±æã¡ã¢ãªã«ããzé åã®å¯Ÿå¿ããã»ã«ã«ãã®çªå·ãæžã蟌ã¿ãŸããããã«ããã¹ãŠã®ã³ã¢ã¯å ¥åé åã®åãåŠçãå®è¡ããŸããã€ãŸããèŠçŽ ã®åèšãèŠã€ããŸãã次ã«ããã¹ãŠã®ã«ãŒãã«ããã«ãŒãã«çªå·ã«å¯Ÿå¿ããzé åã®ã»ã«ã«çµæãæžã蟌ã¿ãŸããçµæãšããŠãé åå ã®ãã¹ãŠã®ã»ã«ã¯åãã§ãªããã°ãªããŸããããã ããããã¯ããªã¢ãã£ã¬ã¯ãã£ãããªããã°çºçããŸãããé åzã®ãã¹ãŠã®ã»ã«ã¯ç°ãªããäžè¬çã«ã¯ä»»æã§ãã第1段éãã第2段éã«ç§»è¡ãããšãã«ãŒãã«ã¯ãäºããåŸ ããã«ããŸã æºåãã§ããŠããªãããŒã¿ã®åŠçãéå§ããŸãã2çªç®ã®æ®µéãã3çªç®ã®æ®µéã«ç§»è¡ãããšãã«ãŒãã«ã¯çµæãzé åã«æžã蟌ã¿å§ããŸãããä»ã®ã«ãŒãã«ã¯ãã®é åã®å€ãèªã¿åã£ãŠåŠçã«äœ¿çšã§ããŸããäž¡æ¹ã®ããªã¢ãã£ã¬ã¯ãã£ãã®ååšã®ã¿ããããã°ã©ã ã®æ£ããå®è¡ãšãzé åã®ãã¹ãŠã®èŠçŽ ã§ã®åãèšç®çµæã®èšé²ãä¿èšŒããŸããå®è¡å¯èœã³ãŒãã«ããåæã¯ãããŒã¿ã®åæ-ãã£ãã·ã¥åæãæå³ããŸãã
éèŠãªæ什
éèŠãªãã£ã¬ã¯ãã£ãã¯æ¬¡ã®ããã«æžãããŠããŸãã
#pragma omp critical [ ]
ãŸããäžåºŠã«1ã€ã®ã³ã¢ã®ã¿ãå®è¡ã§ãã䞊åé åå ã®ã³ãŒããéžæããŸãã
, . . , , , , . , . , . , , : , , ; .
ä¿¡å·åŠçã®å Žåãç¶æ³ã¯åæ§ã§ããç¹å®ã®ã³ãŒããã©ã°ã¡ã³ããè€æ°ã®ã³ã¢ã§åæã«å®è¡ã§ããªãããšãåŠçã¢ã«ãŽãªãºã ã瀺åããŠããå Žåããã®ãããªãã©ã°ã¡ã³ãã¯criticalãã£ã¬ã¯ãã£ãã«ãã£ãŠåºå¥ã§ããŸãããã®ãã£ã¬ã¯ãã£ãã®é©çšäŸã¯æ¬¡ã®ããã«ãªããŸãã
#define CORE_NUM 8 #define N 1000 #define M 80 void crit_ex(void) { int i, j; int A[N]; int Z[N] = {0}; omp_set_num_threads(CORE_NUM); #pragma omp parallel for private (A) for (i = 0; i < M; i++) { poc_A(A, N); #pragma omp critical for (j=0; j<N; j++) Z[j] = Z[j] + A[j]; } }
ãã®ããã°ã©ã ã§ã¯ãé åAã®åŠçïŒé åïŒãšé åZã®åŠççµæã®èç©ã1ãµã€ã¯ã«ã§Måç¹°ãè¿ããããã«ãã³ã¢å®è£ ã«ç§»è¡ãããšãåŠçãµã€ã¯ã«ã®å埩ã8ã³ã¢ã«åæ£ãããŸãããã®å Žåãé åAã¯ãã©ã€ããŒãé åãšããŠãã€ãŸãåã³ã¢ã§ç¬ç«ããŠåŠçãããŸãããããã®æé ã«ã¯äŸåé¢ä¿ããªããããåŠçã¯ãã¹ãŠã®ã³ã¢ã§äžŠè¡ããŠå®è¡ã§ããŸããèç©ãããšããã¹ãŠã®ã³ã¢ã®äœæ¥çµæãå ±éã®Zé åã«çµåãããŸããã³ã¢ãåæããããã®ç¹å¥ãªæªçœ®ãåãããªãå Žåã䞊åã¹ã¬ããã¯1ã€ã®å ±éãªãœãŒã¹ã«ã¢ã¯ã»ã¹ããäºãã®äœæ¥ã«ãšã©ãŒãå°å ¥ããŸãããšã©ãŒãé²ãããã«ããã®å Žæã§äžŠåã¹ã¬ãããå®è¡ãããã®ãé²ãããšãã§ããŸãããªãœãŒã¹ïŒãã®å Žåã¯ã³ãŒãã®äžéšïŒãåŒãç¶ãæåã®ã³ã¢ãå®å šã«ãããææãããã¹ãŠã®ã¹ããããå®äºãããŸã§ãæ®ãã®ã³ã¢ã¯ãã³ãŒãã®ã¯ãªãã£ã«ã«ã»ã¯ã·ã§ã³ã®éå§æã«ãªãœãŒã¹ã解æŸãããã®ãåŸ ã¡ãŸããå®éã䞊åé åå ã®é 次åŠçã«ç§»è¡ããŠããŸãã
ã³ãŒãã§ã¯ãã¯ãªãã£ã«ã«ã»ã¯ã·ã§ã³ã次ã®æ§é ã«çœ®ãæããŸãã
#pragma omp critical (Z1add) for (j=0; j<N; j++) Z1[j] = Z1[j] + A[j]; #pragma omp critical (Z2mult) for (j=0; j<N; j++) Z2[j] = Z2[j] * A[j];
çŸåšã2ã€ã®éèŠãªã»ã¯ã·ã§ã³ããããŸãã1ã€ã¯ãæ žã®ä»äºã®çµæãåèšããããšã«ãã£ãŠçµã¿åãããããšã§ãããã1ã€ã¯ãä¹ç®ã§ããäž¡æ¹ã®ã»ã¯ã·ã§ã³ã¯1ã€ã®ã³ã¢ã§ã®ã¿åæã«å®è¡ã§ããŸãããç°ãªãã»ã¯ã·ã§ã³ã¯ç°ãªãã³ã¢ã§åæã«å®è¡ã§ããŸããé ååãã¯ãªãã£ã«ã«ãã£ã¬ã¯ãã£ããã¶ã€ã³ã«è¿œå ãããå Žåãå¥ã®ã«ãŒãã«ããã®é åã§åäœããå Žåã®ã¿ãã«ãŒãã«ã¯ã³ãŒããžã®ã¢ã¯ã»ã¹ãæåŠãããŸãããªãŒãžã§ã³ã«ååãå²ãåœãŠãããŠããªãå Žåãä»ã®ã«ãŒãã«ãã©ã®ãªãŒãžã§ã³ã§ãæ¥ç¶ãããŠããªããŠããä»ã®ã«ãŒãã«ããããã®ãªãŒãžã§ã³ã®ããããã§åäœããå Žåãã«ãŒãã«ã¯ã¯ãªãã£ã«ã«ãªãŒãžã§ã³ãå ¥åã§ããŸããã
ã¢ãããã¯ãã£ã¬ã¯ãã£ã
ã¢ãããã¯ãã£ã¬ã¯ãã£ãã¯æ¬¡ã®ããã«èšè¿°ãããŸãã
#pragma omp atomic [read | write | update | capture]
åã®äŸã§ã¯ãç°ãªãã³ã¢ãåãé åããåæã«ã³ãŒããå®è¡ããããšã¯çŠæ¢ãããŠããŸãããããããããã¯ç¶¿å¯ãªèª¿æ»ã§ã¯åççã«æããªããããããŸãããçµå±ã®ãšãããå ±æãªãœãŒã¹ãžã®ã¢ã¯ã»ã¹ã®ç«¶åã¯ãç°ãªãã«ãŒãã«ãåãã¡ã¢ãªã»ã«ã«åæã«ã¢ã¯ã»ã¹ã§ãããšããäºå®ã«ãããŸãã 1ã€ã®ã³ãŒãã®ãã¬ãŒã ã¯ãŒã¯å ã§ãç°ãªãã¡ã¢ãªã»ã«ã«ã¢ã¯ã»ã¹ããŠããçµæãæªãããšã¯ãããŸããã atomicãã£ã¬ã¯ãã£ãã䜿çšãããšãã«ãŒãã«ã®åæãã¡ã¢ãªèŠçŽ ã«ãã€ã³ãã§ããŸãã圌女ã¯æ¬¡ã®è¡ã§ã¡ã¢ãªæäœã¯ã¢ãããã¯ã§ãã-äžå¯è§£ã§ããããšãææããŸãïŒã«ãŒãã«ãäœããã®ã¡ã¢ãªã»ã«ã§æäœãéå§ãããšãæåã®ã³ã¢ãåäœãçµäºãããŸã§ä»ã®ãã¹ãŠã®ã«ãŒãã«ã«å¯ŸããŠãã®ã¡ââã¢ãªã»ã«ãžã®ã¢ã¯ã»ã¹ãéããããŸã圌女ãã¢ãããã¯ãã£ã¬ã¯ãã£ãã«ã¯ããªãã·ã§ã³ã瀺ãã¡ã¢ãªã§å®è¡ãããæäœã®çš®é¡ïŒèªã¿åã/æžã蟌ã¿/å€æŽ/ãã£ããã£ãäžèšã®äŸã¯ãatomicãã£ã¬ã¯ãã£ãã䜿çšãããšã次ã®ããã«ãªããŸãã
#define CORE_NUM 8 #define N 1000 #define M 80 void crit_ex(void) { int i, j; int A[N]; int Z[N] = {0}; omp_set_num_threads(CORE_NUM); #pragma omp parallel for private (A) for (i = 0; i < M; i++) { poc_A(A, N); for (j=0; j<N; j++) { #pragma omp atomic update Z[j] = Z[j] + A[j]; } }
çè«çã«ã¯ãã¢ãããã¯ãã£ã¬ã¯ãã£ãã䜿çšãããšããµã€ã¯ã«ã®å®å šãªé 次å®è¡ãããèŠæ±ãããé åèŠçŽ ã®æ°ãç°ãªãã³ã¢ã§äžèŽããå Žåã«åã ã®ã¡ã¢ãªã¢ã¯ã»ã¹æäœã®ã¿ã®é 次å®è¡ã«é²ããããåŠçæéãå€§å¹ ã«ççž®ãããŸãããã ããå®éã«ã¯ããã®ã¢ã€ãã¢ã®æå¹æ§ã¯ããã®å®è£ æ¹æ³ã«ãã£ãŠç°ãªããŸããããšãã°ãã¢ãããã¯ãã£ã¬ã¯ãã£ãã䜿çšããã«ãŒãã«åæããã«ãŒãã®åå埩ã§å ±æã¡ã¢ãªã«ãããã©ã°ã®èªã¿åãã«æžããšãã«ãŒãã®å®è¡æéãå€§å¹ ã«å¢å ããå¯èœæ§ããããŸããèšãæãããšãã¯ãªãã£ã«ã«ãã£ã¬ã¯ãã£ãã®å Žåããµã€ã¯ã«å®è¡æéã¯MxT1ããã»ããµãµã€ã¯ã«ã«ãªããŸããããã§ãMã¯ã³ã¢ã®æ°ãT1ã¯1ã€ã®ã³ã¢ã®ãµã€ã¯ã«æéã§ããã¢ãããã¯ãã£ã¬ã¯ãã£ãã®å Žåããµã€ã¯ã«ã¿ã€ã ã¯T2ããã»ããµãµã€ã¯ã«ã«ãªããŸãããã®å Žåãã¢ãããã¯ãã£ã¬ã¯ãã£ããå«ããµã€ã¯ã«ã«ã¯è¿œå ã®åæã³ãŒããå«ãŸããæéT2ã¯æéT1ã®Må以äžã«ãªãããšããããŸãã
ãã®èšäºã§ã¯ãOpenMPã®äž»èŠãªæ§æèŠçŽ ã§ããããã«ãã³ã¢ããã»ããµã«å®è£ ããããã®ãœãããŠã§ã¢ã®ã³ã³ãã€ã©ã«ããèªå䞊ååã«äœ¿çšãããé«ã¬ãã«ããã°ã©ãã³ã°èšèªïŒC / C ++ïŒã®æ¡åŒµæ©èœã«ã€ããŠèª¬æããŸããããã®èšäºã®ç¹åŸŽã¯ãããžã¿ã«ä¿¡å·åŠçã·ã¹ãã ã®ãªãªãšã³ããŒã·ã§ã³ãšãTexas Instrumentsã®8ã³ã¢DSP TMS320C6678ã§ã®ãµã³ãã«ããã°ã©ã ã®å®è¡ã®èª¬æã§ãã OpenMPã®äž»ãªå©ç¹ã¯ãã·ã³ã°ã«ã³ã¢ãããã«ãã³ã¢å®è£ ãžã®ç§»è¡ã容æãªããšã§ããããŒã¿äº€æãåæãå«ããã¹ãŠã®ã³ã¢ã€ã³ã¿ã©ã¯ã·ã§ã³ã¿ã¹ã¯ã¯ãã³ã³ãã€ã«æ®µéã§æ¥ç¶ãããæšæºOpenMPé¢æ°ã«ãã£ãŠå®è¡ãããŸãããã ããéçºã®å©äŸ¿æ§ã¯ãéåžžãçµæã®ãœãªã¥ãŒã·ã§ã³ã®å¹çã®äœäžã«ã€ãªãããŸãããã®èšäºã§ã¯ãOpenMPã®ããŒãªã³ã°ã³ã¹ãã«ã€ããŠã¯èª¬æããŸãããããã«å¥ã®ä»äºãæ§ããããšãèšç»ãããŠããŸãã
ããã«ãé¢ããããOpenMPãã£ã¬ã¯ãã£ãã®ã³ã¹ãã¯éåžžã«é«ããåäœãšæ°äžã¯ããã¯ãµã€ã¯ã«ã§æž¬å®ãããŸãããããã£ãŠã䞊ååã¯æ¯èŒçé«ãã¬ãã«ã§ã®ã¿æå³ããããŸãã䞊åé åå ã§èšç®è² è·ã倧ãããã»ãšãã©ã®å Žåãã«ãŒãã«ã¯çžäºäœçšããã«ã¿ã¹ã¯ãåŠçããŸãã
OpenMPæšæºã¯å ±éã®ã€ããªãã®ãŒãèŠå®ããŠããããšã«ã泚æããå¿ èŠããããŸããOpenMPã®æå¹æ§ã¯ãç¹å®ã®ããã»ããµãã©ãããã©ãŒã çšã®OpenMPé¢æ°ã®å®è£ ã«äŸåããŸãããã®ãããããã»ããµTMS320C6678çšã«Texas InstrumentsãéçºããOpenMP 1ããã³2ã®ããŒãžã§ã³ã¯å€§ããç°ãªããŸãã2çªç®ã®ããŒãžã§ã³ã¯ãå€æ°ã®ããŒããŠã§ã¢ã¡ã«ããºã ã䜿çšããŠãã¥ãŒã¯ãªã¢ã¹ã®çžäºäœçšãå éããæåã®ããŒãžã§ã³ãããã¯ããã«å¹æçã§ãããã®åŸã®äœæ¥ã§ã¯ãOpenMPæ©èœãå®è£ ããããã®äž»èŠãªã¡ã«ããºã ãæããã«ããäºå®ã§ãããããã®æ©èœã«é¢é£ããã³ã¹ããåæããŸããOpenMPãã£ã¬ã¯ãã£ãã®å®è£ ã®ãã¹ãæéã®èŠç©ãããçæããŸãããã®ã¡ã«ããºã ã®äœ¿çšå¹çãæ¹åããããã®ã¢ããã€ã¹ãäœæããŸãã
æåŠ
1. G. Blake, RG Dreslinski, T. Mudge, «A survey of multicore processors,» Signal Processing Magazine, vol. 26, no. 6, pp. 26-37, Nov. 2009.
2. LJ Karam, I. AlKamal, A. Gatherer, GA Frantz, «Trends in multicore DSP platforms,» Signal Processing Magazine, vol. 26, no. 6, pp. 38-49, 2009.
3. A. Jain, R. Shankar. Software Decomposition for Multicore Architectures, Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431.
4. Web- OpenMP Architecture Review Board (ARB): openmp.org .
5. OpenMP Application Programming Interface. Version 4.5 November 2015. OpenMP Architecture Review Board. P. 368.
6. OpenMP 4.5 API C/C++ Syntax Reference Guide. OpenMP Architecture Review Board. 2015幎ã
7. J. Diaz, C. Muñoz-Caro, A. Niño. A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era. IEEE Transactions on Parallel and Distributed Systems. â 2012. â Vol. 23, Is. 8, pp. 1369 â 1386.
8. A. Cilardo, L. Gallo, A. Mazzeo, N. Mazzocca. Efficient and scalable OpenMP-based system-level design. Design, Automation & Test in Europe Conference & Exhibition (DATE). â 2013, pp. 988 â 991.
9. M. ChavarrÃas, F. Pescador, M. Garrido, A. Sanchez, C. Sanz. Design of multicore HEVC decoders using actor-based dataflow models and OpenMP. IEEE Transactions on Consumer Electronics. â 2016. â Vol. 62. â Is. 3, pp. 325 â 333.
10. M. Sever, E. Ãavus. Parallelizing LDPC Decoding Using OpenMP on Multicore Digital Signal Processors. 45th International Conference on Parallel Processing Workshops (ICPPW). â 2016, pp. 46 â 51.
11. A. Kharin, S. Vityazev, V. Vityazev, N. Dahnoun. Parallel FFT implementation on TMS320c66x multicore DSP. 6th European Embedded Design in Education and Research Conference (EDERC). â 2014, pp. 46 â 49.
12. D. Wang, M. Ali, âSynthetic Aperture Radar on Low Power Multi-Core Digital Signal Processor,â High Performance Extreme Computing (HPEC), IEEE Conference on, pp. 1 â 6, 2012.
13. . . , . . . - . ., 2007, 138 .
14. . . . . . , 2006, 90 .
15. .. . OpenMP. . 2009 , 78 .
16. .. . OpenMP. .: 2012, 121 .
17. TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor, Datasheet, SPRS691E, Texas Instruments, p. 248, 2014.
2. LJ Karam, I. AlKamal, A. Gatherer, GA Frantz, «Trends in multicore DSP platforms,» Signal Processing Magazine, vol. 26, no. 6, pp. 38-49, 2009.
3. A. Jain, R. Shankar. Software Decomposition for Multicore Architectures, Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431.
4. Web- OpenMP Architecture Review Board (ARB): openmp.org .
5. OpenMP Application Programming Interface. Version 4.5 November 2015. OpenMP Architecture Review Board. P. 368.
6. OpenMP 4.5 API C/C++ Syntax Reference Guide. OpenMP Architecture Review Board. 2015幎ã
7. J. Diaz, C. Muñoz-Caro, A. Niño. A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era. IEEE Transactions on Parallel and Distributed Systems. â 2012. â Vol. 23, Is. 8, pp. 1369 â 1386.
8. A. Cilardo, L. Gallo, A. Mazzeo, N. Mazzocca. Efficient and scalable OpenMP-based system-level design. Design, Automation & Test in Europe Conference & Exhibition (DATE). â 2013, pp. 988 â 991.
9. M. ChavarrÃas, F. Pescador, M. Garrido, A. Sanchez, C. Sanz. Design of multicore HEVC decoders using actor-based dataflow models and OpenMP. IEEE Transactions on Consumer Electronics. â 2016. â Vol. 62. â Is. 3, pp. 325 â 333.
10. M. Sever, E. Ãavus. Parallelizing LDPC Decoding Using OpenMP on Multicore Digital Signal Processors. 45th International Conference on Parallel Processing Workshops (ICPPW). â 2016, pp. 46 â 51.
11. A. Kharin, S. Vityazev, V. Vityazev, N. Dahnoun. Parallel FFT implementation on TMS320c66x multicore DSP. 6th European Embedded Design in Education and Research Conference (EDERC). â 2014, pp. 46 â 49.
12. D. Wang, M. Ali, âSynthetic Aperture Radar on Low Power Multi-Core Digital Signal Processor,â High Performance Extreme Computing (HPEC), IEEE Conference on, pp. 1 â 6, 2012.
13. . . , . . . - . ., 2007, 138 .
14. . . . . . , 2006, 90 .
15. .. . OpenMP. . 2009 , 78 .
16. .. . OpenMP. .: 2012, 121 .
17. TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor, Datasheet, SPRS691E, Texas Instruments, p. 248, 2014.