CUDAã¢ãŒããã¯ãã£ã«é¢ããããã€ãã®èšè
ãŸããå ¬åŒããã¥ã¡ã³ã[1,2]ããã³ã¹ã©ã€ã[3,4] ãããŸããŸãªãµãŒãããŒãã£ãµã€ãã®è³æ[5-11]ã«åºã¥ããŠãããã°ã©ããCUDAã䜿çšãããšãã«ééããäžè¬çãªç¶æ³ãæãåºãããŠãã ããã æé«ã¬ãã«ã®æœè±¡åã§ã¯ãSIMTïŒ ã·ã³ã°ã«åœä»€ããã«ãã¹ã¬ãã ïŒã¢ãŒããã¯ãã£ãåãã䞊åã³ã³ãã¥ãŒãã£ã³ã°ã·ã¹ãã ãååŸããŸãã1ã€ã®ã³ãã³ããå€ããå°ãªããç¬ç«ããã¹ã¬ãããšäžŠè¡ããŠå®è¡ãããŸãã åäžã®ã¿ã¹ã¯ã®äžéšãšããŠå®è¡ãããããããã¹ãŠã®ã¹ã¬ããã®å šäœïŒå³1ãåç §ïŒã¯ã gridãšåŒã°ããŸãã
ã°ãªããã®äžŠåå®è¡ã¯ããŸãæåã«ãå®éã«ãããŒãå®è¡ããå€æ°ã®åäžã®ã¹ã«ã©ãŒããã»ããµããããªã«ãŒãäžã«ååšããããšã«ãã£ãŠä¿èšŒãããŸãïŒå³3ãåç §ïŒã ç©ççã«ïŒå³2ãåç §ïŒãã¹ã«ã©ãŒããã»ããµã¯ã¹ããªãŒãã³ã°ãã«ãããã»ããµ ïŒ SM ïŒã®äžéšã§ãã
ããšãã°ãTeslaã«ã¯30åã®SMããããåSMã«ã¯8åã®ã¹ã«ã©ãŒããã»ããµããããŸãã ãã ãããããã®240ã®ã³ã¢ã§ã¯ãå©çšå¯èœãªãªãœãŒã¹ïŒãããã®ã³ã¢ã®äœæ¥æéãšå©çšå¯èœãªã¡ã¢ãªã®äž¡æ¹ïŒãå ±æããããŒããŠã§ã¢ã¡ã«ããºã ã®ãããã§ãéåžžã«å€ãã®ã¹ã¬ããïŒ1ïŒããã°ãªãããå®è¡ã§ããŸãã ãããŠããããã®ã¡ã«ããºã ã ãã®å®è£ ã®ããã€ãã®æ©èœã¯ãå ±æã¡ã¢ãªã«ã¢ã¯ã»ã¹ãããšãã«ã¹ã¬ãããåæããããã®æè¡ã決å®ããŸãã
ãã®ãããªéèŠãªæ©èœã®1ã€ã¯ã ã¯ãŒãã®32åã®ãããŒã®ã°ã«ãŒãåã§ããããã¯ã倧ããªãã©ãŒã¡ãŒã·ã§ã³ã®äžéšã§ããããšãå€æããŸããã åãããã¯ã®ãã¹ãŠã®ã¹ã¬ããïŒããšãã°ãTeslaãããã¯ã®å Žåãæ倧512ã¹ã¬ããïŒ1ïŒãå«ããããšãã§ããŸãïŒã¯ãå³å¯ã«1ã€ã®SMã§å®è¡ãããããããªãœãŒã¹ã«ã®ã¿ã¢ã¯ã»ã¹ã§ããŸãã ãã ãã1ã€ã®SMã§è€æ°ã®ãããã¯ãèµ·åããããšãã§ãïŒå³3ãåç §ïŒããªãœãŒã¹ã¯ãããã®éã§åçã«åå²ãããŸãã
åSMã«ã¯ãããã»ããµæéãªãœãŒã¹ãé åžããå¶åŸ¡ãŠãããããããŸãã ããã¯ã1ã€ã®SMã®ãã¹ãŠã®ã«ãŒãã«ãåžžã«1ã€ã®ã¯ãŒããå³å¯ã«å®è¡ããããã«è¡ãããŸãã ãããŠãå®äºãããšããã®SMã«å²ãåœãŠããã次ã®ã¯ãŒãããcãªæé©ãªæ¹æ³ã§éžæãããŸãã ãããã£ãŠã1ã€ã®ã¯ãŒãã®ãããŒã¯CUDAã®ããŒããŠã§ã¢æ©èœã«ããåæãããSIMDïŒ åäžåœä»€ãè€æ°ããŒã¿ ïŒã«ããã«è¿ãæ¹æ³ã«åŸã£ãŠå®è¡ãããããšãããããŸãã ããããç°ãªãã¯ãŒãããã®1ã€ã®ãããã¯ã®ãããŒããèããåæããŠããªãå ŽåããããŸãã
ãã1ã€ã®éèŠãªæ©èœã¯ãCUDAã®ã¡ã¢ãªã®æ§æãšãããŸããŸãªéšåãžã®ã¹ã¬ããã®ã¢ã¯ã»ã¹ã§ãã ã¹ããªãŒã ã®äžè¬çãªæé«ã®ã¢ã¯ã»ã¹å¯èœæ§ã¯ãã°ã©ãã£ãã¯ã«ãŒãã«å¯å°ãããéç©åè·¯ã®åœ¢ã§ç©ççã«å®è£ ãããã°ããŒãã«ã¡ã¢ãª ïŒç©çã¡ã¢ãª ïŒã«ãã£ãŠæäŸãããŸãã ããã»ããµã®å€éšã«ããããããã®ã¿ã€ãã®ã¡ã¢ãªã¯ããããªã«ãŒãã§ã®èšç®çšã«æäŸãããŠããä»ã®ã¡ã¢ãªã«æ¯ã¹ãŠæãé ããªããŸãã ããå°ããªãã¢ã¯ã»ã¹å¯èœæ§ãã¯å ±æã¡ã¢ãªã§ããéåžžããµã€ãºã16KB ïŒ1ïŒã®åSMã«ãããããã¯ïŒå³2ãåç §ïŒã¯ããã®SMã®ã³ã¢ã§å®è¡ãããã¹ã¬ããã®ã¿ã«ã¢ã¯ã»ã¹ã§ããŸãïŒå³ãåç §ããŠãã ããïŒ 1ãå³3ïŒã 1ã€ã®SMã§ã®äžŠåå®è¡ã«è€æ°ã®ãããã¯ãå²ãåœãŠãããšãã§ãããããSMã§äœ¿çšå¯èœãªå ±æã¡ã¢ãªã®å šéããããã®ãããã¯ã«åçã«é åãããŸãã å ±æã¡ã¢ãªã¯SMã³ã¢ã«éåžžã«è¿ãå Žæã«ç©ççã«é 眮ãããŠãããããã¡ã¢ãªã®äž»ãªçš®é¡ã§ããã¬ãžã¹ã¿ã®é床ã«å¹æµããé«ãã¢ã¯ã»ã¹é床ãæã£ãŠããããšã«æ³šæããŠãã ããã åºæ¬çãªæ©æ¢°èªåœä»€ã®ãªãã©ã³ããšããŠäœ¿çšã§ããã¬ãžã¹ã¿ã§ãããæéã®ã¡ã¢ãªã§ãã 1ã€ã®SMã®ãã¹ãŠã®ãã£ãã·ã¥ã¬ãžã¹ã¿ã¯ããã®SMã§å®è¡ãããŠãããã¹ãŠã®ã¹ã¬ããã«åçã«åå²ãããŸãã ã¹ã¬ããã䜿çšããããã«å²ãåœãŠãããã¬ãžã¹ã¿ã®ã°ã«ãŒãã¯ã圌ã ãã䜿çšã§ããŸãã CUDAïŒãŸãã¯éã«çœå®³ã®èŠæš¡ïŒã®åã®å®äŸãšããŠïŒåããã¹ã©ã§ã¯ãåSMã¯16384åã®32ãããæ±çšã¬ãžã¹ã¿ãŒã®äœ¿çšãæäŸããŸãïŒ1ïŒ ã
äžèšã®ãã¹ãŠããã1ã€ã®ãããã¯ã®ãããŒéã®çžäºäœçšã¯å ±éã®é«éå ±æã¡ã¢ãªãä»ããŠã2ã€ã®ç°ãªããããã¯ã®ãããŒéã§ã¯ã°ããŒãã«ã¡ã¢ãªã®ã¿ã䜿çšããŠè©Šè¡ããããšçµè«ä»ããããšãã§ããŸãã ããã¯ãå°å ¥éšã§æŠèª¬ããåé¡ãçºçããå Žæã§ããã¡ã¢ãªé åã®èªã¿åããšæžã蟌ã¿ã«å ¬éãããŠããããŸããŸãªã¹ããªãŒã ã®ããŒã¿ã®é¢é£æ§ã远跡ããŸãã èšãæããã°ãã¹ã¬ããåæã®åé¡ã ãã§ã«è¿°ã¹ãããã«ã1ã€ã®ãããã¯å ã§ãåã¯ãŒãã®ãããŒã¯äºãã«åæããŠããŸãã ã¯ãŒãã¡ã³ããŒã·ããã«é¢ä¿ãªããããã¯ãããŒãåæããã«ã¯ãããã€ãã®ããªã¢ã¿ã€ãã®ã³ãã³ãããããŸãã
- __syncthreadsïŒïŒãæã確å®ãªæ¹æ³ã§ãã ãã®é¢æ°ã¯ãïŒaïŒ ãã®ãããã¯ã®ä»ã®ãã¹ãŠã®ã¹ã¬ããããã®ãã€ã³ãã«å°éããïŒbïŒ ãã®ãããã¯ã®ã¹ã¬ããã«ãã£ãŠå®è¡ãããå ±æã¡ã¢ãªããã³ã°ããŒãã«ã¡ã¢ãªã«ã¢ã¯ã»ã¹ãããã¹ãŠã®æäœãå®äºãããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ãåã¹ã¬ãããåŸ æ©ãããŸã ã æ¡ä»¶ä»ãifã¹ããŒãã¡ã³ãå ã«ãã®ã³ãã³ããé 眮ããå¿ èŠã¯ãããŸãããããããã¯ã®ãã¹ãŠã®ã¹ã¬ããã«ãããã®é¢æ°ãžã®ç¡æ¡ä»¶åŒã³åºããæäŸããå¿ èŠããããŸãã
- __threadfence_blockïŒïŒã¯ãå ±æã¡ã¢ãªããã³ã°ããŒãã«ã¡ã¢ãªãžã®ãã¹ãŠã®ã³ããããããã¢ã¯ã»ã¹æäœãå®äºãããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ã åŒã³åºããã¹ã¬ãããåŸ æ©ãããŸã ã
- __threadfenceïŒïŒã¯ãå ±æã¡ã¢ãªãžã®ãã¹ãŠã®ã³ããããããã¢ã¯ã»ã¹æäœããã®ãããã¯ã®ã¹ã¬ããããèŠããããã«ãªããŸã§ã¹ã¬ãããåŸ æ©ããããããã€ã¹ãäžã®ãã¹ãŠã®ã¹ã¬ããã«å¯ŸããŠã°ããŒãã«ã¡ã¢ãªã䜿çšããŸãã ãããã€ã¹ããšã¯ãã°ã©ãã£ãã¯ã¢ããã¿ãŒãæå³ããŸãã
- __threadfence_systemïŒïŒã¯__threadfenceïŒïŒã«äŒŒãŠããŸãããéåžžã«äŸ¿å©ãªããŒãžããã¯ã¡ã¢ãªã䜿çšããå ŽåãCPUïŒããã¹ããïŒäžã®ã¹ã¬ãããšã®åæãå¯èœã«ããŸãã 詳现ã«ã€ããŠã¯ã [1,2]ããã³ä»¥äžã®ãªã¹ãã«ãªã¹ããããŠããä»ã®ãœãŒã¹ãåç §ããŠãã ããã
æåã®ããŒã ã¯ã1ã€ã®ãããã¯ã®ãã¹ãŠã®ã¹ã¬ããã«1ã€ã®ããªã¢ãé 眮ããä»ã®3ã€ã¯ãåã¹ã¬ããã«ç¬èªã®ããªã¢ãé 眮ããŸãã ã°ãªããå šäœã®ãããŒãåæããã«ã¯ãä»ã®äœããèãåºãå¿ èŠããããŸãã ãã®ããŸã ããæ€èšããåã«ãæå³ã®ããCã³ãŒãã®äŸã瀺ãããšãã§ããããã«ãã¿ã¹ã¯ãæå®ããŸãã
ããå€ãã®ã¿ã¹ã¯
ãããã£ãŠãããå ·äœçã«ã¯ã次ã®äŸãæ€èšããŠãã ããã ã¢ããã¿ã®ã°ããŒãã«ã¡ã¢ãªã«2ã€ã®ã»ã¯ã·ã§ã³ãå²ãåœãŠãŸãïŒ X []ããã³P []é åã®128èŠçŽ ã é åX []ããã¹ãããïŒã³ã³ãã¥ãŒã¿ãŒã®RAMã®äžå€®åŠçè£ çœ®ã«ãã£ãŠïŒæžã蟌ãŸããããã«ããŸãã ããããã«64ã¹ã¬ããã®2ãããã¯ã®ã°ãªãããã€ãŸãåèš128ã¹ã¬ãããäœæããŸãïŒå³4ãåç §ïŒã
ããã§ãã¹ãããïŒ i ïŒãå®è¡ã§ããŸããçªå·jã®åã¹ããªãŒã ã¯ãé åX []ã®ãã¹ãŠã®èŠçŽ ãå ç®ããçµæãP [j]ã«æžã蟌ã¿ãŸãã 次ã«ãã¹ãããïŒ ii ïŒãå®è¡ããå¿ èŠããããŸããåjçªç®ã®ã¹ããªãŒã ã¯ãé åP []ã®ãã¹ãŠã®èŠçŽ ã®åèšãéå§ãã察å¿ããX [j]ã«æžã蟌ã¿ãŸãã ãã¡ãããCUDAã䜿çšããŠåãããšã128å䞊è¡ããŠå®è¡ããããšã¯ç¡æå³ã§ãããå®éã«ã¯åã¹ããªãŒã ã«ã¯å ç®ãçºçããç¬èªã®éã¿ä¿æ°ã®ã»ããããããå€æX- > P ãããã³ãã®éã P- > X-ã¯äœåºŠãçºçããŸã ãã®äŸã§ã¯ãæå¿«ããšåçŽãã®ããã«ããŠããã£ã«çããä¿æ°ãéžæããŸããããã¯äžè¬æ§ã«éåããŸããã
çè«ããå®éšã«ç§»ããŸãã ã¢ã«ãŽãªãºã ã¯éåžžã«ééçã§ããããã«ãã¹ã¬ãããæ±ã£ãããšããªã人ã¯ããã«æ¬¡ã®CUDAã«ãŒãã«ã³ãŒããææ¡ã§ããŸãã
__global__ void Kernel(float *X, float *P) { const int N = 128; // . const int index = threadIdx.x + blockIdx.x*blockDim.x; // . float a; // . . /* (i): */ a = X[0]; for(int j = 1; j < N; ++j) // , a += X[j]; P[index] = a / N; // , . /* (i). */ /* (ii): */ a = P[0]; for(int j = 1; j < N; ++j) // , a += P[j]; X[index] = a / N; // , . /* (ii). */ }
ãã®ã«ãŒãã«ãç¹°ãè¿ãå®è¡ãããšãé åP []ãæã åãã«ãªãããšãããããŸãããããã§ã¯X []ãç°ãªãå ŽåããããŸãã ããã«ãéããããå Žåãããã¯1ã€ã®èŠçŽ X [j]ã§ã¯ãªãã32ã®é£ç¶ããèŠçŽ ã®ã°ã«ãŒãã«ãªããŸãïŒ ãã®å Žåããšã©ãŒã®ãããããã¯ã®æåã®èŠçŽ ã®ã€ã³ããã¯ã¹ã32ã®åæ°ã«ãªããŸããããã¯ãéåžžã«ã¯ãŒãã®åæãšãããŸããŸãªã¯ãŒãovã®éåæã¹ããªãŒã ã®äžéšã®åæã®çŸãã§ãã äœããã®ã¹ã¬ããã§ãšã©ãŒãçºçããå Žåãããã¯åœŒã®æ®ãã®ã¯ãŒããã¹ãŠã«çºçããŸãã CUDAéçºè ãææ¡ããåæã¡ã«ããºã ãé©çšããå Žå
__global__ void Kernel(float *X, float *P) { ... /* (i). */ __syncthreads(); /* (ii): */ ... }
ãã®åŸãåãããã¯ã¹ããªãŒã ãåãçµæã«ãªãããã«ããŸãã ãããŠã©ãããééã£ãŠããå Žå-ãããã¯å šäœã ãããã£ãŠãç°ãªããããã¯ãåæããããšã¯ã©ããããããæ®ããŸãã
解決æ¹æ³
æ®å¿µãªãããç§ã¯2ã€ã®æ¹æ³ããç¥ããŸããã
- CUDAã«ãŒãã«ã¯ããã¹ãŠã®ã¹ã¬ãããçµäºããå Žåã«ã®ã¿çµäºããŸãã ãããã£ãŠã1ã€ã®ã³ã¢ã2ã€ã«åå²ããã¡ã€ã³ããã°ã©ã ããé 次åŒã³åºãããšãã§ããŸãã
- ã°ããŒãã«ã¡ã¢ãªã«ãã©ã°ã®ã·ã¹ãã ãèãåºããŸãã
ç§ã®ã¿ã¹ã¯ã§ã¯ããã®ãããªã«ãŒãã«ãé »ç¹ã«ïŒæ°ååïŒåŒã³åºãå¿ èŠããããããæåã®ãªãã·ã§ã³ã¯ããŸã奜ãã§ã¯ãããŸããã§ããããŸããã«ãŒãã«ã®éå§æã«è¿œå ã®é 延ãçºçããããšãæããçç±ããããŸãã åã³ã¢ã®éå§æã«ããã€ãã®å€æ°ãæºåããå¿ èŠãããå Žåã«ã®ã¿ãã«ãŒãã«é¢æ°ã®åŒæ°ãåŠçããŸããã倧ããªãã«ãŒãã«ã§ããã1åè¡ããšãCPUã«å¹²æžãããã°ã©ãã£ãã¯ã¢ããã¿ãŒãç¬èªã®ããŒã¿ãããžã¥ãŒã¹ã沞ããã®ã§ãããè«ççãã€é«éã«ãªããŸãã¡ã¢ãªã
ãã©ã°ã·ã¹ãã ã®2çªç®ã®ãªãã·ã§ã³ã«ã€ããŠã¯ãåæ§ã®ã¡ã«ããºã ã[1]ã®ã»ã¯ã·ã§ã³ãB.5ã¡ã¢ãªãã§ã³ã¹é¢æ°ãã«èšèŒãããŠããŸãã ãã ããããã§ã¯ãããã«ç°ãªãã«ãŒãã«ã¢ã«ãŽãªãºã ãèæ ®ãããŸãã ãããã¯åæãå®è£ ããããã«ã2ã€ã®æ©èœãå°å ¥ããŸãã1ã€ç®ã¯äœ¿çšæžã¿ãããã¯ã®ã«ãŠã³ã¿ãŒã®å€ãæºåãã2ã€ç®ã¯ããªã¢ã®åœ¹å²ãæãããŸãããã¹ãŠã®ãããã¯ãå®äºãããŸã§åã¹ã¬ãããé 延ãããŸãã ããšãã°ããããã®é¢æ°ãšãããã䜿çšããã«ãŒãã«ã¯æ¬¡ã®ããã«ãªããŸãã
__device__ unsigned int count; // - . //4 . /* -: */ __device__ void InitSyncWholeDevice(const int index) { if (index == 0) // grid` ( 0) count = 0; // . if (threadIdx.x == 0) // block` , - while (count != 0); // . // block` , : __syncthreads(); // , - . device - . } /* device: */ __device__ void SyncWholeDevice() { // : unsigned int oldc; // , gmem smem, grid`: __threadfence(); // block` ( ) //-: if (threadIdx.x == 0) { // oldc count "+1": oldc = atomicInc(&count, gridDim.x-1); // , "" gmem: __threadfence(); // ( count ), // count, , // gmem. , "", //, "" . if (oldc != (gridDim.x-1)) while (count != 0); } // , : __syncthreads(); } __global__ void Kernel_Synced(float *X, float *P) { InitSyncWholeDevice(threadIdx.x + blockIdx.x*blockDim.x); ... /* (i). */ SyncWholeDevice(); /* (ii): */ ... }
ããã ãã§ãã ãã©ã°ãå·»ãäžããããé¢æ°ãäœæãããŸããã 1çªç®ãš2çªç®ã®æ¹æ³ã®ããã©ãŒãã³ã¹ãæ¯èŒããããšã¯æ®ã£ãŠããŸãã ãã ããæ®å¿µãªããã SyncWholeDevice ïŒïŒé¢æ°ã¯ã«ãŠã³ã¿ãŒãã€ã³ã¯ãªã¡ã³ãããŸãããããªã¢é 延ãæäŸããŸããã ã©ãããŠã§ããïŒ çµå±ã whileã«ãŒãããããŸã ã ããã§ãèŠçŽã«èšèŒãããŠããèœãšãç©Žã«ç®ãåããŸããnvccã³ã³ãã€ã©ãŒ[12-14]ã«ãã£ãŠçæãããptxãã¡ã€ã«ãèŠããšã圌ã¯èŠç¹ãã空ã®ã«ãŒãã芪åã«æããŠããããšãããããŸãã å°ãªããšã2ã€ã®æ¹æ³ã§ããã®æ¹æ³ã§ã«ãŒããæé©åããªãããã«ã³ã³ãã€ã©ãŒã«åŒ·å¶ã§ããŸãã
ptxã¢ã»ã³ãã©ãŒãžã®æ瀺çãªæ¿å ¥ã¯ç¢ºå®ã«æ©èœããŸãã ããšãã°ããã®ãããªé¢æ°ã¯ããã®åŒã³åºãã§whileã«ãŒãã眮ãæããå¿ èŠããããŸã ã
__device__ void do_while_count_not_eq(int val) { asm("{\n\t" "$my_while_label: \n\t" " .reg .u32 r_count; \n\t" " .reg .pred p; \n\t" " ld.global.u32 r_count, [count]; \n\t" " setp.ne.u32 p, r_count, %0; \n\t" "@p bra $my_while_label; \n\t" "}\n\t" : : "r"(val)); }
æ§æçã«ãšã¬ã¬ã³ããªãã1ã€ã®æ¹æ³ã¯ãã«ãŠã³ã¿ãŒãã©ã°ã宣èšãããšãã«volatileæå®åã䜿çšããããšã§ãã ããã¯ãã°ããŒãã«ïŒãŸãã¯å ±æïŒã¡ã¢ãªå ã®å€æ°ããã€ã§ãã©ã®ã¹ã¬ããã§ãå€æŽã§ããããšãã³ã³ãã€ã©ãŒã«äŒããŸãã ãããã£ãŠããã®å€æ°ã«ã¢ã¯ã»ã¹ãããšãã¯ããã¹ãŠã®æé©åããªãã«ããå¿ èŠããããŸãã ã³ãŒãå ã§å€æŽããå¿ èŠãããã®ã¯2è¡ã®ã¿ã§ãã
__device__ volatile unsigned int count; // - . //4 . ... // oldc count "+1": oldc = atomicInc((unsigned int*)&count, gridDim.x-1); ...
解æ³ã®è©äŸ¡
ããã§ã2ã€ã®ãããã¯åææ¹æ³ã®ããã©ãŒãã³ã¹ã®å€§ãŸããªçè«çæšå®ãå®è¡ããŸãã åã«ãããšãã«ãŒãã«åŒã³åºãã«ã¯10ãã€ã¯ãç§çšåºŠããããŸããããã¯ãè€æ°ã®ã³ã¢åŒã³åºãã«ããåæã®ä»£åã§ãã ã«ãŒãããããªã¢ãå°å ¥ããŠåæãè¡ãå Žåãæ倧10åã®ã¹ã¬ããïŒãããã¯æ°ã«å¿ããŠïŒãã«ãŒãå ã®ã°ããŒãã«ã¡ã¢ãªå ã®1ã€ã®ã»ã«ãã€ã³ã¯ãªã¡ã³ãããŠèªã¿åããŸããåå ¥åºåæäœã«ã¯çŽ500ã¯ããã¯ãµã€ã¯ã«ããããŸãã åãããã¯ã«ãã®ãããªæäœ3ãå®è¡ãããŸãããã®åŸãåææäœã«çŽ10 * 500 * 3 = 1.5 * 10 ^ 4ãµã€ã¯ã«ãè²»ããããŸãã 1.5 GHzã®ã³ã¢åšæ³¢æ°ã§ã¯ã1.0 * 10 ^ïŒ-5ïŒç§= 10ÎŒsã«ãªããŸãã ã€ãŸãã倧ããã®é åºã¯åãã§ãã
ãããããã¡ãããå°ãªããšãããã€ãã®ãã¹ãã®çµæãèŠãããšã¯èå³æ·±ãã§ãã å³5ã§ã¯ããã¹ããªãŒããŒã¯ãã°ãªããæ§æããšã«10åç¹°ãè¿ãããX- > P- > Xã® 100åã®é£ç¶ããå€æã«è²»ããããæéã®æ¯èŒãèŠãããšãã§ããŸãã 100åã®å€æã«å¿ èŠãªæéãå¹³åããããã«ã10åã®ç¹°ãè¿ããè¡ãããŸãïŒ2ïŒ ã
æ°Žå¹³é¢ã«ã¯ãããªã¬ãŒããããããã¯ã®æ°ãšããããã®ã¹ã¬ããã®æ°ããããããããŸãã 瞊軞ã¯ã ããã«ãã«ãŒãã«èµ·åãã¡ãœããïŒ MKL ïŒã«å¯Ÿããã1ã«ãŒãã«èµ·åãã¡ãœããïŒ SKL-ã·ã³ã°ã«ã«ãŒãã«èµ·åïŒã®æéã²ã€ã³ã®å²åãè¡šããŸãã æ€èšäžã®ã°ãªããæ§æã®ã²ã€ã³ã¯ãéåžžã«å°ãããã®ã®ãã»ãŒåžžã«ãã©ã¹ã§ããããšãæ確ã«ããããŸãã ãã ãããããã¯æ°ãå€ãã»ã©ãããã©ãŒãã³ã¹ãé ããMKLã¡ãœããã¯å°ãªããªããŸãã 32ãããã¯ã®å Žåã圌ã¯SKLã¡ãœããããããããã«åªããŠããŸãã ããã¯ããããã¯ãå€ãã»ã©ãã¹ã¬ããïŒ threadIdx.x == 0ãæã€ïŒãé ãã°ããŒãã«ã¡ã¢ãªããã«ãŠã³ãå€æ°ãèªã¿åãããã§ãã ãããããäžåºŠèªãã§ããã¹ãŠã®ãããŒã«æå³ãäžããããšããã¡ã«ããºã ã¯ãããŸããã ãããã¯å ã®ã¹ã¬ããã®æ°ã«å¿ããçžå¯Ÿçãªçç£æ§ã®å€åãèæ ®ããå Žåããããã¯èªäœã®æ°ã¯äžå®ã§ãããããäžå®ã®èŠåæ§ã«ãæ°ä»ãããšãã§ããŸãã ããããããã§ã¯ããããã¯å ã®ãããŒã®åæãSMã§ã®ã¯ãŒãã®ç®¡çã«é¢é£ãããèè ã®äœæ¥ã«ã¯äžæãªå¹æããããŸãã ãããã£ãŠããã以äžã®ã³ã¡ã³ãã¯æ§ããŸãã
åãæ°ã®äœæ¥ã¹ã¬ããïŒ1024ïŒã§ããããã¯ãžã®åå²ãç°ãªãããã©ãŒãã³ã¹ãèŠãã®ã¯èå³æ·±ãã§ãã å³6ã¯ã2ã€ã®ã¡ãœããïŒMKLãšSKLïŒã®äžèšã®å€æã®100 * 10ã«è²»ããããç¹°ã蟌ã¿æéã®ã°ã©ãã瀺ããŠããŸãã
å®éãããã¯å³5ã®æãã®ãã¹ã©ã€ã¹ãã§ãã æåã¯ããããã¯ã倧ãããªããšãäž¡æ¹ã®åææ¹æ³ã®ããã©ãŒãã³ã¹ãçããåäžããããšãã¯ã£ãããšããããŸãã CUDAã®éçºè ã¯å ¬åŒããã¥ã¡ã³ãã§ãã®ãããªå¹æãèŠåããŠããŸã[2]ããèè ã¯åã³æ®å¿µãªãããã®çŸè±¡ã®ã¡ã«ããºã ã®è©³çŽ°ãç¥ããŸããã ãã§ã«è¿°ã¹ãããã«ãå€æ°countã®èªã¿åãåæ°ã®å¢å ã«äŒŽããã®ã£ããã®çž®å°ãšããããã¯ãžã®æå°ã®åå²ã«ããSKLã¡ãœããã®æ倱ããé¢é£ããŠããŸã ã
whileã«ãŒãã ptx-assembler insertã«çœ®ãæããããšã«ãããSKLã¡ãœããã®å®è£ äžã«ãã¹ããå®è¡ãããããšã«æ³šæããŠãã ããã æ®çºæ§æå®åã䜿çšãããšïŒã°ãªããæ§æã«å¿ããŠïŒãããã»ã¹ã®é床ãäœäžããå Žåããããé床ãåäžããå ŽåããããŸãã æžé床ã®å€§ããã¯0.20ïŒ ã«éããå é床ã¯0.15ïŒ ã§ãã ã»ãšãã©ã®å Žåããã®åäœã¯ãã³ã³ãã€ã©ãŒã«ããwhileã«ãŒãã®å®è£ ã®ç¹åŸŽãšã人ã«ããptx-assemblerã®æ¿å ¥ã«ãã£ãŠæ±ºå®ãããSKLã¡ãœããã®äž¡æ¹ã®å®è£ ãçããçç£çã§ãããšèããããšãã§ããŸãã
ãããã«
ãã®èšäºã§ã¯ãã¹ã¬ããåæã®åé¡ããããã¯åæã®æ¹æ³ãåºæ¬ã¬ãã«ã§æããã«ããããšããŸããã ããã€ãã®ãã¹ãã®åŸãCUDAã·ã¹ãã ã®äžè¬çãªèª¬æãåçã«äžããŸãã ããã«ããã¹ãããã°ã©ã ã®ãœãŒã¹ã³ãŒãïŒ2ïŒã§ãèªè ã¯å ±æã¡ã¢ãªå ã®ãããã¡ã®ä¿¡é Œã§ãã䜿çšã®å¥ã®äŸãèŠã€ããããšãã§ããŸãïŒã¹ã¬ããã¯__syncthreadsïŒïŒãä»ããŠåæãããŸãïŒã 誰ããããã圹ç«ã€ããšãé¡ã£ãŠããŸãã å人çã«ã¯ããã®æ å ±ã1ãæã§åéããããšã§ãã³ãŒããäœåºŠãè©ŠããŠãã°ãŒã°ã«ã§ãæ€çŽ¢ããæéãç¯çŽã§ããŸããããã¯ãããã¥ã¡ã³ããããŸã泚ææ·±ãèªãŸãªãæããªåŸåãããããã§ãã
ïŒ1ïŒ CUDA APIé¢æ°cudaGetDeviceProperties ïŒ...ïŒ [ 1-2ã15 ]ã䜿çšããŠãã³ã³ãã¥ãŒã¿ãŒã§äœ¿çšå¯èœãªã¢ããã¿ãŒã«é¢ããæè¡æ å ±ãååŸããããšããå§ãããŸãã
ïŒ2ïŒ pastebin.comã«ã¢ããããŒãããããã¹ãããã°ã©ã ã®ãœãŒã¹ã³ãŒã ã
æ å ±æºã®ãªã¹ã
[1] CUDA Cããã°ã©ãã³ã°ã¬ã€ã
[2] CUDA Cãã¹ããã©ã¯ãã£ã¹ã¬ã€ã
[3]é«åºŠãªCUDAãŠã§ãããŒïŒ ã¡ã¢ãªã®æé©å
[4] S. Tariqã GPUã³ã³ãã¥ãŒãã£ã³ã°ãšCUDAã¢ãŒããã¯ãã£ã®çŽ¹ä»
[5]ãŽã¡ã³ããŒãã«ã倧åŠãACCREã GPU Computing with CUDA
[6] OmSTUãç¡ç·å·¥åŠéšãçµ±åæ å ±ä¿è·éšã åãã¬ãŒãã³ã°ããã°ã©ã ãProgramming for GPUã
[7]å€ã®ã¹ãŒããŒã³ã³ãã¥ãŒã¿ãŒã¢ã«ãããŒã NVIDIAã°ã©ãã£ãã¯ã¢ã¯ã»ã©ã¬ãŒã¿ã䜿çšããé«æ§èœã¯ã©ã¹ã¿ãŒã³ã³ãã¥ãŒãã£ã³ã°
[8] iXBT.comïŒNVIDIA CUDA-GPUã§ã®éã°ã©ãã£ãã¯ã³ã³ãã¥ãŒãã£ã³ã°
[9] cgm.computergraphics.ruïŒCUDA ãã¯ãããžãŒã®çŽ¹ä»
[10] THG.ruïŒnVidia CUDAïŒã°ã©ãã£ãã¯ã«ãŒãã§ã®ã³ã³ãã¥ãŒãã£ã³ã°ãŸãã¯CPUã®æ»ïŒ
[11] steps3d.narod.ruïŒCUDAã®åºæ¬ãCUDA ããã°ã©ãã³ã°ïŒããŒã2ïŒ
[12] CUDAã³ã³ãã€ã©ãã©ã€ããŒïŒNVCCïŒ
[13] CUDAã§ã®ã€ã³ã©ã€ã³PTXã¢ã»ã³ããªã®äœ¿çš
[14] PTXïŒäžŠåã¹ã¬ããå®è¡ISAããŒãžã§ã³3.0
[15] CUDA APIãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ïŒ PDF ã HTMLãªã³ã©ã€ã³ ïŒ