ã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã®èª¬æ
ã¯ãŒã¯ã°ã«ãŒãã®æ©èœã«ã¯ãã¯ãŒã¯ã°ã«ãŒãã®ã¬ãã«ã®3ã€ã®å€å žçãªã¢ã«ãŽãªãºã ïŒ å€ãããŒããã£ã¹ãããªãã¥ãŒã¹ãã¹ãã£ã³ ïŒãšãã¯ãŒã¯ã°ã«ãŒãå šäœã«å¯ŸããŠå®è¡ãããæäœã®è«ççµæããã§ãã¯ãã2ã€ã®çµã¿èŸŒã¿é¢æ°ãå«ãŸããŸãã åæžããã³ã¹ãã£ã³ã¢ã«ãŽãªãºã ã¯ãå ç®ãæå°ãããã³æ倧ã®æäœããµããŒãããŠããŸãã
ã¯ãŒã¯ã°ã«ãŒãã®çµã¿èŸŒã¿é¢æ°ã®æ©èœã¯ãååããæããã§ãã
- work_group_broadcastïŒïŒã¯ãéžæããã¯ãŒã¯ã¢ã€ãã ã®å€ããã¹ãŠã®ã¯ãŒã¯ã°ã«ãŒãã¢ã€ãã ã«æ¡åŒµããŸãã
- work_group_reduceïŒïŒã¯ãã¯ãŒã¯ã°ã«ãŒãã®ãã¹ãŠã®èŠçŽ ã®åèšãæå°ããŸãã¯æ倧å€ãèšç®ããçµæã®å€ãã¯ãŒã¯ã°ã«ãŒãã®ãã¹ãŠã®èŠçŽ ã«æ¡åŒµããŸãã
- work_group_scanïŒïŒã¯ã以åã®ãã¹ãŠã®äœæ¥é ç®ã®åèšãæå°ããŸãã¯æ倧å€ãèšç®ããŸãïŒçŸåšã®äœæ¥é ç®ãå«ããããšãã§ããŸãïŒã
- work_group_allïŒïŒã¯ãåäœæ¥é ç®ã«å¯ŸããŠèšç®ãããåãè«çåŒã®è«çANDãè¿ããŸãã
- work_group_anyïŒïŒã¯work_group_allïŒïŒãšåæ§ã«æ©èœããŸãããè«çORã䜿çšããŸãã
ãªã¹ããããŠããçµã¿èŸŒã¿é¢æ°ã«é¢ããéèŠãªå¶éïŒã¹ã«ã©ãŒããŒã¿åã«ã®ã¿é©çšãããŸãïŒããšãã°ãäžè¬çãªåint4ãšfloat4ã¯ãµããŒããããŠããŸããïŒã ãŸããcharãucharãªã©ã®8ãããããŒã¿åã¯ãµããŒããããŠããŸããã
ã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã¯ããã®ååã瀺ããšãããã¯ãŒãã³ã°ã°ã«ãŒãå šäœã§åžžã«äžŠè¡ããŠæ©èœããŸãã ããããæé»ã®çµæãçããŸããã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã«å¯ŸãããããªãææŠãéå£ãšããŠæ©èœããŸãã
ã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã䜿çšããã«ã¯ã2ã€ã®äž»ãªã¢ã€ãã¢ããããŸãã 第äžã«ãã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã¯äŸ¿å©ã§ãã OpenCL 1.2ã§åãæ©èœãå®è£ ããããã«å¿ èŠãªååã«å€§ããªã³ãŒããèšè¿°ãã代ããã«ã1ã€ã®çµã¿èŸŒã¿é¢æ°ã䜿çšããæ¹ãã¯ããã«ç°¡åã§ãã 第äºã«ãã¯ãŒãã³ã°ã°ã«ãŒãã®æ©èœã¯ãæ©åšã®æé©åã䜿çšãããããçç£æ§ã®ç¹ã§ããå¹æçã§ãã
äŸãšããŠã次ã®ã¿ã¹ã¯ïŒã¢ã«ãŽãªãºã ã®äžéšã§ããå¯èœæ§ããããŸãïŒãèããŠã¿ãŸãããïŒãã倧ããªé åãšåããµã€ãºã®åŸå±é åã®æ¥é èŸã®åèšãèšç®ããŸãã ãã®ãããåã¹ã¬ãŒãé åã®åèŠçŽ ã®ãã¬ãã£ãã¯ã¹ã®åèšãèšç®ããåãããŒã¯ã¢ããã§ã¿ãŒã²ããã¡ã¢ãªé åã«ä¿åããå¿ èŠããããŸãã 次ã®å³ã«ããœãŒã¹ãšã¿ãŒã²ããã®ããŒã¿ã¬ã€ã¢ãŠãã瀺ããŸãã
ãã®ã¿ã¹ã¯ã®ã·ã³ãã«ãªOpenCLã³ã¢ã¯æ¬¡ã®ããã«ãªããŸãã
- åé åïŒå³ã®ç·ïŒã¯1ã€ã®ã¯ãŒãã³ã°ã°ã«ãŒãã«ãã£ãŠåŠçãããŸãã
- åäœæ¥é ç®ã«ã€ããŠãåã®é ç®ã®åçŽãªforïŒïŒã«ãŒãã䜿çšããŠã¹ãã£ã³ãå®è¡ããã环ç©ãã¬ãã£ãã¯ã¹å€ãè¿œå ãããçµæãå®å ã«ä¿åãããŸãã
- ã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºãå ¥åé åãããå°ããå ŽåããœãŒã¹ã€ã³ããã¯ã¹ãšãšã³ãã€ã³ããã¯ã¹ã¯ã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºã ãã·ãããããéçŽãã¬ãã£ãã¯ã¹ãæŽæ°ããããã®ããã»ã¹ããœãŒã¹è¡ã®æåŸãŸã§ç¹°ãè¿ãããŸãã
察å¿ããã³ãŒãã以äžã«ç€ºããŸãã
ã³ãŒã
__kernel void Calc_wg_offsets_naive( __global const uint* gHistArray, __global uint* gPrefixsumArray, uint bin_size ) { uint lid = get_local_id(0); uint binId = get_group_id(0); //calculate source/destination offset for workgroup uint group_offset = binId * bin_size; local uint maxval; //initialize cumulative prefix if( lid == 0 ) maxval = 0; barrier(CLK_LOCAL_MEM_FENCE); do { //perform a scan for every workitem uint prefix_sum=0; for(int i=0; i<lid; i++) prefix_sum += gHistArray[group_offset + i]; //store result gPrefixsumArray[group_offset + lid] = prefix_sum + maxval; prefix_sum += gHistArray[group_offset + lid]; //update group offset and cumulative prefix if( lid == get_local_size(0)-1 ) maxval += prefix_sum; barrier(CLK_LOCAL_MEM_FENCE); group_offset += get_local_size(0); } while(group_offset < (binId+1) * bin_size); }
ãã®ãããªåå§çãªã¢ãããŒãã¯ãã»ãšãã©ã®å Žåéåžžã«å¹æçã§ã¯ãããŸããïŒéåžžã«å°ããªã¯ãŒã¯ã°ã«ãŒããé€ãïŒã æããã«ãå éšã®forïŒïŒã«ãŒãã¯åé·ãªããŒããšè¿œå ã®æäœãå€ãããŸãã ãã®æé ã¯æããã«æé©åã§ããŸãã ããã«ãã¯ãŒãã³ã°ã°ã«ãŒãã®ãµã€ãºãå¢å ãããšãåé·æ§ãå¢å ããŸãã ããŒããŠã§ã¢ãªãœãŒã¹ãããå¹ççã«äœ¿çšããã«ã¯ãIntel HD Graphicsã«ã¯Blellochãªã©ã®ããå¹ççãªã¢ã«ãŽãªãºã ãå¿ èŠã§ãã 詳现ã«ã€ããŠã¯æ€èšããŸãããããã¯ãå€å žçãªGPU Gemsã®èšäºã§é¡èã«èª¬æãããŠããŸãã
䞊åã¹ãã£ã³ã䜿çšããOpenCL 1.2ã³ãŒãã¯æ¬¡ã®ããã«ãªããŸãã
ã³ãŒã
#define WARP_SHIFT 4 #define GRP_SHIFT 8 #define BANK_OFFSET(n) ((n) >> WARP_SHIFT + (n) >> GRP_SHIFT) __kernel void Calc_wg_offsets_Blelloch(__global const uint* gHistArray, __global uint* gPrefixsumArray, uint bin_size ,__local uint* temp ) { int lid = get_local_id(0); uint binId = get_group_id(0); int n = get_local_size(0) * 2; uint group_offset = binId * bin_size; uint maxval = 0; do { // calculate array indices and offsets to avoid SLM bank conflicts int ai = lid; int bi = lid + (n>>1); int bankOffsetA = BANK_OFFSET(ai); int bankOffsetB = BANK_OFFSET(bi); // load input into local memory temp[ai + bankOffsetA] = gHistArray[group_offset + ai]; temp[bi + bankOffsetB] = gHistArray[group_offset + bi]; // parallel prefix sum up sweep phase int offset = 1; for (int d = n>>1; d > 0; d >>= 1) { barrier(CLK_LOCAL_MEM_FENCE); if (lid < d) { int ai = offset * (2*lid + 1)-1; int bi = offset * (2*lid + 2)-1; ai += BANK_OFFSET(ai); bi += BANK_OFFSET(bi); temp[bi] += temp[ai]; } offset <<= 1; } // clear the last element if (lid == 0) { temp[n - 1 + BANK_OFFSET(n - 1)] = 0; } // down sweep phase for (int d = 1; d < n; d <<= 1) { offset >>= 1; barrier(CLK_LOCAL_MEM_FENCE); if (lid < d) { int ai = offset * (2*lid + 1)-1; int bi = offset * (2*lid + 2)-1; ai += BANK_OFFSET(ai); bi += BANK_OFFSET(bi); uint t = temp[ai]; temp[ai] = temp[bi]; temp[bi] += t; } } barrier(CLK_LOCAL_MEM_FENCE); //output scan result to global memory gPrefixsumArray[group_offset + ai] = temp[ai + bankOffsetA] + maxval; gPrefixsumArray[group_offset + bi] = temp[bi + bankOffsetB] + maxval; //update cumulative prefix sum and shift offset for next iteration maxval += temp[n - 1 + BANK_OFFSET(n - 1)] + gHistArray[group_offset + n - 1]; group_offset += n; } while(group_offset < (binId+1) * bin_size); }
ååãšããŠããã®ãããªã³ãŒãã¯ããå¹ççã«æ©èœããããŒããŠã§ã¢ãªãœãŒã¹ã«ããã»ã©é«ãè² è·ããããŸããããããã€ãã®æ³šæäºé ããããŸãã
ãã®ã³ãŒãã«ã¯ãããŒã«ã«ã¡ã¢ãªãšã°ããŒãã«ã¡ã¢ãªéã§ããŒã¿ã移åããããã®ãªãŒããŒããããšãããã€ãã®çŠæ¢äºé ããããŸãã æ¬åœã«é«ãå¹çãéæããã«ã¯ãã¢ã«ãŽãªãºã ã«ååã«å€§ããªã¯ãŒã¯ã°ã«ãŒããµã€ãºãå¿ èŠã§ãã å°ããªã¯ãŒã¯ã°ã«ãŒãïŒ<16ïŒã§ã¯ãçç£æ§ãåçŽãªãµã€ã¯ã«ã®çç£æ§ãããé«ããªãå¯èœæ§ã¯äœãã§ãã
ããã«ãã³ãŒãã®è€éããšãå ±æããŒã«ã«ã¡ã¢ãªïŒ BANK_OFFSETãã¯ããªã©ïŒã®ç«¶åãæé€ããããã«èšèšãããè¿œå ã®ããžãã¯ã«æ³šæããŠãã ããã
ã¯ãŒãã³ã°ã°ã«ãŒãã®äœ¿çšã¯ãèšåããããã¹ãŠã®åé¡ãåé¿ããŸãã æé©åãããOpenCLã³ãŒãã®å¯Ÿå¿ããããŒãžã§ã³ã以äžã«ç€ºããŸãã
ã³ãŒã
__kernel void Calc_wg_offsets_wgf( __global const uint* gHistArray, __global uint* gPrefixsumArray, uint bin_size ) { uint lid = get_local_id(0); uint binId = get_group_id(0); uint group_offset = binId * bin_size; uint maxval = 0; do { uint binValue = gHistArray[group_offset + lid]; uint prefix_sum = work_group_scan_exclusive_add( binValue ); gPrefixsumArray[group_offset + lid] = prefix_sum + maxval; maxval += work_group_broadcast( prefix_sum + binValue, get_local_size(0)-1 ); group_offset += get_local_size(0); } while(group_offset < (binId+1) * bin_size); }
äž¡æ¹ã®æé©åãããã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹çµæã¯ãååãªéã®å ¥åããŒã¿ã§æž¬å®ãããŸããïŒåã¯ãŒãã³ã°ã°ã«ãŒãã¯ãããŒã«ã«ãµã€ãºã«å¿ããŠãå€éšãµã€ã¯ã«ã®8192 ... 2048åã®ç¹°ãè¿ãã«å¯Ÿå¿ãã65 536èŠçŽ ãã¹ãã£ã³ããŸãïŒã
äºæ³ã©ãããããŒã«ã«ãµã€ãºã倧ãããªããšåçŽãªã«ãŒãã®åäœãéåžžã«é ããªããæé©åãããäž¡æ¹ã®ãªãã·ã§ã³ã®ããã©ãŒãã³ã¹ãåäžããŸãã
ç¹å®ã®ã¢ã«ãŽãªãºã ã«å¯ŸããŠã¯ãŒãã³ã°ã°ã«ãŒãã®æé©ãªãµã€ãºãèšå®ããå Žåãã³ã¢ã®æ¯èŒã¯æ¬¡ã®ããã«ãªããŸãã
work_group_scan_exclusive_addïŒïŒã䜿çšãããšããããããµã€ãºã®ã¯ãŒã¯ã°ã«ãŒãã®ããã©ãŒãã³ã¹ãå€§å¹ ã«åäžãããšåæã«ãã³ãŒããç°¡çŽ åãããããšã«æ³šæããŠãã ããã
ç°çš®OpenCL 2.0ã¯ãŒã¯ã°ã«ãŒã
OpenCLå®è¡ã¢ãã«ã«ã¯ãNDRangeã®åã ã®ã¯ãŒã¯ã¢ã€ãã ã®ã°ã«ãŒãã§ããã¯ãŒã¯ã°ã«ãŒãã®æŠå¿µãå«ãŸããŠããŸãã ã¢ããªã±ãŒã·ã§ã³ãOpenCL 1.xã䜿çšããå ŽåãNDRangeã®ãµã€ãºã¯å®å šã«ïŒãã¬ãŒã¹ãªãã§ïŒã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºã§é€ç®ããå¿ èŠããããŸãã clEnqueueNDRangeKernelåŒã³åºãã«ãå®å šã«åå²ãããŠããªãglobal_sizeãã©ã¡ãŒã¿ãŒãšlocal_sizeãã©ã¡ãŒã¿ãŒãå«ãŸããŠããå ŽåãåŒã³åºãã¯ãšã©ãŒã³ãŒãCL_INVALID_WORK_GROUP_SIZEãè¿ããŸãã clEnqueueNDRangeKernelåŒã³åºããlocal_sizeãã©ã¡ãŒã¿ãŒã«NULLå€ãæå®ããå®è¡å¯èœã¢ãžã¥ãŒã«ãã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºãéžæã§ããå Žåãå®è¡å¯èœã¢ãžã¥ãŒã«ã¯ãã°ããŒãã«NDRangeãµã€ãºãå®å šã«åå²ã§ãããµã€ãºãéžæããå¿ èŠããããŸãã
NDRangeã®ãµã€ãºãå®å šã«åå²ãããããã«ãã¯ãŒãã³ã°ã°ã«ãŒãã®ãã®ãããªãµã€ãºãéžæããå¿ èŠæ§ã¯ãéçºè ã«ãšã£ãŠå°é£ãåŒãèµ·ããå¯èœæ§ããããŸãã åçŽãª3x3ç»åãŒããã¢ã«ãŽãªãºã ãæ€èšããŠãã ããã ãã®ã¢ã«ãŽãªãºã ã§ã¯ãååºåãã¯ã»ã«ã¯ãé£æ¥ãã3x3é åã®å ¥åãã¯ã»ã«ã®å€ã®å¹³åå€ãšããŠèšç®ãããŸãã ç»åãã¬ãŒã ã«ããåºåãã¯ã»ã«ãåŠçããå Žåããããã®ãã¯ã»ã«ã¯å ¥åç»åã®å¢çã®å€åŽã®ãã¯ã»ã«ã«äŸåãããããåé¡ãçºçããŸãã
äžéšã®ã¢ããªã±ãŒã·ã§ã³ã§ã¯ããã¬ãŒã ã®å ¥åå€ã¯éèŠã§ã¯ãªããã¹ãããããããšãã§ããŸãã ãã®å ŽåãNDRangeã®ãµã€ãºã¯ãåºåã€ã¡ãŒãžã®ãµã€ãºãããã¬ãŒã ã®é åãåŒãããµã€ãºãšåãã§ãã å€ãã®å Žåãå®å šã«åé¢ããã®ãé£ããNDRangeãµã€ãºã«ãªããŸãã ããšãã°ã3x3ãã£ã«ã¿ãŒã1920x1080ç»åã«é©çšããã«ã¯ãäž¡åŽã«1ãã¯ã»ã«ã®åãã®ãã¬ãŒã ãå¿ èŠã§ãã ãããè¡ãæãç°¡åãªæ¹æ³ã¯ã1918x1078ã³ã¢ã䜿çšããããšã§ãã ãããã1918幎ã1078幎ããæé©ãªãµã€ãºã®ã¯ãŒãã³ã°ã°ã«ãŒããæäŸããå€ã«å®å šã«åå²ãããŠããŸããã
OpenCL 2.0ã«ã¯ãåã®ã»ã¯ã·ã§ã³ã§èª¬æããåé¡ãä¿®æ£ããæ°ããæ©èœããããŸãã ããããç°çš®ã¯ãŒã¯ã°ã«ãŒãã«ã€ããŠèª¬æããŠããŸããOpenCL2.0å®è¡å¯èœã¢ãžã¥ãŒã«ã¯ãNDRangeãä»»æã®æ¬¡å ã®ç°çš®ãµã€ãºã®ã¯ãŒã¯ã°ã«ãŒãã«åå²ã§ããŸãã éçºè ãNDRangeãµã€ãºãå®å šã«åå²ããªãã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºãæå®ãããšãå®è¡å¯èœã¢ãžã¥ãŒã«ã¯NDRangeãåå²ããŠãæå®ããããµã€ãºã®ã¯ãŒã¯ã°ã«ãŒããã§ããã ãå€ãäœæããæ®ãã®ã¯ãŒã¯ã°ã«ãŒãã¯ç°ãªããµã€ãºã«ãªããŸãã
ããã«ãããéçºè ãlocal_sizeãã©ã¡ãŒã¿ãŒã®NULLå€ãclEnqueueNDRangeKernelã«æž¡ããšãOpenCLã¯ä»»æã®NDRangeãµã€ãºã«å¯ŸããŠä»»æã®ãµã€ãºã®ã¯ãŒã¯ã°ã«ãŒãã䜿çšã§ããŸãã äžè¬ã«ãã¢ããªã±ãŒã·ã§ã³ããžãã¯ãç¹å®ã®ã¯ãŒã¯ã°ã«ãŒããµã€ãºãå¿ èŠãšããªãå Žåã local_sizeãã©ã¡ãŒã¿ãŒã§NULLå€ã䜿çšããããšã¯ãã«ãŒãã«ãå®è¡ããããã®åªå ãããæ¹æ³ã®ãŸãŸã§ãã
ã«ãŒãã«ã³ãŒãå ã§ãçµã¿èŸŒã¿ã®get_local_sizeïŒïŒé¢æ°ã¯ãåŒã³åºãå ã®ã¯ãŒã¯ã°ã«ãŒãã®å®éã®ãµã€ãºãè¿ããŸãã ã«ãŒãã«ãclEnqueueNDRangeKernelã®local_sizeãã©ã¡ãŒã¿ãŒã«æå®ãããæ£ç¢ºãªãµã€ãºãå¿ èŠãšããå Žåã get_get_enqueued_local_sizeïŒïŒçµã¿èŸŒã¿é¢æ°ã¯ãããã®å€ãè¿ããŸãã
ç°çš®ã¯ãŒã¯ã°ã«ãŒãã®äœ¿çšãæå¹ã«ããã«ã¯ãOpenCL 2.0ã®ãã®æ©èœãšä»ã®æ©èœãå«ã-cl-std = CL2.0ãã©ã°ã䜿çšããŠã«ãŒãã«ãã³ã³ãã€ã«ããå¿ èŠããããŸãã ãã®ãã©ã°ããªããšãããã€ã¹ãOpenCL 2.0ããµããŒãããŠããå Žåã§ããã³ã³ãã€ã©ã¯OpenCL 1.2ã䜿çšããŸãã ããã«ã -cl-uniform-work-group-sizeãã©ã°ã䜿çšããŠã -cl-std = CL2.0ãã©ã°çšã«ã³ã³ãã€ã«ãããã«ãŒãã«ã®ç°çš®ã¯ãŒã¯ã°ã«ãŒããç¡å¹ã«ããããšãã§ããŸãã ããã¯ãOpenCL 2.0ã«å®å šã«ç§»è¡ãããŸã§ãã¬ã¬ã·ãŒã«ãŒãã«ã³ãŒãã«åœ¹ç«ã€ããšããããŸãã
OpenCL 2.0ã®ç°çš®ã¯ãŒã¯ã°ã«ãŒãæ©èœã«ãããOpenCLã®äœ¿ãããããåäžããäžéšã®ã³ã¢ã®ããã©ãŒãã³ã¹ãåäžããŸãã éçºè ã¯ãå®å šã«å ±æãããŠããªãNDRangeãµã€ãºãæäœããããã®ã·ã¹ãã ããã³ã«ãŒãã«ã³ãŒããè¿œå ããªããªããŸããã ãã®æ©èœãå©çšããããã«äœæãããã³ãŒãã¯ãSIMDãšã¡ã¢ãªã¢ã¯ã»ã¹ã®åçåã掻çšã§ããŸãããããã®å©ç¹ã¯ãã¯ãŒã¯ã°ã«ãŒãã«é©åãªãµã€ãºãéžæããããšã«ãã£ãŠæäŸãããŸãã
ã«ãªãã¥ã©ã ã³ãŒãã¯ãäžèšã®3x3ãŒããã¢ã«ãŽãªãºã ãå®è£ ããŠããŸãã ã³ãŒãã®æãèå³æ·±ãéšåã¯ãmain.cppãã¡ã€ã«ã«ãããŸãã
ã³ãŒã
//1. . //2. OpenCL C OpenCL 1.2. // Get the box blur kernel compiled using OpenCL 1.2 (which is the // default compilation, even on an OpenCL 2.0 device). This allows // the code to show the pre-OpenCL 2.0 behavior. cl::Kernel kernel_1_2 = GetKernel(device, context); //3. OpenCL C OpenCL 2.0 ( OpenCL 2.0). // Get the box blur kernel compiled using OpenCL 2.0. OpenCL 2.0 // is required in order to use the non-uniform work-groups feature. kernel_2_0 = GetKernel(device, context, "-cl-std=CL2.0"); //4. , . // Set the size of the global NDRange, to be used in all NDRange cases. // Since this is a box blur, we use a global size that is two elements // smaller in each dimension. This creates a range which often doesn't // divide nicely by local work sizes we might commonly pick for running // kernels. cl::NDRange global_size = cl::NDRange(input.get_width() - 2, input.get_height() - 2); //5. , OpenCL 1.2, local_size NULL. // Blur the image with a NULL local range using the OpenCL 1.2 compiled // kernel. cout << "Compiled with OpenCL 1.2 and using a NULL local size:" << end1 << end1; output = RunBlurKernel(context, queue, kernel_1_2, global_size, cl::NullRange, input, true); //6. , OpenCL 1.2, local_size 16x16. // Blur the image with an even local range using the OpenCL 1.2 // compiled kernel. This won't work, even if we are running on an // OpenCL 2.0 implementation. The kernel has to be explicitly compiled // with OpenCL 2.0 compilation enabled in the compiler switches. try { cout << "Compiled with OpenCL 1.2 and using an even local size:" << end1 << end1; output = RunBlurKernel(context, queue, kernel_1_2, global_size, cl::NDRange(16, 16), input, true); cout << end1; output.Write(output_files[1]); } catch (...) { cout << "Trying to launch a non-uniform workgroup with a kernel " "compiled using" << end1 << "OpenCL 1.2 failed (as expected.)" << end1 << end1; } //7. , OpenCL 2.0, local_size NULL. // Blur the image with a NULL local range using the OpenCL 2.0 // compiled kernel. cout << "Compiled with OpenCL 2.0 and using a NULL local size:" << end1 << end1; output = RunBlurKernel(context, queue, kernel_2_0, global_size, cl::NullRange, input, true); //8. , OpenCL 2.0, local_size 16x16. // Blur the image with an even local range using the OpenCL 2.0 // compiled kernel. This will only work on an OpenCL 2.0 device // and compiler. cout << "Compiled with OpenCL 2.0 and using an even local size:" << end1 << end1; output = RunBlurKernel(context, queue, kernel_2_0, global_size, cl::NDRange(16, 16), input, true); //9. , . 2â5.
段èœã®åãªãã·ã§ã³ã 5-8ã§ã¯ãNDRangeã®åé ã®ããããã§get_local_size ïŒïŒããã³get_get_enqueued_local_sizeïŒïŒãåŒã³åºããçµæãç»é¢ã«è¡šç€ºãããŸãã ãããã£ãŠãNDRangeãã¯ãŒãã³ã°ã°ã«ãŒãã«åå²ãããæ§åãããããŸãã ãŒããã¢ã«ãŽãªãºã ãå®è£ ããã«ãŒãã«ã¯ãBoxBlur.clã«ä¿åãããŸãã éåžžã«åçŽãªå®è£ ãå«ãŸããŠããŸããããŒãããé©çšããæãå¹æçãªæ¹æ³ã§ã¯ãããŸããã
ãã®ãã¥ãŒããªã¢ã«ããã«ãããŠå®è¡ããã«ã¯ã次ã®èŠä»¶ãæºããPCãå¿ èŠã§ãã
- Broadwellãšããã³ãŒãããŒã ã®Intel®Coreâ¢ããã»ããµã·ãªãŒãºã
- Microsoft Windows * 8ãŸãã¯8.1ã
- Intel®SDK for OpenCLâ¢ã¢ããªã±ãŒã·ã§ã³ããŒãžã§ã³2014 R2以éã
- Microsoft Visual Studio * 2012以éã
ã«ãªãã¥ã©ã ã¯ãäžèšã®ã»ã¯ã·ã§ã³ã§èª¬æããåNDRangeããªã¢ã³ãã®å ¥åãããããããèªã¿åããåºåãããããããæžã蟌ãã³ã³ãœãŒã«ã¢ããªã±ãŒã·ã§ã³ã§ãã ãã®ãã¥ãŒããªã¢ã«ã§ã¯ãããã€ãã®ã³ãã³ãã©ã€ã³ãªãã·ã§ã³ããµããŒãããŠããŸãã-hã-ïŒ ïŒãã«ãããã¹ãã衚瀺ããŠçµäºïŒã-i <å ¥åãã¬ãã£ãã¯ã¹>ïŒå ¥åããããããã®ãã¬ãã£ãã¯ã¹ïŒã-o <åºåãã¬ãã£ãã¯ã¹>ïŒåºåããããããã®ãã¬ãã£ãã¯ã¹ïŒã
æäŸãããå³é¢ã®ãã¬ãŒãã³ã°ããã°ã©ã ãéå§ãããšãçµæã¯æ¬¡ã®ããã«ãªããŸãã
é衚瀺ã®ããã¹ã
Input file: input.bmp Output files: output_0.bmp, output_1.bmp, output_2.bmp, output_3.bmp Device: Intel(R) HD Graphics 5500 Vendor: Intel(R) Corporation Compiled with OpenCL 1.2 and using a NULL local size: Work Item get_global_id() get_local_size() get_enqueued_local_size() ------------------------------------------------------------------------- Top left (0,0) (1,239) undefined Top right (637,0) (1,239) undefined Bottom left (0,477) (1,239) undefined Bottom right (637,477) (1,239) undefined Compiled with OpenCL 1.2 and using an even local size: Trying to launch a non-uniform workgroup with a kernel compiled using OpenCL 1.2 failed (as expected.) Compiled with OpenCL 2.0 and using a NULL local size: Work Item get_global_id() get_local_size() get_enqueued_local_size() Top left (0,0) (1,239) (1,239) Top right (637, 0) (1,239) (1,239) Bottom left (0,477) (1,239) (1,239) Bottom right (637,477) (1,239) (1,239) Compiled with OpenCL 2.0 and using an even local size: Work Item get_global_id() get_local_size() get_enqueued_local_size() Top left (0,0) (16,16) (16,16) Top right (637,0) (14,16) (16,16) Bottom left (0,477) (16,14) (16,16) Bottom right (637,477) (14,14) (16,16) Done!
å ¥åç»åã®ãµã€ãºã¯640x480ã§ãããããããããã®å Žåã®NDRangeã®ãµã€ãºã¯638x478ã§ãã äžèšã®çµæã¯ã local_sizeãã©ã¡ãŒã¿ãŒã®NULLå€ã䜿çšããŠOpenCL 1.2ã«ãŒãã«ãèµ·åãããšãåã¯ãŒã¯ã°ã«ãŒãïŒ 1ã239ïŒã«å¥æ°ãµã€ãºã®äœ¿çšã匷å¶ãããããšã瀺ããŠããŸãã 2ã®ã¹ãä¹ã§ã¯ãªãã¯ãŒã¯ã°ã«ãŒããµã€ãºã¯ãäžéšã®ã³ã¢ã§éåžžã«é ãåäœããå ŽåããããŸãã SIMDãã€ãã©ã€ã³ã¯ã¢ã€ãã«ç¶æ ã§ããå¯èœæ§ããããåæã¡ã¢ãªã¢ã¯ã»ã¹ãæãªãããå¯èœæ§ããããŸãã
æå®ãããã¯ãŒã¯ã°ã«ãŒããµã€ãºïŒ16x16ïŒã§OpenCL 1.2ã«ãŒãã«ãå®è¡ãããšã648ã478ã16ã§å²ãåããªãããããšã©ãŒãã¹ããŒãããŸãã
NULLå€ã®local_sizeãã©ã¡ãŒã¿ãŒã§OpenCL 2.0ã«ãŒãã«ãèµ·åãããšãOpenCLå®è¡å¯èœãã¡ã€ã«ãNDRangeãä»»æã®ãµã€ãºã®ã¯ãŒã¯ã°ã«ãŒãã«åå²ã§ããŸãã çµæãäžã«ç€ºããŸããå®è¡å¯èœã¢ãžã¥ãŒã«ã¯ãOpenCL 1.2ã«ãŒãã«ã®å Žåãšåãæ¹æ³ã§ãã¯ãŒã¯ã°ã«ãŒãã®åäžãªãµã€ãºãåŒãç¶ã䜿çšããŠããããšãããããŸãã
ç¹å®ã®ã¯ãŒã¯ã°ã«ãŒããµã€ãºïŒ16x16ïŒã§OpenCL 2.0ã«ãŒãã«ãå®è¡ãããšãNDRangeãµã€ãºãç°çš®ã®ã¯ãŒã¯ã°ã«ãŒãã«åå²ãããŸãã å·Šäžã®ã¯ãŒãã³ã°ã°ã«ãŒãã¯16x16ãå³äžã¯14x16ãå·Šäžã¯16x14ãå³äžã¯14x14ã§ãã ã»ãšãã©ã®å Žåãã¯ãŒã¯ã°ã«ãŒãã®ãµã€ãºã¯16x16ã§ããããããã®ã³ã¢ã¯SIMDãã€ãã©ã€ã³ãéåžžã«å¹ççã«äœ¿çšããã¡ã¢ãªã¢ã¯ã»ã¹ã¯éåžžã«é«éã«ãªããŸãã
IDZ Webãµã€ãã®èšäºã®ãã«ããŒãžã§ã³ïŒ
è±èªã®ãªãªãžãã«èšäºïŒ