
ATI Stream SDKã¯AMD Accelerated Parallel ProcessingïŒAPPïŒSDKã«ååãå€æŽãããOpenCLã¯Brook +ãèšç®ããããã®ã¡ã€ã³ããã°ã©ãã³ã°èšèªGPGPUã眮ãæããŸããã ãã ããAMD Compute Abstraction LayerïŒCALïŒ/ Intermediate LanguageïŒILïŒãšããå¥ã®ãã¯ãããžã䜿çšããŠãATIã«ãŒãã®ã³ãŒããèšè¿°ããããšãã§ããããšãç解ããŠãã人ã¯ããŸãããŸããã CALãã¯ãããžã¯ãGPUãšçžäºäœçšããŠCPUäžã§å®è¡ãããã³ãŒããäœæããããã«èšèšãããŠããŸããã ILãã¯ãããžã䜿çšãããšãGPUã§çŽæ¥å®è¡ãããã³ãŒããäœæã§ããŸãã
ãã®èšäºã§ã¯ã ILãã¯ãããžãŒããã®ç¯å²ãå¶éãããã³OpenCLã«å¯Ÿããå©ç¹ãæ€èšŒããŸãã ç«ã®äžã§èª°ãæ°ã«ããŠãã ããã
ã¯ããã«
ã¯ããã«ãNvidia CUDA SDKãšã®æ¯èŒã次ã«ç€ºããŸãã
- é«ã¬ãã«ããã°ã©ãã³ã°èšèªïŒ
- NvidiaïŒCUDA C ++æ¡åŒµæ©èœ
- AMDïŒOpenCL 1.1ãŸãã¯Compute Abstraction LayerïŒCALïŒ
- äœã¬ãã«ããã°ã©ãã³ã°èšèªïŒæ¬äŒŒã¢ã»ã³ãã©ãŒ*ïŒïŒ
- NvidiaïŒäžŠåã¹ã¬ããå®è¡ïŒPTXïŒ
- AMDïŒäžéèšèªïŒILïŒ
- ãGPUäŸ¡æ Œãã«å¯Ÿããã1ç§ãããã®ãªãŠã ã®æ°ãïŒããšãã°ã1ç§ãããã«ãœãŒããããããã·ã¥ã®æ°ïŒã®æ¯çïŒ
- NVIDIAïŒx
- AMDïŒCAL / ILãã³ãã«ã䜿çšããå ŽåãçŽ2å
ã©ã®ããã«ããŠãã®ãããªããã©ãŒãã³ã¹ã®åäžãåŸãããšãã§ããŸããïŒ
AMD GPUã¢ãŒããã¯ãã£ã®æ©èœ
Nvidia PTXä»æ§ãšAMD ILä»æ§ã泚ææ·±ãèªããšãNvidia PTXã®ãªãã©ã³ãã¯åäžã³ã³ããŒãã³ããã¯ãã«ïŒã€ãŸããåçŽãªnãããã¬ãžã¹ã¿ïŒã§ãããAMD ILãªãã©ã³ãã¯nãããã¬ãžã¹ã¿ã®4ã³ã³ããŒãã³ããã¯ãã«ã§ããããšãããããŸãã ã äž¡æ¹ã®èšèªã§ä¹ç®ã®æäœãèæ ®ãããšãããã¯ããæ確ã«ãªããŸãã
# Nvidia PTX mul.u32 %r0, %r1, %r2 # AMD IL umul r0.xyzw, r1.xyzw, r2.xyzw
ãããã£ãŠã1åïŒã»ãŒ1åïŒã®æäœã§ãAMD GPUã¯æ倧4ã€ã®nãããã¬ãžã¹ã¿ãå€æŽã§ããNvidia GPUã¯1ã€ã®nãããã¬ãžã¹ã¿ã®ã¿ãå€æŽã§ããŸãïŒ1ã€ã®GPUã¹ããªãŒã å ãæå³ããŸãïŒã ãã ããOpenCLã§ã¯ããã«ãã³ã³ããŒãã³ããã¯ã¿ãŒã宣èšããŠæäœããããšãã§ããŸãã 次ã«ãéãã¯äœã§ããããªããã®ILã¯ãŸã£ããå¿ èŠã§ããïŒ
OpenCLãšã¯ç°ãªã
ãããŠããã¹ãŠã®éãã¯ãAMD APP SDKã®éçºè ãOpenCLä»æ§ã«åŸã£ãŠæžãããã³ãŒããAMD ILã§æžãããã³ãŒãã«å€æããã³ã³ãã€ã©ãŒãäœæããã®ãå°é£ãŸãã¯æè¡çã«äžå¯èœã ã£ããšããç¹ã§ãã²ã©ããã®ã§ãã ãããã£ãŠãOpenCLæšæºã®ãµããŒãã«é¢ããå¶é ïŒ
- OpenCL 1.0ã¯ãã»ãŒRadeon HD 4000ã·ãªãŒãºãããµããŒããããŠããŸãïŒããŒã¿ã¬ãã«ãµããŒãïŒïŒç»åãªããžã§ã¯ããã€ãŸããã¯ã¹ãã£ã¡ã¢ãªã®ãµããŒãããªãå ŽåããããŸãïŒã
- Radeon HD 5000ã·ãªãŒãºãäžå¿ã«OpenCL 1.1ããµããŒããããŠããŸã
- OpenCL 1.2ã¯Radeon HD 7000以éã§ãµããŒããããŠããŸããããã®ããŒãžã§ã³ã®æšæºããµããŒãããSDKã¯ãŸã ãªãªãŒã¹ãããŠããŸãã
AMD ILã䜿çšãããšãGPGPUã³ã³ãã¥ãŒãã£ã³ã°ã«Radeon HD 3000ã·ãªãŒãºãRadeon HD 2000ã·ãªãŒãºã®ã«ãŒãã䜿çšã§ããããšã«æ³šæããŠãã ããïŒ ïŒå®å šã«æ£ç¢ºã«èšããšããããã¯R600ãRV610ãRV630ãããã³RV670ãããã«åºã¥ãGPUã§ãïŒ
ããã«ãç°¡æœã«ããããã«ãRadeon HD 5000ã·ãªãŒãºä»¥éã®ãã¹ãŠã®GPUãEvergreen GPUïŒããã¯Radeon HD 5700ãããïŒãšããŠæå®ããŸãããããã®ã«ãŒãã§ã®ã¿èå³æ·±ãæäœããµããŒããããŠããããã§ãã
AMD ILã®ã³ãŒãäœæã®ååã®èª¬æã«ç§»ãåã«ã泚æãåèµ·ããããšæããŸãã
ã¡ã¢ãªãæäœããæ©èœ
åè¿°ããããã«ãAMD GPUã¯nãããã¬ãžã¹ã¿ã®4ã€ã®ã³ã³ããŒãã³ããã¯ãã«ã§åäœããŸããããã§ãn = 32ã§ãïŒããã«64ãããã¬ãžã¹ã¿ã®åäœæ¹æ³ã«ã€ããŠïŒã ããã«ãããã¡ã¢ãªã«å€§ããªå¶éã課ãããŸããã¡ã¢ãªã¯16ãã€ãã®åæ°ã§ã®ã¿å²ãåœãŠãããšãã§ããŸãã ã¡ãã³ãã¡ã¢ãªããããŒããããšãããããã®16ãã€ããåã³æå°äŒé容éã«ãªãããšãèŠããŠããå¿ èŠããããŸãã ã€ãŸããã¡ã¢ãªãå1ãã€ãã®4ã€ã®æåãã¯ãã«ïŒchar4ïŒãå4ãã€ãã®4ã€ã®æåãã¯ãã«ïŒint4ïŒã§æ§æãããããšã瀺ããã©ããã¯ãŸã£ããé¢ä¿ãããŸãããçµæã¯1ã€ã§ããã¡ã¢ãªããã1ã€ã®äº€ææäœãèªã¿èŸŒãŸããŸã16ãã€ãã
ããã«ãNvidia GPUãšã¯ç°ãªããAMD GPUã¯ã°ããŒãã«é åã«ããŒã«ã«ã¡ã¢ãªãå²ãåœãŠãŸãïŒããã¯éåžžã«é ãããŒã¿è»¢éé床ãæå³ããŸãïŒã®ã§ãããŒã«ã«ã¡ã¢ãªãå¿ããŠãã ããã ã¬ãžã¹ã¿ãšã°ããŒãã«ã¡ã¢ãªã䜿çšããŸãã
ãããŠæåŸã«ïŒNvidia GPUãšã¯ç°ãªããèªã¿åããšæžã蟌ã¿ã«æ©èœããã°ããŒãã«ã¡ã¢ãªã¯1ã€ã®ã¿ïŒä»¥äžãg []ãïŒããã¯ã¹ãã£ã¡ã¢ãªã«ã¯ããŸããŸãªãœãŒã¹ããããŸãïŒä»¥äžãi0ãããi1ã ããªã©ïŒãšã³ã³ã¹ã¿ã³ãã¡ã¢ãªïŒä»¥äžãcb0ãããcb1ããªã©ïŒã¯èªã¿åãå°çšã§ãã
å®æ°ã¡ã¢ãªã®æ©èœã¯ããã¹ãŠã®GPUã¹ã¬ããã1ã€ã®ããŒã¿é åã«ã¢ã¯ã»ã¹ãããšãã«ãã£ãã·ã¥ãååšããããšã§ãïŒã¬ãžã¹ã¿ãšåããããéãåäœããŸãïŒã
ãã¯ã¹ãã£ã¡ã¢ãªã®æ©èœã¯ãèªã¿åããã£ãã·ã¥ïŒã¡ã¢ãªã1ã€ã®ã¹ããªãŒã ããã»ããµã«ã€ã8 KBã®å ŽåïŒãšãå®éã®åº§æšã§ã¡ã¢ãªã«ã¢ã¯ã»ã¹ããæ©èœã§ãã ãã¯ã¹ãã£ã®å¢çãè¶ ããå Žåãå¢çèŠçŽ ãèªã¿åãããã«ãŒãããŠæåã«èªã¿åãããšãã§ããŸãïŒåº§æšã¯ãã¯ã¹ãã£ã®å¹ /é·ããæ³ãšããŠååŸãããŸãïŒã
ãããŠä»ã楜ããéšåã®ããã«ïŒ
AMD ILã®ã³ãŒãæ§é
ã¬ãžã¹ã¿ãæäœãã
ãŸããæäœäžã®ã¬ãžã¹ã¿éã®äº€æãã©ã®ããã«çºçãããã«ã€ããŠã®ç°¡åãªèª¬æã
ãã¯ãã«ã³ã³ããŒãã³ãã®ä»£ããã®åºåã¬ãžã¹ã¿ã«ã¯ãã³ã³ããŒãã³ãã®ååãŸãã¯èšå·ã_ããå«ãŸããå ŽåããããŸããããã¯ããã®ã³ã³ããŒãã³ããå€æŽãããªãããšãæå³ããŸãã
åã³ã³ããŒãã³ãã®ä»£ããã®åå ¥åã¬ãžã¹ã¿ã«ã¯ãã0ããŸãã¯ã1ãã®4ã€ã®ã³ã³ããŒãã³ãã®ååãå«ããããšãã§ããŸãã ããã¯ãå ¥åã¬ãžã¹ã¿ã³ã³ããŒãã³ããŸãã¯å®æ°ã®ããããããåºåã¬ãžã¹ã¿ã®å¯Ÿå¿ããã³ã³ããŒãã³ãã®æäœã«é¢äžããããšãæå³ããŸãã äŸãæããŠèª¬æããŸãïŒ
# r0.x = r1.z # r0.y = r1.w # r0.w = r1.y mov r0.xy_w, r1.zwyy # r0.y = 1 # r0.z = 0 mov r0._yz_, r1.x100
ã·ã§ãŒããŒ
AMD GPUã®ã³ãŒãã¯ãã·ã§ãŒããŒã®åœ¢åŒã§çºè¡ãããŸãã ã³ã³ãã¥ãŒã¿ãŒã·ã§ãŒããŒïŒã³ã³ãã¥ãŒã¿ãŒã·ã§ãŒããŒãCSïŒãšãã¯ã»ã«ã·ã§ãŒããŒïŒãã¯ã»ã«ã·ã§ãŒããŒãPSïŒã®äž¡æ¹ãå®è¡ããããšãã§ããŸãã ãã ããCSã¯Radeon HD 4000ã·ãªãŒãºããã®ã¿ãµããŒããããŸãã ããã«ããããã®é床ã¯ã»ãŒåãã§ãã
GPUã§åæã«èµ·åãããã¹ã¬ããã®æ°ã¯ãèµ·åãã©ã¡ãŒã¿ãŒïŒãããã¯æ°ããããã¯ããšã®ã¹ã¬ããæ°ïŒã«ãã£ãŠæ±ºå®ãããããšãç¥ãããŠããŸãã GPUã®åãã«ãããã»ããµïŒ8åããïŒã¯ãå®è¡ã®ããã«1ãããã¯ãåããŸãã 次ã«ããããã¯ããšã«èŠæ±ãããã¹ã¬ããæ°ãæçã«åå²ãïŒã¯ãŒãã32ã®åæ°ïŒãåã¹ã¬ããããã»ããµã«å®è¡çšã®1ã¯ãŒããäžããŸãã ãããã£ãŠãåæã«å®è¡ãããã¹ã¬ããã®å®éã®æ°ã¯æ¬¡ã®ãšããã§ãã
<multiprocessors_count> * <stream_processors_per_multiprocessor_count> * <warp_size>
ãã®ãããæéã®äœæ¥ãè¡ãã«ã¯ã1ã€ã®ã¯ãŒãã®ãã¬ãŒã ã¯ãŒã¯å ã§ãã¹ã¬ãããåå²ããã«åãæäœãå®è¡ããå¿ èŠããããŸãã ãã®åŸããã®æäœã¯äžåºŠã«å®è¡ãããŸãã
ç空äžã®ç圢銬ãèæ ®ããªãããã«ãåçŽãªã¿ã¹ã¯ãèæ ®ããŸãïŒåã¹ã¬ããã¯ãããã¯å ã®ããŒã«ã«èå¥åïŒ32ãããïŒãã°ããŒãã«èå¥åïŒ32ãããïŒãèšç®ããåœä»€ã¡ã¢ãªãšããŒã¿ã¡ã¢ãªããå®æ°ïŒ64ãããïŒãèªã¿åããèªã¿åããã¯ã¹ãã£ã®èŠçŽ ïŒ128ãããïŒã 圌ã¯ããããã¹ãŠåºåã¡ã¢ãªã«æžã蟌ã¿ãŸããåã¹ã¬ããã«ã¯ããã«256ããããå¿ èŠã§ãã
泚ïŒãã¯ã¹ãã£ã®åè¡ã«ã¯ã1ãããã¯ã®ãããŒã®ããŒã¿ãå«ãŸããŠããŸãã
ãã¯ã»ã«ã·ã§ãŒããŒ
il_ps_2_0 ; (cb0): ; cb0[0].x - ; cb0[0].y - ; cb0[0].zw - dcl_cb cb0[1] ; (i0) ; - ( ), ( float 0 1) ; ( uint) dcl_resource_id(0)_type(2d,unnorm)_fmtx(uint)_fmty(uint)_fmtz(uint)_fmtw(uint) ; dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__ ; (g[]) ; , dcl_literal l0, 0xFFFFFFFF, 0xABCDEF01, 0x3F000000, 2 ; ; r0.x - x i0 (float) ( ) ; r0.y - y i0 (float) ( ) ftoi r0.xyzw, vWinCoord0.xyxy ; r0.z - (uint) umad r0.__z_, r0.wwww, cb0[0].yyyy, r0.zzzz ; ftoi r1.x___, vWinCoord0.xxxx mov r1._y__, r0.zzzz mov r1.__z_, cb[0].xxxx mov r1.___w, l0.yyyy ; g[] umul r0.__z_, r0.zzzz, l0.wwww ; mov g[r0.z+0].xyzw, r1.xyzw ; i0 ; float 0.5 itof r0.xy__, r0.xyyy add r0.xy__, r0.xyyy, l0.zzzz sample_resource(0)_sampler(0)_aoffimmi(0,0,0) r1, r0 ; sample_resource(0) - i0 ; _sampler(0) - sampler'a #0 ; _aoffimmi(0,0,0) - x, y, z ; , _aoffimmi(1,0,0); - _aoffimmi(0,1,0) ; mov g[r0.z+1].xyzw, r1.xyzw ; endmain ; end
èšç®ã·ã§ãŒããŒ
ãã¹ãŠã®éãã¯ãããŒèå¥åã®èšç®ã®ã¿ã§ãæ®ãã¯åãã§ãã
il_cs_2_0 dcl_num_thread_per_group 64 ; (cb0): ; cb0[0].x - ; cb0[0].yzw - dcl_cb cb0[1] ; (i0) ; - ( ), ( float 0 1) ; ( uint) dcl_resource_id(0)_type(2d,unnorm)_fmtx(uint)_fmty(uint)_fmtz(uint)_fmtw(uint) ; (g[]) ; , dcl_literal l0, 0xFFFFFFFF, 0xABCDEF01, 0x3F000000, 2 ; mov r0._y__, vThreadGrpIDFlat.xxxx ; mov r0.x___, vTidInGrpFlat.xxxx ; mov r0.__z_, vAbsTidFlat.xxxx ; mov r1.x___, vTidInGrpFlat.xxxx mov r1._y__, vAbsTidFlat.xxxx mov r1.__z_, cb[0].xxxx mov r1.___w, l0.yyyy ; g[] umul r0.__z_, r0.zzzz, l0.wwww ; mov g[r0.z+0].xyzw, r1.xyzw ; i0 ; float 0.5 itof r0.xy__, r0.xyyy add r0.xy__, r0.xyyy, l0.zzzz sample_resource(0)_sampler(0)_aoffimmi(0,0,0) r1, r0 ; sample_resource(0) - i0 ; _sampler(0) - sampler'a #0 ; _aoffimmi(0,0,0) - x, y, z ; , _aoffimmi(1,0,0); - _aoffimmi(0,1,0) ; mov g[r0.z+1].xyzw, r1.xyzw ; endmain ; end
ã·ã§ãŒããŒã®éã
ããŸããŸãªã«ãŒãã§ã®ãµããŒãã«å ããŠãã·ã§ãŒããŒã®äž»ãªéãã¯ããããã¯ããšã«èµ·åãããã¹ã¬ããã®æ°ã®æ ŒçŽå Žæã§ãã PSã®å Žåããã®å€ã¯ã¡ã¢ãªã«ä¿åã§ããŸãããCSã®å Žåããã®å€ã¯ã³ãŒãã«ãã³ãããå¿ èŠããããŸãã ããã«ãCSã¯ãããŒèå¥åãèšç®ããã®ãç°¡åã§ãã
ãããã«
ãã®èšäºã§ã¯ãGPUèªäœã§å®è¡ããããã«AMD ILã§ç°¡åãªã³ãŒããèšè¿°ããæ¹æ³ã説æããããšããŸããã çµè«ãšããŠãäœæ¥é床ã®æé©åã«é¢ããããã€ãã®èšèïŒ
- ã¢ã»ã³ãã©ãŒåºæã®æé©åææ³ïŒå®æ°ã«ããæŒç®ã®äºåèšç®ãç¬ç«ããæŒç®ã®é åïŒãé©çšããããšããªãã§ãã ããã ããã¯ãŸã æ¬äŒŒã¢ã»ã³ãã©ã§ããããšãå¿ããªãã§ãã ãããã³ã³ãã€ã©ãæé©åãè¡ããŸãã ã¢ã«ãŽãªãºã ã«ã€ããŠããèããŠãã ããã
- ã§ããã ãå€ãã®ããŒã¿ãã«ãŒãã«ã¢ããããŒãããŸãã ãã¯ãã«ã®4ã€ã®ã³ã³ããŒãã³ããã¹ãŠã®32ãããããã¹ãŠäœ¿çšããããšããå§ãããŸãã
- å ¥åããŒã¿ã«å¯ŸããŠåãã¿ã€ãã®èšç®ïŒããã·ã¥èšç®ãªã©ïŒãããå Žåãæäœã®ã³ã³ããŒãã³ãã®æ°ãè©ŠããŠã¿ã䟡å€ããããŸããr0.x___ãããé«éã«åäœããå Žåããããr0.xy ___ããã³r0.xyzwãåäœããå ŽåããããŸãã
- AMDã¯ããããã¯å ã®ã¹ã¬ããã®æ°ã¯<warp_size>ã®åæ°ã«ã§ããGPUã¯æ£ããåäœãããšäž»åŒµããŠããŸãããå®éã¯ããã§ã¯ãããŸããã å®éã«ã¯ã<warp_size> = 32ãŸãã¯64ã®ã¿ã衚瀺ãããGPUã¯ãããã¯å ã®ã¹ã¬ããæ°ã<warp_size>ã«çããå Žåã«ã®ã¿æ£ããæ©èœããŸããã ããã«ãRadeon HD 4650ã¯ããããã¯å ã®32ã¹ã¬ããã§èµ·åãããšïŒããã³æè¡ããŒã¿ã«ãããšããã®ã«ãŒãã®<warp_size> = 32ïŒãã¢ã«ãŽãªãºã ã®1ã€ã§ééã£ãããŒã¿ãæäŸããŸãããããããã¯å ã«64ã¹ã¬ããã®ãã³ã§åäœããŸããã çµè«ïŒãããã¯ããšã«64ã¹ã¬ããã®ã¿ã§ã¢ã«ãŽãªãºã ãå®è¡ããŸãïŒãããã¯ã®æ°ã¯ãã§ã«å€æŽå¯èœã§ãïŒã
- Evergreen GPUã¯ããã€ãã®ã¯ãŒã«ãªæ©èœããµããŒãããŠããŸãïŒåŸªç°ã·ããããªãŒããŒãããŒãã©ã°ã®ãµããŒãã64ãããæäœã®ãµããŒãïŒãã®ããã«2ã€ã®ã³ã³ããŒãã³ããäºçŽãããŠããŸãïŒã æ®å¿µãªãããEvergreenãããè¥ã家æã®GPUã¯ãããããã¹ãŠã®ãã³ããµããŒãããŠããŸããã 誰ãã64ãããæäœãèšè¿°ããæ¹æ³ãæããŠããããããããããã§ãã
ããŒã¿ãã«ãŒãã«è»¢éããããããããŒã¿ãåéããæ¹æ³ã«ã€ããŠã¯ãAMD Compute Abstraction LayerïŒCALïŒã«é¢ãã第2éšã§èª¬æããŸãã