// func.cpp void benchmark_func(int* a) { for (int i = 0; i < 32; ++i) a[i] += 1; }
ããŠããããããçš®ã®ãã€ã¯ããã³ãããŒã¯ã§ã©ããããäœåºŠãïŒçµæãå¹³åããããã«ïŒåŒã³åºããŠãäœãèµ·ãããèŠãŠã¿ãŸãããã ã³ã³ãã€ã©ãŒãããã§äœãããæé©åãããŠããªãããšã確èªããããã ãã«ãçæãããåœä»€ãèŠãããšãã§ããŸãã ãŸããã«ãŒããããã«ããã¯ã§ããããšã確èªããããã«ãããã€ãã®ç°ãªããã¹ããå®è¡ã§ããŸãã ãŸããããã ãã§ãã ç§ãã¡ã¯æž¬å®ãããã®ãç解ããŠããŸãããïŒ
ãã¡ã€ã«ã«å¥ã®é¢æ°ãããããšãæ³åããŠã¿ãŸããããé床ã枬å®ããŸãããåå¥ã®ãã¹ãã§ãã ããªãã¡ ãã¡ã€ã«ã¯æ¬¡ã®ããã«ãªããŸãã
// func.cpp void foo(int* a) { for (int i = 0; i < 32; ++i) a[i] += 1; } void benchmark_func(int* a) { for (int i = 0; i < 32; ++i) a[i] += 1; }
ãããŠããæ¥ãããªãã®ãããŒãžã£ãŒãããªãã®ãšããã«æ¥ãŠãããªãã®ã©ã€ãã©ãªã®ãŠãŒã¶ãŒãããããªããçŽæããã»ã©éãåäœããªããšããèŠæ ã瀺ããŸãã ãã ããããã©ãŒãã³ã¹ããã枬å®ãããã¹ãçµæããåŸããã®ãæ£ç¢ºã«çŽæããŸããã äœãæªãã£ãã®ã§ããïŒ
ãŠãŒã¶ãŒã¯ãbenchmark_funcïŒïŒé¢æ°ã®ãã¹ãã«ã®ã¿èå³ãããããããã®ããã ãã«ããã©ãŒãã³ã¹ãã¹ããå®è¡ãããšèšããŸãã
ãã£ã®ã¥ã¢
次ã®ãªãã·ã§ã³ã䜿çšããŠãææ°ã®Clangã§ãã®ã³ãŒããã³ã³ãã€ã«ããŸããã
-O2 -march=skylake -fno-unroll-loops
ãã®ã³ãŒããIntel Core i7-6700 Skylakeããã»ããµãŒã§å®è¡ããŸãã
ãã¹ãŠã®ã³ãŒããšãã«ãã¹ã¯ãªããã¯ã ããããããŠã³ããŒãã§ããŸã ã Googleãã³ãããŒã¯ã©ã€ãã©ãªãå¿ èŠã«ãªããŸã ã
2ã€ã®é¢æ°ã䜿çšããã³ãŒãã®ããŒãžã§ã³ããåºæ¬ãããããŠãbenchmark_funcé¢æ°ã®ã¿ã䜿çšãããªãã·ã§ã³-ãno_fooããåŒã³åºããŸãããã çµæã¯æ¬¡ã®ãšããã§ãã
$ ./baseline.sh --------------------------------------------------------- Benchmark CPU Iterations Throughput Clockticks/iter --------------------------------------------------------- func_bench_median 4 ns 191481954 32.5626GB/s 74.73 $ ./no_foo.sh --------------------------------------------------------- Benchmark CPU Iterations Throughput Clockticks/iter --------------------------------------------------------- func_bench_median 4 ns 173214907 29.5699GB/s 84.54
ãClockticks / iterãã¡ããªãã¯ãèªåã§èšç®ããbenchmark_funcïŒïŒé¢æ°ã®ãã£ãã¯æ°ãå埩åæ°ã§é€ç®ããŸããã
å¥åŠãªããšã«ããã¹ãã§ã¯ãŸã£ããåŒã³åºãããªãé¢æ°fooïŒïŒããœãŒã¹ã³ãŒãã®ãããã¡ã€ã«ããåé€ãããšãæ®ãã®é¢æ°ã®ããã©ãŒãã³ã¹ã10ïŒ ã»ã©äœäžããŸããã
ããã§äœãèµ·ãã£ãŠããã®ããç解ããŠã¿ãŸãããã
å°ãå ãèŠãŠãbenchmark_funcïŒïŒé¢æ°çšã«çæãããã¢ã»ã³ãã©ã³ãŒãã¯äž¡æ¹ã®å Žåã§åäžã§ãããå¯äžã®éãã¯ãã€ããªå ã®äœçœ®ãšå éšã«ãŒãã®ã¢ã©ã€ã¡ã³ãã§ãã
æåã«ããããŒã¹ãããŒãžã§ã³çšã«çæãããã³ãŒããèŠãŠã¿ãŸãããã
$ objdump -d a.out -M intel | grep "<_Z14benchmark_funcPi>:" -A15 00000000004046c0 <_Z14benchmark_funcPi>: 4046c0: 48 c7 c0 80 ff ff ff mov rax,0xffffffffffffff80 4046c7: c5 fd 76 c0 vpcmpeqd ymm0,ymm0,ymm0 4046cb: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 4046d0: c5 fe 6f 8c 07 80 00 vmovdqu ymm1,YMMWORD PTR [rdi+rax*1+0x80] 4046d7: 00 00 4046d9: c5 f5 fa c8 vpsubd ymm1,ymm1,ymm0 4046dd: c5 fe 7f 8c 07 80 00 vmovdqu YMMWORD PTR [rdi+rax*1+0x80],ymm1 4046e4: 00 00 4046e6: 48 83 c0 20 add rax,0x20 4046ea: 75 e4 jne 4046d0 <_Z14benchmark_funcPi+0x10> 4046ec: c5 f8 77 vzeroupper 4046ef: c3 ret
ã³ãŒãããã£ãã·ã¥ã©ã€ã³ã®å¢çã«é 眮ãããŠããããšãããããŸãïŒ0x406c0 mod 0x40 == 0x0ïŒã ããã¯ããã§ãã ããããIntelããã»ããµã¢ãŒããã¯ãã£ã«ã€ããŠã¯ããŸã ç¥ã£ãŠããã¹ãããšããããŸãã Skylakeããã»ããµãŒã«ã¯ã1åã®ãã¹ã§16ãã€ãã®åœä»€ãéžæãããã€ã¯ãåœä»€å€æãšã³ãžã³ã§ããMITEïŒãã€ã¯ãåœä»€å€æãšã³ãžã³ïŒããããŸãã ããã§éèŠãªç¹ã¯ã16ãã€ãã ãã§ãªãã16ãã€ãééã«ã¢ã©ã€ã³ããããŠã£ã³ããŠããã®16ãã€ãã§ãããšããããšã§ãã ãããã®åœä»€ãéžæãããåŸããã³ãŒããŒã¯ããããäžé£ã®å°ããªãã€ã¯ãæäœïŒuopïŒã«å€æããŸãã ããã«ããããã®ãã€ã¯ããªãã¬ãŒã·ã§ã³ã¯ãå®è¡ã®æ¬¡ã®ã¹ããŒãžã«è»¢éãããŸãã
ããããDSBïŒDecoded Stream BufferïŒãšåŒã°ããå¥ã®ããŒããŠã§ã¢ãŠããããããããã®ååã瀺ãããã«ãããã¯ãã€ã¯ããªãã¬ãŒã·ã§ã³ãã£ãã·ã¥ã§ãã ãã§ã«æè¿å®äºããäžé£ã®åœä»€ãå®è¡ããå Žåã¯ãDSBã§ããã«å¯Ÿå¿ãããã€ã¯ãæäœããããã©ãããæåã«ç¢ºèªããŸãã ããã§èŠã€ãã£ãå Žåãããã«ãããMITEãåãããŒããã£ã¹ãããã ãã§ãªããRAMããèªã¿åãããšãã§ããŸãïŒäžè¬çã«ã¯åªããŠããŸãïŒã ãã ãããã€ã¯ãåœä»€ãDSBã«å°éããïŒãŸãã¯ååŸããªãïŒæ¹æ³ã«åœ±é¿ããç¹å®ã®å¶éããããŸããããã«ã€ããŠã¯ä»¥äžã§èª¬æããŸãã
äžèšã®ã¢ã»ã³ãã©ãŒã³ãã³ãã§ã¯ãã³ãŒãããã¯ãã«åãããŠãããå®éã«ã¯ã«ãŒãã®å埩ã4åã®ã¿ã§ããããšãããããŸããããã¯ãã®äŸã«é©ããŠããŸãããã以å€ã®å Žåã¯ãLSDïŒã«ãŒãã¹ããªãŒã ãã£ãã¯ã¿ãŒïŒãã«ãŒããæ€åºããã¡ã¢ãªããã®åœä»€ã®ãã§ãããåæ¢ããŸãã
Intelã¢ãŒããã¯ãã£ã®ããããã¹ãŠã®ãã¥ã¢ã³ã¹ã«é¢ãã詳现ã¯ãããã¥ã¡ã³ããIntel 64ããã³IA-32ã¢ãŒããã¯ãã£æé©åãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ãã«èšèŒãããŠããŸãã ãã®ãããã¯ã«é¢ããåªããZia Ansariã®ãã¬ãŒã³ããŒã·ã§ã³ãã芧ãã ããã
ã³ãŒãåœä»€ã®æŽåãéèŠ
åŸã§è°è«ããå 容ããã§ã«æšæž¬ããŠãããšæããŸãã ã©ã¡ãã®å Žåã§ããbenchmark_funcïŒïŒé¢æ°ãã³ãŒãå ã§ã©ã®ããã«é 眮ãããŠããããèŠãŠã¿ãŸãããã
ãåºæ¬ã±ãŒã¹ãïŒ

ãNo_fooãïŒ

äžã®å³ã®å€ªãé·æ¹åœ¢ã¯32ãã€ãã®ãŠã£ã³ããŠã瀺ããã«ãŒãæ¬äœã®æ瀺ã¯é»è²ã®èæ¯ã§ããŒã¯ãããŠããŸãã æåã®èŠ³å¯çµæã¯ã2çªç®ã®å Žåãã«ãŒãã®ã³ãŒãå šäœã1ã€ã®32ãã€ããŠã£ã³ããŠã«åé¡ãããæåã®å Žåã¯2ã€ã®ãŠã£ã³ããŠã«åæ£ããããšããããšã§ãã å®éã2çªç®ã®ã±ãŒã¹ã§ã¯ãDSBã«ã¢ã¯ã»ã¹ãããšãã«ãã¹ãååã«ãªãïŒDSB_MISS_PS 1800M察888MïŒãDSB-MITEãåãæ¿ãããªãŒããŒãããããŸã£ãããŒãïŒDSB2MITE_SWITCHESãPENALTY_CYCLES 888M察0ïŒã«ãªããŸãã ãããããªãããã¹ãŠã10ïŒ æªããªãã®ã§ããããïŒ ããããããŸã èæ ®ã«å ¥ããŠããªãä»ã®ã¢ãŒããã¯ãã£æ©èœãããã€ããããŸãã
ããã€ãã®å®éšãè¡ãããã³ãŒããããåœä»€ãDSBã«ã©ã®ããã«é 眮ããããã«ã€ããŠã®ããŸããŸãªä»®èª¬ããã¹ãããŸããããããã§ãå®å šã«ç解ããŠãããšã¯100ïŒ ç¢ºä¿¡ã§ããŸããã ããã«å®éšãæçš¿ããŸãã ã
ããã©ãŒãã³ã¹ã«ãŠã³ã¿ãŒã«ç°åžžã¯ãããŸããã§ããã 泚æã§ããå¯äžã®ããšã¯ããã©ã¡ãŒã¿ãŒã®2ã€ã®ã±ãŒã¹ã®éãã§ãã
IDQ_UOPS_NOT_DELIVEREDãCYCLES_0_UOPS_DELIVïŒ4100M察5200MïŒã ããªãããããäœã§ãããããããªãå Žå-èšäºã®çµãããèŠãŠããã¹ãŠã®äœ¿çšãããã«ãŠã³ã¿ãŒã®èª¬æããããŸãã
ããã«å ãž
ã¢ã©ã€ã¡ã³ããæ瀺çã«èšå®ããŠãããã«2ã€ã®å®éšãè¡ããŸããã-mllvm-align-all-functions = 5ããã³-mllvm -align-all-blocks = 5ïŒ
$ ./aligned_functions.sh --------------------------------------------------------- Benchmark CPU Iterations Throughput Clockticks/iter --------------------------------------------------------- func_bench_median 3 ns 218294614 36.8538GB/s 63.37 $ ./aligned_blocks.sh --------------------------------------------------------- Benchmark CPU Iterations Throughput Clockticks/iter --------------------------------------------------------- func_bench_median 3 ns 262104631 44.3106GB/s 46.25
bench_funcïŒïŒã32ãã€ãã®å¢çã§æŽåãããšã+ 13ïŒ ã®ããã©ãŒãã³ã¹ãåŸããã32ãã€ãã®å¢çã§é¢æ°bench_funcïŒïŒã®ãã¹ãŠã®ããŒã¹ãããã¯ïŒé¢æ°ã®éå§ãå«ãïŒãæŽåãããšã+ 36ïŒ ã®é床åäžãåŸãããŸããã ããããã§ãããïŒ
é¢æ°ã®é 眮ãããã±ãŒã¹ã®é¢æ°ã®äœçœ®ã¯ããããŒã¹ãã®å Žåãšããã»ã©å€ãããŸããã

ã€ãŸãããããŒã¹ãã®å Žåã®ããã«ãDSBã§äœããã®åé¡ãåŠçããŠããŸãã DSB_MISS_PS 2600M察1800Mã®ã«ãŠã³ã¿ãŒã§ã¯ãããã«æªãDSBããã©ãŒãã³ã¹ã瀺ãããŸãã ããã«éèŠãªã®ã¯ãã«ãŠã³ã¿ãŒIDQ_UOPS_NOT_DELIVEREDãCYCLES_0_UOPS_DELIVãæ¯èŒããããšã§ãïŒ330M察4100Mã æåŸã«ãç§ãã¡ã«ãšã£ãŠæ¬åœã«éèŠãªã®ã¯ãããã¯ãšã³ãããã³ãŒãããããã€ã¯ãåœä»€ã§æºããããããã«ããããšã§ãã
ããŒã¹ãããã¯ãæŽåããŠããå ŽåïŒ

èå³æ·±ãã®ã¯ãDSBã®äœ¿çšçãé«ãããšãšãé ä¿¡ããããã€ã¯ãåœä»€ããªãã£ã察çã®æ°ãå°ãªãããšã§ãã ç¹å®ã®ã«ãŠã³ã¿ãŒå€ãå«ã以äžã®è¡šãã芧ãã ããã
䜿çšæžã¿ããã©ãŒãã³ã¹ã«ãŠã³ã¿ãŒ

ãããŠããã®è¡šã®åèŠåºãã®èª¬æã¯æ¬¡ã®ãšããã§ãã
FRONTEND_RETIRED.DSB_MISS_PS -DSBïŒãã³ãŒãã¹ããªãŒã ãããã¡ãŒïŒã§æ€çŽ¢ãã¹ãçºçããåœä»€ãã«ãŠã³ãããŸã
DSB2MITE_SWITCHES.PENALTY_CYCLES -DSBãšMITEãåãæ¿ããéã®ããã«ãã£æž¬å®å€ãã«ãŠã³ãããŸãã å¿ èŠãªæ瀺ããªããMITEã䜿çšããªããã°ãªããªãã£ãDSBãžã®ã¢ããŒã«ã¯ãææªã®å ŽåãIDQã«ãã€ã¯ãæäœã転éãããªãæ倧6ã¯ããã¯ãµã€ã¯ã«ãããå¯èœæ§ããããŸãã ååãšããŠãããã«ã¯æ倧2ã€ã®æ段ãå¿ èŠã§ãã
IDQ.ALL_DSB_CYCLES_4_UOPS-ãã³ãŒãã¹ããªãŒã ãããã¡ãŒïŒDSBïŒããåœä»€ãã³ãŒããã¥ãŒïŒIDQïŒã«æ£ç¢ºã«4ã€ã®ãã€ã¯ãåœä»€ãé ä¿¡ãããã¡ãžã£ãŒã®æ°ãã«ãŠã³ãããŸã
IDQ.ALL_DSB_CYCLES_ANY_UOPS-ãã³ãŒãã¹ããªãŒã ãããã¡ãŒïŒDSBïŒããåœä»€ãã³ãŒããã¥ãŒïŒIDQïŒã«ãã€ã¯ãåœä»€ãé ä¿¡ãããã¡ãžã£ãŒã®æ°ãã«ãŠã³ãããŸã
IDQ_UOPS_NOT_DELIVERED.CORE-åã¹ããªãŒã ã®ãªãœãŒã¹å²ãåœãŠããŒãã«ïŒRATïŒã«é ä¿¡ãããªããã€ã¯ããªãã¬ãŒã·ã§ã³ã®æ°ãã«ãŠã³ãããåœä»€ãã³ãŒããã¥ãŒïŒIDQïŒããªãœãŒã¹å²ãåœãŠããŒãã«ïŒRATïŒã«xåã®ãã€ã¯ããªãã¬ãŒã·ã§ã³ãé ä¿¡ãããšãã«ã4ããè¿œå ããŸãïŒxã¯ã»ãã{0 ã1,2,3}ïŒ
IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE-åã¹ããªãŒã ã«ã€ããŠããã€ã¯ããªãã¬ãŒã·ã§ã³ããªãœãŒã¹å²ãåœãŠããŒãã«ïŒRATïŒã«é ä¿¡ãããªãã£ãã¡ãžã£ãŒã®æ°ãã«ãŠã³ãããŸãã IDQ_Uops_Not_Delivered.core = 4ã
èŠå
ãã®ç¹å®ã®ã±ãŒã¹ã§ã¯ãããšãã°ãå埩åæ°ã1024ã«å¢ãããšããããã®ã¢ã©ã€ã¡ã³ãã®åé¡ã¯ãã¹ãŠãªããªããŸãããã®æç¹ã§ãã«ãŒãæ€åºåšïŒLSDïŒãæ©èœããŸãã 圌ã¯ãç§ãã¡ã埪ç°ããŠããããšãç解ããåãæ瀺ãäœåºŠãç¹°ãè¿ããŸãã 次ã«ãã¡ã¢ãªããã®åœä»€ã®èªã¿åããçŠæ¢ããå éšãããã¡ããå®è¡ãéå§ããŸãã ãã®æç¹ã§ãåœä»€ãã¡ã¢ãªå ã§ã©ã®ããã«é 眮ããæŽåããããã¯å®å šã«ç¡é¢ä¿ã«ãªããŸãã
å¥ã®èå³æ·±ãäŸãšããŠã ãŽãŒã«ããªã³ã«ãŒã䜿çšãããšãã«ããã©ãŒãã³ã¹ãããã«10ïŒ äœäžããããšããããŸãã ããã¯ããããäœããã®çç±ã§æªãããã§ã¯ãªããã³ãŒãã®ã¢ã©ã€ã¡ã³ãã®ããã§ãã
åžžã«ã³ãŒããæããªãã®ã¯ãªãã§ããïŒ
ã¢ã©ã€ã¡ã³ããšã¯ãã³ã³ãã€ã©ãŒãã³ãŒãã«NOPåœä»€ãæ¿å ¥ããããšãæå³ããŸãã ããã«ããããã€ããªã®ãµã€ãºã倧ãããªãããããã®NOPåœä»€ãäžè¬çã«äœ¿çšãããã«ãŒãã«é¥ããšãããã©ãŒãã³ã¹ãäœäžããå¯èœæ§ããããŸãã NOPåœä»€ã®å®è¡ã¯å®å šã«ç¡æã§ã¯ãããŸãããã¡ã¢ãªããèªã¿åãããã³ãŒãããå¿ èŠããããŸãã
çµè«
ã芧ã®ãšããããã®ãããªå°éã®ã³ãŒãã§ãé£ããå ŽåããããŸãã ç§ãã¡å šå¡ããã€ã¯ãããã»ããµã¢ãŒããã¯ãã£ã®å°é家ã§ããå¿ èŠã¯ãªããšæããŸãããå°ãªããšããã®ãããªåé¡ãååšããå¯èœæ§ãããããšãç¥ã£ãŠããå¿ èŠããããŸãã äžåºŠæž¬å®ãããé¢æ°ã®ããã©ãŒãã³ã¹ã¯ããã®é¢æ°ã®ã³ãŒããå€æŽããªããŠãå°æ¥å€æŽãããå¯èœæ§ãããããšã«æ³šæããŠãã ããã ãããéèŠãªãã€ã³ãã§ããå Žå-è¿œå ã®ããã©ãŒãã³ã¹æž¬å®ãè¡ã£ãŠããã®èšäºã§èª¬æããåé¡ãšåæ§ã®åé¡ãç¹å®ããããšãå¿ããªãã§ãã ããã