ããã«ãæããããå°ãªããšãæå°éã®è€éã§æ©èœçãªã·ã¹ãã ïŒãããã«ããŠããç§ãéè¡æ¥çã§8幎ã®ä»äºã§å人çã«äŒã£ãã·ã¹ãã ïŒã¯éåžžç°è³ªã§ã-ãããã¯å€åœ©ãªãã«ãã®ãããªå€ãã®æ©èœãããã¯ã§æ§æãããåãããã¯ãå€ãã®å Žåç°ãªãããŒããŠã§ã¢ãã©ãããã©ãŒã äžã§ãã£ãŠããç°ãªãã¢ããªã±ãŒã·ã§ã³ã«ãã£ãŠå®è¡ãããŸãã ãªãã§ïŒ ã¯ããããã¯åççã§äŸ¿å©ã§ãã å補åã¯ãã®åéã§åªããŠããŸãã ããšãã°ãçµæžåŠè ã¯Excelã䜿çšããŠããŒã¿ãåæããã³èŠèŠåããããšã奜ã¿ãŸãã ãããããã®ããã°ã©ã ã䜿çšããŠæ·±å»ãªäººå·¥ãã¥ãŒã©ã«ãããã¯ãŒã¯ããã¬ãŒãã³ã°ãããã埮åæ¹çšåŒããªã¢ã«ã¿ã€ã ã§è§£æ±ºãããããããšãèãã人ã¯ã»ãšãã©ããŸããããã®ãããæè»ãªAPIãæäŸãã匷åãªãŠãããŒãµã«ããã±ãŒãžããã°ãã°è³Œå ¥ãããïŒãŸãã¯æ¢ã«äŒç€Ÿã«ãã£ãŠååŸãããŠããïŒãã泚æçšã«åå¥ã®ã¢ãžã¥ãŒã«ãäœæãããŸãã ãã®ãããåãMatlabã§çµæãæ€èšããïŒLinuxã¯ã©ã¹ã¿ãŒã§å®è¡ãããŠããïŒOracle DBMSããŒãã«ã«ä¿åããWindowsã§OLEãµãŒããŒãšããŠå®è¡ãããExcelã§ãŠãŒã¶ãŒã«ã¬ããŒãã衚瀺ããæ¹ãããæçã§ããããšãããããŸãã ããã«ããããã®ã³ã³ããŒãã³ãã¯ãã¹ãŠããŠãããŒãµã«ããã°ã©ãã³ã°èšèªã®1ã€ã«ãã£ãŠæ¥ç¶ãããŠããŸãã
ç¹å®ã®ã¿ã¹ã¯ã«æé©ãªå®è£ ç°å¢ãéžæããæ¹æ³ã¯ïŒ ããªãã¯åŠ¥åã«çŽé¢ããŠããŸãïŒããã€ãã®ããŒã«ãã©ã€ãã©ãªã¯ããªãã«ãã銎æã¿ããããä»ã®ãã®ã¯ããå€ãã®æ©èœãæã£ãŠããŸãïŒäŸãã°ããããã¯OOPããµããŒãããŸãïŒãä»ã®ãã®ã¯å®è¡é床ã«å©ç¹ããããŸãïŒäŸãã°ãC ++ã®ãããªSSEãã¯ãã«åã䜿çšããŸãïŒã4ã€ç®ã¯é«ãéçºé床ãæäŸããŸãïŒäŸãã°ã Visual BasicïŒã ã³ã³ãã€ã©ãã³ã³ãã¥ãŒã¿ãŒæ°åŠã·ã¹ãã ããªãã£ã¹ã¹ã€ãŒããããŒããŠã§ã¢ãã¯ãããžãŒïŒx86-x64ãCellãGPGPUïŒãããã³ãããã®å ±åäœæ¥ãçµç¹åããæ段ïŒãããã¯ãŒã¯ãã¯ã©ã¹ã¿ãŒãã¯ã©ãŠãã³ã³ãã¥ãŒãã£ã³ã°ïŒã®äž¡æ¹ã®ãœãããŠã§ã¢ããŒã«ãå°è±¡çãªéãä»æ¥åžå Žã§æäŸãããŠããŸãã
ããã«ãã³ã³ãã¥ãŒãã£ã³ã°ãªãœãŒã¹ã®äœ¿çšã®åŸåïŒäžŠåã³ã³ãã¥ãŒãã£ã³ã°ãžã®å€§é移è¡ãã¢ããªã±ãŒã·ã§ã³ãšãµãŒããŒã®ä»®æ³åïŒãã³ã³ãã¥ãŒãã£ã³ã°ãã¯ãŒãæäŸããæ°ããã¢ãã«ïŒAmazon EC2ãªã©ïŒããã©ãŒã«ããã¬ã©ã³ã¹ã確ä¿ããæ°ããæ¹æ³ãã©ã€ã»ã³ã¹ã®åŸ®åŠãªéããè¿œå ãããšãèšå€§ãªæ°ã®çµã¿åãããåŸãããŸãã æé«ã®éžææ¹æ³ã¯ïŒ
äž»ãªæšå¥šäºé ã¯ãæåã«éçºé床ãæãéãç°å¢ã§ãããã¿ã€ããäœæãã次ã«ããæéã®ããããªãã·ã§ã³ã«ç§»åããŠããã®ãµãã¿ã¹ã¯ã解決ããæéããªããªããŸã§ããã©ãŒãã³ã¹ã¡ããªãã¯ã®æ¹åïŒããšãã°ãå®è¡æéã®ççž®ïŒãè©Šã¿ãããšã§ãã åæã«ãæå³çã«ã¹ã±ãŒã©ãã«ã§ãªããœãªã¥ãŒã·ã§ã³ã«ãšãã«ã®ãŒã浪費ããŠã¯ãããŸããã æ ããŠã¯ãããŸãããã³ãŒããæžããŠãã©ã€ããã§ãã¹ãããæšæž¬ã確èªããŠãã ããã çè«çãªç¥èãšç¹çŽ°ãªçŽèŠ³ã¯åé¡ãããŸããããå°ããªçç£åé¡ã解決ããç§ã®çµéšã§ãããããã«ãçŸå®ã¯ç§ãã¡ã®ã¢ãã«ãã¢ã€ãã¢ãšã¯ç°ãªãããšãéåžžã«å€ããããŸãã
å æ¥ãç§ã¯åçŽãªã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãæé©åããå¿ èŠããããŸããã1ãã2åãŸã§ã®ãã¹ãŠã®æŽæ°ã®æ£åŒŠã®åèšãæ°ããããšã§ãã å®éãå®éã®ã¢ã«ãŽãªãºã ã¯å€å°ç°ãªã£ãŠããŸããããåæ§ã®ã¯ã©ã¹ã®åé¡ã解決ããããã®æé©ãªããŒã«ãæ¢ããŠããŸãããåæ段éã§åãããã¿ã€ãã«å®å šãªã€ã³ãã©ã¹ãã©ã¯ãã£ãæ§ç¯ããããšã¯çã«ããªã£ãŠããªããããã¢ã«ãŽãªãºã ãéçãŸã§åçŽåããŸããã ãªããã®çš®ã®ã¢ã«ãŽãªãºã ãå¿ èŠãªã®ãçåã«æã£ãŠãããªããäžè§é¢æ°ã人工ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®ãããå€æŽ»æ§åé¢æ°ãšããŠäœ¿çšã§ããããšãæãåºããŸãã ãã¿ãã¬ïŒç 究ã®çµæã¯ç§ã«ãšã£ãŠå®å šã«äºæ³å€ã§ããïŒ
åé¡æïŒ
ã¯ãŒã¯ãã³ãïŒãã¥ã¢ã«Xeon E5 2670 @ 2.6 Ghzãã¢ã€ãã«æã®CPUåšæ³¢æ°ãäžããçãšããã¯ãããžãŒããã€ããŒãã¬ããæ©èœãåãã2x8ç©çã³ã¢ãWindows Server 2008R2ãã©ãããã©ãŒã ã®128 Gb DDR3-1600ã¡ã¢ãª
éå§æã®å粟床ïŒå粟床ãã€ãŸãæšæºã®x86ã³ããã»ããµãŒãã©ãŒãããã®1ã€ïŒã¯ãç§ãã¡ãå®å šã«æºè¶³ãããŸãã ç°ãªãã³ã³ãã€ã©ããããŠç°ãªãã³ã³ãã¥ãŒãã£ã³ã°ã¢ãŒããã¯ãã£ãããµã€ã³ã®åèšãèšç®ããã¿ã¹ã¯ã«ã©ã®ããã«å¯ŸåŠããããèŠãŠã¿ãŸãããã ã¬ãŒã¹ã®åå è ïŒMaple 12ãMaple 17ãMatLab R2013aãVisual Basic 6.0ãVisual Basic.NETãããã³Visual C ++ 2012ïŒäžè¬çã«ãæå ã«ãã人ïŒã ãã¹ãŠã®æé枬å®ã¯åèµ·ååŸã«è¡ãããå¹³åæéã«å¯Ÿå¿ããŠããŸãã
ç§ã¯ããã¹ãæ¹æ³è«ãæãå³æ Œã§ã¯ãªãããšãç¥ã£ãŠããŸãã1ã€ã®ã¿ã€ãã®ããã»ããµã1ã€ã®OSãããã©ãŒãã³ã¹ã枬å®ããç°¡åãªæ¹æ³ã ãã§ãã ãã ããç§åŠèšäºã¯ãããŸããããããã£ãŠãæãèå³æ·±ãäºå®ã«éå®ããŸãã ã³ã³ããŒãã³ãééä¿¡ã®æ§æã®è©³çŽ°ã«ã¯è§ŠããŸããïŒååãšããŠãWinãCOMã³ã³ããŒãã³ããéåžžã®dllãå ±æã¡ã¢ãªã¢ã¯ã»ã¹ã§ååã§ãïŒãã©ã®çµã¿åããã䜿çšãããšãç®çã®çµæããã°ããèšç®ã§ãããã確èªããã ãã§ãã æåã«ãã©ã®ããŒã«ãæéã®ã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãçæããããèŠã€ããããã䞊ååããŸãã
Mapleããå§ããŸãããã
st := time():evalhf(sum(sin(i),i=1..200000000));time() - st;
Maple 12ã®çµæïŒ
54.304ç§ã§1.25023042417602160
Maple 17ã®çµæïŒ
19.656ç§ã§1.25023042417610020
äžèŠãMapleSoftãšã³ãžãã¢ã®ãã°ãããä»äºã 補åã®ããŒãžã§ã³ããšã«ã©ã³ã¿ã€ã ã75ïŒ æ¹åãããŸããã 17çªç®ã®ããŒãžã§ã³ã§ããã®æé ãã³ã³ãã€ã«ã§ãããã©ããèŠãŠã¿ãŸãããã ãã€ãŒãã³ãŒã«
cp:=Compiler:-Compile(proc(j::integer)::float;local i::integer: evalhf(sum(sin(i),i=1..j)) end proc:):
äœããã®çç±ã§ãã©ããªç¶æ³ã§ããŒããçæããããã·ãŒãžã£ãååŸããŸãã ç§ãã¡ã¯ä»£æ¿æ¡ãè©Šã¿ãŸã-æ瀺çãªãµã€ã¯ã«ã
p2 := proc(j::integer)::float;local i::integer,res::float;res:=0; for i from 1 to j do: res:=res+evalhf(sin(i)):end do; res end proc:
ã³ã³ãã€ã«ããã«p2ãå®è¡ãããšãçµæã¯åŸ ã€ããšãã§ããŸããïŒ å°ãªããšã10ååŸ ã£ãŠãããããŸããã ã©ããããMapleã©ã³ã¿ã€ã ã§ã¯ãsumé¢æ°ã¯ã«ãŒããšæ¯èŒããŠå€§å¹ ã«æé©åãããŠããŸãã ããã
cp2:=Compiler:-Compile(p2): st := time():cp2(200000000);time() - st;
Maple 17ã§ã¯ãåªããçµæãåŸãããŸãã
9.360ç§ã§1.25023042417610020ïŒ
ã·ã¹ãã ã®ãããªãã¯ãã«ãé¢ããããæ°žç¶æ§ãšåµæ工倫ã瀺ããªãããåªããããã©ãŒãã³ã¹ã®åäžãåŸãããšãã§ããŸãã:-)
Microsoft Visual Studio 2012ãžã®ç§»è¡
ç§ã¯ãã³ã³ãã€ã«ãããã³ãŒãã®ã²ã©ãé ãã§ããã€ãŠNetãã©ãããã©ãŒã ã«æ·±ã倱æããŠããããšãããèŠããŠããŸãã ç§ã®ãã¹ãã±ãŒã¹ïŒã³ã³ãã€ã«æžã¿ïŒïŒVB.Net 2003ã§ã¯ãVB6ã³ãŒãã®çŽ8åã®é床ã§å®è¡ããããããVB.Net 2012ã®ãããžã§ã¯ããæ§ç¯ãããšãã«å¹»æ³ã¯ãããŸããã§ããã
Public Class Form1 Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load Dim i As Long, res As Double, tm As DateTime tm = Now For i = 1 To 200000000 res = res + Math.Sin(i) Next TextBox1.Text = res & vbNewLine & Now.Subtract(tm).TotalSeconds.ToString("0.000") End Sub End Class
çµå±ã®ãšãããç¡é§ã«ïŒ
VB.Net 2012ã®çµæïŒ
11.980ç§ã§1.25023042417527ãæªããªãïŒ
ãã¡ãããæé©åVisual C ++ã³ã³ãã€ã©ã«äž»ãªæåŸ ãå²ãåœãŠãŸããã次ã®æé ãNetã¹ã©ã°ãªãã§ãã€ãã£ãå®è¡å¯èœã³ãŒãã«ã³ã³ãã€ã«ããã€ããã§ãã
#include <iostream> #include <windows.h> using namespace std; int main() { double res=0.0; int dw = GetTickCount(); for (int i = 1; i <= 200000000; i++) res+=sin(i); cout.precision(20); cout << "Result: " << res << " after " << (GetTickCount()-dw); }
䜿çšãããããŒã¯/ O2ããã³/ Otã§ãã
VC ++ã®çµæïŒNet 2012ïŒ
ãŸã£ããå°è±¡çã§ã¯ãªãïŒ11.404ç§ã§1.2502304241761002ã
å€ãåŠæ ¡ã®VB6ã®çª
ã³ãŒãã¯éåžžã«ã·ã³ãã«ã«èŠããŸãïŒ
res = 0: for i = 1 To 200000000: res = res + Sin(i) : Next i
æ倧ã®æé©åã§ã³ã³ãã€ã«ããŸãïŒ
ãããŠãVB6ã«çŽé¢ããŠãã£ã³ããªã³ãç²åŸããŸãã
çµæïŒ1.25023042417543ã䜿çšæéïŒ0:00:09ïŒ9.092153ïŒ
ã§ã¯ã12幎åã®è£œåããã¹ãŠã®ç¹ã§æ°ããã³ã³ãã€ã©ãŒãã©ã®ããã«æå ¥ããŠãããã説æããŠãã ããã MSã¯å£åããŠããããã§ãããããã¯å¥ã®äŒè©±ã®ãããã¯ã§ãã éã¢ã»ã³ãã©ãŒã®ãªã¹ããèŠããšãVC ++ã³ã³ãã€ã©ãŒãã³ãŒããæ£çŽã«ãã¯ãã«åããããšãããããŸãã
確ãã«ãSSE2ã§ã®ãã®ãµã€ã³ã®å®è£ ã®è©³çŽ°ã¯ããããŸããããVisual Basic 6ã§äœ¿çšãããŠããFPUã¹ã¿ãã¯ã®åäžãªãã©ã³ãã³ãã³ãã倱ãããå Žåãäœãè¯ãã§ããããïŒ ãã®ãããªäžå¿«ãªçºèŠã®åŸãç§å¯ããäŒãããŸããã€ã³ãã«Parallel Studio XE 2013ã®è©ŠçšçãããŠã³ããŒãããQxSSE4.2ããŒã䜿çšããŠã€ã³ãã«C ++ 13ã³ã³ãã€ã©ãŒã§ãããžã§ã¯ããåæ§ç¯ãããã®åŸ13ã15ç§ã§çµæãããã«æªåããŸããã ãã®ãããªçµæã®åŸãã¯ã€ã«ããªæèãçããŸãã-ç§ã®ãµãŒããŒã®ããã»ããµãŒã§ãã¹ãŠãæ£åžžã«åäœããŠãããšã¯éããŸãããïŒ VB6ãšVC ++ 2012ã®æ¯èŒãã2ã€ã®ã³ã¢ãæã€å€ãCore Duo 6600ãšããå¥ã®ãã·ã³ã§éå§ããŸããã SSEããŒãžã§ã³ã®é 延ã¯ããã«å€§ãããªããŸãã å¯äžã®è«ççãªèª¬æã¯ãã³ã¢ã¢ãŒããã¯ãã£ããå§ãŸã£ãŠãIntelãšã³ãžãã¢ã¯SSEãšæ¯èŒããŠFPUæäœã®ããã©ãŒãã³ã¹ãå€§å¹ ã«æ¹åããMicrosoftããã³Intelã³ã³ãã€ã©ã®éçºè ã¯ãã®äºå®ã倱ã£ããšããããšã§ãã
ã¡ãªã¿ã«ãäžã®å³ã§ã¯ããããã¹ãããã¢ãŒãã®Vtune Amplifierãããã¡ã€ã©ãŒãSSEåœä»€ã®ã¿ã€ãã³ã°èšç®ã«å¯Ÿå¿ã§ããªãããšãããããŸããã ããªãã圌ãä¿¡ããŠãããªããç§ã®ã³ãŒãã§æãæéã®ãããæäœã¯ã«ãŒãã«ãŠã³ã¿ãŒãå¢ããããšã§ãïŒ åŠè¡çãªé¢å¿ãããIntelããã±ãŒãžãã€ã³ã¹ããŒã«ãããŠããã䞊ååã§ããã³ãŒãå ã®å Žæã衚瀺ããããã«èšèšãããAdvisor XE 2013補åãä»ããŠã¢ãžã¥ãŒã«ãå®è¡ããŸããã ããŒã¿äŸåæ§ã®ãªãçããµã€ã¯ã«ã§ã䞊åå®è¡ã®ããã®å ±æãªãœãŒã¹PERFECTããªããã°ããã®è£œåã¯ãã®ãããªå ŽæãèŠã€ããããšãã§ããŸããã§ããã ããŠãããè€éãªã±ãŒã¹ã§ãã®ããã°ã©ã ãä¿¡é Œããæ¹æ³ã¯ïŒ åºãç¥ãããŠããããç·Žç¿ã«ã¯é©ããŠããªããã®ã®ãIntelããã°ã©ããŒã補åããªãªãŒã¹ããŠãããšããæèãé«ãŸã£ãŠããŸãïŒLarrabeeãšKnights Cornerã®ããŠã¢ãã³ã·ã®çºè¡šãšåçºè¡šãæãåºããšãããã°ã©ããŒã ãã§ãªãïŒã ãŸãããŸã MatlabããããŸãã
Matlabã§ã®å®éšã¯ããããé©ã
ã¿ã€ãã³ã°
tic;sum(sin(1:200000000));toc;
ã»ãšãã©ããã«çµæã衚瀺ãããŸãã
1.250230424175050ãçµéæéã¯3.621641ç§ã§ãã
ãŸãããŸãïŒ è³¢ãMatlabã¯ããã®åŒãããŒã«ã«ãã·ã³äžã®ç©çã³ã¢ã®æ°ã§ããã«äžŠååããŠãå šè² è·ã確ä¿ããããšãå€æããŸããã ããã§ããã çµå±ã®ãšãããç§ã¯åŒ·åãªããŒããŠã§ã¢ã®ä»£éãæ¯æã£ãã®ã§ããœãããŠã§ã¢ïŒãšã«ããã決ããŠå®ãã¯ãªãïŒã100ïŒ äœ¿çšããããšæã£ãŠããŸãã MathWorksã®ãšã³ãžãã¢ãå°éããŠãã ããïŒ
ã³ã³ãã€ã«ããŠã¿ãŸããããïŒ ãããè¡ãã«ã¯ãåŒãmãã¡ã€ã«ã«è»¢éããŸãã ãããŠãå¥ã®é©ããç§ãã¡ãåŸ ã£ãŠããŸãã ã³ã³ãã€ã«ãããŠããªãããã¯; FastSumïŒ200000000ïŒ; toc; ãã§ã«5.607512ç§ã®çµéæéãäžããŸãã 誰ããã§ã«ããã«ééããŸããããåé¡ã¯äœã§ããïŒ ç§ã«ãšã£ãŠãããã¯è¬ã§ãã deploytoolã³ãã³ãã®ãã«ãã䜿çšããŠãäžå®ã®æéåŸ æ©ãããšãMatlabã¯0.5 MBã®å·šå€§ãªå®è¡å¯èœãã¡ã€ã«ãäœæããŸãã ã¯ããã³ã³ãã€ã©ã®éçºããŒã ã«ã¯ä»ã«åãçµãã¹ãããšããããŸã-æ¯èŒã®ããã«ãVC ++ã®ãµã€ãºã¯46KbãVB.Netã¯30KbãVb6ã¯36Kbã§ãã ããããMatlabã³ã³ãã€ã«æžã¿å®è¡å¯èœãã¡ã€ã«ã¯äœãæäŸããŸããïŒ
1.2502ãçµéæéã¯10.716620ç§ã§ãã
ã芧ã®ããã«ãã³ã³ãã€ã«ãããããŒãžã§ã³ã§ã¯ãäœããã®çç±ã§ã«ãŒãã®èªå䞊ååãæ¶ããŸãã ç§ã®å¿ã¯ãäŒç€ŸãParallel Computing Toolboxã®ããã«äœåãªãéãæãã§ããããšãæããŠãããŸã:-)
ãããã«ãããã·ã³ã°ã«ã¹ã¬ããã«ããVB6ãªãŒããŒãååšããããããã®ç°å¢ã§åçŽãªActive ExeãµãŒããŒãäœæããŸãããããã«ãããç¹å®ã®ã¹ã¬ããæ°ã§èšç®ã䞊ååã§ããŸãã
ã¯ãŒã¯ãããŒã®æ°ã1ãã32ã«å¢ããããšã§ãœãªã¥ãŒã·ã§ã³ãã©ãã ãããŸãã¹ã±ãŒãªã³ã°ããããèŠãŠã¿ãŸããããæ£çŽã«èšããšãHTã®ä»®æ³ã³ã¢ãã©ããããªãœãŒã¹ãååŸãããã«ãã£ãŠãç©çããã»ããµã³ã¢ã®æ°ãäžéã«éãããŸã§ããã©ãŒãã³ã¹ãåäžããããšã確信ããŠããŸããFPUãã€ãã©ã€ã³ããã§ã«å®å šã«å æãããŠããå Žåã®æµ®åå°æ°ç¹
ããã«ããããããã14åã®è«çã³ã¢ãè¿œå ã§å«ãŸããŠãããããå®è¡æéã20ïŒ ççž®ã§ããŸãããtïŒ16ïŒ= 0.803ç§ããã tïŒ30ïŒ= 0.645 ããã»ããµãæåã«çé»åã¢ãŒãã«ãªã£ãŠããªãå Žåãããããçµæã¯ããå°è±¡çã§ãã0.6ç§ã§ãã¯ããã¯åšæ³¢æ°ãæ倧ã«äžããæéããªãããã«èŠããããã§ãã
GPGPU
ããŠãç¹å®ã®ã±ãŒã¹ã§äž»æµã®æ§æã«æé©ãªãœãªã¥ãŒã·ã§ã³ãèŠã€ããŸããã ããããGPGPUïŒãŠãããŒãµã«ã³ã³ãã¥ãŒãã£ã³ã°ãå®è¡ã§ããã°ã©ãã£ãã¯ã«ãŒãïŒãå¿ããªãã§ãã ãããæè¿ã§ã¯ãµãŒããŒãå¢ãããã¹ãŠã®ããŒã ã³ã³ãã¥ãŒã¿ãŒãšæ°ããã©ãããããã«ã¯ã»ãŒç¢ºå®ã«è£ åãããŠããŸãã ç§ã®ã¹ã¿ã³ããµãŒããŒãäŸå€ã§ã¯ãããŸããã§ãããç¹ã«ãã«ãã¹ã¬ããã³ã³ãã¥ãŒãã£ã³ã°çšã«ãCUDAãã¯ãããžãŒããµããŒãããã Fermiã¢ãŒããã¯ãã£ãåãããã¥ã¢ã«ãããGTX 590ã°ã©ãã£ãã¯ã¹ã«ãŒãã§ããå ãã©ãã°ã·ããã®Nvidiaãè³Œå ¥ãããŸããã
äžè¬çã«ãç§ã¯Nvidiaãéåžžã«å°æ¬ããŠãããšèšããªããã°ãªããŸããã 第äžã«ãããã¯ãããã倧èŠæš¡ïŒäŒè°ãã»ãããŒãã€ãã³ãããœãããŠã§ã¢ãšã¢ãŒããã¯ãã£ã®ç©æ¥µçãªéçºãšæ¹åïŒã«é«äžŠåã³ã³ãã¥ãŒãã£ã³ã°ãå®éã«ä¿é²ããå¯äžã®äŒæ¥ã§ããããã第äºã«ãããŒããŠã§ã¢ãã§ãããæ°ããé«æ§èœã³ã³ãã¥ãŒãã£ã³ã°ã»ã¯ã¿ãŒã§äž»å°æš©ãæ¡ããŸãã ã¯ããAMDïŒATIïŒãœãªã¥ãŒã·ã§ã³ã¯ãã匷åã§ãAMDã«ãŒãã®ã®ã¬ããããã¯ããå€ããªãå¯èœæ§ããããŸãããFireStreamåãã®éçºãéå§ããŠãã ãã-AMDã®Webãµã€ãã«ã¯ããã¯ãããžãŒã®èª¬æè³æããããããã説æã¯ãããŸããã AMDããã°ã©ããŒ/ããŒã±ãã£ã³ã°æ åœè /å¹¹éšã¯ãåã«æèœã®ããATIãšã³ãžãã¢ã®ä»äºãåããŠããããã«èŠããŸãã ãããã£ãŠããããŸã§ã®éžæã¯CUDAã§ãïŒ ãšããã§ãCUDA 5ãšVisual Studio 2012ã®çµ±åã«åé¡ãããå Žåã¯ã ãã®èšäºã®æšå¥šäºé ã䜿çšã§ããŸãã
ããã§ã¯ãå¥è·¡ã®ããã€ã¹ã«ã¯ã©ã®ãããªãªãœãŒã¹ããããŸããïŒ
ã芧ã®ãšãããçè«çã«ã¯ã2ã€ã®GPUããã€ã¹ã§16 * 1536 * 2 = 49152ã¹ã¬ããã§èšç®ãå®è¡ã§ããŸãã å®éããã¹ãŠãããã»ã©ãã©è²ã§ã¯ãããŸãã-Fermiã®æ£åŒŠã¯ç¹æ®æ©èœãŠãããã§ã«ãŠã³ãããããã®ãã¡4ã€ã¯ãã«ãããã»ããµïŒSMïŒäžã«ãããŸãã åæã«èšç®ã§ããåèšå€ã¯ã16 * 4 * 2 = 128以äžã§ãïŒçè«çã«ãïŒã
ç¹ã«CUDAã®æé©åã®åŸ®åŠãªç¹ã«ã€ããŠã¯ãããã«CUDAã¢ãŒããã¯ãã£ã®è©³çŽ°ã«ç«ã¡å ¥ããããããŸãããããã¯ãããèªäœãç§åŠãšèžè¡ã®äž¡æ¹ã§ãã ãããã£ãŠãäœã¬ãã«ã®CUDA Cã«å ããŠãCUDAã¢ãã«ã®é«ã¬ãã«ã®æœè±¡åã«ããããã°ã©ããŒã®çç£æ§ãåäžãããããã«èšèšãããã Thrustã©ã€ãã©ãªãŒã®ç°¡åãªãããã¿ã€ãããå§ããŸãããã
äœæè ã«ãããšãThrustã®é åã¯ãéçºè ãç®è¡æŒç®ããœãŒããããããçš®é¡ã®åæžãªã©ã®ããªããã£ããåå¥ã«å®è£ ããæéããã¯ãå¿ èŠãšããªãããšã§ãïŒãããããCUDAããã¥ã¢ã«ã«ç²ŸéããåŸãã»ãšãã©ã®ããã°ã©ããŒã®èªåœã«ç¢ºå®ã«å ¥åãããçšèªïŒ ãæ€çŽ¢ãé åã®åç·šæãããã³ãã®ä»ã®ã¿ã€ãã®æäœã ããã«ããã®ã©ã€ãã©ãªã¯æé©ãªèµ·åæ§æãåå¥ã«æ±ºå®ãã以åã®ããã«ãããã¯æ°ãšGPUãããŒã®æé©ãªæ¯çã決å®ããããšã«éäžããå¿ èŠã¯ãããŸãã...
Thrustã¯ãã¹ãŠã®CUDAäºæããã€ã¹ãèªåçã«ã¢ã¯ãã£ãã«ããããã§ã¯ãããŸããããæéã®1ã€ãéžæããŸããããããã«ããŠããNvidiaã®ãã³ããŒã¯ã©ãã·ã£ãŒã§ç¡æ¡ä»¶ã®åå©ãæåŸ ããŠããŸããããå粟床èšç®ã®ããã©ãŒãã³ã¹ã®äœäžã«ã€ããŠã¯ç¥ã£ãŠããŸããã çµå±ã®ãšããã2åã®æ£åŒŠãèšç®ããããšã¯1ã€ã®ããšã§ããããã®é åå šäœã1ã€ã®åèšå€ã«å¹æçã«åæžããå¿ èŠããããŸãã
匷åãªtransform_reduceé¢æ°ã䜿çšããŸããããã«ããã1ã€ã®è«çã¹ãããã§é åã®èŠçŽ ãå€æããã³åèšã§ããŸãã ç¹å¥ãªãã¡ã³ã¯ã¿ãŒsin_opãäœæããŠã¿ãŸãããã ã³ãŒãã¯éåžžã«ç°¡åã§ãã
#include <thrust/transform_reduce.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/functional.h> #include <thrust/sequence.h> #include <windows.h> using namespace std; template <typename T> struct sin_op { __host__ __device__ T operator()(const T& x) const { return sin(x); } }; int main(void) { int dw = GetTickCount(); int n=10; double res=0.0; sin_op<double> tr_op; thrust::plus<double> red_op; thrust::device_vector<int> i(200000000/n); for (int j=1;j<=n;j++) { thrust::sequence(i.begin(), i.end(),200000000/n*(j-1)+1); res = thrust::transform_reduce(i.begin(), i.end(), tr_op, res, red_op); } cout.precision(20); cout << res << endl<< "Total Time: " << (GetTickCount()-dw) << endl; }
å€éšã«ãŒãã䜿çšããŠãå¿ èŠãªéã®ã¡ã¢ãªãããã€ã¹ã«ç¢ºä¿ããŸããæ®å¿µãªããšã«ãThrustã¯åžžã«åç¬ã§ãããè¡ããšã¯éããŸããã è«ççã«ã¯ãã¹ã¬ããã¯æŽæ°ã€ã³ããã¯ã¹ãèšç®ããããã«å€æãã¡ã³ã¯ã¿ãé©çšããçµæãé«éå ±æã¡ã¢ãªã«ä¿åããå¿ èŠããããŸãã ã ãããã³ã³ãã€ã«ãå®è¡ïŒ
1.704ç§ã§1.2502304241755013
倱æã«å¶éã¯ãããŸããã kakbeã®çµæã¯ãã©ã€ãã©ãªãã¢ã¯ã»ã©ã¬ãŒã¿ã«ç§ãã¡ãæ³åãããã®ãšãŸã£ããåãããšã匷å¶ããŠããªãããšã瀺åããŠããŸãã å®éã詳现ãªã¿ã€ãã³ã°ãèŠããšãã©ã€ãã©ãªã¯æåã«ããã€ã¹ã®æ¯èŒçäœéãªã¡ã€ã³ã¡ã¢ãªïŒæéã®35ïŒ ãæ¶è²»ããïŒã«ãŒãã®å·šå€§ãªé åãé 眮ããããšãã次ã«ãããã®ãŒããèªç¶æ°1,2,3 ...ïŒ40ïŒ æéïŒããŸããæ®ãã®25ïŒ ã¯æ£åŒŠã®èšç®ãšçŽæ¥å ç®ïŒãã©ã¹æŒç®åã«ããåæžãããã³é ãã¡ã€ã³ã¡ã¢ãªïŒã«é¢ä¿ããŠããŸãã
æ²ããããšã«ããã®ã©ã€ãã©ãªã«ã¯ä»®æ³ã€ãã¬ãŒã¿ïŒãã¡ã³ã·ãŒã€ãã¬ãŒã¿ïŒãããããšãæãåºããŸãã ããã¥ã¡ã³ãã調ã¹ãŸã-確ãã«ã
constant_iteratorãšcounting_iteratorã¯é åãšããŠæ©èœããŸãããå®éã«ã¯ã¡ã¢ãªã¹ãã¬ãŒãžãå¿ èŠãšããŸããã ãããã®ã€ãã¬ãŒã¿ã®1ã€ãéæ¥åç §ãããšããã®å Žã§é©åãªå€ãçæãããåŒã³åºãå ã®é¢æ°ã«è¿ãããŸããã«ãŠã³ãã€ãã¬ãŒã¿ã¯ãå»åž«ã泚æãããã®ã§ãã
#include <thrust/iterator/counting_iterator.h> #include <thrust/transform_reduce.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/functional.h> #include <thrust/sequence.h> #include <windows.h> using namespace std; template <typename T> struct sin_op { __host__ __device__ T operator()(const T& x) const { return sin(x); } }; int main(void) { int dw = GetTickCount(); double res=0.0; sin_op<double> tr_op; thrust::plus<double> red_op; thrust::counting_iterator<int> first(1); thrust::counting_iterator<int> last = first + 200000000; res = thrust::transform_reduce(first, last, tr_op, res, red_op); cout.precision(20); cout << res << endl<< "Total Time: " << (GetTickCount()-dw)<< endl; }
ãªãŒãã¿ã€ã ã¯ïŒ 圌ãã¯ãããã»ãŒååã«ããã¡ã€ã³ã¡ã¢ãªã§ã®éå¹çãªæäœãåãé€ããŸããïŒ
0.780ç§ã§1.2502304241761253
åŸ æ©äžã®ããã€ã¹ã³ã³ããã¹ããžã®ç©ºã®åŒã³åºãã§ãæéããããããšã念é ã«çœ®ããŠããå¿ èŠããããŸããå°ãªããšãä»åã¯å¹³å0.26ç§ã§ããã å Žåã«ãã£ãŠã¯ã0.52ç§ã§ãããæ°ãã©ããããã¹ã®ããŒã¯ããã©ãŒãã³ã¹ãåããè¶ äžŠåã¢ãŒããã¯ãã£ããã€ã¹ããæåŸ ãããçµæã§ã¯ãããŸããã CUDA Cã§ã³ã¢ãèªåã§èšè¿°ããŠã¿ãŸããããããã«ãããäºåçãªéèšãå®è¡ãããŸãã ããã»ã©é£ãããããŸãã...ãã®ããã«ãèšç®ãçããé·ãã®ãããã¯ã«åå²ããŸãã åãããã¯ã¯ãé«éå ±æã¡ã¢ãªå ã®èŠçŽ ã®äžŠååæžãå®è¡ãããããã¯ã€ã³ããã¯ã¹ã«çãããªãã»ããã§ãã¢ã¯ã»ã©ã¬ãŒã¿ã®ã°ããŒãã«ã¡ã¢ãªã«çµæãä¿åããŸãã
__global__ void SumOfSinuses(double *partial_res, int n) { // extern- extern __shared__ double sdata[]; // int i =blockIdx.x*blockDim.x+threadIdx.x; sdata[threadIdx.x] = (i <= n) ? sin((double)i) : 0; __syncthreads(); // for (int s=blockDim.x/2; s>0; s>>=1) { if (threadIdx.x < s) { sdata[threadIdx.x] += sdata[threadIdx.x + s]; } __syncthreads(); } // , if (threadIdx.x == 0) partial_res[blockIdx.x] = sdata[0]; }
çè«çã«ã¯ã1ã€ã®ãããã¯ã®åèšã¯ãããã€ã¹äžã§æ倧1024ã¿ãŒã ã§ãã SumOfSinusesã«ãŒãã«ãžã®æåã®åŒã³åºãã®åŸãããã€ã¹ã®ã¡ã¢ãªã«çŽ20äžã®äžéé ããããŸããããã¯ãthrust :: reduceïŒãžã®1åã®åŒã³åºãã§ç°¡åã«è¿œå ã§ããŸãã
int main(void) { int dw = GetTickCount(); int N=200000000+1; cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, 0); double *partial_res; int rest=N;int i=0;double res=0; int threads_per_block=1024;//deviceProp.maxThreadsPerBlock; int max_ind=deviceProp.maxGridSize[0] * threads_per_block; checkCudaErrors(cudaMalloc(&partial_res, max_ind/threads_per_block*sizeof(double))); thrust::device_ptr<double> arr_ptr(partial_res); do { int num_blocks=min((min(rest,max_ind) % threads_per_block==0) ? min(rest,max_ind)/threads_per_block : min(rest,max_ind)/threads_per_block+1,deviceProp.maxGridSize[0]); SumOfSinuses<<<num_blocks,threads_per_block,threads_per_block*sizeof(double)>>>(partial_res,i*max_ind,N); checkCudaErrors(cudaDeviceSynchronize()); // thrust- , res = thrust::reduce(arr_ptr, arr_ptr+num_blocks,res); rest -=num_blocks*threads_per_block; i++; } while (rest>0); cudaFree(partial_res); cout.precision(20); cout << res << endl<< "Total Time: " << (GetTickCount()-dw)<< endl; }
0.749ç§ã§1.2502304241758133
CUDAãããã¯ã®ã¡ãã·ã¥ãµã€ãºãå¶éãããŠãããããDoã«ãŒãã䜿çšããå¿ èŠããããŸãã ãããã£ãŠããã®å Žåãã³ã¢ã¯ã«ãŒããã3ååŒã³åºãããæ¯åçŽ7000äžèªãåŠçããŸãã ããã«ãããããããããã©ãŒãã³ã¹ã¯å粟床ã®è¶ è¶é¢æ°ã®èšç®ã«ããã£ãŠãããããThrustã¯æ¯èŒçäœãããã©ãŒãã³ã¹ã責ããããšã¯ã§ããŸãããããç解ãããããšã¬ã¬ã³ããªã³ãŒãã§æåã®ãªãã·ã§ã³ã䜿çšããããšããå§ãããŸãã ãšããã§ãç§ãã¡ã®ã¢ãããŒãã¯ãCUDAäºæããã€ã¹ïŒããŒã«ã«ã¯ã©ã¹ã¿ãŒãç·šæã§ããããã€ã¹ïŒã®æ°ã«ãã£ãŠãŸã ã¹ã±ãŒãªã³ã°ãããŠããŸããã ããã¯ä¿®æ£ã§ããŸããïŒ äœããã®çç±ã§2ã€ã®ããã€ã¹ãšè£å©ãã€ã³ãã£ã³ã°å šäœïŒcudaSetDeviceãcudaStreamCreate / cudaStreamDestroyãåŒã³åºãïŒã®éã§ã³ã³ããã¹ããç°¡åã«åãæ¿ããã«ã¯ããã§ã«çŽ0.5ç§ããããŸããã ã€ãŸããè€æ°ã®CUDAããã€ã¹ã«ãŸãããã¹ã±ãŒãªã³ã°ã¯ãã«ãŒãã«ã®å®è¡æéãé·ããã³ã³ããã¹ãã®åãæ¿ãã®ãªãŒããŒããããèŠããªãå Žåã«æçã§ããããšãããããŸãã ç§ãã¡ã®å Žåãããã¯ããã§ã¯ãªãã®ã§ãèšäºã®ç¯å²å€ã®è€æ°ã®ããã€ã¹ã«ã¹ã±ãŒãªã³ã°ãæ®ããŸãïŒãã¹ãåŽã§è€æ°ã®ã¹ããªãŒã ã䜿çšãã¹ãã ã£ããããããŸããããããããŸããïŒã
ç§ã¯ã»ãšãã©å¿ããŠããŸãã-Matlabã¯CUDAããã€ã¹ã§ã®ã³ãŒãå®è¡ã3幎éãµããŒãããŠããŸãïŒãã¡ãããããã€ãã®å¶éããããŸãïŒã èå³ã®ããæ¹ã¯è±èªã®ãŠã§ãããŒãèŠãããšãã§ããŸãïŒç»é²ãå¿ èŠã§ãïŒã 客芳æ§ã®ããã«ãMapleã§ã¯ãããã€ãã®ç·åœ¢ä»£æ°ããã±ãŒãžããã·ãŒãžã£ã®ã¬ãã«ã§ãCUDAãµããŒããåæ©çã§ãããšèšãããã¹ãã§ãã ãã®ç¹ã§ãMatLabã¯ã¯ããã«é«åºŠã§ãã çŸåšã®ããŒãžã§ã³ã§ã¯ããã¹ãããé åãã³ããŒããããšãªããããã€ã¹äžã§çŽæ¥é åãé åã§åããããšãã§ãããã©ããã¯ããããŸããïŒããã¥ã¡ã³ãã«ããå€æã§ã¯ã§ããŸããïŒã ããã§ãæ£é¢ã¢ãããŒããé©çšããŸãã
tic; res=0.0;n=10;stride=200000000/n; for j=1:n X=stride*(j-1)+1:stride*j; A=gpuArray(X); res=res+sum(sin(A)); end toc; gather(res)
1.250230424175708ãçµéæéã¯2.872859ç§ã§ãã
ãã¥ã¢ã«ãããã¢ã¯ã»ã©ã¬ãŒã¿ã®äž¡æ¹ã®ããã»ããµã§ã³ãŒããããã«åäœãããããšã¯ã§ããŸããã§ãããããã¥ã¢ã«ã§ã¯ããã®ãããã¯ã¯å®å šã«ã¯å ¬éãããŠããŸããã Spmdã¯æ©èœãã2ã€ã®éšåã«åå²ãããè€åé åãäœæãããŸãã ãã ããããæç¹ã§ãããã°ã©ã ã¯å€±æããããŒã¿ã¯ããå©çšã§ããªããšèšã£ãŠããŸãã matlabã§è€æ°ã®GPUãæ¢ã«äœ¿çšããŠãã人;-)ãšã«ãããmatlabããŒãžã§ã³ã¯Thrustã§ã®å®è£ ããé«éã§ã¯ãããŸããã
ãŸããç§ã¯faç¶ãšããŠãVtuneã€ãã³ããããã¡ã€ãªã³ã°ã䜿çšããŠãã£ãã·ã¥ãã¹ããã®ä»ã®åŸ®åŠããæé©åããçŽç²ãªã¢ã»ã³ãã©ããŒãžã§ã³ãè¿œå ããããšæããŸãããåã¯ãããŸãã:-)ãã©ã³ãã£ã¢ãããå Žåã¯ãçµæãéä¿¡ããŠãã ãããèšäºã ãã€ãã³ãŒããŒã®äžã§æã¡äžãã®çµæãèŠãã®ãé¢çœãã§ãããããæ®å¿µãªããšã«ãé©åãªããŒããŠã§ã¢ã¯ãããŸããã
ãããŠæåŸãŸã§èªãã 人ãžã®ããŒãã¹
ãã£ãšå€ãã®äººãç§ãæµ®æ°ããŠããããšã«æ°ã¥ããŸããã ãµã€ã³ã®ç¹å®ã®åèšã¯ã2ååã®æ°å€ãåèšããã ãã§ãªããè€æ°ã®æ°éãåçŽã«ä¹ç®ããããšã§ãååŸã§ããŸãã
ããŠããŸãã¯æ°ä»ããªãããšã§ãããMathematicaãŸãã¯Mapleã§æ°åŒãã·ã³ããªãã¯åœ¢åŒã§é§åããŸãã æ£ç¢ºãªå°æ°ç¹ä»¥äž50æ¡ã§æ€çŽ¢ãããæ°å€ïŒ
æãéèŠã§æé©ãªæé©åã¯åžžã«ã¢ã«ãŽãªãºã ã¬ãã«ã§è¡ãããŸã:-)ããã«ãããããããè¡ãããäœæ¥ã¯åœ¹ã«ç«ããªãã£ãããã§ã¯ãããŸãã-çµå±ãããè€éãªãã¿ãŒã³ã§ã¯åæåŒã䜿çšã§ããªãå¯èœæ§ãé«ãããã§ã«æ°å€ã®åé¡ã«å¯ŸåŠããæ¹æ³ãããã£ãŠããŸãã ã¡ãªã¿ã«ãåèšã®ç²ŸåºŠãéèŠãªå Žåã¯ãå°ããªçšèªã倧ããªåèšã«è¿œå ãããšãã«æå¹æ°åã倱ãããªãããã«ããããã«ãè¿œå ã®å¯Ÿçãè¬ããå¿ èŠããããŸããããšãã°ãKaganã¢ã«ãŽãªãºã ã䜿çšããŸãã ããªããèå³ãæã£ãŠããããšãé¡ã£ãŠããŸãïŒ
ãããã
ã¢ãããªãŒã»ã¢ã¬ã¯
ã·ãŒãšã05/24/2013
ããŠããŸãã¯æ°ä»ããªãããšã§ãããMathematicaãŸãã¯Mapleã§æ°åŒãã·ã³ããªãã¯åœ¢åŒã§é§åããŸãã æ£ç¢ºãªå°æ°ç¹ä»¥äž50æ¡ã§æ€çŽ¢ãããæ°å€ïŒ
1.2502304241756868163500362795713040947699040278200
æãéèŠã§æé©ãªæé©åã¯åžžã«ã¢ã«ãŽãªãºã ã¬ãã«ã§è¡ãããŸã:-)ããã«ãããããããè¡ãããäœæ¥ã¯åœ¹ã«ç«ããªãã£ãããã§ã¯ãããŸãã-çµå±ãããè€éãªãã¿ãŒã³ã§ã¯åæåŒã䜿çšã§ããªãå¯èœæ§ãé«ãããã§ã«æ°å€ã®åé¡ã«å¯ŸåŠããæ¹æ³ãããã£ãŠããŸãã ã¡ãªã¿ã«ãåèšã®ç²ŸåºŠãéèŠãªå Žåã¯ãå°ããªçšèªã倧ããªåèšã«è¿œå ãããšãã«æå¹æ°åã倱ãããªãããã«ããããã«ãè¿œå ã®å¯Ÿçãè¬ããå¿ èŠããããŸããããšãã°ãKaganã¢ã«ãŽãªãºã ã䜿çšããŸãã ããªããèå³ãæã£ãŠããããšãé¡ã£ãŠããŸãïŒ
ãããã
ã¢ãããªãŒã»ã¢ã¬ã¯
ã·ãŒãšã05/24/2013