ã¿ãªããããã«ã¡ã¯ïŒ
ã¢ã«ãã©SDK for OpenCLã¯ã OpenCLã§èšè¿°ãããã³ãŒããã¢ã«ãã© FPGAãã¡ãŒã ãŠã§ã¢ã«ã³ã³ãã€ã«ã§ããã©ã€ãã©ãªãšã¢ããªã±ãŒã·ã§ã³ã®ã»ããã§ãã ããã«ãããããã°ã©ããŒã¯ãHDLèšèªã®ç¥èããªããŠãFPGAãé«æ§èœã³ã³ãã¥ãŒãã£ã³ã°ã®ã¢ã¯ã»ã©ã¬ãŒã¿ãŒãšããŠäœ¿çšããGPUã§äœ¿çšãããšãã«æ £ããŠããããšãæžãããšãã§ããŸãã
ç°¡åãªäŸã§ãã®ããŒã«ã䜿çšããŸããããããã«ã€ããŠã話ãããããšæããŸãã
èšç»ïŒ
- FPGAã«ã€ããŠäžèš
- äœãå®è¡ããŸããïŒ
- éçºããã»ã¹ïŒã¯ãŒã¯ãããŒïŒ
- Opencl bsp
- ã«ãŒãã«ãã³ã³ãã€ã«ãã
- ...ãããŠå®è¡
- ãããã«
ç«ãžããããïŒ æ³šæ ãåçããããŸãïŒ
FPGAïŒFPGAïŒã«ã€ããŠäžèš
FPGAïŒField-Programmable Gate ArrayïŒã¯ãŠãŒã¶ãŒãããã°ã©ã å¯èœãªã²ãŒãã¢ã¬ã€ã§ã FPGAã®äžçš®ã§ãã
ãããã®ãããã¯ãè«çèŠçŽ ã®å°ããªãããã¯ã«åºã¥ããŠããŸãã ãã®ãããªããªããã£ãã§ã¯ã8ããããã€ã¯ãã³ã³ãããŒã©ãŒãããããã³ã€ã³ãã€ããŒãŸã§ããããããããã®ããžãã¯ãæ§ç¯ã§ããŸãã
FPGAã®è©³çŽ°ãèªã
éåžžã«é«å質ã®ãããªãèŠèŽããããšããå§ãããŸãã
ãŸãã Dummyçšã®FPGA FPGAãããã FPGAã®æŠèŠãšãããã®ãããã®äœ¿çšæ¹æ³ãéåžžã«ç°¡åãªèšèªã§èª¬æããŠããŸãã
ãŸãã Dummyçšã®FPGA FPGAãããã FPGAã®æŠèŠãšãããã®ãããã®äœ¿çšæ¹æ³ãéåžžã«ç°¡åãªèšèªã§èª¬æããŠããŸãã
FPGAã®ãã¯ã©ã·ãã¯ãéçºã¯æ¬¡ã®ããã«ãªããŸãã
FPGAã¯ãããé«äŸ¡ãªçš®é¡ã®ãã€ã¯ãã³ã³ãããŒã©ãŒãšããŠèªèãããå ŽåããããŸãããã¡ãã¡ã§LEDãç¹æ» ãããUARTãSPIãI2CãæŽçã§ããŸãã 以åã¯ãFPGAãïŒãªãœãŒã¹ãšåšæ³¢æ°ã®ç¹ã§ïŒå°ãããäžéšã®æ·±å»ãªããŒã¿åŠçãããã»ããµãšã®ç«¶åã«ã€ããŠè©±ãããšãã§ããªãã£ããããããã¯éšåçã«çå®ã§ããã çŸåšãFPGAãããã¯ãŸããŸãåããªãããã®ããã©ãŒãã³ã¹ã¯GPUãšæ¯èŒãããŠããŸãã
FPGAã䜿çšãããšãæäœã¬ãã«ã§åŠçãå¶åŸ¡ã§ããŸããé©åãªå Žæã«é©åãªãµã€ãºã®ãã£ãã·ã¥ãäœæãããã€ãã©ã€ã³ãæŽçããæ瀺çãªåæå®è¡æ§ãèšè¿°ããŸãã ããŸããŸãªåšèŸºæ©åšïŒãããªã«ã¡ã©ãã€ãŒãµãããããŒããªã©ïŒãæ¥ç¶ããæ±çšããã»ããµãªãã§èšç®ãå®è¡ã§ããŸãã
FPGAã®ãã¹ãŠã®é åã¯ãäœã¬ãã«ã®å¶åŸ¡ãããå Žåããã®äœã¬ãã«ãããã°ã©ã ããå¿ èŠããããšããäºå®ã«ãã£ãŠçžæ®ºãããŸãïŒ äœã¬ãã«ã®æœè±¡åã¯ãåžžã«éçºãšãããã°ã®è€éåãçšèªã®å¢å ã«ã€ãªãããŸãã
FPGA補é æ¥è ã¯ãåžå Žæå ¥ãŸã§ã®æéãççž®ããå¿ èŠæ§ã«ã€ããŠéåžžã«åççã«èããŸãããã€ãŸããããã°ã©ããŒãFPGAã®äžã§éåžžã«ç°¡åãã€è¿ éã«äœæã§ããããã«ããããã§ãã 䞊åèšç®çšã®ããã°ã©ã ãèšè¿°ããããã®æšæºãªãã·ã§ã³ã®1ã€ã¯OpenCLã§ãã ã¢ã«ãã©ã¯OpenCLã®ãµããŒãã決å®ããŸããã ã¢ã«ãã©SDK for OpenCLãéçºãããŸããã
OpenCLã®èª¬æã¯æå³çã«çç¥ããŠããŸãããã·ã¢èªã®ã€ã³ã¿ãŒãããã«é¢ãããã®äž»é¡ã«é¢ããæç®ã¯å€æ°ãããŸããããšãã°ã OpenCLã®æŠèŠãªã©ã§ãã
äœãå®è¡ããŸããïŒ
FPGAãæèŒãããã¹ãŠã®ãã¶ãŒããŒãã OpenCLãå®è¡ã§ããããã§ã¯ãããŸãããã¢ã«ãã©ã¯ç¹å¥ãªã¢ãã£ãªãšã€ãããã°ã©ã ãäœæããŸãããéçºè ãããã¯äžèšã®ã¿ã°ãååŸããŸãã
PCIe
FPGAãããã¯ã察å¿ããã¹ãããã®ãã¶ãŒããŒãã«ïŒå°ãªããšãGPUã®ä»£ããã«ïŒãã©ã°ã€ã³ãããPCIeã«ãŒãã«é 眮ã§ããŸãã DMAããã³PCIe FPGAãä»ããŠãããã»ããµã«æ¥ç¶ãããŠããDDRã¡ã¢ãªãšéä¿¡ã§ããŸãïŒèšç®ã®ããã«ããŒã¿ãååŸããŸãïŒã ããŒãã«ã¯ãFPGAã§ã®ã¿äœ¿çšå¯èœãªå€éšã¡ã¢ãªãé 眮ããããšãã§ããŸãïŒCPUã®OSã¯ãã®ã¡ã¢ãªã«ã¢ã¯ã»ã¹ã§ããŸããïŒã
äžéèšç®ãä¿åããããã«å€éšã¡ã¢ãªãå¿ èŠã«ãªãå ŽåããããŸããDMAãžã®ã¢ã¯ã»ã¹ã¯ãã¹ãã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ãããå®äŸ¡ã§ãã DDRã§ããå¿ èŠã¯ãããŸãããäžéšã®äœé 延èšç®ã§ã¯ãSRAMã®æ¹ãé©åã«æ©èœããå ŽåããããŸãã
åŠççšã®ããŒã¿ã¯ãã°ããŒãã«ã¡ã¢ãªããã ãã§ãªããããšãã°ã€ãŒãµãããããŒããªã©ã®I / Oãã£ãã«ãããã«ãŒãã«ã«æäŸã§ããŸãã ãã®å Žåããã¹ãã¯ã«ãŒãã«ã®ã¿ãæ§æããããŒã¿ã¯æå°éã®é 延ã§åŠçãããŸãã ïŒã€ãŒãµããããFPGAãäœé 延ãšããèšèã䞊ãã§ããå Žåãã»ãšãã©ã®å Žåãé«é »åºŠååŒãæ瀺ãããŠããŸãïŒã
SoC
2çªç®ã®ãªãã·ã§ã³ã¯ã SoC 'axã§å¯èœã§ããããã°ã©ããã«ããžãã¯ãšARMããã»ããµã1ã€ã®ã¯ãªã¹ã¿ã«ã«é 眮ãããŠããŸãã
ç·è²ã®DDRã¡ã¢ãªã¯å ±æãªãœãŒã¹ã§ããäžæ¹ã§CPUã䜿çšãïŒ ããã§Linuxãå®è¡ã§ããŸã ïŒãããäžæ¹ã§FPGAãæå°éã®ãªãŒããŒãããã§SDRAMã³ã³ãããŒã©ãä»ããŠãã®ã¡ââã¢ãªã«ãçŽæ¥ãèªã¿æžãã§ããŸãã PCIeã«ãŒãã®å Žåãšåæ§ã«ãå€éšã¡ã¢ãªãFPGAã«æ¥ç¶ã§ããŸããããã®å¿ èŠæ§ã¯å°ãªããªããŸãã åžžã«æå ã®DDRã
ãã©ãããã©ãŒã ã®è©³çŽ°ã«ã€ããŠã¯ã ãã¡ããã芧ãã ãã ã
Altera Preferred Board for OpenCLããŒã¯ã®ãªãããŒãã§OpenCL ãå®è¡ããããšã¯å¯èœã§ãã ããã«ã€ããŠã¯èª¬æããŸããããåºçºç¹ãšããŠãå ¬åŒã®ã¢ã«ãã©SDK for OpenCLïŒã«ã¹ã¿ã ãã©ãããã©ãŒã ããŒã«ããããŠãŒã¶ãŒã¬ã€ããåç §ããããšãææ¡ããŸãã
éçºããã»ã¹ïŒã¯ãŒã¯ãããŒïŒ
ã«ãŒãã«ãèµ·åããã«ã¯ãã©ã®ã¹ããããå®è¡ããå¿ èŠããããŸããïŒ
- ã«ãŒãã«ã³ãŒãã¯* .clãã¡ã€ã«ã«èšè¿°ãããŠããŸãã
- C / C ++ã®ãã¹ãã¢ããªã±ãŒã·ã§ã³ãæºåãããŠããŸããããã«ãããå¿ èŠãªéã®ã¡ã¢ãªãå²ãåœãŠãããå€ãã«ãŒãã«ã«ãããŒãããããŸãã
- ã¢ã«ãã©OpenCL SDKã«å«ãŸããŠããaocãŠãŒãã£ãªãã£ã䜿çšããŠãã«ãŒãã«ã¯aocxãã¡ã€ã«ã«ãã³ã³ãã€ã«ããããŸãã gccã䜿çšããŠããã¹ãã¢ããªã±ãŒã·ã§ã³ãæ§ç¯ãããŸãã
- host_appãèµ·åãããšãFPGA ãã¡ãŒã ãŠã§ã¢ãããŒããããæºåãããããŒã¿ãã«ãŒãã«ã«ããŒããããåŠçãéå§ãããŸãã
- ãããã¡ã€ãªã³ã°çšã®ã«ãŠã³ã¿ãŒã¯ã profile.monãã¡ã€ã«ã«åãŸãããŒã¿ãåéããŸãã
- aoclãŠãŒãã£ãªãã£ã䜿çšãããšããã®ã¬ããŒããèŠãŠããã®å®è£ ãã©ã³ã¿ã€ã /ããã©ãŒãã³ã¹ã®ç¹ã§æºè¶³ã§ãããã©ãããçµè«ä»ããããšãã§ããŸãã
- æºè¶³ãããã-- profileãªãã§ã«ãŒãã«ãåã³ã³ãã€ã«ã§ããŸãã ãããã¡ã€ãªã³ã°ã¡ãŒã¿ãŒã¯ãFPGAã®ãªãœãŒã¹ãæ¶è²»ããŸãã äžæ¹ãè¿œå ã®ã³ã¢ãè¿œå ããäºå®ããªãå Žåã¯ãåæ§ç¯ããªãããšãå¯èœã§ãã
- ãããæºããããªãå Žåã¯ãæé©åãã/ãã³ã§æžã/å¥ã®ããããåããããããææ ¢ããå¿ èŠããããŸãã
aocxãã¡ã€ã«ãžã®ã³ã³ãã€ã«ã«ã¯æ°æéãããããšããããŸãïŒ
aoc kernel.clãåŒã³åºããããšã©ããªããŸããïŒ
ãã«ãaocx
- kernel.cl㯠ãèšè¿°ãIRã«å€æããããŸããŸãªæé©åãå®è¡ããclangã«ãã£ãŠæäŸãããŸãã
- RTL Verilog IPã³ã¢ãçæãããŸãã çæããããã¡ã€ã«ã¯èªã¿åãå¯èœïŒæå·åãããŠããªãïŒã§ãããéåžžã®ã·ãã¥ã¬ãŒã¿ãŒïŒModelSimãªã©ïŒã§ã·ãã¥ã¬ãŒãã§ããŸãã ãã ãããã¹ãŠã®ã³ãŒããèªåçæãããããã§ã¯ãããŸããã人ã ãæ確ã«æžããã¢ãžã¥ãŒã«ããããŸãã
- åãåã£ãIPã¯ãããŒãã®ããã©ã«ããããžã§ã¯ãã«ãåå ããã Quartusã®éåžžã®ãããžã§ã¯ããååŸãããŸãã
- ãããžã§ã¯ãã¯çµã¿ç«ãŠãããŠããŸãïŒAnalysisïŒSynthesisãFitterãAssemblerïŒã ãã®ã¢ã€ãã ã®æèŠæéã¯æé·ïŒ10åããæ°æéïŒã§ããããªããã£ãã®æé©ãªå Žæãæ€çŽ¢ããã«ã¯ãå€ãã®èšç®ãå¿ èŠã§ãã
- ã¢ã»ã³ããªã®çµæãããŒãã«é¢ããæ å ±ãªã©ã¯ãåãªãELFãã¡ã€ã«ã§ããaocxã«é 眮ãããŸãã
次ã«ããã®aocxãã¡ã€ã«ã䜿çšããŠã«ãŒãã«ããããŒããããŸãã
DE1-SoC OpenCL BSP
èšèãšåçã§ã¯ããã¹ãŠãéåžžã«æ»ããã«èŠããŸããVerilogã®ç¥èã¯å¿ èŠãããŸããã
æ¬åœã«äœã§ããïŒ
Terasicã®DE1-SoCããŒããç§ã®æã«åã³çŸããŸããã Cyclone V SoC ïŒ 5CSEMA5F31C6 ïŒã®ç³ã«åºã¥ããŠããŸãã
é衚瀺ã®ããã¹ã
ãã®ããŒãã«ã¯ã OpenCLããŒã¯çšã®Altera Preferred BoardããããŸãããã®ãããOpenCLã®ããŒã³ãã¯ããã«äœ¿çšã§ããŸãããã®ç¹å®ã®ããŒãã«ã¯OpenCL BSPãå¿ èŠã§ãã ããã§åãããšãã§ããŸã ã
OpenCL BSPã®ã¢ãŒã«ã€ãã«ã¯ä»¥äžãå«ãŸããŸãã
- ãã©ãã·ã¥ãã©ã€ãã€ã¡ãŒãžïŒLinuxãèµ·åããŸãïŒã
- ã€ã³ã¿ãŒãã§ã€ã¹ïŒfpga2sdramãlwhps2fpgaãªã©ïŒãšåæ§ã«ããã¹ãŠã®ãã³ãæ¢ã«æ§æãããŠããããã©ã«ãã®ãããžã§ã¯ãã
- ç°¡åãªäŸã
ã€ã¡ãŒãžã¯ãåã«ddãä»ããŠMicroSDã«æžã蟌ãŸããŸãã
泚 ïŒ ã¯ã©ã¹10ãã©ãã·ã¥ãã©ã€ãã䜿çšããããšããå§ãããŸã ã
ãã§ã«LinuxããããŸãïŒ
root@socfpga:~# uname -a Linux socfpga 3.13.0-00298-g3c7cbb9-dirty #3 SMP Fri Jul 4 15:42:32 CST 2014 armv7l GNU/Linux root@socfpga:~# cat /etc/issue Poky 8.0 (Yocto Project 1.3 Reference Distro) 1.3 \n \l root@socfpga:~# cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 0 (v7l) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x3 CPU part : 0xc09 CPU revision : 0 processor : 1 model name : ARMv7 Processor rev 0 (v7l) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x3 CPU part : 0xc09 CPU revision : 0 Hardware : Altera SOCFPGA Revision : 0000 Serial : 0000000000000000
ãŸããã³ã³ãã€ã«ããããµã³ãã«ãšOpenCLã©ã³ã¿ã€ã ç°å¢ããããŸãã
æãããã®ããREADMEã®æäŸïŒ
Run "source ./init_opencl.sh" to setup OpenCL Run-Time Environment, including loading driver, on this board. Do it once right after booting the board. OpenCL Run-Time Environment is pre-installed in opencl_arm32_rte folder.
Init_opencl.shèªäœã¯éåžžã«ç°¡åã«èŠããŸãã
root@socfpga:~# cat init_opencl.sh export ALTERAOCLSDKROOT=/home/root/opencl_arm32_rte export AOCL_BOARD_PACKAGE_ROOT=$ALTERAOCLSDKROOT/board/c5soc export PATH=$ALTERAOCLSDKROOT/bin:$PATH export LD_LIBRARY_PATH=$ALTERAOCLSDKROOT/host/arm32/lib:$LD_LIBRARY_PATH insmod $AOCL_BOARD_PACKAGE_ROOT/driver/aclsoc_drv.ko
ãã®ã¹ã¯ãªãããå®è¡ãã helloworldãã£ã¬ã¯ããªã«ç§»åããŠãåãååã®ã¢ããªã±ãŒã·ã§ã³ãå®è¡ããŸãã
root@socfpga:~/helloworld# ./helloworld Querying platform for info: ========================== CL_PLATFORM_NAME = Altera SDK for OpenCL CL_PLATFORM_VENDOR = Altera Corporation CL_PLATFORM_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 14.0 Querying device for info: ======================== CL_DEVICE_NAME = de1soc_sharedonly : Cyclone V SoC Development Kit CL_DEVICE_VENDOR = Altera Corporation CL_DEVICE_VENDOR_ID = 4466 CL_DEVICE_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 14.0 CL_DRIVER_VERSION = 14.0 CL_DEVICE_ADDRESS_BITS = 64 CL_DEVICE_AVAILABLE = true CL_DEVICE_ENDIAN_LITTLE = true CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0 CL_DEVICE_GLOBAL_MEM_SIZE = 536870912 CL_DEVICE_IMAGE_SUPPORT = false CL_DEVICE_LOCAL_MEM_SIZE = 16384 CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000 CL_DEVICE_MAX_COMPUTE_UNITS = 1 CL_DEVICE_MAX_CONSTANT_ARGS = 8 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 134217728 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 8192 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0 Command queue out of order? = false Command queue profiling enabled? = true Using AOCX: hello_world.aocx Kernel initialization is complete. Launching the kernel... Thread #2: Hello from Altera's OpenCL Compiler! Kernel execution is complete.
ããŠããã©ãã·ã¥ãã©ã€ãäžã®ç¹å¥ã«æºåãããããã€ãã®äŸãšãã¡ã€ã«ãæ©èœããäœããå°å·ããŸãã
ç°¡åãªäŸããã«ãããŠå®è¡ããã«ã¯ãäœãããå¿ èŠããããŸããïŒ
SDKã®ã€ã³ã¹ããŒã«
å¿ èŠãªãã®ïŒ
- Quartus
- ã¢ã«ãã©SDK for OpenCL
- SoC Embedded Design Suite ïŒSoC'eäžã§ã¢ããªã±ãŒã·ã§ã³ãæ§ç¯ããã³ãããã°ããããã®ããŒã«ã»ããã
ãããã®ããŒã«ããã¹ãŠã€ã³ã¹ããŒã«ããã®ã¯ç°¡åãªããšã§ããã埮åŠãªç¹ããããŸãã
- ã«ãŒãæš©éãå¿ èŠã«ãªãå ŽåããããŸãããã€ã³ã¹ããŒã«ã®æåŸã§ã®ã¿éç¥ãããŸãã
- ã€ã³ã¹ããŒã«åŸãPATHãALTERAOCLSDKROOTãQUARTUS_ROOTDIRã«äœããæžã蟌ãå¿ èŠããããŸãã ããã«ç»é²ãããã®ã¯ãé¢é£ããã¬ã€ãããåŠã¶ããšãã§ããŸãã
äœãééã£ãããšããããããããŸããããæçµçã«ã¯ç°å¢å€æ°ãèšå®ããããã®ã¹ã¯ãªããã次ã®ããã«ãªãå§ããŸããã
export PATH=/home/ish/altera/14.1/quartus/bin:$PATH export PATH=/home/ish/altera/14.1/hld/bin:$PATH export PATH=/usr/local/DS-5/bin:$PATH export PATH=/usr/local/DS-5/sw/gcc/bin:$PATH export PATH=/home/ish/altera/14.1/hld/linux64/bin/:$PATH export ALTERAOCLSDKROOT=/home/ish/altera/14.1/hld/ export QUARTUS_ROOTDIR=/home/ish/altera/14.1/quartus/ export LD_LIBRARY_PATH=/home/ish/altera/14.1/hld/linux64/lib/:$LD_LIBRARY_PATH # , export AOCL_BOARD_PACKAGE_ROOT=/home/ish/altera/14.1/hld/board/de1soc
é衚瀺ã®ããã¹ã
ã¯ããç§ã¯ææ°ã®Quartusãæã£ãŠããªãããããããã15çªç®ã®ããŒãžã§ã³ã§å°ãäœãã«è¡šç€ºããããã®ãæ¹åãããŸããã
OpenCLã®ç¹ã§æ ¹æ¬çã«å€æŽããããã®ãããå ŽåãPMã§ç§ã«åœãã£ãŠããã ããã°å¹žãã§ãã
OpenCLã®ç¹ã§æ ¹æ¬çã«å€æŽããããã®ãããå ŽåãPMã§ç§ã«åœãã£ãŠããã ããã°å¹žãã§ãã
ããããã¹ãŠèšå®ããã©ã€ã»ã³ã¹ã«å°å¿µããããããŒããã€ã³ã¹ããŒã«ããå¿ èŠããããŸãã
ãããè¡ãæ¹æ³ã¯ãBSPã®ã¢ãŒã«ã€ãã«ããREADME.txtã«æ瀺ããŸãã
note:before the below operations,make sure you have install the opencl SDK 14.0 and SoCEDS 14.0. 1. directly unzip the de1soc_openCL_bsp.zip into %ALTERAOCLSDKROOT%/board directory. 2. set the "User variables" AOCL_BOARD_PACKAGE_ROOT to %ALTERAOCLSDKROOT%/board/de1soc 3. open the windows command window and type "aoc --list-boards", it should output "de1soc_sharedonly"
å®è¡ããŠç¢ºèªããŸãã
ish@xmr:~$ aoc --list-boards Board list: de1soc_sharedonly
ããŒãããªã¹ãã«è¡šç€ºãããŸãã-ããã¯ãã¹ãŠãæ£ããè¡ãããããšãæå³ããŸãã
äŸã眮ã
ãŸããéåžžã«åçŽãªäŸãéžæããŸããã
Z = X + Y ã
XãšYã¯ã Nåã® uintïŒ32ãããïŒæ°å€ã®é åã§ãã
ã«ãŒãã«vector_addã¯éåžžã«ã·ã³ãã«ã«èŠããŸãïŒ
// ACL kernel for adding two input vectors __kernel void vector_add( __global const uint *restrict x, __global const uint *restrict y, __global uint *restrict z ) { // get index of the work item int index = get_global_id(0); // add the vector elements z[index] = x[index] + y[index]; }
å®å šãªãã¹ãã³ãŒãã¯æäŸããŸããã ãã¡ããã芧ãã ãã ã
圌ã¯äœãããŸããïŒ
- OpenCLããã€ã¹ãšã¯äœããèªèããããšãã
- aocxãã¡ã€ã«ã䜿çšããŠFPGAãåããã°ã©ã ãã
- é åXãYãZã®ãããã¡ãŒãåæåããŸã
- é åXããã³Yã«ããŒã¿ãçæããïŒããã»ããµäžã§ïŒåç §å¿çãèšç®ããŸã
- é åãžã®ãã€ã³ã¿ãã«ãŒãã«ã«æž¡ããŸã
- åŠçãéå§ããŸã
- ãã®å®æãåŸ ã£ãŠããŸã
- åç §ã®åçãã«ãŒãã«ãã«ãŠã³ããããã®ãšæ¯èŒããŸã
ãã«ãã¯ç°¡åã§ããARMã¯ãã¹ã³ã³ãã€ã©ã䜿çšããéåžžã«ã·ã³ãã«ãªMakefileãå®è¡ããŸãã ïŒãã®å Žåã®ãã¹ãã¯ãSoC'eã«ããARMã§ãïŒã
aocxãååŸ ïŒ
ish@xmr:~/tmp/cl/vector_add$ aoc device/vector_add.cl -o bin/vector_add.aocx --board de1soc_sharedonly --profile -v aoc: Environment checks are completed successfully. You are now compiling the full flow!! aoc: Selected target board de1soc_sharedonly aoc: Running OpenCL parser.... aoc: OpenCL parser completed successfully. aoc: Compiling.... aoc: Linking with IP library ... aoc: First stage compilation completed successfully. aoc: Hardware generation completed successfully.
--profileãã©ã°ã¯ããããã¡ã€ã«ãäœæããããã®ã«ãŠã³ã¿ãŒããã¡ãŒã ãŠã§ã¢ã«è¿œå ãã -vã¯åé·ãªãã®ã§ããããšãæãåºãããŠãã ããã
çŽ10ã15åããããŸãã
binãã£ã¬ã¯ããªã«vector_add.aocxã衚瀺ããã bin_vector_addã« Quartusãããžã§ã¯ãã衚瀺ãããŸããããã®ãããžã§ã¯ãã¯ãã®éãã£ãšçµã¿ç«ãŠãããŠããŸããã
ã¢ã»ã³ããªã¬ããŒãïŒ
+-------------------------------------------------------------------------------+ ; Fitter Summary ; +---------------------------------+---------------------------------------------+ ; Fitter Status ; Successful - Sat Oct 17 21:36:01 2015 ; ; Quartus II 64-Bit Version ; 14.1.0 Build 186 12/03/2014 SJ Full Version ; ; Revision Name ; top ; ; Top-level Entity Name ; top ; ; Family ; Cyclone V ; ; Device ; 5CSEMA5F31C6 ; ; Timing Models ; Final ; ; Logic utilization (in ALMs) ; 5,570 / 32,070 ( 17 % ) ; ; Total registers ; 9685 ; ; Total pins ; 103 / 457 ( 23 % ) ; ; Total virtual pins ; 0 ; ; Total block memory bits ; 127,344 / 4,065,280 ( 3 % ) ; ; Total DSP Blocks ; 0 / 87 ( 0 % ) ; ; Total HSSI RX PCSs ; 0 ; ; Total HSSI PMA RX Deserializers ; 0 ; ; Total HSSI TX PCSs ; 0 ; ; Total HSSI PMA TX Serializers ; 0 ; ; Total PLLs ; 2 / 6 ( 33 % ) ; ; Total DLLs ; 1 / 4 ( 25 % ) ; +---------------------------------+---------------------------------------------+
ç§ãæãèå³ãæã£ãŠããã®ã¯ã ããžãã¯äœ¿çšçãšåèšãããã¯ã¡ã¢ãªãããã® 2è¡ã§ãã
ãã®ç°¡åãªäŸã§ã¯ã5570 ALMã䜿çšããŸããã å®éãå ç®æäœã«ãããã®ã¯ãã®æ°ã®1ïŒ æªæºã§ããæ®ãã¯ããã€ã³ãã©ã¹ãã©ã¯ãã£ãã«ãã£ãŠå ããããDDRïŒããã³ãããã¡ã€ãªã³ã°ã«ãŠã³ã¿ïŒããããŒã¿ãèªã¿æžãããŸãã Quartusã®ãããžã§ã¯ãã¯ããã©ã«ãèšå®ã§è¡ãããŠããããšã«æ³šæããããšãéèŠã§ããããã©ã«ãèšå®ã«ã¯ããªãœãŒã¹/é »åºŠã®æé©åã¯å«ãŸããŠããŸããã
ãŸããåèšå®¹éãçŽ128 Kãããã®ã¡ã¢ãªãèªåçã«ãã©ãããã«è¡šç€ºãããããšãèå³æ·±ãã§ãã
ãšããã§ã vector_add.aocxã«è¡šç€ºãããã»ã¯ã·ã§ã³ãèŠãããšãã§ããŸãïŒ
é衚瀺ã®ããã¹ã
ish@xmr:~/tmp/cl/vector_add$ readelf -a bin/vector_add.aocx ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: NONE (None) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 2370388 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 40 (bytes) Number of section headers: 20 Section header string table index: 1 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .shstrtab STRTAB 00000000 000080 00011c 00 S 0 0 128 [ 2] PROGBITS 00000000 000200 001000 00 0 0 128 [ 3] .acl.board PROGBITS 00000000 001200 000011 00 0 0 128 [ 4] .acl.compileoptio PROGBITS 00000000 001280 000002 00 0 0 128 [ 5] .acl.version PROGBITS 00000000 001300 00000a 00 0 0 128 [ 6] .acl.file.0 PROGBITS 00000000 001380 000030 00 0 0 128 [ 7] .acl.source.0 PROGBITS 00000000 001400 0006c2 00 0 0 128 [ 8] .acl.nfiles PROGBITS 00000000 001b00 000001 00 0 0 128 [ 9] .acl.source PROGBITS 00000000 001b80 0006c2 00 0 0 128 [10] .acl.opt.rpt.xml PROGBITS 00000000 002280 000019 00 0 0 128 [11] .acl.mav.json PROGBITS 00000000 002300 00107f 00 0 0 128 [12] .acl.area.json PROGBITS 00000000 003380 0009da 00 0 0 128 [13] .acl.profiler.xml PROGBITS 00000000 003d80 002f08 00 0 0 128 [14] .acl.profile_base PROGBITS 00000000 006d00 0009c8 00 0 0 128 [15] .acl.autodiscover PROGBITS 00000000 007700 000071 00 0 0 128 [16] .acl.autodiscover PROGBITS 00000000 007780 00021e 00 0 0 128 [17] .acl.board_spec.x PROGBITS 00000000 007a00 0003eb 00 0 0 128 [18] .acl.fpga.bin PROGBITS 00000000 007e00 23ab98 00 0 0 128 [19] .acl.quartus_repo PROGBITS 00000000 242a00 000151 00 0 0 128 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) There are no section groups in this file. There are no program headers in this file. There are no relocations in this file. There are no unwind sections in this file. No version information found in this file.
ã«ãŒãã«ãèµ·åããŸã
scp vector_addããã³vector_add.aoxãä»ããŠããŒãã«ã³ããŒããå®è¡ããŸãã
root@socfpga:~/myvectoradduint# ls -l -rwxr-xr-x 1 root root 42525 Apr 16 06:57 vector_add -rw-r--r-- 1 root root 2371188 Apr 16 06:58 vector_add.aocx root@socfpga:~/myvectoradduint# ./vector_add Initializing OpenCL Platform: Altera SDK for OpenCL Using 1 device(s) de1soc_sharedonly : Cyclone V SoC Development Kit Using AOCX: vector_add.aocx Launching for device 0 (1000000 elements) Time: 112.475 ms Kernel time (device 0): 7.270 ms Verification: PASS
7.270ããªç§ã§100äžãã¢ã®32ãããæ°ããŸãã¯7.27 nsã§1ãã¢ãè¿œå ã§ããŸããã å®éããã®ã€ã³ãžã±ãŒã¿ãŒã¯çŸæç¹ã§ã¯ããã»ã©èå³æ·±ããã®ã§ã¯ãããŸãããäŸã¯ããã©ãŒãã³ã¹çšã«æé©åãããŠããŸããã ïŒã¹ãã€ã©ãŒïŒ1ã€ã®å ç®åšã®ã¿ã䜿çšãããŸããïŒèšç®ã®äžŠååã¯ãããŸããã§ããïŒã
å®è¡åŸã profile.monã¯ãã£ã¬ã¯ããªã«è¡šç€ºãããŸãã
root@socfpga:~/myvectoradduint# ls -l -rw-r--r-- 1 root root 170 Apr 16 06:58 profile.mon -rwxr-xr-x 1 root root 42525 Apr 16 06:57 vector_add -rw-r--r-- 1 root root 2371188 Apr 16 06:58 vector_add.aocx
ãããã³ã³ãã¥ãŒã¿ãŒã«ã³ããŒããŠããããã¡ã€ãªã³ã°çµæã確èªããŸãã
ish@xmr:~/tmp/cl/vector_add$ aocl report bin/vector_add.aocx profile.mon
ãããã¡ã€ã©ãŒã¯ãã°ããŒãã«ã¡ã¢ãªã«åž¯åå¹ ã®3åã®1ãã䜿çšããŠããªãããšã瀺ããŸããã
ããžã¥ã¢ã©ã€ã¶ãŒãå®è¡ããããšãå¯èœã§ãïŒ
ish@xmr:~/tmp/cl/vector_add$ aocl vis bin/vector_add.aocx
ããžã¥ã¢ã©ã€ã¶ãŒã¯ãã°ããŒãã«ã¡ã¢ãªãšéä¿¡ãã3ã€ã®ãããã¯ãããããšã瀺ããŸããã2ã€ã¯èªã¿åãçšã1ã€ã¯æžã蟌ã¿çšã§ãã ãã®å Žåã®ã°ããŒãã«ã¡ã¢ãªãžã®ã¢ã¯ã»ã¹ã¯ãå¹ ã®çããªã³ã¯ã«ãªãããšããããŸãã åè¡ã®ãšãªã¢ã¬ããŒãã§ã¯ãFPGAã§å®è£ ã«è²»ãããããªãœãŒã¹ã®éã確èªã§ããŸãã ãã¡ããã1è¡ã®äŸã¯ææšã§ã¯ãããŸããã
ã¢ã«ãã©ã®youtubeãã£ã³ãã«ã«ã¯ãäžèšã®ãã¹ãŠã®æé ã詳现ã«ç€ºããããªããããŸãã
ãã®ãµã€ã¯ã«ã®æ®ãã®ãããªã¯ããã¿ãã¬ã®äžã«ãããŸãïŒ
é衚瀺ã®ããã¹ã
ãããã«
ãã®èšäºã§ã¯ãHDLèšèªã®ç¥èããªããŠããFPGAã§é«ã¬ãã«ã§èšè¿°ã§ããããŒã«ãè©ŠããŸããã ã芧ã®ãšãããïŒç°¡åãªäŸã䜿çšããŠïŒåäœããå®éã«äœåãªããšãããå¿ èŠã¯ãããŸããã§ããã
FPGAã§ã®OpenCLã¯éè²ã®ãã³ããŒã«ã¯ãªããŸããã
- ããŒãã«æ£ç¢ºãªããã»ã¹ãèšè¿°ããããšã¯ã§ããŸããïŒããããããããéããããšæããŸããïŒïŒ
- å°ããªãããã«ã¯é©çšã§ããŸãããã€ã³ãã©ã¹ãã©ã¯ãã£ã¯èšå€§ãªéã®ãªãœãŒã¹ãæ¶è²»ããŸãã
ãã ããFPGAã¯ããã䜿çšããŠããããªåŠçïŒãã·ã³ããžã§ã³ïŒãæå·åãDSPãããŸããŸãªããã»ã¹ã®ã·ãã¥ã¬ãŒã·ã§ã³ïŒã·ãã¥ã¬ãŒã·ã§ã³ïŒãªã©ã®åéã§GPUãšç«¶åã§ããŸãã ç§ãäœæ¥ããŠããé åïŒçæããã£ã«ã¿ãªã³ã°ãã€ãŒãµããããã±ããã®ã¹ã€ããã³ã°ïŒãæäœã¬ãã«ã®ã³ã³ãããŒã«ã®ãããã§æ倧ã®ããã©ãŒãã³ã¹ãçµãåºãããé åã«ã€ããŠè©±ããšãOpenCLã®äœ¿çšæ¹æ³ãããããŸããïŒåæ§ã®çµæãåŸãããŸãïŒã
æ倧ã®ããã©ãŒãã³ã¹ãå¿ èŠãªå Žåã¯ããã®èšèªæ§ââæãŸãã¯ãã®èšèªæ§æãåŸããããã®ãéåžžã«ããç解ããå¿ èŠããããŸãã ãã®ãããFPGAã®äžã§OpenCLã§ããã»ã©æ·±å»ã§ã¯ãªããã®ãäœæããã人ã¯ãåºæ¬ã¬ãã«ã§QuartusãQsysãããã³VerilogïŒèªã¿åãã¬ãã«ã§ïŒãå匷ããå¿ èŠãããããã«æããŸãã ãããããããžã¥ã¢ã©ã€ã¶ãŒãšãããã¡ã€ã©ãŒã§ååã§ãããåŠçã®åœç©ã®ããã«èŠããéããæ°ããquartusãªãªãŒã¹ã§ä¿®æ£ãããããšãé¡ã£ãŠããŸãã
ãªã¢ã«ã¿ã€ã ã®ãããªåŠçã«ã€ããŠè©±ãå Žåã¯ããã®ãã¢ãèŠãããšããå§ãããŸãã
iABRAã®ã¡ã³ããŒã¯ãæåã«AMD GPUã䜿çšããOpenCLã§ãã·ã³ããžã§ã³ãå®è¡ããŸãããããã®åŸã¢ã«ãã©ã«ç§»è¡ããŸããã ããã°ã©ããŒã¯ãOpenCLã䜿çšãããšãVHDLãç解ã§ããªããªãããã 圌ãã¯ããã«çµéšããªããäœãã§ããããæžããŠããŸããã
GPUãšOpenCL FPGAã§ã®ã¢ã«ãŽãªãºã ïŒæå·åããããªåŠçïŒã®å®è£ ãæ¯èŒããã¬ããŒãã§ã¯ã1ç§ãããã«å®è¡ãããæäœã®æ°ã¯ã»ãŒåãã§ãããFPGAã¯10åã®1ã®é»åããæ¶è²»ããŸããã ç§ã¯èªåã§è©Šããããšããªãã®ã§ããã®ãããªãã³ãããŒã¯ã«ã€ããŠã¯åžžã«å°ãæççã§ãïŒ
æ°ããArria 10ãã¡ããªãšStratix 10ãã¡ããªã®ãªãªãŒã¹ã«ããããŸããŸãå€ãã®äžŠåã³ã³ãã¥ãŒãã£ã³ã°ãFPGAã«åãæ¿ãããšèããŠããŸãããããã®ãããã¯ãã¹ãŒããŒã³ã³ãã¥ãŒã¿ãŒãšããŒã¿ã»ã³ã¿ãŒã§èŠãããã§ãããã
ãŸããã¢ã«ãã©SDK for OpenCLã®å®éã®äœ¿çšã«é¢ããå¥ã®ãããªïŒ
ãæž èŽããããšãããããŸããïŒ ã³ã¡ã³ããŸãã¯PMã§è³ªåãã³ã¡ã³ããæè¿ããŸãïŒ
䟿å©ãªãªã³ã¯ ïŒ
- ãµã³ãã«ãœãŒã¹ãšèªåçæãããVerilogã³ãŒã
- ã¢ã«ãã©SDK for OpenCLïŒæŠèŠ
- ã¢ã«ãã©SDK for OpenCLïŒå ¥éã¬ã€ã
- ã¢ã«ãã©ã®OpenCL Compilerã䜿çšããFPGAã®ãã¯ãŒã®æŽ»çš ïŒåçãèŠãã®ã奜ããªäººã®ããã®éåžžã«å€§ããªãã¬ãŒã³ããŒã·ã§ã³ïŒ100æ以äžã®ã¹ã©ã€ãïŒ:)
- ã¢ã«ãã©SDK for OpenCLïŒããã°ã©ãã³ã°ã¬ã€ã
- ã¢ã«ãã©SDK for OpenCLïŒæé©åã¬ã€ã
- ã¢ã«ãã©SDK for OpenCLïŒãã¹ããã©ã¯ãã£ã¹ã¬ã€ã
æŽæ° ïŒ
èšäºã®ç¬¬2éšããªãªãŒã¹ãããŸããïŒ ã¢ã«ãã©+ OpenCLïŒã«ãŒãã«ãéããŸã ã