ãã®æçš¿ã§ã¯ãæçµå¿çãæã€ãã£ã«ã¿ãŒãå®è£ ããé¢æ°-FIRãã£ã«ã¿ãŒïŒæéã€ã³ãã«ã¹å¿çïŒãæ€èšããŸãã
FIRãã£ã«ã¿ãŒ
ãã£ã«ã¿ã¯ãããžã¿ã«ä¿¡å·åŠçã§æãéèŠãªåéã®1ã€ã§ãã ãããŠãã¡ãããIPPã©ã€ãã©ãªã«ã¯ãFIRïŒæéã€ã³ãã«ã¹å¿çïŒãã£ã«ã¿ãŒãå«ããããã®ãã£ã«ã¿ãŒã®ã»ãšãã©ã®ã¯ã©ã¹ã®å®è£ ããããŸãã FIRãã£ã«ã¿ãŒã®è©³çŽ°ãªèª¬æã¯ãå€æ°ã®æç®ãŸãã¯Wikipediaã§èŠã€ããããšãã§ããŸãããç°¡åã«èšãã°ãFIRãã£ã«ã¿ãŒã¯ãããã€ãã®ä»¥åã®ãµã³ãã«ãšå ¥åé¢æ£ä¿¡å·ã®çŸåšã®ãµã³ãã«ã«ãããã«å¯Ÿå¿ããä¿æ°ãåã«ä¹ç®ãããããã®è£œåãè¿œå ããŠãåºåä¿¡å·ã®çŸåšã®ãµã³ãã«ãåãåããŸãã ãŸãã¯ããå°ã圢åŒçã«ïŒFIRãã£ã«ã¿ãŒã¯ãé·ãNãµã³ãã«ã®å ¥åãã¯ãã«Xãé·ãNã®åºåãã¯ãã«Yã«å€æããŸããå ¥åãã¯ãã«ã®Kãµã³ãã«ã«å¯Ÿå¿ããKä¿æ°Hãä¹ç®ããããããå ç®ããŸãã ä¿æ°Kã®æ°ã¯ããã£ã«ã¿ãŒã®æ¬¡æ°ãšåŒã°ããŸãã
å³ 1. FIRãã£ã«ã¿ãŒ
ããã«ïŒ
tapsLenã¯ãã£ã«ã¿ãŒæ¬¡æ°ã
numItersã¯ãã¯ãã«ã®é·ãã§ãã
ãã®å³ã¯IPPã©ã€ãã©ãªã®ããã¥ã¡ã³ãããåãããŠãããããIPPã§åãå ¥ããããŠããçšèªã䜿çšãããŸãã
èŠèŠçã«ãFIRãã£ã«ã¿ãŒã¯æ¬¡ã®ããã«æ³åã§ããŸãã
å³ 2. FIRãã£ã«ã¿ãŒã®æŠç¥å³
ã芧ã®ãšãããããã§ãã£ã«ã¿ãŒæ¬¡æ°Kã¯4ã§ããã4ã€ã®ãã£ã«ã¿ãŒä¿æ°hã«ãã¯ãã«xã®4ã€ã®ãµã³ãã«ãä¹ç®ããåèšãåºåãã¯ãã«yã®1ã€ã®ãµã³ãã«ã«å ç®ããŠæžã蟌ã¿ãŸãã ãã£ã«ã¿ä¿æ°h [3]ãh [2]ãh [1]ãh [0]ã¯ãå³ã«äžè¬çã«åãå ¥ããããŠããåŒã«åŸã£ãŠãxãšyã«é¢ããŠéã®é åºã§ã¡ã¢ãªå ã«ããããšã«æ³šæããŠãã ããã 1
é 延ç·
FIRãã£ã«ã¿ãŒã¯éåžžã®ç³ã¿èŸŒã¿ã§ãããããé·ããNãµã³ãã«ã®åºåãã¯ãã«ãååŸããã«ã¯ãN + K-1åã®å ¥åãµã³ãã«ãå¿ èŠã§ãïŒKã¯ã³ã¢ã®é·ãïŒã æåã®K-1ãµã³ãã«ã¯ãé 延ã©ã€ã³ãïŒé 延ã©ã€ã³ïŒãšåŒã°ããŸãã å³ 2ãçªå·ã¯x [-3]ãx [-2]ãx [-1]ã§ãã é¢æ°ã«æäŸãããããŒã¿ã¯éåžžã«å€§ãããªãå¯èœæ§ãããããã®çµæãããŒã¿ã¯åå¥ã«é 次åŠçããããããã¯ã«åå²ã§ããŸãã ããšãã°ããªãŒãã£ãªä¿¡å·ã§ããå Žåããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã«ãã£ãŠãããã¡ãªã³ã°ã§ããŸããå€éšããã€ã¹ããã®ããŒã¿ã§ããå Žåãéä¿¡åç·ãä»ããŠéšåçã«åä¿¡ã§ããŸãã ãŸããå¯èœæ§ã®ããããŒã¿ã®éãäºåã«ããããªãããããããã¡ããã³ã¢ããªã±ãŒã·ã§ã³èªäœã§ããŒã¿ãåŠçã§ããŸãã ãã®å Žåãäœæ¥ãããã¡ãŒã«ã¯ç¹å®ã®åºå®é·ãå²ãåœãŠããããããããšãã°ãäžå®ã¬ãã«ã®ãã£ãã·ã¥ã«åãŸãããã¹ãŠã®ããŒã¿ããããã§ãã®ãããã¡ãŒãééããŸãã ãã®ãããªå Žåã¯ãã¹ãŠãé 延ç·ãéåžžã«åœ¹ç«ã¡ãŸãã ããŒã¿ããããã¯ã«åå²ããŠããšããžå¹æããªãããã«ãããŒã¿ã1ã€ã®é£ç¶ããã¹ããªãŒã ã«éåžžã«åçŽã«ãæ¥çãããã®ã«åœ¹ç«ã¡ãŸãã
IPP API
IPPã©ã€ãã©ãªã®é·å¹Žã®äœ¿çšçµéšããã次ã®èŠä»¶ãæºããããã«FIRãã£ã«ã¿APIãå€æŽããå¿ èŠãããããšãæããã«ãªããŸããã
- é 次ãããã¯ã§ãã¯ãã«ãåŠçããããšãå¯èœã§ãã
- é ãããã¡ã¢ãªå²ãåœãŠã¯ãããŸããã
- ç°ãªãã¹ã¬ããã§ã®ãã¯ãã«åŠçããµããŒããããŸãã
- ã€ã³ãã¬ãŒã¹ã¢ãŒãã¯èš±å®¹ãããŸãããã€ãŸããå ¥åãã¯ãã«ã¯åæã«åºåã§ãã
ããããã¹ãŠã®èŠä»¶ãåæã«æºããããã«ããå ¥åãããã³ãåºåãé 延ç·ã®æŠå¿µãå°å ¥ããããã®åŸãAPIã¯æ¬¡ã®ããã«ãªãå§ããŸããã
FIR Filter API
// Name: ippsFIRSRGetSize, ippsFIRSRInit_32f, ippsFIRSRInit_64f // ippsFIRSR_32f, ippsFIRSR_64f // Purpose: Get sizes of the FIR spec structure and temporary buffer // initialize FIR spec structure - set taps and delay line // perform FIR filtering // Parameters: // pTaps - pointer to the filter coefficients // tapsLen - number of coefficients // tapsType - type of coefficients (ipp32f or ipp64f) // pSpecSize - pointer to the size of FIR spec // pBufSize - pointer to the size of temporal buffer // algType - mask for the algorithm type definition (direct, fft, auto) // pDlySrc - pointer to the input delay line values, can be NULL // pDlyDst - pointer to the output delay line values, can be NULL // pSpec - pointer to the constant internal structure // pSrc - pointer to the source vector. // pDst - pointer to the destination vector // numIters - length of the destination vector // pBuf - pointer to the work buffer // Return: // status - status value returned, its value are // ippStsNullPtrErr - one of the specified pointer is NULL // ippStsFIRLenErr - tapsLen <= 0 // ippStsContextMatchErr - wrong state identifier // ippStsNoErr - OK // ippStsSizeErr - numIters is not positive // ippStsAlgTypeErr - unsupported algorithm type // ippStsMismatch - not effective algorithm. */ IppStatus ippsFIRSRGetSize (int tapsLen, IppDataType tapsType , int* pSpecSize, int* pBufSize ) IppStatus ippsFIRSRInit_32f( const Ipp32f* pTaps, int tapsLen, IppAlgType algType, IppsFIRSpec_32f* pSpec ) IppStatus ippsFIRSR_32f (const Ipp32f* pSrc, Ipp32f* pDst, int numIters, IppsFIRSpec_32f* pSpec, const Ipp32f* pDlySrc, Ipp32f* pDlyDst, Ipp8u* pBuf)
ãã®APIã¯ãIPPã§äœ¿çšãããæšæºã¹ããŒã ã«åŸããŸãã ãŸãã ippsFIRSRGetSizeé¢æ°ã䜿çšããŠãé¢æ°ã³ã³ããã¹ããšäœæ¥ãããã¡ãŒã®ã¡ã¢ãªãµã€ãºãèŠæ±ãããŸãã 次ã«ã ippsFIRSRInité¢æ°ãåŒã³åºãã ãããã«ãã£ã«ã¿ãŒä¿æ°ãæäŸãããŸãã ãã®é¢æ°ã¯ãpSpecæ§é äœã®å éšããŒã¿ããŒãã«ãåæåãã ippsFIRSRåŠçé¢æ°ã®æäœãå éããŸãã ãã®æ§é äœã®å 容ã¯ãé¢æ°ã®åäœäžã«å€åããããã®ååSpecã«åæ ãããŸãããããã£ãŠãè€æ°ã®ã¹ã¬ããã§åæã«äœ¿çšããŠãã¡ã¢ãªãããå¹ççã«äœ¿çšã§ããŸãã pBufãã©ã¡ãŒã¿ãŒã¯ãé¢æ°ã®äœæ¥çšããã³å€æŽå¯èœãªãããã¡ãŒã§ãããããåäœæ¥ãããã¡ãŒã¯ã¹ã¬ããããšã«å²ãåœãŠãå¿ èŠããããŸãã
ãµãã£ãã¯ã¹SRã¯ã·ã³ã°ã«ã¬ãŒããæå³ããMRïŒãã«ãã¬ãŒãïŒãã£ã«ã¿ãŒã®åäžæ§ã®ããã«äœ¿çšãããŸããMRãã£ã«ã¿ãŒã®èª¬æã¯å®å šã«å¥ã®èšäºã«ããããšãã§ããŸãã numItersãã©ã¡ãŒã¿ãŒãMRãã£ã«ã¿ãŒããååŸãããŸãããã®å Žåãåã«ãã¯ãã«ã®é·ããæå³ããŸãã
ãã©ã¡ãŒã¿pSrcã¯ãåŠçããããããã¯x [0]ã®å é ãæããŸãã
次ã«ãpDlySrcãã©ã¡ãŒã¿ãŒãšpDlyDstãã©ã¡ãŒã¿ãŒã®æå³ãèŠãŠã¿ãŸãããã
å³ 3.ãå ¥åãããã³ãåºåãé 延ç·
åè¿°ã®ããã«ãx [-3]ãx [-2]ãx [-1]ã®å¿ èŠæ§ã¯ãç³ã¿èŸŒã¿åŒã«ç±æ¥ããŸãã ãããã®èŠçŽ ã¯ãå ¥åé 延ç·ãpDlySrcãšåŒã°ããŸãã ãµã³ãã«x [N-3]ãx [N-2]ãx [N-1]ã¯åŠçããããã¯ãã«ã®ãããŒã«ãã§ããã€ãŸãã æåŸã®K-1ã¢ã€ãã ã ãããã¯ãpDlyDstãåºåé 延ç·ããšåŒã°ããŸãã 次ã®ãããã¯ã§ã¯ãããããå ¥åè¡ãªã©ã«ãªããŸãã
å ¥åé 延ã©ã€ã³pDlySrcã¯ãx [0]ã®å·Šã«ããk-1åã®ãµã³ãã«ãä»ã®ãããã¡ãŒããŸãã¯NULLãæãããšãã§ããŸãã NULLã®å Žåãå ¥åé 延ç·ã®ãã¹ãŠã®èŠçŽ ã0ã§ãããšæ³å®ãããŸããããã¯ãããŒã¿ããŸã ãªãåæãããã¯ã«äŸ¿å©ã§ãã
pDlyDstã¢ãã¬ã¹ã¯ããããã¯ã®ãããŒã«ããèšé²ããŸãã æåŸã®ãµã³ãã«ã®k-1ã å€ãNULLã®å Žåãäœãæžã蟌ãŸããŸããã
ãã®ãããª2ã€ã®é 延ç·ã®ã¡ã«ããºã ã«ãããã€ã³ãã¬ãŒã¹ã¢ãŒãã®å Žåã§ãããã¯ãã«ã®äžŠååŠçãå¯èœã«ãªããŸãã ãã¯ãã«ãäžæžãããããšãã ãããè¡ãã«ã¯ãæåã«ãããã¯ã®ãããŒã«ããåå¥ã®ãããã¡ãŒã«ã³ããŒããåã¹ããªãŒã ãžã®å ¥åè¡ãšããŠéä¿¡ããã ãã§ååã§ãã ãã®èšäºã§äœ¿çšãããŠããã³ãŒãã®äŸã¯ãæåŸã«1ã€ã®ãªã¹ãã§ç€ºãããŠããŸãã
ããŒãã¹IPP FIRãã£ã«ã¿ãŒã®äœ¿çšäŸã
ããšãã°ãä¿¡å·ã®äœåšæ³¢æåã®ã¿ãæ®ãããã«IPP FIRãã£ã«ã¿ãŒã䜿çšããæ¹æ³ãæ€èšããŠãã ããã
å ã®ãã£ã«ã¿ãŒãããŠããªãä¿¡å·ãçæããã«ã¯ãç¹å¥ãªIPPé¢æ°Jaehneã䜿çšããŸãã
pDst [n] = magn * sinïŒïŒ0.5Ïn2ïŒ/ lenïŒã0â€n <len
ãã®æ©èœã¯ãå€ãã®IPPæ©èœããã¹ããããŠããäž»å補åã§ãã çæãããä¿¡å·ãæãåçŽãª.csvãã¡ã€ã«ã«æžã蟌ã¿ãExcelã§ç»åãæç»ããŸãã å ã®ä¿¡å·ã¯æ¬¡ã®ããã«ãªããŸãã
å³ 4. 128 Jaehneä¿¡å·ãµã³ãã«
ããšãã°ã次æ°31ã®ãã£ã«ã¿ãŒãèããŸããä¿æ°ãçæããã«ã¯ãIPPé¢æ°ippsFIRGenLowpass_64fã䜿çšãããŸãã ãã®é¢æ°ã¯ä¿æ°ãdoubleã§ã®ã¿èšç®ãããããfloatã«å€æãããŸãã ä»é²ã®firgenlowpassïŒïŒé¢æ°ã³ãŒããåç §ããŠãã ããã ãã®é¢æ°ãåŒã³åºããåŸããããã¡ãŒãµã€ãºãåæåãããã³ã¡ã€ã³é¢æ°ippsFIRSRã®åŒã³åºããèšç®ããããã®ããã©ãŒãã³ã¹ã枬å®ãããŸãã
ããŒãã¹ãã£ã«ã¿ãŒãé©çšããåŸãä¿¡å·ã«äœåšæ³¢æåãæ®ããŸããã äœçžãã·ããããŠããããšã«æ³šæããŠãã ããããã ããããã¯ãã§ã«FIRãã£ã«ã¿ãŒèªäœã®ããããã£ã«åŸã£ãŠãããIPPã©ã€ãã©ãªã«ã¯é©çšãããŸããã
å³ 5.128ããŒãã¹ãã£ã«ã¿ãŒåŸã®Jaehneä¿¡å·ãµã³ãã«
ãããã®å³ã§ã¯ãFIRãã£ã«ã¿ãŒã¯128ãµã³ãã«ãåŠçããŸããå ¥åé 延ã©ã€ã³ã®30ãµã³ãã«ã¯0ã«èšå®ãããpDlySrc = NULLã瀺ããŸãã åºåè¡pDlyDst = NULLãå¿ èŠãããŸããã
ãã«ãã¹ã¬ããã®ããã©ãŒãã³ã¹
IPPã©ã€ãã©ãªãŒã®ååã«ã¯ããã©ãŒãã³ã¹ãšããèšèããããããã¯æåç·ã«ãããŸãã ãããã£ãŠãAVX2ããµããŒãããããã»ããµã§ã®ippFIRSRé¢æ°ã®ããã©ãŒãã³ã¹ã枬å®ããŸãã ãã®åŸãOpenMPã䜿çšããŠæ¬¡ã®ãã«ãã¹ã¬ããã³ãŒããå®è£ ãã枬å®ãã枬å®çµæã1ã€ã®ã°ã©ãã«ãŸãšããŸãã
FIRãã£ã«ã¿ãŒAPIã¯ãå³ã«ç€ºãããã«ããã¯ãã«ãè€æ°ã®ã¹ããªãŒã ã«åå²ããããšãåçŽãã€è«ççã§ããããã«èšèšãããŸããã
å³ 6.ã¹ã¬ããéã§å ã®ãã¯ãã«ãåå²ãã
ã¹ããªãŒã éã§ãã¯ãã«ãåå²ãã次ã®æ¹æ³ãæ瀺ãããŠããŸããfir_ompé¢æ°ãåç §ããŠãã ããã
Fir_ompã³ãŒã
void fir_omp(Ipp32f* src, Ipp32f* dst, int len, int order, IppsFIRSpec_32f* pSpec, Ipp32f* pDlySrc, Ipp32f* pDlyDst, Ipp8u* pBuffer) { int tlen, ttail; tlen = len / NTHREADS; ttail = len % NTHREADS; #pragma omp parallel num_threads(NTHREADS) { int id = omp_get_thread_num(); Ipp32f* s = src + id*tlen; Ipp32f* d = dst + id*tlen; int len = tlen + ((id == NTHREADS-1) ? ttail : 0); Ipp8u* b = pBuffer + id*bufSize; if (id == 0) ippsFIRSR_32f(s, d, len, pSpec, pDlySrc, NULL, b); else if (id == NTHREADS - 1) ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), pDlyDst, b); else ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), NULL, b); } }
ãã®ã³ãŒãã®æ©èœãæ€èšããŠãã ããã ãã®ããããã£ã«ã¿ãŒã®åŠçãå¿ èŠãªä¿¡å·x [0]ã...ãx [N-1]ã®æ¬¡ã®éšåãšãå ¥åããã³åºåé 延ã©ã€ã³ãžã®ãã€ã³ã¿ãŒãã€ãŸãåã®éšåãšãããã¡ãŒã®ããŒã«ãåãåããŸãããçŸåšã®éšåã®ãå°Ÿããé 眮ããŸãã ãã£ã«ã¿ãªã³ã°ããã»ã¹ãé«éåãããã®éšåã®åŠçãã¹ã¬ããæ°ã«å¯Ÿå¿ããT = NTHREADSãããã¯ã«åå²ããŸãã ãããè¡ãã«ã¯ãå ¥åè¡ãšåºåè¡ãæ£ããæå®ããåã¹ããªãŒã ã«äœæ¥ãããã¡ãŒãå²ãåœãŠãã ãã§ãã
0çªç®ã®ã¹ããªãŒã ã®å Žåã ippsFIRSRãåŒã³åºããããšãã®å ¥åé 延ã©ã€ã³ã¯åã®éšåãšåããããŒã«ãã§ãããä»ã®ãã¹ãŠã®å Žåãorder-1èŠçŽ ã«ãã£ãŠã·ããããããããã¯ãžã®ãã€ã³ã¿ãŒãå ¥åã©ã€ã³ãšããŠæäŸãããŸãã ãããŠãæåŸã®ã¹ããªãŒã ã®ã¿ãéšåã®ãããŒã«ããæžã蟌ã¿ãŸãã
äžèšã®ã¢ãããŒãã¯ãçµæã®ãã¯ãã«ãå ã®ãã¯ãã«ãšã¯ç°ãªãã¢ãã¬ã¹ã«æžã蟌ãŸããããšãæå³ããŸããããŒã¿ãäžæžããããå Žåãé 延ç·ã¯äºåã«å¥ã®ãããã¡ã«ã³ããŒããå¿ èŠããããŸãã
ãã®ã°ã©ãã¯ãAVX2Intel®CoreïŒTMïŒi7-4770K 3.50Ghzåœä»€ããµããŒãããããã»ããµãŒäžã®4次31ãã£ã«ã¿ãŒã¹ã¬ããã®ã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ã®ããã©ãŒãã³ã¹ã瀺ããŠããŸãã FIRãã£ã«ã¿ãŒã®å ŽåãcpMACãŠãããã䜿çšãããŸãã æäœããšã®ã¡ãžã£ãŒæ°ä¹ç®+å ç®
cpMAC =ïŒé¢æ°å®è¡æéïŒ/ïŒãã¯ãã«é·*ãã£ã«ã¿ãŒæ¬¡æ°ïŒ
å³ 7. FIRãã£ã«ã¿ãŒã®ã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ã®ããã©ãŒãã³ã¹ã®æ¯èŒ
é¢æ°ã®ã¹ã±ãŒãªã³ã°ã¯éåžžã«ããããã«ãã¹ã¬ããããŒãžã§ã³ã¯ã4ã¹ã¬ããã«éåžžã«ãã察å¿ããã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãããååã«é·ããã¯ãã«ã§çŽ3.7åé«éã«åäœããããšãããããŸãã æ°ããAPIã䜿çšããŠãã·ã³ã°ã«ã¹ã¬ããããŒãžã§ã³ãšãã«ãã¹ã¬ããããŒãžã§ã³ãåãæ¿ããããã®åºæºã¯ãç¹å®ã®ãã·ã³ã«å¯ŸããŠå®éšçã«éžæã§ããŸãã以åã®ãã·ã³ãšã¯ç°ãªããåºæºã¯ã³ãŒãã«çµã¿èŸŒãŸããé¢æ°ã¯å éšãã䞊åã§ããã
çŽæ¥å®è£ ãšFFTå®è£ ã®æ¯èŒ
ããžã¿ã«ä¿¡å·åŠçã§ã¯ãç³ã¿èŸŒã¿ãšããŒãªãšå€æã®çžäºãããã³ã°ãåºã䜿çšãããŠããŸãã
çŽæ¥å®è£ ã«å ããŠãIPP FIRãã£ã«ã¿ãŒã«ã¯FFTãä»ããå®è£ ããããçµæã®cpMACã¯ãç¹å®ã®CPUããã³çŽæ¥ã¢ã«ãŽãªãºã ã§çè«çã«å¯èœãªå€ãè¶ ããããšããããŸãã
ããã§ã䜿çšããã¢ã«ãŽãªãºã ã®ã¿ã€ãã瀺ãããã«ãalgTypeãã©ã¡ãŒã¿ãŒã®å€ã®1ã€-ippAlgDirect ippAlgFFTãippAlgAutoã䜿çšããå¿ èŠããããŸãã æåŸã®ãã©ã¡ãŒã¿ãŒã¯ã䜿çšãããCPUã®åºå®åºæºã«åŸã£ãŠé¢æ°ãã¢ã«ãŽãªãºã ãéžæããããšãæå³ããåžžã«æé©ãšã¯éããŸããã
çŽæ¥ã¢ã«ãŽãªãºã ãšFFTå®è£ ã䜿çšããŠã1024ããã³128ãµã³ãã«ã®ãã¯ãã«é·ã®ç°ãªã次æ°ã®ãã£ã«ã¿ãŒã®åãCPUã§ã®ããã©ãŒãã³ã¹ãèæ ®ããŠãã ããã
å³ 8. 1024ãµã³ãã«ã®é·ãã§ã®çŽæ¥å®è£ ãšfftå®è£ ã®ããã©ãŒãã³ã¹ã®æ¯èŒ
å³ 9. 128ãµã³ãã«ã®é·ãã§ã®çŽæ¥å®è£ ãšfftå®è£ ã®ããã©ãŒãã³ã¹ã®æ¯èŒ
FFTã®å®è£ ã¯ãã¹ãããã«ãã£ãŠç¹åŸŽä»ããããŸãã ããã¯ãããã€ãã®è¿ã次æ°ã®ãã£ã«ã¿ãŒã§ã¯ãåã次æ°ã®FFTã䜿çšãããFFTã®æ¬¡ã®æ¬¡æ°ãžã®é·ç§»ããªã³ã«ãªããšãããã©ãŒãã³ã¹ãå€åããããã§ãã æ倧ã®ããã©ãŒãã³ã¹ãå®çŸããã«ã¯ãã°ã©ãã®äžã«ããã¢ã«ãŽãªãºã ã䜿çšããå¿ èŠããããŸãã ææ¡ãããAPIã䜿çšãããšãäž¡æ¹ã®ããŒãžã§ã³ã®ã¢ã«ãŽãªãºã ãå®è¡ããŠç¹å®ã®ãã·ã³ã§æž¬å®ããæé©ãªãã®ãéžæããäŸãå®è£ ã§ããŸãã åçã¯æ¬¡ã®ããã«ãªããŸãã ãã®å³ã§ã¯ãX軞ã«æ²¿ã£ããã£ã«ã¿ãŒæ¬¡æ°ãšY軞ã«æ²¿ã£ããã¯ãã«ã®é·ãã®1024x1024ã®ãµã€ãºã®2次å 空éãæãããŠããŸãã ç·è²ã¯ãfftã¢ã«ãŽãªãºã ãçŽæ¥ããŒãžã§ã³ãããé«éã§ããããšãæå³ããŸãã å³ã®äžéšã«ããç¹åŸŽçãªçŽç·ã¯å³ã«å¯Ÿå¿ããŠããŸãã 9ã次ã®é åºã«åãæ¿ããåŸãfftãªãã·ã§ã³ã®åäœããã°ããé ããªããŸãã
å³ 10. 1024 x 1024ã®ãã£ã«ã¿ãŒç©ºéXãã¯ãã«é·æ¬¡å ã§ã®IPP FIRãã£ã«ã¿ãŒãããŒãå®è£ ã®çŽæ¥ããã©ãŒãã³ã¹ãšfftããã©ãŒãã³ã¹ã®æ¯èŒ
ãã®å³ã¯éåžžã«è€éã§ãããä»»æã®ãã©ãããã©ãŒã ã§IPPå ã«è£éããããšã¯ããã»ã©å®¹æã§ã¯ãªãããšãããããŸãã ããã«ããã®ãã¿ãŒã³ã¯ç¹å®ã®ãã·ã³ã«ãã£ãŠç°ãªãå ŽåããããŸãã çŽæ¥ã³ãŒããšfftã³ãŒãã®éžæã«å ããŠãã¹ããªãŒã æ°ã®åœ¢åŒã§å¥ã®æ¬¡å ãè¿œå ã§ããŸããããã«ãããå€å±€çãªç»åãåŸãããŸãã ãã®å Žåããææ¡ãããAPIã«ããããã®ãã©ãããã©ãŒã ãªãã·ã§ã³ã«æé©ãªãªãã·ã§ã³ãéžæã§ããŸãã
ãããã«
IPP 9.0ã§å°å ¥ãããFIRãã£ã«ã¿ãŒAPIã䜿çšãããšãçŽæ¥ã¢ã«ãŽãªãºã ãšfftã¢ã«ãŽãªãºã ããæé©ãªãªãã·ã§ã³ãéžæããéžæããåãªãã·ã§ã³ã䞊ååããããšã§ãã¢ããªã±ãŒã·ã§ã³ã§ããã«å¹ççã«äœ¿çšã§ããŸãã ããã«ãIPPã©ã€ãã©ãªã¯å®å šã«ç¡æã§ããã®ãªã³ã¯ããããŠã³ããŒãã§ããŸãIntel Performance PrimitivesïŒIPPïŒã
ã¢ããªã±ãŒã·ã§ã³ã IPP FIRãã£ã«ã¿ãŒã®ããã©ãŒãã³ã¹ã枬å®ãããµã³ãã«ã³ãŒã
ãµã³ãã«ã³ãŒã
#include <stdio.h> #include <math.h> #include <omp.h> #include "ippcore.h" #include "ipps.h" #include "bmp.h" void save_csv(Ipp32f* pSrc, int len, char* fName) { FILE *fp; int i; if((fp=fopen(fName, "w"))==NULL) { printf("Cannot open %s\n", fName); return; } for (i = 0; i < len; i++){ fprintf(fp, "%.3f\n", pSrc[i]); } fclose(fp); } Ipp32f* pSrc; Ipp32f* pDft; Ipp32f* pDst; Ipp32f* pTaps; Ipp64f rFreq = 0.2; int bufSize; int NTHREADS = 1; IppAlgType algType = ippAlgDirect; void firgenlowpass(int order) { IppStatus status; Ipp8u* pBuffer; Ipp64f* pTaps_64f; int size; int i; status = ippsFIRGenGetBufferSize(order, &size); pBuffer = ippsMalloc_8u(size); pTaps_64f = ippsMalloc_64f(order); ippsFIRGenLowpass_64f(rFreq, pTaps_64f, order, ippWinBartlett, ippTrue, pBuffer); for (i = 0; i < order;i++) { pTaps[i] = pTaps_64f[i]; } ippsFree(pTaps_64f); } void fir_omp(Ipp32f* src, Ipp32f* dst, int len, int order, IppsFIRSpec_32f* pSpec, Ipp32f* pDlySrc, Ipp32f* pDlyDst, Ipp8u* pBuffer) { int tlen, ttail; tlen = len / NTHREADS; ttail = len % NTHREADS; #pragma omp parallel num_threads(NTHREADS) { int id = omp_get_thread_num(); Ipp32f* s = src + id*tlen; Ipp32f* d = dst + id*tlen; int len = tlen + ((id == NTHREADS-1) ? ttail : 0); Ipp8u* b = pBuffer + id*bufSize; if (id == 0) ippsFIRSR_32f(s, d, len, pSpec, pDlySrc, NULL, b); else if (id == NTHREADS - 1) ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), pDlyDst, b); else ippsFIRSR_32f(s, d, len, pSpec, s - (order - 1), NULL, b); } } void perf(int len, int order, float* cpMAC) { IppStatus status; IppsFIRSpec_32f* pSpec; Ipp8u* pBuffer; int specSize; Ipp32f* pDlySrc = NULL;/*initialize delay line with "0"*/ Ipp32f* pDlyDst = NULL;/*don't write output delay line*/ __int64 beg=0, end=0; int i, loop = 10000; /*allocate memory for input and output vectors*/ pSrc = ippsMalloc_32f(len); pDst = ippsMalloc_32f(len); pTaps = ippsMalloc_32f(order); /*create special vector Jaehne*/ ippsVectorJaehne_32f(pSrc, len, 128); /*get lowpass filter coeffs*/ firgenlowpass(order); /*get necessary buffer sizes for pSpec and for pBuffer*/ status = ippsFIRSRGetSize(order, ipp32f, &specSize, &bufSize); /*allocate memory for pSpec*/ pSpec = (IppsFIRSpec_32f*)ippsMalloc_8u(specSize); /*for N threads bufSize should be multiplied by N*/ /*allocate bufSize*NTHREADS bytes*/ pBuffer = ippsMalloc_8u(bufSize*NTHREADS); /*initalize pSpec*/ status = ippsFIRSRInit_32f(pTaps, order, algType, pSpec); /*apply FIR filter*/ /*start measurement for sinle threaded*/ if (NTHREADS == 1){ ippsFIRSR_32f(pSrc, pDst, len, pSpec, pDlySrc, pDlyDst, pBuffer); beg = __rdtsc(); for (int i = 0; i < loop; i++) { ippsFIRSR_32f(pSrc, pDst, len, pSpec, pDlySrc, pDlyDst, pBuffer); } end = __rdtsc(); } else { fir_omp(pSrc, pDst, len, order, pSpec, pDlySrc, pDlyDst, pBuffer); beg = __rdtsc(); for (int i = 0; i < loop; i++) { fir_omp(pSrc, pDst, len, order, pSpec, pDlySrc, pDlyDst, pBuffer); } end = __rdtsc(); } *cpMAC = ((double)(end - beg) / ((double)loop * (double)len * (double)order)); printf("%5d, %5d, %3.3f\n", len, order, *cpMAC); ippsFree(pSrc); ippsFree(pDst); ippsFree(pTaps); ippsFree(pSpec); ippsFree(pBuffer); } int main() { int len = 32768; int order; float cpMAC; NTHREADS = 1; algType = ippAlgDirect; //algType = ippAlgFFT; len = 128; printf("\nthreads: %d\n", NTHREADS); printf("len, order, cpMAC\n\n"); for (order = 1; order <= 512; order++){ perf(len, order, &cpMAC); } return 0; }