ãã®èšäºã¯ã24.11ã«éå¬ãããäŒè°ãžã®ã¬ããŒããšããŠå§ãŸããŸããã iPhoneã®æé©åã«é¢ããè±å¯ãªãã³ããããã«ãããŸã ã æ¬¡ã®èšäºã§ã¯ããã®ãã¬ãŒã³ããŒã·ã§ã³ã®å 容ã®å¹ ãšæ·±ããæ¡å€§ããŸãã
NEONãšã¯äœã§ããïŒ NEONã¯ãARMããã»ããµã§äœ¿çšãããæ±çšSIMDãšã³ãžã³ã§ãã ããŒãäžã«ã¯ããããã128ãããã®16åã®ã¬ãžã¹ã¿ãããã64ãããã®32åã®ã¬ãžã¹ã¿ãšèŠãªãããšãã§ããŸãã NEONã¯ç¬èªã®ãã€ãã©ã€ã³ãæã£ãŠããŸãããVFPãšã¬ãžã¹ã¿ãå ±æããŠããŸãã SSEãšåæ§ã«ãããŒã¿ã¯16ãã€ãã§æŽåããå¿ èŠããããŸãã NEONã¯ãéå¢çæŽåããŒã¿ã®æäœæ¹æ³ãç¥ã£ãŠããŸãããéåžžã¯2åé ããªããŸãã
NEONã¯ä»¥äžã§åäœããŸãïŒ
- 笊å·ä»ã\笊å·ãªã8 \ 16 \ 32 \ 64ãããæŽæ°ããŒã¿åã
- å粟床浮åå°æ°ç¹æ°-32ãããæµ®åå°æ°ç¹ã
ã²ãŒã ãªã©ã®ãã«ãã¡ãã£ã¢ã¿ã¹ã¯ã«æé©ã§ãã
äž»ãªãã®ããå§ããŸããã-çŸä»£ã®ãã¹ãŠã®ã¢ãã€ã«ã·ã¹ãã ã®äžå¿ãã·ã¹ãã ãªã³ããããŸãã¯SoCïŒSystem on ChipïŒã iOS Aããã€ã¹ã¯ããããäžã®Apple Aã·ãªãŒãºã®ã·ã¹ãã -A4ãA5ãA5xãA6ãããã³A6xã䜿çšããããšãç¥ãããŠããŸãã ãããã®ãããã®æãéèŠãªä»æ§ã衚ã«ç€ºããŸãã
CPUã®ä»æ§ | A4 | A5 | A5x | A6 |
---|---|---|---|---|
å»ºç¯ | ARMv7 | ARMv7 | ARMv7 | ARMv7 |
ã³ã¢ | ç®è³ªa8 | ç®è³ªa9 | ç®è³ªa9 | ç¬èªã®éçº |
ïŒã³ã¢ | 1 | 2 | 2 | 2 |
åšæ³¢æ°ãMHz | 800 | 1000 | 1000 | 1300 |
æ¡åŒµæ©èœ | VFPv3ïŒVFPLiteïŒãNEON | VFPv3ãNEON | VFPv3ãNEON | VFPv4ãNEON |
GPUã®ä»æ§ | ||||
ã¢ãã« | PowerVR SGX 535 | PowerVR SGX 543MP2 | PowerVR SGX 543MP4 | PowerVR SGX 543MP3 |
åšæ³¢æ°ãMHz | 200 | 200 | 200 | 266 |
NEONã®åšæ³¢æ°ã¯GPUã«æ¯ã¹ãŠ5åé«ãããšãããããŸãã ãã¡ãããããã¯ãIPCããã€ãã©ã€ã³ãªã©ãGPUãšæ¯èŒããŠããã©ãŒãã³ã¹ã5ååäžãããšããæå³ã§ã¯ãããŸããã éèŠã§ãã ãã ããNEONã«ã¯1ã€ã®æ©èœãã©ãŒããããŸãã4ã€ã®32ããããããŒããåæã«åŠçã§ããŸãããPowerVR SGXã¯1ã€ã ãã§ãã GPUã¯4ã€ã®å粟床浮åå°æ°ç¹æ°ïŒ16ãããïŒãåæã«åŠçã§ãããããPowerVR SGX 5ã·ãªãŒãºSIMDã¬ãžã¹ã¿ã®é·ãã¯64ãããã®ããã§ãã äŸãèããŠã¿ãŸãããïŒ
highp vec4 v1, v2; highp float s1, s2; // v2 = (v1 * s1) * s2; //v1 * s1 â 4 , s2, - 4 . //8 // v2 = v1 * (s1 * s2); //s1 * s2 â 1 ; * v1 â 4 . //5
次ã«ãGPUãã¯ãã«ãšã³ãžã³ã§å®è¡ãããå¥ã®äŸãèããŸãã
mediump vec4 v1, v2, v3; highp vec4 s1, s2, s3; v3 = v1 * v2; // â 1 s3 = s1 * s2; // â 4
ããšãã°ãé ç¹ã®äœçœ®ãªã©ãããŒã¿ã®highpæå®åãå¿ èŠã«ãªããŸãã NEONããã®å©çã¯ããã§æçœã§ãã
次ã«ãNEONã®å¥ã®å©ç¹ã«ç§»ããŸãããã PowerVR SGX 5ã·ãªãŒãºã«ã¯ãåŠçããã·ã§ãŒããŒã®çš®é¡ãé ç¹ããã¯ã»ã«ãåããªãã·ã§ãŒããŒããã»ããµã§ããUSSEãæèŒãããŠããŸãã ã€ãŸããããã°ã©ããŒã«ã¯äžå®ã®é»åããžã§ããããããé ç¹åŠçã«è²»ããããã¯ã»ã«åŠçã«è²»ãããã¯ããã°ã©ããŒæ¬¡ç¬¬ã§ãã ããã§NEONãå©ãã«ãªããŸã-ãããæ°ããé ç¹ããã»ããµã§ãã ããã«ãããŒã«ãã§ã€ã¹ãæ¿å ¥ããã®ãå¿ãããšæããããããŸããããããã¯ãã¹ãŠéåžžã«æ·±å»ã§ãã ã»ãŒãã¹ãŠã®ã¢ãã€ã«ã·ã¹ãã ã®ããã©ãŒãã³ã¹ã¯ãç¹ã«2Dã²ãŒã ãç¹ã«æè¿ã®ç»é¢è§£å床ã®ç«¶äºã«ãããŠããã£ã«ã¬ãŒãã«ãã£ãŠå¶éãããŸãã ãã¹ãŠã®é ç¹åŠçãNEONã«è»¢éãããšããã¯ã»ã«åŠççšã®ãªãœãŒã¹ãè§£æŸãããŸãã ããã«å ããŠãNEONã¯æç»åŒã³åºãã®åæ°ãæžããã®ã«åœ¹ç«ã¡ãŸã-1ã€ã®ãããã®ãã¹ãŠã®é ç¹ã®äœçœ®ãã¯ãŒã«ã座æšã§èšç®ãã1ã€ã®åŒã³åºãã§Nåã®ãªããžã§ã¯ããæç»ããŸãã
çè«ã¯çµãããŸããïŒ ãããããŒãã³ã¢ãå§ããŸãããïŒ NEONãæŽ»çšããæ¹æ³ã¯ããã€ããããŸãã
- ã³ã³ãã€ã©ã«ã³ãŒãããã¯ãã«åãããŸãã æªãæ¹æ³ã ã³ã³ãã€ã©ãŒã¯ãã¯ãã«åãããããã¯ãã«åããªãå ŽåããããŸãã ã³ã³ãã€ã©ãã³ãŒãããã¯ãã«åããŠãããããæè¯ã®ã³ãŒãã«ãªããšããäºå®ããã¯ã»ã©é ãã ããããäžæ¹ã§ããã®æ¹æ³ã¯ããªãã®åŽã§ã®åªåãå¿ èŠãšãããå©çãåŸãããšãã§ããŸãã ããã§ããç²ç®çã«ã³ã³ãã€ã©ã«é Œãã®ã§ã¯ãªããå°ãªããšãæãéèŠãªã³ãŒããæåã§ãã¯ãã«åããå¿ èŠããããŸãã
- NEONã¢ã»ã³ãã©ãŒã ãããŠãããã§åœŒã¯ããŒãã³ã¢ã§ãã çã®ãžã§ãã€ãšãã®ãã¹ãŠã®éã ããŒã¯ããžãã¯ãåŠã³ãARMã®ããã¥ã¢ã«ã«äžæ©ãè²»ããå¿ èŠããããŸãã NEONã³ãŒãã¯ARMã¢ãŒããšThumb-2ã¢ãŒãã®äž¡æ¹ã§æ©èœããããšã«ãçæããŠãã ããã
- NEONçµã¿èŸŒã¿é¢æ°ïŒx86ã®SSEãšåãïŒã ã³ã³ãã€ã©ãæå®ããããã®ãæãã«æ¿å ¥ããã¢ã»ã³ãã©ãšã¯ç°ãªããçµã¿èŸŒã¿é¢æ°ã¯æé©åãããŸãã 圌ããšäžç·ã«æ®ããããšã¯ã¯ããã«ç°¡åã§ã-åœä»€ã®ã¿ã€ãã³ã°ãç ç©¶ãããããã€ãã®åæ»ãé¿ããããã«ããããã·ã£ããã«ããå¿ èŠã¯ãããŸãã
- æ¢ã«ãã¯ãã«åãããã³ãŒã-GLKMathãæ°åŠããªã³ã§äœ¿çšããŸãã
ããããã®æ¹æ³ã®é·æãšçæããã¹ãŠçºèŠããæãæ¥ãŸããã ãããè¡ãããã«ãåçŽãªãã¢ãäœæããŸããã10,000åã®ã¹ãã©ã€ãã®åãã¬ãŒã ã¯ãç»é¢å ã§äœçœ®ãã©ã³ãã ã«å€æŽããŸãã ç®æšã¯ãæå°éã®CPUè² è·ã§æéã®ã³ãŒããååŸããããšã§ããçµå±ã®ãšãããã²ãŒã ã§ã¯ãã¬ã³ããªã³ã°çšã®ããŒã¿ã«å ããŠãå€ããã«ãŠã³ãããå¿ èŠããããŸãã
ãã¹ãŠã®ããŒã¿ã¯1ã€ã®VBOã«ä¿åãããŸãã Updateã¡ãœããã¯ãå°åœ±è¡åã«ã©ã³ãã ãªäœçœ®ã®ModelViewè¡åãä¹ç®ããŸãã æ¬¡ã«ãåã¹ãã©ã€ãã®åé ç¹ã«ãçµæã®ModelViewProjectionãããªãã¯ã¹ãä¹ç®ãããŸãã åé ç¹ã®æçµäœçœ®ã¯ãåã«é ç¹ã·ã§ãŒããŒã®gl_Positionã«æž¡ãããŸãã ãã¹ãŠã®ããŒã¿ã¯16ãã€ãã®å¢çã«æããããŸãã
ã¡ãœããæŽæ°ã³ãŒãïŒ
void Update() { GLKMatrix4 modelviewMat = { 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, }; const u32 QUADS_COUNT = 10000; const u32 VERTS_PER_QUAD = 4; const float Y_DELTA = 420.0f / QUADS_COUNT; // Y float vertDelta = Y_DELTA; for (int i = 0; i < QUADS_COUNT * VERTS_PER_QUAD; i += VERTS_PER_QUAD) { float randX = random() % 260; // modelviewMat.m[12] = randX; modelviewMat.m[13] = vertDelta; float32x4x4_t mvp; Matrix4ByMatrix4((float32x4x4_t*)proj.m, (float32x4x4_t*)modelviewMat.m, &mvp); for (int j = 0; j < 4; ++j) { Matrix4ByVec4(&mvp, &squareVertices[j], &data[i + j].pos); } vertDelta += Y_DELTA; } glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer); glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_STREAM_DRAW); }
ããŠãä»åºŠã¯ãã®èšäºã®æ¬è³ªã§ããã³ãŒãã®ãã¯ãã«åã«ã€ããŠèª¬æããŸãã æ¬¡ã«ãã²ãŒã éçºã§æãé »ç¹ã«äœ¿çšãããæäœã®3ã€ã®æ¯èŒã¢ãããŒãã§äœ¿çšãããã³ãŒãã瀺ããŸãããã¯ãã«ã«ããè¡åä¹ç®ãšè¡åã«ããè¡åä¹ç®ã§ãã
GLKMathã䜿çšããã³ããŒããŒã¹ãïŒ
static __inline__ GLKVector4 GLKMatrix4MultiplyVector4(GLKMatrix4 matrixLeft, GLKVector4 vectorRight) { float32x4x4_t iMatrix = *(float32x4x4_t *)&matrixLeft; float32x4_t v; iMatrix.val[0] = vmulq_n_f32(iMatrix.val[0], (float32_t)vectorRight.v[0]); iMatrix.val[1] = vmulq_n_f32(iMatrix.val[1], (float32_t)vectorRight.v[1]); iMatrix.val[2] = vmulq_n_f32(iMatrix.val[2], (float32_t)vectorRight.v[2]); iMatrix.val[3] = vmulq_n_f32(iMatrix.val[3], (float32_t)vectorRight.v[3]); iMatrix.val[0] = vaddq_f32(iMatrix.val[0], iMatrix.val[1]); iMatrix.val[2] = vaddq_f32(iMatrix.val[2], iMatrix.val[3]); v = vaddq_f32(iMatrix.val[0], iMatrix.val[2]); return *(GLKVector4 *)&v; } static __inline__ GLKMatrix4 GLKMatrix4Multiply(GLKMatrix4 matrixLeft, GLKMatrix4 matrixRight) { float32x4x4_t iMatrixLeft = *(float32x4x4_t *)&matrixLeft; float32x4x4_t iMatrixRight = *(float32x4x4_t *)&matrixRight; float32x4x4_t m; m.val[0] = vmulq_n_f32(iMatrixLeft.val[0], vgetq_lane_f32(iMatrixRight.val[0], 0)); m.val[1] = vmulq_n_f32(iMatrixLeft.val[0], vgetq_lane_f32(iMatrixRight.val[1], 0)); m.val[2] = vmulq_n_f32(iMatrixLeft.val[0], vgetq_lane_f32(iMatrixRight.val[2], 0)); m.val[3] = vmulq_n_f32(iMatrixLeft.val[0], vgetq_lane_f32(iMatrixRight.val[3], 0)); m.val[0] = vmlaq_n_f32(m.val[0], iMatrixLeft.val[1], vgetq_lane_f32(iMatrixRight.val[0], 1)); m.val[1] = vmlaq_n_f32(m.val[1], iMatrixLeft.val[1], vgetq_lane_f32(iMatrixRight.val[1], 1)); m.val[2] = vmlaq_n_f32(m.val[2], iMatrixLeft.val[1], vgetq_lane_f32(iMatrixRight.val[2], 1)); m.val[3] = vmlaq_n_f32(m.val[3], iMatrixLeft.val[1], vgetq_lane_f32(iMatrixRight.val[3], 1)); m.val[0] = vmlaq_n_f32(m.val[0], iMatrixLeft.val[2], vgetq_lane_f32(iMatrixRight.val[0], 2)); m.val[1] = vmlaq_n_f32(m.val[1], iMatrixLeft.val[2], vgetq_lane_f32(iMatrixRight.val[1], 2)); m.val[2] = vmlaq_n_f32(m.val[2], iMatrixLeft.val[2], vgetq_lane_f32(iMatrixRight.val[2], 2)); m.val[3] = vmlaq_n_f32(m.val[3], iMatrixLeft.val[2], vgetq_lane_f32(iMatrixRight.val[3], 2)); m.val[0] = vmlaq_n_f32(m.val[0], iMatrixLeft.val[3], vgetq_lane_f32(iMatrixRight.val[0], 3)); m.val[1] = vmlaq_n_f32(m.val[1], iMatrixLeft.val[3], vgetq_lane_f32(iMatrixRight.val[1], 3)); m.val[2] = vmlaq_n_f32(m.val[2], iMatrixLeft.val[3], vgetq_lane_f32(iMatrixRight.val[2], 3)); m.val[3] = vmlaq_n_f32(m.val[3], iMatrixLeft.val[3], vgetq_lane_f32(iMatrixRight.val[3], 3)); return *(GLKMatrix4 *)&m; }
Appleã®ãããã®æäœã®å®è£
ã§ã¯ãå€ãã倿°ã転éãã倿°ãã³ããŒãããšãããæé©ãªã¢ãããŒããšã¯ã»ã©é ãæ¹æ³ã䜿çšããŠããããšãããããŸãã å°ãªããšããããã°ã¢ã»ã³ããªã§ã¯ãããªãé
ãèŠããŸãã ãããã¡ã€ãªã³ã°äžã«ãã®ã³ãŒããã©ã®ããã«è¡šç€ºãããããèŠãŠã¿ãŸãããã
ã¢ã»ã³ãã©ãŒã®ã¢ãããŒãïŒ
inline void Matrix4ByVec4(float32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__ vec, float32x4_t* __restrict__ result) { asm ( "vldmia %0, { d24-d31 } \n\t" "vld1.32 {q1}, [%1]\n\t" "vmul.f32 q0, q12, d2[0]\n\t" "vmla.f32 q0, q13, d2[1]\n\t" "vmla.f32 q0, q14, d3[0]\n\t" "vmla.f32 q0, q15, d3[1]\n\t" "vstmia %2, { q0 }" : : "r" (mat), "r" (vec), "r" (result) : "memory", "q0", "q1", "q8", "q9", "q10", "q11" ); } inline void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2, float32x4x4_t* __restrict__ r) { asm ( "vldmia %1, { q0-q3 } \n\t" "vldmia %2, { q8-q11 }\n\t" "vmul.f32 q12, q8, d0[0]\n\t" "vmul.f32 q13, q8, d2[0]\n\t" "vmul.f32 q14, q8, d4[0]\n\t" "vmul.f32 q15, q8, d6[0]\n\t" "vmla.f32 q12, q9, d0[1]\n\t" "vmla.f32 q13, q9, d2[1]\n\t" "vmla.f32 q14, q9, d4[1]\n\t" "vmla.f32 q15, q9, d6[1]\n\t" "vmla.f32 q12, q10, d1[0]\n\t" "vmla.f32 q13, q10, d3[0]\n\t" "vmla.f32 q14, q10, d5[0]\n\t" "vmla.f32 q15, q10, d7[0]\n\t" "vmla.f32 q12, q11, d1[1]\n\t" "vmla.f32 q13, q11, d3[1]\n\t" "vmla.f32 q14, q11, d5[1]\n\t" "vmla.f32 q15, q11, d7[1]\n\t" "vstmia %0, { q12-q15 }" : : "r" (result), "r" (m2), "r" (m1) : "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15" ); }
ã¢ã»ã³ãã©ãŒã«æ
£ããŠããªã人ã«ãšã£ãŠã¯ããã¹ãŠãããªãæãããã§ã-ç§èªèº«ãããã§ããç§ã¯NEONã¢ã»ã³ãã©ãŒããçè§£ã§ããŸããã ããããå®éã«ã¯ãããã§ã¯ãã¹ãŠãåçŽã§ããå®éã«ã¯ã q1ãq15ã¯NEONã¬ãžã¹ã¿ã§ãã vldmia \ vld1.32-ããŠã³ããŒãæé ã vstmia-ã¡ã¢ãªå
ã®ä¿åã vmul.f32 \ vmla.f32-ä¹ç®\ä¹ç®ããã³å ç®ã
çµã¿èŸŒã¿ã¡ãœããïŒ
inline void Matrix4ByVec4(float32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__ vec, float32x4_t* __restrict__ result) { (*result) = vmulq_n_f32((*mat).val[0], (*vec)[0]); (*result) = vmlaq_n_f32((*result), (*mat).val[1], (*vec)[1]); (*result) = vmlaq_n_f32((*result), (*mat).val[2], (*vec)[2]); (*result) = vmlaq_n_f32((*result), (*mat).val[3], (*vec)[3]); } inline void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2, float32x4x4_t* __restrict__ r) { (*r).val[0] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[0], 0)); (*r).val[1] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[1], 0)); (*r).val[2] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[2], 0)); (*r).val[3] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[3], 0)); (*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[1], vgetq_lane_f32((*m2).val[0], 1)); (*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[1], vgetq_lane_f32((*m2).val[1], 1)); (*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[1], vgetq_lane_f32((*m2).val[2], 1)); (*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[1], vgetq_lane_f32((*m2).val[3], 1)); (*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[2], vgetq_lane_f32((*m2).val[0], 2)); (*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[2], vgetq_lane_f32((*m2).val[1], 2)); (*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[2], vgetq_lane_f32((*m2).val[2], 2)); (*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[2], vgetq_lane_f32((*m2).val[3], 2)); (*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[3], vgetq_lane_f32((*m2).val[0], 3)); (*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[3], vgetq_lane_f32((*m2).val[1], 3)); (*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[3], vgetq_lane_f32((*m2).val[2], 3)); (*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[3], vgetq_lane_f32((*m2).val[3], 3)); }
GLKMathãšã»ãŒåãã³ãŒãã§ãããããããªéãããããŸãã 説æïŒ vmulq_n_f32-ãã¯ãã«ãšã¹ã«ã©ãŒã®ä¹ç®ã vgetq_lane_f32-ãã¯ãã«ããã¹ã«ã©ãŒãéžæãããã¯ãã vmlaq_n_f32-ã¹ã«ã©ãŒãä¹ç®ããŠå ç®ããŸãã ãã®ã³ãŒãã¯ãã¢ã»ã³ãã©ãŒãçµã¿èŸŒã¿é¢æ°ã«åã«åæ ãããã®ã§ãã 圌ãåœŒãšæ¯èŒããŠã©ã®ããã«åœŒèªèº«ã瀺ããèŠãŠã¿ãŸãããã
iPod Touch 4ã§ãã¹ããè¡ããŸããã衚ã«ã¯ãæŽæ°æ©èœã®ãããã¡ã€ãªã³ã°çµæãå«ãŸããŠããŸãã
ã¢ãããŒã | å®è¡æéãããªç§ | CPUè² è·ãïŒ |
---|---|---|
FPU | 6058 + 5067 * | 35ã38 |
GLKMath | 2789 | 20-23 |
ã¢ã»ã³ãã©ãŒ | 5304 | 23-25 |
çæ§ | 2803 | 18-20 |
ããã«å¥ã®ãã³ãããããŸã-ããã©ãŒãã³ã¹ãéèŠãªã³ãŒããç©æ¥µçã«ã€ã³ã©ã€ã³åããŸãã ãã®ãããªå Žåã¯ãéåžžã®ã€ã³ã©ã€ã³ããã__attribute __ïŒïŒalways_inlineïŒïŒãåªå ããŠãã ãã ã
æŽæ°ãããçµæè¡šïŒ
ã¢ãããŒã | å®è¡æéãããªç§ | CPUè² è·ãïŒ |
---|---|---|
FPU匷å¶ã€ã³ã©ã€ã³å | 6209 | 25ã28 |
GLKMath | 2789 | 20-23 |
ã¢ã»ã³ãã©ãŒ | 5304 | 23-25 |
çæ§ | 2803 | 18-20 |
æçµçµæè¡šïŒ
ã¢ãããŒã | å®è¡æéãããªç§ | å®è¡æéïŒãã¯ãã«ïŒãms | CPUè² è·ãïŒ | CPUè² è·ïŒãã¯ãã«ïŒãïŒ |
---|---|---|---|---|
FPU匷å¶ã€ã³ã©ã€ã³å | 6209 | 5028 | 25ã28 | 22-24 |
GLKMath | 2789 | 2776 | 20-23 | 20-23 |
ã¢ã»ã³ãã©ãŒ | 5304 | 5291 | 23-25 | 22-24 |
çæ§ | 2803 | 2789 | 18-20 | 18-20 |
ã¢ã»ã³ãã©ãŒãšçµã¿èŸŒã¿é¢æ°ã®å Žåãããªãå¥åŠãªçµæã芳å¯ãããŸã-å®éã«ã¯ã³ãŒãã¯åãã§ãããçµæã¯åçã«ç°ãªããŸã-ã»ãŒ2åã§ãïŒ ãã®è³ªåã«å¯Ÿããçãã¯ãã¢ã»ã³ããªã®ãªã¹ãïŒèªåã§èª¿ã¹ãã人ïŒã«ãããŸãã ã¢ã»ã³ãã©ã®å Žåããªã¹ãã«æžãããã®ãæ£ç¢ºã«èŠãããšãã§ããŸãã çµã¿èŸŒã¿é¢æ°ã®å Žåãã³ã³ãã€ã©ã¯ã³ãŒããæé©åããŸããã ãã£ãããäžèŠãããšãããGLKMathã³ãŒãã³ã³ãã€ã©ã¯å®å šã«æé©åãããŠãããæåã§èšè¿°ãããçµã¿èŸŒã¿é¢æ°ãšåãã³ãŒãå®è¡æéãäžããŸããã
ã¹ã¯ãªãŒã³ã·ã§ããã®çµæ
åšåº«ãåãæã§ãã ããã€ãã®çµè«ãåŒãåºãããšãã§ããŸãã
- LLVMã®ãšã³ãžãã¢ã¯çŽ æŽãããä»äºãããŸããã ãã®çµæãã³ã³ãã€ã©ã¯æé©åãããçµã¿èŸŒã¿ã³ãŒããçæããŸãã Xcodeã®å¯äžã®ã³ã³ãã€ã©ãGCC 4.2ã§ã1幎以äžåã«åæ§ã®ãã¹ããè¡ããŸããããFPUã³ãŒããšæ¯èŒããŠããã©ãŒãã³ã¹ã10ã15ïŒ ããåäžããŸããã§ããã ããã¯çŽ æŽããããã¥ãŒã¹ã§ããã¢ã»ã³ãã©ãŒãåŠã¶å¿ èŠã¯ãªããç§ã¯ããã«ã€ããŠéåžžã«æºè¶³ããŠããŸãïŒ
- clangã³ã³ãã€ã©ã¯ãã³ãŒããèªåãã¯ãã«åã§ããŸãã ããã°ã©ããŒã«ãšã£ãŠãããã¯ãã£ã4èªã§æžãããšã«ããããã©ãŒãã³ã¹ããŒãã¹ã§ãã ãããã¯ãŒã«ãªãã®ã§ããããšãé€ããŠãç§ã¯ä»ã«äœãèšãããšãã§ããŸããïŒïŒ
- NEONã³ãŒãã¯ãéåžžã®Cã³ãŒãã«æ¯ã¹ãŠããã©ãŒãã³ã¹ã2.22ååäžããŠããŸãã æé©åã®çµæãé ç¹åŠçã¯ãããã®é ç¹ãGPUåŽã«ã³ããŒãããããé«éã«ãªããŸããïŒ memcpyã¢ã»ã³ãã©ãŒãèŠããšãããã§NEONã³ãŒãã䜿çšãããŠããããšãããããŸãã Cortex A8ã«ããŒããŠã§ã¢ããããªããããã³ãŒããé ããªããŸãã
- ç¹ã«ããªãã®ç®æšãããã«ãªãããšã§ããå Žåãããããã¹ãŠã®äœã¬ãã«ã®ããšãåŠã¶ããšã¯äŸ¡å€ããããŸãã
åç §è³æ
www.arm.com/products/processors/technologies/neon.phpblogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores
code.google.com/p/math-neon
llvm.org/devmtg/2012-04-12/Slides/Hal_Finkel.pdf
ãã¢ãããžã§ã¯ã