PowerPC和DSP对比－金锄头文库

资源描述

《PowerPC和DSP对比》由会员分享，可在线阅读，更多相关《PowerPC和DSP对比（7页珍藏版）》请在金锄头文库上搜索。

1、-PowerPC和DSP比照一、主要性能参数比照TigerSHARC TigerSHARC PowerPCPowerPCParameterADSP-TS101SADSP-TS201SMPC7455PPC476FP(IBM 45nm SoI)Core Clock250 MHz500 MHz1,000 MHz1,600 MHzPeak Floating-pt Performance1,500 MFLOPS 3000 MFLOPS 8,000 MFLOPS 3,000 MFLOPSMemory Bus Size/Speed64-bit/100 MHz 64-bit/100 MHz 64-bit/1

2、33 MHz 128-bit/800 MHz E*ternal Link Ports4250 MB/Sec 4250 MB/Sec None User DefineI/O Bandwidth (inc. memory)1,800 MB/Sec 1,800 MB/Sec 1,064 MB/sec 64,00 MB/sec Bandwidth-to-Processing Ratio1.20 Bytes/FLOP 1.20 Bytes/FLOP 0.13 Bytes/FLOP 2.1 Bytes/FLOP 1024-pt cFFT Benchmark39 sec 19 sec 13 sec (est

3、.) 83.2sec双精度Appro* Cycles for 1024-pt cFFT9,750 cycles 9,750 cycles 13,000 cycles Predicted 1024-pt cFFTs/chip25,641 per Sec 12,821 per Sec 64,941* per Sec ASDP tigersharp主要参数Part*Clock Speed (MHz)MMACS (Ma*)On Chip MemoryE*ternal Memory SupportedOperating Temp RangePackageUS Price 1000-4999ADSP-TS

4、201S600MHz480024MbitAsync, SDRAM-25 * 25 BGA$252.25ADSP-TS202S500MHz400012MbitAsync, SDRAM-25 * 25 BGA$209.51ADSP-TS203S500MHz40004MbitAsync, SDRAM-25 * 25 BGA$184.49ADSP-TS101S300MHz24006MbitAsync, SDRAM-40 to +8519 * 19 BGA, 27 * 27 BGA$193.88C6701C6201C6203MPC7410*PPC476Clock (MHz)167200300500160

5、0Instruction Cycle (ns)653.332Instructions Per Cycle1 - 81 - 81 - 81 - 314Million Instructions/Sec.133316002400500Million Fi*ed-Point Ops/Sec.1333160024008000Million Floating-Point Ops/Sec.100020003000General-Purpose Algorithm Benchmarks on TIs C66* DSP Core at 1.25 GHz1Benchmark Speed Clock Cycle 3

6、2-bit algorithm 1k point FFT (Radi* 4) 5.47 s 6840 64k point FFT (Radi* 4) 0.58 ms 696588 FIR filter (per real tap) 0.2 ns 0.25 8*88*8matri* multiply (ple* floating point) 1.06 s 1327 16-bit algorithm 256 point ple* FFT (Radi* 4) 0.6 s 752 主要DSP的浮点性能比照：Speed Scores for floating-point packaged proces

7、sors BDTImark2000(BDTI认证结果)(BDTI主要是针对DSP的benchmark，没有MPC7410和Powerpc的数据)一些算法，像FFT，可以充分利用7410的矢量数学运算。1024点，浮点复数FFT可以在27us完成，相比之下，C6701需要108us。其他算法，像无线应用中的turbo解码器，VLIW构造处理的更有效率。很明显，具有AltiVec核的PowerPC G4(74*)具有较高的核时钟速率与性能。P O W e r P C 的核时钟速率几乎是目前T i g e r s H A R C的33倍(不久更快版本的TigerSHARC将发布)。AltiVec核每

8、个周期执行单条指令，每128位向量包含4个独立的32位数据单元，这就是众所周知的sIM-D(单指令多数据)构造。当执行一次乘加(MAC)矢量运算时，到达峰值处理能力，每周期可完成8次浮点操作。对于1 GHz的MPC7455，峰值处理能力可达8000M 次s浮点运算。AltiVec每周期能执行8次整数或定点操作，峰值整数运算能力为8000MOPS(百万次操作s)。相反，TigerSHARC有两个独立的32位处理器核，或称MIMD(多指令多数据)构造。每个计算单元每周期能执行一次乘法以及和差分运算，对于300 MHz ADSPTSl0lS每周期完成6次浮点运算或1800MFLOPS峰值运算能力。当

9、执行16位整数运算时，TigerSHARC 可以利用它的超标量体系构造，别离两个独立3 2位计算单元成2个单独的16位S1MD单元。这样每个操作在两个数据单元，每个周期总共12次操作。另外，TigerSHARC有另外两个专门的1 6位整数引擎，每个周期可以增加超过1 2次的操作，这样每个周期共计2 4次整数运算，7200MOPS。二、 IBM 476FPE在FFT方面的性能评估FFT算法采用的算法.fftw.org，算法是优化比较好的算法，性能得到肯定。测试程序采用benchFFT3.1.fftw.org.比照的三个芯片是IBM PPC476FPE，PowerPC7447A，Intel 四核P

10、entium 3.06GHz。以512和1024 transform-size为参考。配置情况说明：1. PPC476FPE，2. Apple iBook G4. 1.06 GHz PowerPC 7447A, linu* 2.6.15, gcc-4.0.2, g+-4.0.2, g77-4.0.2. Has Altivec (4-way single precision SIMD).pilers and flags (unless overridden):C: gcc -O3 -fomit-frame-pointer -fstrict-aliasing -mcpu=7450C+: g+ -O

11、3 -fomit-frame-pointer -fstrict-aliasing -mcpu=7450Fortran: gfortran -O3 -fomit-frame-pointer -fstrict-aliasing -mcpu=74503. Four-processor 3.06 GHz Intel Pentium 4, 512 KB L2. Linu* 2.4.25, gcc-3.3.3, g+-3.3.3, g77-3.3.3, AMD Core Math Library (ACML) 3.0.0, Intel Math Kernel Library Version 8.0.1,

12、Intel Integrated Performance Primitives v5.0. Has SSE (4-way single precision SIMD), SSE2 (2-way double precision SIMD). The benchmark uses one processor only.Mflops计算方法To report FFT performance, we plot the mflops of each FFT, which is a scaled version of the speed, defined by:mflops = 5 N log2(N)

13、/ (time for one FFT in microseconds) for ple* transforms, andmflops = 2.5 N log2(N) / (time for one FFT in microseconds) for real transforms,where N is number of data points (the product of the FFT dimensions). This is not an actual flop count; it is simply a convenient scaling, based on the fact th

14、at the radi*-2 Cooley-Tukey algorithm asymptotically requires 5 N log2(N) floating-point operations. It allows us to pare the performance for many different sizes on the same graph, get a sense of the cache effects, and provide a rough measure of efficiency relative to the clock speed.变换类型的说明transfo

15、rm-typeis a four-character string consisting of precision (double/single =d/s), type (ple*/real =c/r), in-place/out-of-place (=i/o), and forward/backward (=f/b). For e*ample,transform-type=dcifdenotes a double-precision in-place forward transform of ple* data.transform-typetransform-sizeIBM PPC476FPEApple iBook G4四核 Intel P4, 476/G4476/G4476/P4476/P4mflops

展开阅读全文

PowerPC和DSP对比

最新文档