• May expand the high speed FFT processor’s design based on FPGA with to realize - en.51rd.net

    Abstract: This article proposed realizes the assembly line FFT processor’s structural design which as well as various functional module algorithm based on FPGA the Fourier transformation points may expand nimbly realizes, counts the FFT algorithm including the high combination the assembly line to realize the structure, the interstage to mix the foreword to read/writes the RAM address rule, the dash to count the FFT array processing structure as well as the base complement realizes the CORDIC algorithm running water line structure and so on. Various functional module which realizes using FPGA has assembled 64 FFT processors. May know from its estimated performance, when the data-in speed is 20 MHz, uses the FFT processor which this structure realizes to calculate 1 024 FFT the operation time approximately for 52μs.
    Key word: Fast Fournier transformation; Processor; Coordinate rotation digital computer; Scene programmable gate array; Design

    First, introduction
      DFT (the discrete Fourier transform) takes the signal transforms from the time domain to the frequency range fundamental operation, in each kind of digital signal processing the key role, its fast algorithm FFT (the fast Fournier transformation) in domains and so on wireless communication, speech recognition, imagery processing and spectral analysis has the widespread application. (The scene programmable gate array) realizes the FFT algorithm when with large scale integrated circuit FPGA, needs the key consideration no longer is the algorithm operand, but is the algorithm complexity, neat and the modulation, will be more suitable because of the algorithm simplicity and neat the large scale integration, conveniently in domain design, but the algorithm modulation is more advantageous to the FFT processor’s nimble expansion. The combination counts the FFT algorithm and CORDIC (coordinate rotation digital computer) the algorithm unifies, when calculates the long points, may expand FFT has the big superiority [1,2]. But face high speed, large capacity data stream FFT real-time processing, may (ultra large scale integrated circuit) component’s parallel processing or multistage assembly line processing and so on achieves through VLSI. Specially the multistage assembly line processing FFT structure causes processor completes the different points based on FPGA component’s FFT when the FFT computation to be possible through the fluctuation module progression to realize very easily.

    Second, the combination counts N=r1r2 the mixed base FFT principle
      Calculates N DFT:
      
      In the formula k=0,1,…, N-1.
      If the N=r1r2 combination number, may the n(n<N) expression be
      
      Type (2) significance lies, the computation combination counts N=r1r2 to select DFT, in extracts r2 group r1 DFT first equally, its result after correspondence twiddle factorphase rotation, then calculates r1 group r2 DFT. In the practical application, DFT often uses its fast algorithm FFT to realize, thus type (2) r1 selects DFT and r2 selects DFT to use r1 to select FFT and r2 selects FFT to realize.

    Third, may expand the FFT processor to realize the structure
      FFT algorithm principle designs the FFT processor according to type (2) to be possible to expand structure as shown in Figure 1.

      Uses the assembly line modulation cascade structure, divides the dash the FFT processor to count FFT, the interstage to mix functional modules and so on foreword RAM and phase rotation, the design various functional module may the reuse, or fluctuates various functional module through the multiplying to be possible to change the FFT processor’s computation scale nimbly, moreover does not increase the design quantity. In Figure 1 in structure, when Li=1, evolved base 2 FFT; When Li=2, evolved base 4 FFT; When likewise, Li≠Lj, evolved high combination number mixed base FFT.

    1. the dash counts the FFT array architecture
     
    - When the Tukey algorithm structure realizes, has the massive complex multiplication in fact to transform as the addition and subtraction operation, not only therefore realizes with the array architecture has the speed quick merit, moreover uses the component resources also to reduce many, through counts FFT to the array architecture dash to carry on the multiplying, may enhance the arithmetical unit the use efficiency.

    2. phase rotation arithmetical unit
      Realizes the dash to count the FFT interstage phase rotation, uses conventional routes which the ROM memory twiddle factor and the data duplicate ride, not only involves the multiply operation, will consume the massive memory resources.

      Realizes the combination using the CORDIC algorithm to count the FFT interstage data the phase rotation, transforms the plus-minus method operation the multiplication, suits FPGA the large scale integration. May design the unification structure the CORDIC processor module, the reuse realizes the phase rotation in the different interstage, moreover its control logic is simple.

      (1)CORDIC algorithm principle
      The plural number P=x jy degrees rotation Theta obtains the Q expression:
      
      If the degrees rotation Theta may decompose Cheng Nge the small angle φi sum, namely:
      
    Formula:  
      

      (2)CORDIC processor structural design
      This article proposed one kind of assembly line CORDIC processor structure solution. Realizes when the formula (4) iterative computation uses base complement shifting and the base complement addition and subtraction operation, may reduce the massive supplement operation, its iteration structure as shown in Figure 2.

      
    The former lies in the left shift to make up zero the figure difference, like this, only needs to change n0k0 the enlargement factor (change to shift to the left low position to make up zero figure), may the FFT processor’s different interstage calculate the unidirection vector functional module cascade to Figure 1 CORDIC processor’s MSBi, this greatly reduced the redundant design, its iteration structure as shown in Figure 3.
      

    the 3.RAM structure and the interstage data mix the foreword to read with the assembly line/write the RAM address generator the design
      Design RAM, each memory cell is 32 bit, high 16 for plural number real part, low 16 for plural number imaginary component. The input output data connection with the RAM design is the pingpong structure, reads out or alternately the write data alternately with two same RAM, has like this relaxed to the I/O operating speed request, causes the peripheral circuit to be possible not to need to work in the FPGA system clock.

      Between the level and the level the data mixes the foreword to design with RAM to read/writes RAM, completes to the RAM identical memory cell with two clocks one time reads/writes the operation, namely reads with the assembly line/writes realizes between the level and the level data together with RAM mixes the foreword. This structure substituted has completed the data with two RAM to mix the foreword the pingpong structure conventional routes, did not involve between memory’s read-write cut, the control logic is simple, moreover consumed the memory resources saved half, this was realizes the high speed FFT processor’s key which and the difficulty the structure might expand nimbly. May through the theory inferential reasoning, obtain ith level of FFT interstage to mix the foreword with i-1 the level FFT wonderfully time to read with RAM/writes the address is
      
      In the foundation the end around shift, the position length is the Li-Li-1 position toward left; At the same time, the latter also expressed that in the former’s foundation the end around shift, the position length is the Li-Li-1 position toward left, thus forms the address the end around shift rule. Unify Li-1=Li and the Li-1<Li two kind of situations, when namely Li-1=Li, Li-Li-1=0, does not use the end around shift, only needs the counter the high Li-1 position and the low Li-1 position carries on takes turn. Has the rule using this address, may design based on Figure 1 the structure base 2, bases 4 and so on random base x FFT as well as the mixed base FFT interstage data mixes the foreword to read with the assembly line/writes the RAM address generator.

    4.8×4×2 a combination counts the FFT processor’s experimental result and the analysis
      We used various functional module which FPGA realized according to Figure 1 to realize the structure to assemble 8×4×2 a combination to count the FFT processor, has confirmed its design accuracy after the simulation, also has carried on the hardware confirmation on the FPGA experiment board to it, its experimental verification platform as shown in Figure 4.

      The hardware confirms when adopts the experimental technique is, with the same sampling frequency fs equal-space extraction different frequency monofrequent sinusoidal signal same points 64 spots, namely fixed FFT frequency resolution fr, calculates its scope spectrum using the design 64 FFT processor, observes in its scope spectrum the direct component spectral line and the harmonic component spectral line gap size change, carries on the comparison the experimental result and the theoretical analysis result, confirms the FFT processor work normally or not.

      System clock work when 40.861 MHz, the sampling frequency is 40.861/2=20.4305 MHz, the sampling period is 1/20.4305 MHz=48.9 ns, extracts 64 spots the time is 48.9×64=3.13 μs. Because each sampled data time interval is 48.9μs, therefore calculates its scope spectrum with the design assembly line way work’s 64 FFT processor the spectral line gap is also 48.9 ns. When inputs the monofrequent sinusoidal signal the frequency approximately is 638.454 kHz, its cycle is 1/638.454 kHz=1.567μs. With 20.4305 MHz frequency sampling, 3.13μs in the time just extracts 64 spots in sinusoidal signal’s 2 cycles, the input monofrequent sinusoidal signal frequency is frequency resolution 319.227 kHz 2 times, the direct component for the scope spectrum 1st spectral line, a subharmonic component for the scope spectrum’s 3rd spectral line, its theoretical calculation result profile as shown in Figure 5, the experiment tests the result profile and fractionated gain profile like chart 6 and shown in Figure 7.



      May see from the oscilloscope, the abscissa unit standard gap for 1μs, the FFT transformation cycle gap approximately is 3 standards, namely approximately for 3μs, has extracted the waveform 2 cycles, 64 FFT computing time also approximately for 3μs.

      Inputs the monofrequent sinusoidal signal the frequency is frequency resolution 319.227 kHz 2 times, the direct component for the scope spectrum 1st spectral line, a subharmonic component for the scope spectrum 2nd spectral line. Because the scope spectrum’s spectral line gap is 48.9 ns, i.e., the direct component and a subharmonic component gap approximately is 100 ns. May see from the oscilloscope, the abscissa unit standard gap is 100 ns, the direct component and a subharmonic component gap approximately is 100 ns, is consistent with the theoretical analysis result.

    Fourth, conclusion
      This article take the high combination number mixed base DFT algorithm as the foundation, the design and has realized the assembly line FFT processor which with FPGA the transformation points might expand nimbly. When the input/output data speed is 20 MHz, reads/writes the RAM work in 40 MHz clocks, calculates 1 024 FFT the operation time approximately for 52μs. This design uses the modular design structure, is advantageous for the system to debug and to realize, moreover each design module may the reuse, the avoid repetition same design, thus the reduction chip design development time, changes in the FFT processor’s structure expansion. The entire FFT design structure is novel, realizes easily, has the certainly use value.

    Reference

    [1] Cheng Peiqing. Digital signal processing course [M]. Beijing: Tsinghua University publishing house, 2001.
    [2] Hou Boheng, gives a thought to the new .VHDL hardware description language and the digital logic circuit design [M]. Xi’an: Xidian University publication, 1999.
    [3] Stephan W.Mondwurf.BENEFITS OF THE CORDIC-ALGORITHM IN A VERSATILE COFDM MODULATOR/DEMODULATOR DESIGN [A]. Fourth IEEE International Caracas Conference on Devices, Circuits and Systems [C]. Aruba, April 17~19, 2002.
    [4] Zhao Zhongwu, Chen standing grain, Han Yueqiu. Based on FPGA 32 floating point FFT processor’s design [J]. Telecommunication technology, 2003,43 (6).
    [5] Y.Ma, L.Wanhammar.A Hardware efficient control of memory addressing for high performance FFT processors [J] .IEEE transactions on signal processing, 2000,48(3):917~921.
    [6] J.E.Volder.The CORDIC Trigonometric Computing Technique [J]. IRE Trans. on Electronic Computers,1959,8(3):330~334.
    [7] Han Ying, Wang Xu, Wu Siliang .FPGA realizes the high speed FFT processor’s design [J]. Telecommunication technology, 2003,43 (2): 74~78.
    [8] A.M.Despain.Fourier Transform Computers Using CORDIC Iterations [J] .IEEE Trans.on Computers,1993, C-23(10):993~1001.

    Share/Save/Bookmark

    Tuesday, September 9th, 2008 at 20:02
No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

TOP
Copyright © 51 Research and Design, Electronic Engineers website - Embedded Systems, MCU, DSP, EDA, Test and Measurement, Components, Communications, Power, Microelectronics, Semiconductors
Powered by WordPress | Theme by mg12 | Valid XHTML 1.1 and CSS 3