Abstract: The main introduction the FFT module’s design with realizes the plan based on the scene programmable gate array (FPGA) microwave relay correspondence. Proposed that one kind of entire parallel running water structure, uses new generation large capacity high speed Stratix series FPGA to be possible to complete N spot in N system clock FFT, the stable property, the operating speed is quick, can definitely satisfy the signal real-time processing the request.
Key word: FFT; Starix; FPGA: Entire parallel running water way: Design
1 introduction
Regarding ground in long-distance range microwave communication, when the signal distance surpasses certain scope, the electromagnetic wave dissemination can receive the ground the impediment, will increase the correspondence signal along with the signal distance to weaken. In order to lengthen the signal distance and the enhancement correspondence quality, needs to correspond between two places to set up certain microwave relay equipment (e.g. microwave relay machine), carries on the electromagnetic wave transfer and carries on after the signal receives and the enlargement section by section transmits again for the next section. In the microwave relay correspondence, usually selects the wide frequency method to sharpen system’s antijamming ability. But, the wide frequency systematic excessively wide frequency band band width is very easy in the communication facility crowded place to receive other equipment’s disturbance (selective interference). Therefore in the microwave relay machine work process, needs to carry on to the received signal selective interference signal distinguishes fast and uses technologies and so on corresponding auto-adapted trapped wave to carry on to the selective interference suppresses. Because selective interference’s power spectrum presents the peak shape in the frequency range, but the wide frequency signal presents the smooth characteristic approximately and easy to distinguish. Must therefore design the FFT module in the microwave relay machine to use in calculating the signal the power spectrum.
To realizes the FFT project, at present uses in common the method is uses DSP, the FFT processing electric circuit and FPGA. Realizes the FFT processing speed with DSP to be slow, cannot satisfy the certain high speed signal real-time processing the request; Although the special-purpose FFT processing component the speed is quick, but the price is relatively expensive and the peripheral circuit is relatively complex; Uses new generation’s FP-GA to realize FFT to have at the same time two merits. The FPGA fruitful in resources, realizes FFT easy with the aid of the parallel running water’s characteristic, not only the stable property, the efficiency are good, moreover may reduce the computation greatly lasting. Take Altera Corporation’s Stratix series FPGA as an example, it has reaches 79 040 logical units, 7 MB embedded memory, the optimized digital signal processor and high performance I/O ability, carries on the FFT processing very conveniently by the entire parallel running water way.
The author selects in Stratix series EPlS25 FPGA to realize FFT, after the system basic frequency is bigger than 52 MHz under the environment the steady work, completes the time which 1 time 256 spots FFT needs is smaller than 5μs, completes 1 time 1024 spots FFT to need the time is smaller than 20μs, satisfies the real-time processing completely the request.
2 module’s designs with realize
2.1 FFT algorithm selection
J.W.Tuky and T.W.Coody on have published famous “Machine accounting Fourier’s series One Algorithm” since 1965 after “Computer Mathematics” the paper, undergoes for several dozens year unceasing improvement, has formed the many FFT highly effective algorithms. These algorithms basically divide into two broad headings: Time domain extraction law FFT(DIT-FFT) and frequency range extraction law FFT(DIF-FFT). The time domain extraction law is disrupts the transformation module data-in the time domain according to certain inverted order rule, after passing through the transformation, the output FFT frequency range signal is the arranged in order. But the frequency range extraction law is the transformation module data-in the time domain according to the smooth input, after undergoing the transformation, output FFT frequency range signal according to but actually sequence rule output. According to operates base’s difference, may also divide into the base 2, base 4, bases 8 and the mixed base algorithm and so on.
In this transformation module’s treating processes, may complete the inverted order operation conveniently in the pre-Canadian window unit, moreover in the entire parallel running water way processing process, uses the time domain extraction to be possible to save fully using the original address, saves the memory. Therefore, chooses the simple practical time domain extraction base 2FFT algorithm in the design.
2.2 FFT modules and exterior electric circuit’s connection
The FFT module with exterior electric circuit’s connection as shown in Figure 1. In the chart, input signal Xin is the plural number zero intermediate frequency signal, the data width is 18bit, the coding form for the binary system base complement. Xout is the plural number transformation output signal, the data width is 18 bit, the coding form is also the binary system base complement. CLK and HCLK respectively are system’s master clock and 2 time of clocks. HCLK mainly uses in the data the input, the output. When CLK is `1 ‘, by Xin in HCLK rise along input real part data and in Xout output transform data real part; When CLK is `O’shi, by Xin in HCLK rise along input imaginary component data and in Xout output transform data imaginary component. iFSyne is the transformation input frame synchronization control signal, oFSync is the transformation output frame synchronization control signal. 2 signals are `1 ‘ when expressed separately the module input/output transformation frame’s 1st data starts to input/the output.

2.3 entire parallel running water way realization
In the FFT working’s design, uses in FPGA to inlay the multiplier and the memory fruitful in resources fully the characteristic. Uses entire parallel running water working. As shown in Figure 2, in the chart N is carries on the FFT operation the points, M=log2N. Take N=256, M=8 as an example, when after system steady work, in 256 clocks, simultaneously has the lO group data to make the different operation. When 1st group of data feeds, the 10th group of data is outputting, but the middle 8 groups of data are carrying on all levels of butterfly-shaped operations. Therefore, after when the module enters the steady work condition, every other 256 clocks have a group of data to complete 256 spots FFT, from outputs in RAM to output.

2.4 FFT transformation module internal design
FFT internal conversion module’s design as shown in Figure 3. Below as the example gives take N=256 FFT to explain separately.

2.4.1 add window and the inverted order memory cell
In order to reduce the frequency spectrum divulging error which the time domain interruption creates, before carrying on the FFT transformation to deal with the module input the data to carry on adds window processing. Considered this module mainly uses for to analyze the superimposition to have the selective interference wide frequency signal, it requests to give each selective interference precisely the center frequency and the jamming intensity relative size, therefore, here selects outside the belt the weaken is 80 dB Chebyshev windows gains the good frequency spectrum effect. After adding the window the data according to the inverted order rule saves in RAM, waited for that enters the butterfly-shaped arithmetical unit to carry on the operation.
2.4.2 control units
The control unit is the entire FFT transformation module core. Its primary cognizance following two aspect work.
(1) provides each module the operation to enable
When examines the input port the iFSync signal after the high level, starts immediately “adds window and the inverted order memory cell” and “window factor ROM” the unit carries on the data feeds, the Canadian window, inverted order memory processing. After 256 clocks, the start “all levels of butterfly-shaped operations” the unit, and controls the address to produce the unit to produce present need each kind of address. Middle all levels of butterfly-shaped operation’s enabling calculates the unit production by on l level butterfly. When the 8th level of operation ended, provided the data output symbol oFSync, and controlled outputs the RAM synchronization output data.
(2) produces the address which in all levels of operation process needs
Inverted order address: Is the N synchronous counter loses with the mold realizes, carries on the current counter output’s top digit and the low position corresponding position exchanges completely then obtains the current data inverted order address.
All levels of operation address: Corresponds the RAM fetching address and the ROM fetching address. The principle was all data computation which corresponded 1 twiddle factor finished first changes to again in the data which the next 1 twiddle factor corresponded. Such words, may while produce ROM address to produce all RAM fetching address. After two address establishment connection, may cause the RAM data and the ROM data corresponds strictly.
2.4.3 RAM modules
In 256 spot FFT, must carry on 8 levels of butterflies to calculate that speaking of the entire parallel working needs 8 different RAM to save all levels of intermediate results. And, 8th level of RAM may take outputs RAM. In addition front adds the window and inverted order 1 RAM, the overall system altogether needs 9 RAM. Regarding 256 spot plural numbers, separate the real part and the imaginary component altogether need 512 memory cells. In some first-level butterfly calculates, as a result of the signal and the operation detention, is impossible to complete in 256 clocks this level operates, but next group of data after 256 clocks must carry on this level operates and will save finally in this RAM, like this has the possibility to cause the data not to read completely by the recent data cover conflict. In order to guarantee that realizes the data accurate deposit in entire multi-tasking way, may establish 9 RAM 1024×18 the bit storage format, namely: Divides into two parts each RAM, what the address is 0~511 is the first half part, what the address is 512~l 023 is the second half, controls two parts of memories around with a MSB signal as the address highest order. When the 1st group of data carries on this level operates, its result preservation in RAM the first half part; after 256 clocks, asks to MSB instead, and carries on by this control the I 1 group data the result which this level operates to read in RAM the second half, this time carries on to the 1st group of data’s read in the first half part, does not conflict mutually. In the Altera FPGA component has the rich RAM resources, uses twin port RAM to be possible to realize the above operation very conveniently.
2.4.4 ROM modules
The entire module altogether needs 3 ROM, uses for to save the Chebyshev window factor, other two use for to save the twiddle factor separately the real part and the imaginary component. Figures out these factors beforehand in the MATLAB intermediate total, and them according to *.mif document format output. In QuartusⅡIn the software, example 3 ROM, and *.mif which produces by MATLAB in the file write respective ROM initialization document, completes to the ROM initialization work.
2.4.5 butterfly-shaped arithmetical units
(1) basic butterfly-shaped arithmetical unit. Decomposes the complex operation after the real arithmetic, each basic butterfly-shaped arithmetical unit may by 4 multipliers, 1 accumulator and 1 subtracter constitutes. And, the multiplier is decides the system operating speed the key aspect. To 256 FFT under the entire parallel working, most requests parallel completes 33 18×18 bit in the identical clock the multiply operation. But EPlS25 series FPGA has the very rich multiplier resources, only DSP may parallel complete 40 18×18 bit the multiply operation, satisfies system’s request completely.
(2) may simplify butterfly-shaped arithmetical unit. To all levels of butterfly-shaped conducts the research in the foundation to discover, 1st level and the 2nd level of butterfly-shaped process simplifies definitely may not need to carry on the multiply operation.
The 1st level has 1 twiddle factor won, its real part is 1, imaginary component is O, after the substitution basic butterfly-shaped arithmetical unit simplifies, may result in:
mx1=xl x2; my1=yl y2
mx2=x1-x2; my2=y1-y2
And: x1, x2 are the data-in real part, y1, y2 are the data-in imaginary component, after mx1, mx2 are the transformation data real part, after my1, my2 are the transformation data imaginary components.
The 2nd level has 2 twiddle factors, won and w64n, may continue to use the first level of short-cut method to won.
Regarding w64n, its real part is 0, imaginary components for - l, after the substitution basic butterfly calculates the unit to simplify, may result in: mx1=x1 y2; my1=y1 x2; mx2=x1-xz; my2=y1 x2 like this, the altogether 8 levels of butterfly-shaped operations have 2 levels to be possible not to use the multiplier and memory twiddle factor ROM, has saved 25% multipliers and the ROM resources.
2.5 error’s analyses and control
Speaking of FPGA, uses the hardware expenses which the floating point calculation brings to be too big. But if uses the block floating point against overflow plan which literature [3] proposed, after each first-level butterfly-shaped operation conclusion, needs to discover in this level of computed result the maximum value to judge the overflow the condition, the figure which by this determination under and carries on when first-level operation each data needs to shift. This speaking of entire parallel working, means that each first-level data can bring a bigger detention, affects the entire operation the speed. But speaking of the fixed-point arithmetic, although has the limited word length effect the influence, but, so long as carries on suitable shifting processing to the data to be possible to prevent the overflow; When data rejection, carries on the operation which the similar 4 sheds 5 enter to be possible the effective ning error. After overall evaluation, the system uses the fixed-point arithmetic plan. In the fixed-point arithmetic, the error mainly manifests in the following two aspects:
(1) multiplication truncation error. 2 18 data multiplications obtain 36 products, should accumulate the half adjust is 18 will have the error. What because from zero intermediate frequency 18 bit data actual attribute is the mold value is not bigger than `1 ‘ the duplicate decimal, therefore the multiplication cannot have the overflow. Removes the inferior top digit unnecessary sign bit and clips latter 17. When is clipped everybody is `1 ‘ time, the error is biggest; Everybody who clips is `O’shi, does not have the error. To the part which clips is done processing which the similar 4 sheds 5 enter, 20th is `1 ‘ upwardly carries, is `O’ze leaves directly, may reduce the error effectively.
(2) plus-minus method overflow error. 2 18 data add and subtract obtains 19 results, before advance first-level operation, must leave l, also carries on the above 4 sheds to rejection this l 5 to enter the operation. 2 decimal’s addition and subtraction operations say, may prevent the overflow result complete right lateral 1.
3 profile simulations and performance analysis
The profile simulation selects the input signal is
x (n) =Xxexp [j× (03 2x127xnxπ) /256]
In the formula. X needs respectively to take 18 bit signals according to the test the maximum value and achieves the minimum value 13, n value scope which 80 dB signal-to-noise ratios need is 0:255. Designs the tool to select the VHDL93 version hardware description language, in QuartusⅡin 4.1 platforms carries on the logic synthesis and the succession analysis, the simulation result preservation is the *.tbl document format. In MATLAB, reads the *.tbl document, and carries on the comparison with the MATLAB computed result. Because 8 levels of operations have made right lateral 1 processing, therefore the actual result ratio uses MATLAB the computed result to reduce 256 times. Reduces 256 times the MATLAB computed result with 0uartusⅡ4.1 computed result comparison, as shown in Figure 4. In the chart, the left figure above is the primitive sequence, the right figure above to use the result which MATLAB calculates, the right under chart to use the actual result which FPGA calculates. Under the left in the chart, carries on two results the fractionated gain, the MATLAB computed result indicated with the solid line that the Quartus4.1 simulation result uses ” ” to express. Might see two groups of results to tally the nature to be good, has confirmed the procedure accuracy. The simulation uses 60 MHz system basic frequency, after the system enters the steady state (passes through 38.34μs), completes 1 time 256 FFT to use the time every time for 4.26μs. Takes the situation to the EPlS25 component resources is: The logical unit uses 15%, the internal memory uses 18%, special-purpose DSP uses 62.5%. Although the special-purpose DSP block use are many, but the logical unit uses very much few, may use the logical unit to constitute the 18×18 multiplier and special-purpose DSP completes the more parallel multiplication operations together. This indicated that the system also has the very good extendibility, must complete more points FFT, only need increase the corresponding butterfly-shaped operation the progression then.

May see from the result, because operates uses the effective action to prevent the error and the overflow, when biggest data operation has not overflowed, moreover operates the result the error is smaller than 10-9 finally. When uses to achieve 80 dB signal-to-noise ratios must the smallest data carry on the operation, also has the very good resolution.
4 concluding remark
This article discussed in the microwave relay machine FFT module design with to realize the process. The complete circuit design after the function simulation, the logic synthesis, the latency has analyzed and succeeds downloads into FPGA in puts into the practice application. The practice application indicated that realizes the FFT speed with Stratix series FPGA to be quick, the stability high, easy to expand. In the microwave relay correspondence, specially in relay machine has the very big superiority to the selective interference fast recognition’s application.