1 parallel running water structure FIR principle
When uses FPGA or the specific IC realizes the digital signal processing algorithm, the computation speed and the chip area are the subject matters which two restrict mutually. When practical application FIR filter, must obtain the good filter effect, filter’s exponent number will obviously possibly increase, sometimes possibly will reach several hundred steps. Therefore, it is necessary, in the performance and realizes between the complexity to make the choice, is also the choice different filter realizes the structure. Here utilizes the parallel assembly line structure to realize between the speed and the hardware area exchange and compromised.
Inserts register’s running water line structure in the critical path is enhances system turnover rate one item formidable to realize the technology, and does not need to be massively redundant establishes the hardware. The assembly line type mainly divides into two kinds: Arithmetic assembly line and instruction assembly line. To the FPGA design, the logical function is faces the specific application, therefore, uses needs little the extra control logic arithmetic running water structure. The running water line structure means that divides the time goes forward, the connected many processing fragment after the digital processing algorithm. And adds the signal register between Duan Yuduan to come the cushion. Between these Duan Heduan’s cushion constituted the assembly line. The system original operand is divided Cheng Kge the part, comes by the assembly line k section to manage outside separately. Once the front duty passed the assembly line first section, the new duty may enter the assembly line. Supposes the system does not add when the running water the time delay is D, then after adding on the running water, every other D/k Unit of time may start the new duty. Must realize the assembly line performance promotion to satisfy 3 conditions:
①The operand divides into the time delay consistent k part evenly;
②Loses the population to occupy the massive redundant operations;
③Around the redundant operation does not have the relevance.
The parallel structure is by is redundant the same structure, to simultaneously satisfies the concurrent operation condition the parallel algorithm the structure which carries in the hardware realizes. The parallel structure utilizes the main difficulty is as follows: First, the parallel structure takes more area. Second, the parallel each computation part when is exchanging the data mutually, needs the extra control and the interconnection structure. But, reduces unceasingly in the chip technology size today, the parallel structure becomes the design high speed, the low time delay data processing system’s first choice. The control and the interconnection structure’s complexity is decided by the algorithm and to the algorithm division method. The FIR filter itself suit the parallel processing, but regarding the holding time and the chip area very big multiplier, with entire parallel realizes the FIR filter is uneconomical.
The FIR filter are simple by its design, the stability good, realizes, merits and so on linear phase often to become the first choice conveniently, even is the only choice. The FIR filter use the difference equation expression are:

FIR filter direct structure like chart 1.

2 realize the method
Scene programmable gate array (FPGA) has the architecture and the logical unit disposition is nimble, the integration rate is high as well as design development cycle short and so on merits, therefore, selects FPGA to confirm and to realize this filter structure. VHDL is one kind of hardware description language, mainly uses in describing number system’s structure, the behavior, the function and the connection, unifies after FPGA, displays more formidable and the nimble number system designed capacity. Completes number system’s functional description with VHDL, realizes with FPGA is one practical convenience software and hardware union way. Is by synthesizes the tool as well as the layout, the wiring tool from the hardware description language to the FPGA layout data document completes. Whether does number system’s function realize as well as the performance finally how, is decided in number system’s algorithm structure, is also decided in synthesizes the tool, the layout and the wiring tool, but also has the component performance. But, if number system’s algorithm design is not good, will have more design relapses. Here to the FIR filter proposed that one kind of process time and uses the structure which the chip area may exchange, when initial design, can have the estimate to its handling ability, reduced the design relapse.
Regarding the FIR filter, Xilinx has provided two soft nuclei, one is based on the distributional operation, another is while adds the operation based on the single-channel. Regarding the big exponent number, the high sampling rate’s filter, these two kind of filter structure is not too suitable.
In order to enhance the FIR filter’s volume of goods handled, the available parallel Canadian assembly line’s structure realizes the FIR filter, as shown in Figure 2. The running water structure uses in enhancing the volume of goods handled rate, the parallel structure may reduce the processing time delay. Causes it using the running water and the parallel structural adjustment filter performance to satisfy the practical application request. Here realizes the third-level running water and two group parallel FIR filter. The third-level running water corresponds the fetching, the multiplication and the accumulation separately. Mainly by pair of mouth RAM, the multiplier, the accumulator, the control logic and the running water space’s register is composed, but also has the data to read in the module (in chart not picture).

Deposits the FIR coefficient and the data first N type value spot separately with two RAM, these two RAM request has one to write the data and reads the data twin port RAM. The data reads in the module to be responsible to want filter’s data to take turns to read in two pair of mouth RAM; The FIR coefficient also presses the even number subscript and the odd number subscript reads in two coefficient RAM separately, realizes when is disposes in advance. When realizes the filter, as shown in Figure 1 to data shifting is not realistic. Therefore, realizes the first level of running water with the crossing linking network union control module–The fetching, completes for the first-level assembly line delivers the number correctly the goal. The second level of running water is two parallel multipliers, completes the multiply operation. The third level of running water is an accumulator; Under control logic’s control, carries on the correct accumulation operation to the multiplier output result.
After completing the structural design, must carry on the succession design. The data reads in the module the clock has the data speed according to the data pool to decide. But the assembly line work clock rate requests to be bigger than the data to produce clock rate N/2 time, N filters the filter exponent number; 2nd, degree of parallelism. Is also requests the assembly line one cyclical internal energy which produces in the data to complete a time FIR filter output the computation. In which control logic is the assembly line normal operation key. In the data assembly line’s each kind of succession request needs by its production, including to read the data address, to read the coefficient the address, the crossing linking network control and the running water line structure output. Its VHDL port description is as follows:

The coefficient address by the counter production, the counter cycle is filter’s exponent number dividing degree of parallelism, by the first_data_address 0th border triggering, by from 0 starts to count again. The data RAM address adds on the counter the value. Because two RAM addresses the current input filter data’s depositing position, possible same also possibly to differ 1. The crossing linking network’s control signal is the counter most low position. The accumulator outputs enables the signal is when counts produces to filter’s exponent number, then gives the accumulator after the time delay. The accumulator reset signal produces in here must use other methods the accumulator to be much more convenient.
The crossing linking network is also the design key point. Regarding the parallel processing structure, between various units data sharing and the correspondence are limit the degree of parallelism the primary cause. In the degree of parallelism is in 2 structures, so long as the exchange coefficient might in turn. But to a higher degree of parallelism, this communication network’s time delay is quite big, this is also lists as alone it filter’s assembly line first-level primary cause.
What must pay attention: Has the sign digit commonly used complement representation. When to has the sign digit carries on the expansion, must expand the highest order. Must carry on the expansion generally to multiplier’s output, avoids the accumulator overflowing.
Regarding rides adds the operation, one distributed computing method, is also further decomposes the multiplication into the part and (binary coefficient each result which deals with data-in). When while adds the operation a multiplicator is the known constant, distributional while will add the operation very economical resources. Because the coefficient is fixed, is may know with the operation result before the operation, such zero position and the data deal with the result does not participation adds the operation, thus realizes the non-multiplier’s filter. Here does not select this method, the reason has two: First, the distributional operation will cause the filter to match again with difficulty: Second, compare the multiplier performance which based on the FPGA hardware multiplier the synthesis compatibles to be better.
3 simulations and test
Describes the complete electric circuit module after the VHDL language, the input coefficient 1, 2, 3, 4, 5 and data - l, - 2, 3, 4 and so on carry on the test. Carries on the simulation with Mod-elsim, its result as shown in Figure 3.
May see, the module can carry on the computation correctly, from data feeds to data output approximately time delay 2 data clocks. This is mainly the front data feeds module time delay. While adds the part to use data clock’s N/2 time, its time delay and filter’s exponent number becomes the proportion, but will not surpass a data clock cycle.

Then, carries on the synthesis and the test to the VHDL description on Xilinx Spartan-3. May obtain the test result which Table 1 shows, the first behavior parallel running water structure designs the filter, the second line uses the soft nucleus design filter which Xilinx provides.

May see, besides increases a multiplier, the logical block and the trigger increased more than one time. Increased one time with this kind of structural design’s filter area, the speed characteristic also enhanced one time. The above two kind of filter may apply in voice signal processing–Let the voice signal gain the pronunciation low frequency component through the low pass filter. Relatively speaking, the parallel running water structure can realize compared to Xilinx is soft a nucleus higher exponent number the filter. In the sense of hearing, passes through two kind of filter (same exponent number) the voice signal not too wide difference.
4 concluding remark
This article in the operation level, realizes the direct FIR filter based on the running water and the concurrent operation structure. If when designs the filter, union cascade and direct two kind of filter structure, then can also realize same parallel and the running water effect. In fact, but may also when the lower level multiply operation, and also uses parallel and the running water structure to the part realizes. These structure’s choice rests on the complexity which the performance requirement and realizes to come the concrete determination.
As realizes the modern high performance processor’s method, parallel and the running water structure has the characteristic respectively. Parallel is trades the speed by the area. The running water is trades the speed by the time delay, uses these two kind of structures, can between the area, the speed, the time delay the nimble exchange.