Abstract: Elaborated take DSP chip TMS320F2812 as the core one kind of metering equipment’s composition principle, the design concept as well as the fast fixed-point algorithm realizes the method, simultaneously has carried on the comparison to the fixed point and the floating point algorithm result.
Key word: Fixed-point chip; Floating point calculation; Fast algorithm; System disposition; TMS320F2812
1 TMS320F2812 synopsis
TMS320F2812 is the high performance which a TI Corporation’s section uses in controlling, multi-purpose, the high performance-to-price ratio 32 fixed-point DSP chip. This chip compatible TMS320LF2407 command system 熥 gang li tomb guarantees load Egypt to stop under slightly the decayed tooth basic frequency to work, and has 18k×16 on the position 0 waiting cycle piece SRAM and 128k×16 on the position piece FLASH (access time 36ns). Its piece on peripheral device mainly includes 2×8 road 12 ADC (quickest 80ns switching time), 2 group SCI, 1 group SPI, 1 group McBSP, 1 group eCAN and so on, and has two event administration module (EVA, EVB), includes 6 group PWM/CMP, 2 group QEP, 3 group CAP, 2 groups 16 timers separately (or TxPWM/TxCMP). Moreover, this component also has 3 independent 32 CPU timers, as well as reaches 56 independent programming the GPIO pins, but outside expands in 1M×16 the position procedure and the data-carrier storage. TMS320F2812 uses the Harvard bus structure, has the password protection mechanism, may carry on the pair 16×16 while to add with 32×32 while adds the operation, thus may give dual attention to the control and the fast operation dual function.
Through chip reasonable system disposition and the programming may realize the fast operation to TMS320F2812 the fixed-point DSP, this article performs to explain regarding this emphatically.
2 TMS320F2812 basic system disposition
2.1 TMS320F2812 clocks
On the TMS320F2812 piece the peripheral device may divide into the following 4 groups according to the input clock:
(1)SYSOUTCLK group: Including the CPU timer and the eCAN main line, may dynamic revise by the PLLCR register;
(2)OSCCLK group: Is mainly the watch-dog electric circuit, establishes the frequency division coefficient by the WDCR register;
(3) low speed group: Has SCI, SPI, McBSP, may establish the frequency division coefficient by the LOSPCP register;
(4) high speed group: Including EVA/B, ADC, may establish the frequency division coefficient by the HISPCP register.
In order to enable the system to have the quick working speed, except the timer and SCI and so on minority need low speed clock’s place, other peripheral devices may the 150MHz clock work.
Figure 1
2.2 storage spaces
Shown in Figure 1 is the TMS320F2812 internal storage space mapping chart. TMS320F2812 is Harvard (Harvard) structure DSP, namely may simultaneously carry on the identical clock cycle one time takes the instruction, to read the data and to write the data the operation. Has 4M×16 the position procedure space in logic and 4M×16 the bit data space, but in physics already the procedure space and the data space unification was a 4M×16 position storage space, various main lines according to priority order from high to low were: The data writes, the procedure to write, the data to read, the procedure to read. And expands 256k×16 position SARAM by CY7C1041 located at Zone 6 (0×100000~0×13FFFF), the access time is not smaller than 12ns; 128k×16 the position FLASH space (0×3D8000~ 0×3F7FFF) takes refers to the time is not smaller than 36ns. To raise component’s working speed as far as possible, in causes it to the FLASH register programming which works while the high speed, may the time request quite strict procedure (for example latency counting subroutine, FIR filter subroutine and so on), the variable (for example FIR filter coefficient, auto-adapted algorithm weight vector and so on) various warehouse space removal to H0, L0, L1, M0, the M1 space move.
2.3 interrupts
TMS320F28x on the series DSP piece has the very rich peripheral device, on each piece the peripheral device may have 1 or many interrupt requests. The interrupt is composed of two levels, first-level is the PIE interrupt, in addition first-level is the CPU interrupt. The CPU interrupt has 32 interrupt sources, including RESET, NMI, EMUINT, ILLEGAL, 12 user definition software interruption USER1~USER12 and 16 maskable interrupt (INT1~INT14, RTOSINT and DLOGINT). All software interruption belongs to the non-shield interrupt. Because the CPU enough interrupt source has not managed on all pieces the peripheral device interrupt request, therefore established a peripheral device interrupt expansion controller in TMS320F28x in series DSP (PIE) to manage the interrupt request which on the piece the peripheral device and the exterior pin caused.
The PIE interrupt altogether has 96, is divided into 12 groups, in each group has on 8 pieces the peripheral device interrupt request, on 96 pieces the peripheral device interrupt request signal may record is INTx.y (x=1,2,…,12; y=1,2,…,8). Each group outputs an interrupt request signal to give CPU, namely PIE output INTx (x=1,2,…,…12) corresponds CPU interrupt input INT1~INT12. TMS320F28x in the series DSP 96 possibility’s PIE interrupt sources has 45 by the TMS320F2812 use, other is retained does the later DSP component use.
ADC, the timer, the SCI programming and so on interrupt mode carries on, may enhance CPU the use factor.
2.4 replacement guidances
Shown in Figure 2 is on the TMS320F2812 piece guides the ROM space mapping. Its this master program disposition in Figure 2 0×3FFC00~0×3FFFBF, according to Figure 1, establishes VMAP=1, MP/MC=0, ENPIE=0, on the replacement vector direction piece 0×3FFFC0, but on the piece in 0×3FFFC0 the content is 0×3FFC00, namely directional chart 2 vectoring procedure. Disposition table 2 GPIOF4 (SCITXDA) =1, then changes in FLASH 0×3F7FF6 to start the executive routine, finally in the 0×3F7FF6 piece establishment skipping instruction direction user program’s start place, starts to move the user program. Because has used the PIE interrupt in the practical application, therefore, in the user application procedure, should the first initialization PIE interrupt to the meter, then enable PIE.
3 programming designs
The programming is realizes the system normal work and operates the essential important link fast. Under the system disposition reasonable condition, realizes the fast operation key with the fixed-point chip to carry on computation processing with the integer substitution floating number. When with the C compiler, has the most superior code, should follow the following principle:
(1) transforms the division into the multiplication, causes the compiler to have the MAC instruction as far as possible, by carries on the fast operation fully using the DSP hardware multiplier resources, and should cause MAC the operand to assign for the local variable to the register in (or to an accumulator in).
(2) uses the static state in-line function as far as possible, saves function call the overhead.
(3) to for circulation’s upper limit, the use constant or has the constant attribute variable to be possible to produce redundant instruction RPT.
3.1 ADC programming
TMS20F2812 has two 8 to choose 1 multi-channel cut and double sampling/retainer’s 12 ADC, the simulation quantity input range is 0~3V, the quickest slewing rate is 80ns, selects the 10kSPS sampling rate, and uses EVA the timer (0.1ms) the autotrigger way, may simultaneously the sampling 4 channels, and uses the interrupt mode which each time transformed finishes to record the sampling result (right lateral 4).
Transformation result = (212-1) * (input simulated signal - ADCLO) /3
When ADC transformation, the first initialization DSP system, then establishes the PIE interrupt arrow meter, the initialization ADC module, then loads again the ADC interrupt’s entry point address the interrupt arrow meter and opens the interrupt, then starts the 0.1ms timer again, simultaneously waits for the ADC interrupt, finally reads the ADC transformation result in the ADC interrupt, and uses the software start next interrupt.
3.2 FIR filter programming
The target signal is sensitive to certain low frequency disturbances, it will respond the localization result and the data validity directly. For after the filter does not affect the latency data the computation, may use the linear phase the FIR filter. Filter coefficient h(i) with MATLAB production, and, in turns the reshaping then to solidify to the procedure, does this (, but is not calculates filter coefficient alone) the goal is to realize the fast filter but excessively will not increase the entire measurement system localization computation the time.
3.3 localization algorithm transplant
Because the localization algorithm uses the auto-adapted latency estimate law, therefore the computation load is huge, is high to the DSP chip performance requirement. TMS320F2812 has 32 hardware multiplier sum storage, its RPT instruction is suitable to circulate the computation, handling ability may reach 150MIPS, thus has the high performance. But it is a section of fixed-point processing chip, needs to use the fixed-point algorithm to solve the process load major problem. Therefore, to the initial datum, the power vector should use 16 reshaping variables (Q=12: By ADC conversion accuracy decision), but circulates the intermediate result which in the computation produces to use 32 reshaping variables (Q=20: In the result does not overflow in the situation satisfies the computational accuracy as far as possible); As for to the trigonometric function and so on operation, the available table look-up law and uses Figure 2 the form to carry on the rapid calculation.
The C compiler has the floating point calculation storehouse, therefore may carries on the floating point algorithm and the fixed-point algorithm result the comparison, regarding 4 group each 1024 data processing, realizes with the floating point algorithm approximately needs for 3.6 seconds, but only needs with the fixed-point algorithm for 1.3 seconds.
Moreover, but may also carry on the optimization to the algorithm. First is the middle variable which uses frequently disposes to the waiting cycle is in 0 memories; Second is uses the FLASH acceleration technology (to enable FOPT register’s ENPIPE position to realize refers to mechanism in advance FLASH assembly line pattern), like this may achieve 100~120MIPS the handling ability, is higher than itself 36ns greatly the read ability. What needs to pay attention, as a result of the TMS320F2812 protection mechanism, carries on the deposit to the FLASH register this section of procedures to the removal to L0, L1 carry out. Although like this, requests this section to the time the algorithm which compares Holland to remember to transplant to memory H0, may achieve highest 150MIPS the processing speed, and can use function memcpy() to complete the procedure the removal.
4 concluding remark
When the computation load is big, usually selects the floating point DSP chip. In fact, to use the fixed-point DSP chip fully on the piece the resources, can also use the method which this article introduced to select the fixed-point chip to achieve the high computation speed, like this may save the hardware design expense and the cycle, and reduces the power loss.