Introduction
The image compression technology is getting more and more important in the modern life’s status, along with present’s DSP processing data speed’s enhancement, speaking of the traditional image compression, monolithic DSP then achieves the very good effect. But as a result of the information content growth, concept and so on high definition statements, system’s processing data ability also needs to enhance particularly, when particularly requests the real-time image compression code, monolithic DSP is unable to be competent such work, even if is the special-purpose chip is also unable to meet the corresponding requirements. For the past ten years DSP technology swift development, while the DSP basic frequency obtains important breakthrough, its parallel technology and the external communication technology also had the very big enhancement. Now each big DSP manufacturer produces DSP has realized the different parallel technology on the data level and the instruction level, like TI Corporation’s TMS320 C64XX series and ADI Corporation’s Tiger SHARC series chip. What this article main introduction uses the multi-DSP system which ADI Corporation’s ADSP-TS201S chip realizes.
ADSP-TS201S parallel technology
The ADSP-TS201S chip is ADI Corporation a model of chip which promoted in 2004, had the 600MHz clock rate, the 1.67ns instruction cycle. ADSP- the TS201S chip (External Port) and the chain street intersection (Link Ports) two seed grafting ventriloquism technique has provided the powerful support through exterior to the multi-processor, this kind of multiprocessing ability has the following characteristic:
· the identical strip universal bus support reaches 8 piece of DSP also to work much;
· provides multi-processor’s distributional bus arbitration logic, realizes multi-processor’s seamless connection;
· realizes high speed point-to-point communication between multi-processor’s with Link port.
External Port has provided a unified address space, this address space may let each processor visit the ADSP-TS201S chip internal memory and the register directly. This DSP distributional bus arbitration logic may realize multi-processor’s seamless connection, and supports reaches eight piece of ADSP-TS201S chips and a piece of main processor also works. The arbitration logic may also prevent a processor to take the external bus time to be excessively long.
ADSP-TS201S chip four Link the Ports port is multi-processor plan another realizes the way, Link between the Ports support processor reaches as high as the 4GB each second data transfer rate, each main line also provides 1GB each second speed, i.e. four main lines altogether provide between 4.87GB each second processor to correspond the band width.
Sharing saves the parallel DSP system
According to the structure difference, the multi-processor parallel system may divide into the distributional parallel DSP system and the shared buffer memory type parallel DSP system, ADSP-TS201S supports these two kind of parallel processor structure. Common sharing saves parallel DSP system structure as shown in Figure 1.
Sharing saves the parallel DSP system’s superiority: Because it uses the sharing memory structure, therefore quite economical memory resources. Next, shared the main line to save the main line resources, might raise system’s resources utilization ratio. Finally, is also most main, it uses main from the type joint operation, causes various processors division of labor to be clear, is advantageous realizes and debugs.
However considered when the image arranges the decoding, will share the memory and the main line completely will often cause the system resources to be scarce, moreover when the request mass data real-time processing (for example to high clear image, video frequency will arrange decoding), will share completely saves cannot be competent. At the same time, uses DSP to take the master-control unit, is not advantageous for the later system upgrade and the maintenance. Finally, the sole sharing memory type structure is inferior obviously between the DSP correspondence aspect distributional. As follows introduces saves the parallel DSP system based on the FPGA improvement sharing, better display sharing saved the parallel DSP system’s superiority, simultaneously improved the above shortcoming.
The real-time image coding system realizes
This system uses the improvement the sharing memory structural design, saves the parallel DSP system speaking of general sharing has the following characteristic:
· Between SP uses the space-coupling type, even more is advantageous for between DSP the data exchange;
· introduces a DSP bunch of concept, uses a DSP bunch of sharing to save, solution memory resources bottleneck;
· FPGA makes the master-control unit, favors the hardware to realize and easy to maintain;
· the extension is strong, may the cascade adapt a higher request;
· uses the independent power supply power supply, reduces the power source line to the system line influence
This system is composed of two parts, the first part to process part (Processboard), the second part for control pretreatment part (Mainboard).
Processing partial structures
Processboard is composed of four piece of ADSP-TS201S, between DSP uses the loose and tight coupling the way, constitutes a nimble highly effective multiprocessing unit parallel structure. The so-called loose coupling is refers to four piece of DSP to use Link ports to realize the bidirectional interconnection way. The close coupling is refers to two piece of DSP to constitute one bunch, the DSP external bus continually on a bunch of main line, the exterior memory also meets on a bunch of main line. The exterior memory and each DSP internal memory takes the shared resource to be possible by main line’s on DSP visit. This way uses AD fully on I company DSP piece the seamless connection superiority. Processboard structure drawing as shown in Figure 2.
This system four piece of DSP puts out three group of Link the ports resources constitution bidirectional cross link connection, moreover each piece of DSP also has group of Link ports and the Mainboard connection, uses between the system continually part the data communication. This kind of main chip approximate full symmetry’s structure is advantageous to the PCB reasonable wiring. All DSP data, the address as well as the control signal and so on are connected through a 150 needle’s connection and Mainboard, constitute a complete system platform.
In such system structure, waits processing the signal to be possible to deliver after Link ports to Mainboard FPGA, or and transforms the chip input through the speed quicker LVDS connection after the string. Because two bunch of main lines meet to FPGA, therefore the data-in may make the data bus switch through the FPGA interior, causes two DSP bunches to be possible “the pingpong” the read and processing continuously inputs treats the processing signal. Processing completes data after Link ports back to Mainboard. Regarding one DSP bunch, the use close coupling’s way, with a 8M×32 position’s SDRAM depositing block data, through the DMA technology may when DSP essence clear signal processing the high speed transmission data, enhanced timeliness and maximum limit alleviated the main line bottleneck. Bunch in DSP and the auxiliary equipment connection through 32 bit address main line interconnection, maps the unification storage space. Therefore equates visit to exterior storage space’s in visits the exterior interface equipment. The external bus work in 100 MHz, the monolithic DSP main line turnover rate amounts to 1 GB each second.
Control pretreatment partial structures
Mainboard is composed of two FPGA and piece of ADSP-TS201S. The system uses the modular design, may divide into three modules. The control module completes system’s control function by two FPGA. Post-processing module including DSP and peripheral circuit. The expansion module is composed of eight 150 needle’s connections, completes with the Processboard correspondence. For the expansion storage space, on this system has contained four piece of SRAM and four piece of 16 SDRAM (divides equally is two groups, expansion is 32), two FPGA may also use for to make part of pretreatments (for example in JPEG2000 image compression code wavelet pretreatment). Mainboard structure as shown in Figure 3.
System power supply plan
Because this system’s primary device are many and the majority of working power is big, on each block the respective design power supply system is inappropriate. Simultaneously because in the system the data exchange speed amounts to several hundred megabit per second, will interfere with the system normal work the power source integration to the system interior circuit wafer even to cause each kind of wiring question.
This system has used the independent power supply power supply, namely the overall system power supply comes from an independent design the electrical power system. This power supply way is similar on individual PC the power supply system. Electrical power system structure as shown in Figure 4.
The electrical power system chip uses TI Corporation’s power source module PTH series, this series chip has stable easy to use and to supply power the power high characteristic. after 5V input voltage system, passes through voltage which five piece of PTH chips transform need (1.0V,1.5V,1.8V,2.5V,3.3V), sends in Mainboard and Processboard after the power source connection.
System work flow
When the high speed image data spreads to Mainboard big FPGA, this FPGA realizes to the data grouping and the pretreatment, if the data quantity surpasses the FPGA interior memory’s scope, puts in the data in the exterior memory to carry on the buffer; After the data grouping finished, was responsible by big FPGA the data transmission after the Processboard, Processboard parallel processing transmitted Mainboard small FPGA separately again (, if data grouping were suitable, data synchro to small FPGA), finally realized after small FPGA the data reorganization and the pooling function, delivers to Mainboard DSP carries on later period processes, processes again by the output port outputs. Such system has completed treating processes.
When system work the most major problem is the parallel task assignment and the scheduling problem. The task allocation and the algorithm quality immediate influence to parallel system’s performance, affects the system to carry out the duty efficiency, then influence system’s timeliness. In the multi-DSP system, the task allocation enables each processor to be able to assign evenly to the sub-duty, causes each processing unit the idle time to reduce, thus obtains high carries out the efficiency. For example in the image coding an image average division, enables various processors to obtain the same size sub-image, raised the system coding efficiency effectively, satisfies its timely request. As a result of this system’s order of complexity, uses FPGA to make the task allocation and the dispatch is feasible.
Real-time image coding system’s expansion
Because uses the separation design which in the design controls and processes, this system has the very good extension. As shown in Figure 5, the system supports to Processboard and the Mainboard expansion, concrete manifestation in the following several aspects:
(1)Processboard expansion - Mainboard has given four group of effective Processboard expansion connection, i.e. are most may expand to four Processboard. In the chart has drawn eight connections, 221 groups, the goal is the convenience debugs between two boards the correspondences, but also favors overall system’s radiation.
the (2)Mainboard board level expands - the connection which reserves through the board on, Mainboard may realize 32 cascade expansions. When joins other Mainboard, the system working is as follows: After the cascade, each first-level is responsible for own duty specially, first board’s all FPGA uses for to realize the pretreatment and the basic control, other cascade board may use for to complete is similar to the Processboard duty, may also complete multistage processing the duties, this working is suitable for multistage processing.
When Processboard expansion will meet the clock resources deficient question, because each piece of FPGA will provide the overall situation the resources only will be 16 throughout, but time each expansion together Processboard will need the resources will be 5, will therefore use time monolithic FPGA will be most can only realize three Processboard expansions. In order to solve this problem, we have used the double FPGA design method, two FPGA governs Processboard separately the turnover clock, not only like this designs in a big way turns the clock resources time, meanwhile may facilitate FPGA the programming, is advantageous for the system debugging.
System performance
Selection resolution respectively be 1600×1280, 1280×1024, 1024×960 6 images, the use real-time image coding system carries on the image coding, uses the JPEG2000 image compression code standard, Table 1 and uses the KDU algorithm for this system’s compression result the software to reduce the result, can discover through the contrast, this system compression effect’s PSNR value and the KDU difference is really small, moreover may achieve in the resolution 1440×1280 situation realizes each second 45 element compression speed, definitely may be competent the image sequence and the high clear video frequency real-time compression request.

Conclusion
Through to shares the memory parallel DSP system’s research and the improvement, this article take the high performance ADSP-TS201S chip as the foundation, designs an improvement based on the FPGA multi-processor parallel system. Proved through the practice that this system can realize to the high clear image and the video frequency real-time compressed encoding.
Reference:
1. Analog Devices INC. ADSP-TS201 TigerSHARC Embedded Processor Data Sheet (Rev. A). 2004
2. Analog Devices INC. ADSP-TS201 TigerSHARC Processor Hardware Reference, Revision 1.0, November 2004
3. Liu Shuming, Su Tao and so on. TigerSHARC DSP application system design. Beijing: Electronics industry publishing house, 2004.
4. King two strength, Li Shijie. Based on ADSP-TS101 multi-DSP parallel processing system. Monolithic based on embedded application, 2005(12)