• Based on XScaIe processor’s video communication system

    The abstract main introduction in builds the video communication prototype system’s concrete method based on the xscale processor (PXA255) embedded development platform. The overall system altogether divides into gathering, the demonstration. The code, the decoding, the transmission, receive 6 functional modules, introduced emphatically the video frequency arranges the decoding part on embedded system optimized method. The experimental result indicated that in based on xscale(PXA255) processor’s embedded system platform, this video communication prototype system may satisfy the real-time video communication basically the request, and has the good extendibility.
    Key word Xscale H.263 video frequency code optimization PXA255

    Introduction
        After PC time arrival and Internet network swift development, the people already did not satisfy in are limited in the PC platform video communication, may carry on the video communication through the wireless network the mobile equipment to have the very big social demand anytime and anywhere. It is well known, the video frequency is one kind of class characteristic service, the data quantity is very big; Moreover the real-time video communication request carries on the high efficiency, the high proportion compression to the video image, the computation order of complexity is high. If uses on existing Pc directly the algorithm, the embedded equipment limited battery energy and the operational capability satisfy with difficulty carry on the real-time video communication the demand, therefore needs to rest on the embedded equipment’s characteristic to make the improvement and the optimization to the algorithm, thus reduces the operation the order of complexity. Based on the xscale processor’s video communication prototype system, will satisfy the migration video communication request initially, this article will introduce that specifically this system realized, the optimized method and the experimental result.

    1 system disposition
        The hardware aspect, this system uses Intel Corporation’s sitsang Board (based on XScale the PXA255 processor) to take the hardware platform, uses as connection chip USB Camera takes the image gathering equipment take OV51l, Symbol Corporation’s Spectrum24 WiFi CF Card takes the wireless network transmission facility. The system pounds structure diagram like chart l to show.

        The software aspect, the operating system uses Linux-2.4.19-rmk7 the edition the embedded Linux essence, the graphical interface environment uses the MiniGUI1.3.3. network transport protocols to use 802.11b.

    2 system software designs
    2.1 functional module design
        This system terminal needs to have following function: According to user’s needs,①Only demonstration local image;②Only demonstration long-distance image;③Simultaneously demonstrates the local image and the long-distance image. In order to realize the function choice haphazardness, carried on the modular design to the system software, software module diagram as shown in Figure 2.

        ①Image gathering module. Transfers Vide04Linux the module the API function to carry on the compilation, gathers the YUV form for the system this locality real-time image data.
        ②Image display module. Based on MiniGUI 1.3.3 graph storehouse compilation. And uses in MiniGUI the YUVOverlav technology to carry on the demonstration directly to the YUV image data.
        ③Image coding module. Uses the H.263 code standard, carries on the compressed encoding to the local image data.
        ④Image decoding module. Uses the H.263 decoding standard, carries on the decoding to the long-distance image data. This module and the image decoding module constitute this system’s core together.
        ⑤Wireless network service module. Uses the 802.11b agreement, has introduced the RTP agreement pack mechanism, has realized based on the UDP transmission mechanism transmission module and the receive module.

    2.2 software design flow
        In the system local demonstrated that the long-distance demonstration, the transmission and the receive need to carry out concurrently, therefore the system uses the multi-thread programming technology. This system altogether founds gathering, the demonstration, the code, the decoding, to transmit and to receive 6 threads, as shown in Figure 3. And, during the reasonable effective thread’s correspondence and the incompatible mechanism are guarantee the procedure to be able the smooth highly effective execution key.

    3 system performance optimizations
        The embedded equipment computing power is limited the question as well as the power loss question existence, causes to realize the real-time video communication on the embedded equipment to have the challenging. This needs to rest on the embedded equipment’s characteristic, uses the computing resource fully, designs the more reasonable software construction, and uses a computation order of complexity smaller algorithm to carry on the optimization to the system. Below introduces in this system’s optimized strategy specifically.

    3.1 software frame level optimization
        In the multi-thread mechanism, between each thread passes “the time piece” the mechanism time sharing multiplying CPU resources. If does not carry on the optimization, is unable to guarantee that obtains the time piece thread to be at the effective execution condition, but needs the CPU resources the thread to be able to obtain the time piece very quickly.

        In this system between 6 threads has the obvious dependence. If the code thread does not complete, then the transmission thread will not have the data pool, if the thread switching time piece will be 200 ms, in transmission thread’s 200 ms, CPU has been at the idle operation condition. Therefore speaking of the overall system, if does not add any optimization to process .CPU only then 30% about time to be at the effective execution condition. This system’s optimized strategy uses the system call usleep() function to cause to be at the invalid condition the thread to release the CPU resources as soon as possible, realizes the method to be as follows:
        while(1) {
        if (this thread flag bit by triggering) {
        ……

        usleeD(1000)
     }

        Through inserted the usleep() function call in the code appropriate position, the CPU use factor enhances from 30% to 96% above, thus raised the computing resource effective use factor greatly, enhanced overall system’s performance.

    3.2 algorithm level optimization
        This system’s hard core by code module and decoding module constitution. And the code module’s order of complexity must be bigger than the decoding module far the order of complexity, becomes overall system’s bottleneck. This article main introduction code module optimized strategy.

        This system uses tmn-1.7 to take the code module the main source. tmn-1.7 follows the standard H.263 to arrange the decoding standard, therefore has not considered the embedded equipment’s operation characteristic. And what affects to this system is most obvious is discrete cosine transformation algorithm (DCT) as well as motion compensation algorithm (ME). Below proposes the optimized method in view of these two algorithms.

    3.2.1 discrete cosine transformation algorithm
        The DCT algorithm the image transforms likely from the prime field after the frequency domain, the image majority of energies concentrate the direct-current coefficient component as well as in the low frequency exchange coefficient component, thus is more advantageous to the elimination space redundant information.

        The DCT transformation’s principle is: Through linear substitution x=Hx N Uygur vector x transformation for transformation ratio vector x, transforms nucleus H is:
       
        And in the transformation nucleus’s element H(k, n) is an irrational number. This to majority does not have the floating point to cooperate processor’s embedded equipment to realize the real-time video communication is the very big bottleneck, therefore proposed that rewrites the integer DCT transformation algorithm the floating number DCT transformation algorithm the plan.
     
        In order to realize this plan, the most essential question is produces one to satisfy the transformation nucleus the orthogonal request, and only contains the integer coefficient the transformation matrix. Its basic mentality is expands the irrational number takes again entire, namely:
        Q(k, n)=round(aH(k, n)) (2)
        Following rests on the method which this system uses, the introduction integer DCT algorithm.
        First, introduced that in this system this algorithm the data indicated:
        the int*dataptr– direction deposits the DCT coefficient temporarily the memory space pointer:
        int*blkptr– direction depositing primitive block data memory space pointer;
        int*coeffptr– direction depositing finally DCT coefficient memory space pointer.

        Then, carries on scaling to the correlation coefficient as well as the constant. To 8×8 when the block carries on the DCT transformation, uses the advanced all the various professions transformation, enters the ranks transformation again the method. Below take gains a DCT coefficient process as the example explanation.
        #define CONST__BlTS 13
        #deflne PASS BITS 2
        #define F1x_0_541196100 ((int) 4433)/*O.541196100<<
        CONST_BITS*/
       

        Finished to this line of transformation. The union formula (1) may see, after going through another firm as a middleman the transformation compared to the primitive DcT transformation enlarges 22 times; Likewise, after again undergoes a row transformation, the coefficient increases 2, /2 times, crosses the ranks transformation, after immediately altogether enlarges to 8 times. Is final in the algorithm, will return to original state in proportion:
        block[i] = (short int) (data[i]>>3);

    Through the zigzag scanning matrix, fills again the coefficient in the coeff matrix:
        * (coeff zigzag[i][j])=*(bLock i*8 j);

        In the integer DCT algorithm, eliminated the floating number operation through proportion scaling, and the majority multiplication and division operations use shifting way processing, conforms to the CPU operation characteristic, thus raised the operation efficiency and the compression speed large scale.

    3.2.2 motion compensation algorithm
        The motion compensation algorithm uses in removing between the neighboring image the time redundancy information. In this algorithm, the optimum matching block searching algorithm operand occupies the major part. the tmn-1.7 encoder uses the screw type entire searching algorithm, although the accuracy is high, but the speed is very slow. In order to adapt the real-time video communication as well as the embedded equipment operational capability low request, this system uses the two-dimensional logarithm drop law.

        The two-dimensional logarithm drop’s method principle is, through the fast search track smallest MAD spot, as shown in Figure 4. Take the movement vector (O, O) spot as the initial station, by cruciform distributed 5 spot constitution each time search spot group. If the smallest MAD spot appears in the cross group edge, then the next time search take this spot as a center, the length of stride is invariable; If the smallest MAD spot appears in the cross group center, then the next time search still take this spot as a center, but the length of stride halves; If the new cruciform search center appears in the search window edge, then the length of stride halves. So the circulation, until the length of stride is 1, then smallest MAD namely for optimum matching spot.

        In this system realizes as follows to the two-dimensional logarithm drop method:
        whlie (length of stride step>=1) {
        for (in current search cross group each spot) {
        sad=SAD_Macroblock (current search block indicator); /* obtains SAD value */if(sad<Min_FRAME) {/*, if is smaller than the current smallest SAD value, the record works as the previous letter
    Rest */
        }
      }
        if (the smallest MAD spot is current search center) {
        step=stet) /2; // renewal search footway
        }
      }

        This algorithm reduced the movement search search number of times large scale, thus reduced the operand greatly, enhanced the frame to ask the code the speed.

    4 system performance analyses
        Below carries on the test and the analysis to the real-time gathering QCIF image sequence. The testing environment is Intel the Sitsang hardware platform. This hardware platform uses the PXA255 processor, basic frequency 400 MHz; 64 MB SDRAM; The operating system uses Embeded Linux 12.4 .19-rmk7.
     
        Real-time video communication system main performance index for frame rate, image compression ratio and signal-to-noise ratio. Affects these three factors the main module for gathering, the code, the decoding as well as the receive and the transmission module. The following carries on the performance analysis in view of various modules.

    4.1 module performance analysis
    (1) gathering module
        Sitsang on the board USB connection is the USBl.1 type. Might know by Table 1, gathers the data the speed to achieve USBI.1 basically the agreement 12 Mbps through-put capacity upper limit, therefore, the gathering image’s speed was decided completely by the image form and the picture size. In order to guarantee the image timeliness, uses YUV420176×144 the plan.

    (2) code module
        The code module after rewriting and the optimization, the code speed has the very big enhancement, basic may satisfy the real-time video frequency transmission the speed request. Concrete data like table 2 arrange in order. In current compression speed situation, but can also obtain the ideal compression ratio and the signal-to-noise ratio, thus guaranteed that the real-time video communication the quality, like Table 3 arrange in order.

    (3) decoding module
        Passes through to the original procedure reduction and rewriting, the decoding speed may achieve 84 fps.

    (4) transmission and receive module
        This system uses WiFi CF Card to take the network transmission facility, uses the 802.11b agreement, has introduced the RTP agreement pack mechanism, has realized based on the UDP transmission mechanism transmission module and the receive module. the 802.11b band width reaches 11Mbps, in Table in 3 compression ratio situations, may transmit frame rate >1000fps, satisfies the real-time video frequency transmission completely the request.

    4.2 system performance analyses
        After the module optimization and the system conformity, the prototype system on the Sitsang board simultaneously demonstrated that the local image and the long-distance image may reach 8 /s, has met the real-time video frequency requirements basically. Because used the gathering thread in the system frame design one local to demonstrate a thread code thread transmission thread suppressed mutually mechanism, thus has realized the local end image and the long-distance end image synchronization basically.

    Conclusion
        This article designs “based on embedded equipment’s video communication prototype system”, although is only an embryonic form, but actually completely has realized the real-time video communication function, and for realized the real-time video communication on the embedded equipment to provide the feasible frame and the mentality. This system has the good extendibility, for realized the video frequency conference as well as the videophone system on the embedded equipment has provided the valuable reference.

    Share/Save/Bookmark

No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

TOP
Copyright © 51 Research and Design, Electronic Engineers website - Embedded Systems, MCU, DSP, EDA, Test and Measurement, Components, Communications, Power, Microelectronics, Semiconductors
Powered by WordPress | Theme by mg12 | Valid XHTML 1.1 and CSS 3