• Based on DSP audio frequency conference signal synthesis algorithm research

      Along with in the digital signal processing (DSP) algorithm and chip handling ability as well as aspect and so on communications network architecture optimization unceasing development, the modernized correspondence already rapidly popularized. The audio frequency conference is the numerous communications system’s necessary function. Has many user participation audio frequency conference, the simple the pattern may use under the token control the incompatible pattern, enables only then to have right to speak that people in attendance only then to be possible to speak. Under this kind of pattern, each people in attendance one time can only hear a group tonic train signaling, this kind “the half-duplex” the pattern is not convenient and is not practical regarding the audio frequency conference.

      The genuine teleconference must simulation many people in attendance carry on the dialog in a conference room the situation. But as a result of attending terminal in physics not in the same place, but each terminal only then a set of audio frequency output unit (power amplifier sound box), must simultaneously transmit for each terminal audio frequency class also can only use a group channel. In order to cause each terminal simultaneous reception many people in attendance’s sounds, must use the multi-channel audio frequency synthesis plan. Teleconference’s characteristic is the conference site use microphone and the speaker, this way is very easy to create the echo trouble and to whistle by. Generally the conference signal processing algorithm main attention is also this aspect, usually uses the echo counter-balance the method. But this way regarding conference signal’s processing is not most perfect and effective [1]. After the research, uses has the silent examination, the normalized calibration, the auto-adapted echo counter-balance algorithm synthesis technology to be able to realize the conference simulation effect very really.

    1 conference signal synthesis realizes the plan

    1.1 conference signal synthesis rationality and necessity

      The audio frequency class does not look like the typical video frequency class to hold the only position equally in the spatial/time domain, does not have any significance in the same time and the position signal element superimposition. But the person ear may the sensation broadcast many audio frequency class in the identical spatial/time. This is the conference signal synthesis rationality and the necessity. Through conference signal synthesis, multi-channel audio frequency class input after processing, provides a list delivery channel output synthesis audio frequency.

    1.2 conference signal synthesis key aspect

      When many audio frequency sources when a spatial broadcast, the person ear hears the sound wave is each acoustic source sound wave linear superimposition, this is precisely simulates the tonic train signaling synthesis the foundation. This fact indicated after the digitized the pronunciation carries on the synthesis also to use the way which the linearity superimposes. The supposition has n group input audio frequency class to carry on mixes the sound, Xi(t) is the t time ith group input pronunciation linear sample, then the t time mixes the phonetic value is:

      m(t)=ΣXi (t), i=0,1,…, n-1

      The voice signal is continual, the time request strict one kind of class media signal, it has the short-time steady characteristic in the time domain. Carries on processing to the voice signal a basic concept is carries on the sampling to the voice signal, obtains the pronunciation sample carries on processing take the buffer as the unit, namely to pronunciation type duty frame. Pronunciation processing many concepts are based on the pronunciation frame, for instance has the sound/silent, the energy, the autocorrelation and so on. The pronunciation frame’s length uses 10~20ms generally. The digital audio frequency’s key parameter is a sampling rate, each group input audio frequency class synthesis’s premise must be the use same sampling rate.

      Along with pronunciation channel quantity increase which needs to synthesize, in does not take any attachment preventive measure in the situation, some by no means conference desired signal (for example acoustical feedback and noise) will accumulate cause the quality deterioration, will let person unacceptable. The echo which by the local amplifying system specially has the electroacoustical feedback which causes to create the regeneration reverberation, its result has affected the pronunciation clarity seriously. What is more fatal is when the acoustical feedback is serious will have the autoexcitation, will cause the entire communications system to be unable the normal work. Must therefore carry on to each terminal input audio frequency has the silent examination and the acoustical feedback suppresses processing.

      When speech synthesis should pay attention to the summation sample the dynamirange, this has drawn out the normalized calibration question. The digital audio frequency profile theory definition, the calibration is inspects some designation the frame, found the oscillation amplitude peak value, and adjusted from this is elected the frame overall volume, with the aim of causing the permission the amplitude to be biggest, and will not overflow. The speech synthesis is one kind of editor who carries on to the digital profile, especially needs to solve the normalized calibration problem.

    2 conference signal synthesis key technologies research

    2.1 auto-adapted echo counter-balance algorithm

      The digital echo counter-balance’s rationale is the auto-adapted filter technology. Along with the DSP fast development, the digital echo counter-balance has been able to perform on DSP to apply well. Has the echo most primary cause in the teleconference is the far-end conference signal causes the regeneration reverberation which after the local speaker system the sound field back coupling which produces in the indoor to the microphone the echo creates.

      The echo counter-balance must estimate precisely the echo way characteristic and fast adaptation its change, according to teleconference’s characteristic, the use interference counter-balance model are the best ways. This model is one has two input end’s auto-adapted filter, as shown in Figure 1. It the local microphone will output takes the primary signal, but takes the local speaker’s input the reference signal. After auto-adapted echo counter-balance processing, can suppress the local microphone’s output to feed effectively after the indoor sound field to microphone’s electroacoustical feedback (echo), thus realizes the auto-adapted acoustical feedback (echo) counter-balance.

      The echo counter-balance’s core is the auto-adapted filter algorithm. Common algorithm including the SDA algorithm and LMS algorithm. Because in the SDA algorithm the gradient computation involves the matrix, does not suit the practical application. Through its derivation LMS algorithm simple practical, the counting yield is high. TI Corporation’s DSP chip TMS320C54X has the special LMS instruction to use in accelerating the adaptive filtering algorithm. In practical application, but may also obtain the revision filter coefficient algorithm in the LMS algorithm foundation:
      
      The detailed auto-adapted echo counter-balance algorithm computation step is as follows:
      (1) sampling value;
      (2) acts according to preceding time predicted value and the filter coefficient revision algorithm, makes the coefficient adjustment;
      (3) computation far-end estimate energy;
      δ2 [k] = (1- Alpha) δ2 [k-1] α X2[k]
      (4) carries on the FIR filter computation, obtains filter’s output y(n) and error signal e(n);
      (5) data output;
      (6) skips to the first step.

    2.2 whether there is sound energy examinations

      Has the silent examination in the ITU-T agreement is the pronunciation activation examination (Voice Activity Detection). In the multi-spot audio frequency conference, has the silent examination to cause in some time interval actual speech synthesis terminal number big old and young in the people in attendance number, reduced the synthesis operand, lightened the processing chip burden. Simultaneously is also the microphone auto-adapted gain control AGC foundation.

      In the digit voice signal, has the silent examination is the proceed signal energy, the zero crossing rate parameter combination, carries on the comparison with the initialization energy threshold value to obtain. Is uses a fixed width based on the short-time average energy’s computation the sliding window, inputs a newest sample every time, before calculating this sample the window cover’s all sample’s median energy, judges it with an energy threshold value comparison this new sample is the static sound or has the sound.

      As stated above, carries on the examination take the frame as the unit to the digital pronunciation, if in some have any sample to have the sound, then this has the sound. The window take the frame as the unit glide, but is not take the sample as a unit, has directly depending on each frame’s last sample silent determined that this has the sound frame or the silent frame, this simplification’s judgment way has saved the operand greatly. And does not have the influence speaking of the judgment result.

      Uses the auto-adapted change the energy threshold to be possible for to have accurately silent judges. May obtain the background noise energy through the sample short-time energy first-order linear low-pass filtering. But the auto-adapted energy threshold value maintains with short-time background noise energy static sound examination sensitivity constant ratio So. The long time speaks continuously can elevate the background noise the estimated value, this correspondingly enhanced the static sound examination energy threshold, has the possibility to create follows closely the low peak-to-peak value speech which occurs to treat as the static sound, but has not been examined. When therefore examines the voice may through the change low pass filter’s cut-off frequency estimate the noise energy.

      How while filters the static sound to pay attention to retain the short-time energy relatively low weak tonic train signaling, like fricative and consonant. These weak signal’s existence has guaranteed the pronunciation semantics integrity, therefore should also unify the zero crossing rate distinction beside the short-time average energy judgment to retain these weak tonic train signaling. Selects the residual sound generator’s method to be possible to realize the weak tonic train signaling retention, namely the residual sound generator will follow close on after a pronunciation string first several. The so-called silent frame still should treat as has the sound, thus avoids the low level pronunciation suppressing. ITU-T G.723.1A has made the detailed design to the residual sound generator algorithm, does not make the detailed description in this.

    2.3 normalized calibration processing

      When multi-channel voice signal synthesis uses the linear superimposition, must solve how is the question prevents the superimposition to have the overflow to cause the distortion. If the sampling sample is 16bit, but sums the buffer is also 16bit, that two group audio frequency class easy to cause the summation area overflow. Even if has provided the high accuracy summation buffer, will cause not to overflow in the summation process, but this cannot guarantee that the summation result the peak-to-peak value will suit the output hardware component’s request scope (DA component scope usually is 16bit).

      The simple method is to the outside field value clamp. A better method is divided the frame to the summation result to carry on the normalized calibration, is specifically: To some summation pronunciation frame all analysis of sample, if the sample S value has surpassed the most wide range which the component can express, then after S all samples are multiplied by attenuation factor f. And f is can cause S to satisfy the output component scope the maximum value, obviously, the f absolute value is smaller than 1. Like this in clamp’s period of time, between the pronunciation sample’s size is relatively invariable.

      In the experiment selected general 16bit fixed-point DSP chip TMS320C549 to carry on the real-time emulation to complete the multi-channel audio frequency class the synthesis. In various routes sample additive poricess, the summation value will not overflow, because the sample will be 16bit, but the accumulator will be 32bit. But is very easy with the value to surpass the output hardware equipment permission the scope (16bit).

      When normalized calibration processing, initialization attenuation factor f is 1, each time starts when processes a new sample buffer, any sample S has surpassed the scope, the S clamp, and obtains S and permission scope value ratio f, divides f in the succession superior after the S sample. But to avoid the pronunciation nonessential weaken, but the clamp operation has lets the f tendency getting smaller, therefore needs to have lets the place which f increases, this has the entrance which starts in each new sample buffer to process. The new buffer sample still needed to weaken the possibility is very big, therefore f does not suit each time from 1 start, but is the value which should inherit to a certain extent. Namely in each new sample buffer entrance, so long as f is not equal to 1, for slightly is bigger than its adjustment f a value, enables it to become the new attenuation factor. If the sample indeed does not need to weaken, after passing through certain, f will change slowly 1.

      In fixed-point DSP is not easy to use the division, therefore may make all f value a table, the f value scope definition is 1/16, 2/16, until 15/16, its weaken precision is 1/16. When S has the clamp, extracts appropriate f with the comparison test or the table look-up law (one of 15 values). The reason that the consideration is 1/16 length of stride, is because it already might guarantee that 16 inward flux summation cannot overflow, if also a need greater precision, may take 1/32 (2 n powers of exponent to realize conveniently by fixed-point DSP).

      Induces, the normalized calibration’s core thought is: f must turn the appropriate attenuation factor very quickly, causes the sample not to be able to overflow, then the f diastrophism will return slowly 1. S has when the clamp f is calculated immediately, but processes a summation frame every time after the time, attempts f to 1 to approach, f each time increases it and 1 differential value 1/16. Namely: f ‘ = f (1-f)/16. Concrete calibration flow chart as shown in Figure 2.

    3 experimental analyses

      Simultaneously inputs 10 group audio frequency class to mix the sound module, each road’s sampling rate is 16kHz, the frame size chooses 10ms, namely 160 samples.

      When carries on the counter-balance to the electrical noise, regarding the band width is 3kHz (300~3 300Hz) the wide band stochastic white noise, counter-balances the degree to surpass 42dB. In outside, its reverberation time is small, surpasses 30dB to broadband noise’s sonic disturbance’s counter-balance degree. In a reverberation more serious laboratory, the sonic disturbance counter-balance degree may also surpass 15dB.

      Indicated after the sense of hearing experiment that can distinguish each group sound clearly after the calibration and echo suppression’s synthesis pronunciation class output.

      Uses the Matlab comparison for to output carries on the simple clamp and the output calibrates two ways the pronunciation time domain profiles, because may observe in the former profile to have many overflows causes “the clipping”, but the latter’s wave distortion is small.

      The digital audio frequency class synthesis regarding the multi-spot audio frequency conference system is essential. First flows in after the input multi-channel audio frequencies whether there has passed by the sound energy examination and echo suppression processing the effective input signal linearity superimposition, then carries on the gain calibration in order to reduce distorts, satisfies output unit’s request. Through fixed-point DSP realization as well as the experiment proved that under this kind of pattern the audio frequency conference signal synthesis algorithm can make the very good conference progress.

    Reference
    1 week continuous rain. The DSP communication engineering technology applies [M]. Beijing: Defense industry Publishing house, 2004:301~315
    2 Yang Xingjun. Pronunciation digital signal processing [M]. Beijing: Electronics industry publishing house, 1995:154~157
    3 ITU-T G.723.1 Annex A:Silence Compression Scheme. ITU,1996

    Share/Save/Bookmark

No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

TOP
Copyright © 51 Research and Design, Electronic Engineers website - Embedded Systems, MCU, DSP, EDA, Test and Measurement, Components, Communications, Power, Microelectronics, Semiconductors
Powered by WordPress | Theme by mg12 | Valid XHTML 1.1 and CSS 3