• With realizes - en.51rd.net based on DSP and the FPGA robot voice control system design

    1 introduction
        
         The robot sense of hearing system is mainly carries on the speech recognition to person’s sound and makes the judgment, then the output corresponding action command control forehead and arm’s movement, the traditional robot sense of hearing system is generally carries on the control take PC machine as the platform to the robot, its characteristic is carries on the control with a computer as robot’s information processing core through the interface circuit to the robot, although handling ability is quite formidable, the pronunciation storehouse is quite complete, the system renewal as well as the function development are quite easy, but is quite unwieldy, does not favor under robot’s miniaturization and the complex condition carries on the work, in addition the power loss is big, the cost is high.

         This design used performance-to-price ratio high digital signal processing chip TMS320VC5509 to take the speech recognition processor, had the quick processing speed, caused the robot under the off-line state, the independence completed complex voice signal processing and the action command control, the FPGA system’s development reduced the sequential control electric circuit and the logic circuit area [1] which occupied in the PCB board, caused the robot ” the cerebrum ” pronunciation processing part microminiaturization, low power loss. A volume small, the low power loss, the high velocity can complete the particular range speech recognition and the action command robot assembly system’s development have the very big practical significance.

    2 system hardware system design
     
         System’s hardware function is realizes pronunciation instruction gathering and step-by-steps electrical machinery’s actuation control, provides for the system software develops and debugs the platform. As shown in Figure 1.

         The system hardware divides into the voice signal gathering and the broadcast, based on the DSP speech recognition, the FPGA action command control, step-by-steps the electrical machinery and the actuation, the DSP external connection dodges saves the chip, the JTAG mouth simulation debugging and the keyboard controls several parts. The work flow is the microphone transforms person’s voice signal as the simulated signal, in transforms the digital signal after the audio frequency chip TLV320AIC23 quantification to input DSP.DSP completes the recognition, output action instruction.

         FPGA produces the correct reverse signal and the accurate pulse according to the DSP input’s action command for step-by-steps the motor-driven chip, the actuation chip provides step-by-steps electrical machinery’s driving signal, the control step-by-steps electrical machinery’s rotation. Outside the piece FLASH uses in the memory system procedure and the pronunciation storehouse and completes system’s on electricity load. The JTAG mouth uses in with PC machine carrying in-line online simulation, the keyboard uses in the parameter adjustment and the function cut.

    3 speech recognition system design

    3.1 voice signal characteristic

         Voice signal’s frequency component mainly distributes between 300~3400Hz, according to the sampling theorem selection signal’s sampling rate is 8 kHz. A voice signal’s characteristic lies in him ” short-time “, sometimes presents the stochastic noise in a short time interval the characteristic, but another section of performance periodic signal’s characteristic, or the two have both at the same time. Voice signal’s characteristic is along with the time variation, only then in period of time, the signal only then displays the stable consistent characteristic, generally speaking the short time interval may take 5~50 ms, therefore voice signal’s processing must establish in it ” short-time ” on [2], the system supposes the voice signal frame size is 20 ms, the frame moves supposes is 10 ms, then each data for 160×16 b.

    3.2 voice signal gathering and broadcast

         What pronunciation gathering and the broadcast chip use is TI Corporation produces TLV320AIC23B, TLV320AIC23B a/d conversion (ADC) and the digital-analog conversion (DAC) part integrates highly in the chip, the chip uses 8 k sampling rates, the single track simulated signal input, the duplex sound track outputs. TLV320AIC23 has the programmable characteristic, DSP may edit this component’s control register through the control interface, moreover can translate the SPI, I2C two kind of specifications the connections, TLV320AIC23B and DSP5509 electric circuit connection as shown in Figure 2.

         DSP uses the I2C mouth to carry on the establishment to the TLV320AIC23 register. When MODE=O, is the I2C specification connection, DSP uses the main transmission pattern, is 0000000~0001111 11 registers carries on the initialization through the I2C mouth to the address. Under the I2C pattern, the data divides into 3 8 b to read. But TLV320AIC23 has 7 bit addresses and 9 bit data, i.e., needs to supplement the data item above highest order in second 8 B last.

         The MCBSP serial port passes 6 pin CLKX, CLKR, FSX, FSR, DR and CX and TLV320AIC23 is connected. The data after the MCBSP serial port and the peripheral device correspondence through DR and the DX pin transmission, the control synchronized signal realizes by the CLKX, CLKR, FSX, FSR four pins. The MCBSP serial port establishment is DSP the Mode pattern, then causes the serial port the receiver and the transmitter synchronization, and by the TLV320AIC23 frame synchronizing signal LRCIN, LRCOUT start serial port transmission, simultaneously will transmit the receive the data word length hypothesis will be 32 b (left sound track 16 b, right sound track 16 b) single frame pattern.

    3.3 speech recognition program module design

         In order to realize the robot to the non-specific person pronunciation instruction recognition, the system uses the non-specific person’s isolated word recognition system. The non-specific person’s speech recognition is refers to the pronunciation model by the disparity in age, the different sex, the different voice person to carry on the training, when recognition does not need to train may distinguish storyteller’s pronunciation [2]. The system divides into the pre-emphasis and adds the window, the dash examination, the feature extraction, and trains several parts with the pronunciation storehouse’s pattern matching.

    3.3.1 voice signal’s pre-emphasis and adds the window

         Pre-emphasis processing is mainly removes the glottis drive and the oronasal radiation influence, pre-emphasis digital filtering H(Z)=1 KZ-1, is the pre-emphasis coefficient, close 1, in this system k takes 0.95. Carries on the pre-emphasis to pronunciation sequence X(n), after obtaining pre-emphasis pronunciation sequence x(n):x(n)=X(n) kX (n one 1) (1)

        The system uses a limited length the hamming window to carry on the glide in the pronunciation sequence, with take intercepts the frame size as 20 ms, the frame moves supposes is 10 ms voice signals, uses the hamming window to be possible to reduce the signal characteristic effectively the loss.

    3.3.2 vertex examinations

         The vertex examination has in the enough time gap situation between the word and the word to examine the word the head end spot, generally uses the examination short-time energy distribution, the equation is:

         And, x(n) is the hamming window interception pronunciation sequence, the sequence length is 160, therefore N takes 160, for is very small regarding no word signal E(n), but regarding will have news number E(n) to increase rapidly for some value, from this may the area participle initial station and the completion point.

    3.3.3 eigen vector/feature vector/proper vector extraction

         The eigen vector/feature vector/proper vector is withdraws in the voice signal the effective information, uses in the further analysis processing. At present commonly used characteristic parameter including linear prediction cepstrum coefficient LPCC, US cepstrum coefficient MFCC and so on. The voice signal eigen vector/feature vector/proper vector uses Mel frequency cepstrum coefficient MFCC (the Mel Frequency Cepstrum Coeficient extraction, the MFCC parameter is based on person’s sense of hearing characteristic, he uses the person sense of hearing critical belt effect [3], uses the MEL cepstrum parsing technique to obtain the MEL cepstrum coefficient vector sequence to voice signal processing, expresses the input pronunciation frequency spectrum with the MEL cepstrum coefficient. Establishes certain in the pronunciation frequency spectrum scope to have the triangle or the sine shape filter characteristic bandpass filter, then the pronunciation energy spectrum through this filter group, asks each filter output, takes the logarithm to it, and makes discrete cosine transformation (DCT), then obtains the MFCC coefficient. The MFCC coefficient transforms may simplify is:


         And, i is the triangle filter’s integer, this system chooses P is 16, F(k) for each filter’s output data, M is the data length.

    3.3.4 voice signal pattern matching and training

         The model training soon the eigen vector/feature vector/proper vector carries on the training establishment template, the pattern matching soon in the current eigen vector/feature vector/proper vector and the pronunciation storehouse template carries on the match to obtain the result. The pronunciation storehouse’s pattern matching and the training use hidden Markov model HMM (Hidden Markov Models), he is a one kind of statistical stochastic process statistical property probabilistic model dual stochastic process, because the hidden Markov model can describe the voice signal non-stability and the variability well, therefore obtains widespread use [4].

         The HMM primary algorithm has 3 kinds: Viterbi algorithm, forward backward algorithm, Baum-Welch algorithm. This design uses the Viterbi algorithm to carry on the condition to distinguish that will gather the pronunciation the eigen vector/feature vector/proper vector and the pronunciation storehouse model carries on the pattern matching. The Baum-Welch algorithm uses for to solve the voice signal training, because the model observation characteristic is the frame independent, thus may use the Baum-Welch algorithm to carry on the HMM model the training.

    3.4 speech recognition procedure DSP development

         The DSP development environment is CCS3.1 and. DSP/BIOS, makes separately the speech recognition and the training program the module, defines for the different function, transfers in the procedure. The definition pronunciation recognizer function is int Recognizer (int Micin), the recognition result output function is int Result(void), the pronunciation trainer function is int Train (int Tmode, int Audiod), the action command input function is int Keyin (int Action[5]).

         The pronunciation recognizer’s function is transforms the current speech input the phonetic feature vector, and carries on to the pronunciation storehouse’s template matches and outputs the result, the speech recognition result correspondence pronunciation reply output which the pronunciation reply output function will gain, the pronunciation training is many disparities in age, the different sex, the different voice person pronunciation command input transforms as the training storehouse template. In order to prevent the sample to be wrong, each person’s pronunciation instruction needs to train 2 times, with is away from regarding 2 inputs with Euclidean space carries on the pattern matching, if 2 input similarities achieve 95%, then joins the sample collection. The pronunciation reply input function is for each pronunciation storehouse in the template input opposition speech output, serves the language reply purpose. System active status for effective language recognition subroutine, when training carries out the external interrupt, the execution training function, obtained the database template, the training finishes the returns. Flow chart as shown in Figure 3.

    4 robot’s action control system designs

    4.1 FPGA logical design

         The system through the voice control robot forehead movement, the forehead movement divides into high and low and about the movement 2 degrees-of-freedom, needs 2 to step-by-step the motor control, after DSF completes the speech recognition, outputs the corresponding action command, after the movement execution had ended, DSP issues the nulling operation instruction, the forehead returns to the preliminary test condition. The FPGA function is provides the DSP interface logic, the establishment saves DSP the instruction the RAM block, simultaneously produces step-by-steps the motor-driven pulse control to step-by-step the electrical machinery rotation direction and the angle.

         The FPGA component for the action command control unit, the design uses the FLEXlOKE chip, after receiving the DSP data, the concurrent control 2 groups step-by-step the electrical machinery. FPGA internal structure logic as shown in Figure 4, the FPGA interior establishes 2 parts as the electrical machinery pulse generator, controls electrical machinery’s work pulse as well as is reversing. AO~A7 is DSP data feeds the port, WR is the data writes the port, P1, P2 is 2 step-by-steps the motor-driven chip pulse input port, L1, L2 is the electrical machinery is reversing the control mouth, ENABLE is enables the signal.

         RAM1 and RAM2 respectively are 2 step-by-step electrical machinery’s instruction register, the electrical machinery pulse generator send out with RAM in the corresponding quantity square-wave pulse. DSP outputs 8 instructions through the DO~D8 data end. D8 is the RAM choice, is when 1 chooses RAM1, is when 0 chooses RAM0, DO~D7 is outputs the electrical machinery angle, the electrode high and low and about the degrees rotation for 120°, precision for 1°, starting value for 60°, DO~D7 the scope is 00000000~11111000, the starting value is 00111100. The FPGA achievement step-by-steps the pulse generator, through clock cycle disposition control electrical machinery rotational speed, with starting value correspondence coordinate decision pro and con extension. System action command procedure as shown in Figure 5.

         And R1 is the DSP instruction register, R2 is the current coordinate register, carries on the differential value operation through the DSP output coordinate and the FPGA current coordinate to determine that step-by-stepped electrical machinery’s hand of rotation and the degrees rotation, the merit is may act according to the new input order the change, the conclusion current movement moves the new instruction, after the instruction execute finished, the system resets, step-by-steps the electrical machinery to return to the original state.

    4.2 FPGA logic simulation

         FPGA by MAX-PLUSⅡDevelops the platform, is the VHDL language carries on the design with the language to the above logical function, and carried on the debugging through the JTAG connection, the FLEXl0KE chip has been able to act according to the DSP output order output correct reverse signal and the pulse waveform.

    4.3 step-by-step the motor-driven design

         FPGA step-by-steps the motor-driven chip through the P1, L1, P2, L2 output control control. What step-by-steps the motor-driven to use is Toshiba Corporation produces the monolithic sine segmentation two phase to step-by-step motor-driven special-purpose chip TA8435H, FPGA and TA8435H electric circuit connection like chart 6.

         Because FLEX1OKE and the TMS320VC5509 working voltage is 3.3 V, but TA8435H is 5 V and 25 V, therefore base pin connection use electro-optic coupled apparatus TLP521, causes two side voltage isolation. CLK1 is the clock input foot, CW/CCW is reversing the control foot, A, A, B, B is the two phase step-by-steps the electrical machinery input.

    5 conclusions

         The system has used the DSP high processing speed and the expandable piece external storage space fully, has high speed, real-time, the recognition rate high characteristic and supports the big pronunciation storehouse, the FPGA use causes the system circuit to obtain the simplification, a piece of FLEXl0KE chip may complete 2 to step-by-step electrical machinery’s sequential control. Although has certain disparity in the processing speed and in the pronunciation storehouse’s storage capacity with the PC machine system, but in robot’s microminiaturization, the low power loss and the specific function realize on, as the core embedded system has the broad prospect without doubt take DSP and FPGA.

    Share/Save/Bookmark

    Friday, August 22nd, 2008 at 00:49
No comments yet.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
TOP
Copyright © 2008 51 Research and Design, Electronic Engineers website - Embedded Systems, MCU, DSP, EDA, Test and Measurement, Components, Communications, Power, Microelectronics, Semiconductors
Powered by WordPress | Theme by mg12 | Valid XHTML 1.1 and CSS 3