1 introduction
Turbo the code close Shannon theory limit’s high performance enables it at domains and so on satellite communication, deep space correspondence, multimedia correspondences to have the very big attraction, therefore has received the attention which since statement widespread continues.
The Turbo code’s project application with realized is the recent years research work’s hot spot. The Turbo code uses the feedback iterative decoding structure, the member decoder use biggest probabilityaposteriori (MAP) decoding algorithm decoding, because the MAP algorithm includes the massive index operation and the logarithm operation, for realizes brings the enormous difficulty, in the project application, usually uses it to number field simplified algorithm - - Log-MAP and the Max-Log-MAP algorithm. Is opposite in the Log-MAP algorithm, Max-Log-MAP, although loses 0.5 dB the gain, but because it simplified the order of complexity greatly, time is paid attention in the application with realization. This article chip discussed Max-Log-MAP algorithm realization based on TMS320C6000 the series DSP with to optimize.
2 Turbo code feedback iterative decoding structure and Max-Log-MAP decoding algorithm
The Turbo code is called parallel cascade convolution code (PCCC), the encoder through interweaves the parallel cascade by two RSC member code. With it correspondence, carries the Turbo code in the decoding to use two member decoder series constitution the feedback iteration structure, as shown in Figure 1, DEC1 and DEC2 expressed that two input outputs (SISO) the member decoder, the supposition code output uses the BPSK modulation system, xk, yk is softly softly the demodulator outputs the noise pollution information bit with the verification bit, zk(zn) was expressed the outside information which after the solution from another decoder interweaves (interweaves) obtains. Each member decoder has two output ports, the distinction produce information bit logarithm likelihood ratio LLR(L1(ak), L2(an)) and the outside information which is used by another member decoder calls ω1k, ω2k, undergoes certain iterative and two member decoder’s outside exchange of information, carries on the hard decision to the information bit logarithm likelihood ratio then to complete the Turbo code the decoding.

Under the Max-Log-MAP algorithm’s logarithm likelihood ratio may express as follows:
![]()
And m ‘, m corresponds k-1 and the k time encoder condition separately, αk (m), βk (m) is called separately forward and latter to the condition measure, may measure rk according to the RSC code grid chart by the branch (i, m ‘, m) (i=±1) recursion computation:

Outside information, if selects the Robertson use method, under the AWGN channel the code rate is 1/2 RSC code branch measures rk (i, m ‘, m) the formula may express is:

In the formula j=±1, indicated the corresponding information bit ak=i code should output the bipolarity verification bit, Lc=4Es/N0 defines as the channel confidence level value. Outside the information and the logarithm likelihood ratio’s relations are:

3 Max-Log-MAP decoding algorithm’s C language software programming with realizes
The analysis may know, the Max-Log-MAP algorithm needs to calculate several kind of measure values according to each time receive information: The branch measures rk (i, m ‘, m), forward condition measure αk (m) and latter to condition measure βk (m), calculates this time finally based on 3 measure values logarithm likelihood ratio L(ak), thus obtains the outside information which another member decoder needs ωk. Therefore the algorithm may divide into several modules approximately: The branch measure module, first, the backward condition measure module and the logarithm likelihood ratio module, each module’s computation is completes based on the grid chart recursion, therefore may use C in the language for cyclic sentence to realize, here analyzes one by one take eight condition (13,15)RSC code as the example.
3.1 branch measure module (BMU)
The condition measure’s recursion is carries on in the branch measure foundation, therefore the branch measure is the algorithm basic measurement, may know by type (4), the branch measure in fact is shifts the way correspondence output by the receive information and on the grid chart the related operation. Regarding eight condition (13,15)RSC code, on the grid chart two neighboring time’s condition shift ways altogether have 16, considers to (i, j) combination value only then 4 kinds, and (- 1,-1) and ( 1, 1), (- 1, 1) and ( 1, - 1) under condition branch measure value mutually for inverse, therefore to reduce the data the memory, each time only need calculate two branch measure value then, might as well supposes is BM11 and the BM10, BMU algorithm realizes the structure is:

Here Lx and Ly expressed separately processes the receive information bit after the channel confidence level value with verification bit soft information, z had expressed that from other member decoder’s outside information, N is the Turbo code information frame length.
3.2 condition measure module (SMU)
The forward condition measure’s recursion and the backward condition measure’s recursion in the algorithm is similar, we before to the condition measure for the example explained that condition measure module (SMU) the algorithm programming realizes, with FSMj expressed that based on RSC(13,15) yard grid chart j condition forward condition measure accumulation value (j=0,1,…,7), forward condition measure recursion cyclic sentence structure for (sentence in temp1, temp2 expresses temporary variable):

In fact, SMU will complete in each grid chart condition will shift to this condition branch measure carries on “the accumulation”, “will elect in a big way” the function namely so-called ASC operation.
3.3 logarithm likelihood ratio module (LLRU)
The logarithm likelihood ratio module (LLRU) basis branch measure and the constitution diagram size computation logarithm likelihood ratio with the outside information, its fundamental operation is also in the similar SMU Canada compared to chooses (ACS) to operate, the corresponding algorithm structure as follows (shift way respectively is 0 and 1 divides into two groups according to input, condition from 0~7 arrangements):

In sentence BSM expressed that the backward condition measure, LLR expresses the logarithm likelihood ratio, Omega for the input to other member decoder’s outside information, other is the temporary variable.
4 based on DSP Max-Log-MAP decoding algorithm code optimization
Development key lay in the code based on C the language DSP the simplification optimization, TI Corporation CCS in the development software’s C compiler has provided to the code optimized function, the people might through the option establishment, unwind, the replenishment key words, operations and so on use internal integration function (intrinsic) complete to the C code optimization. This article mainly aims at the TMS320C6000 series chip the structure and the characteristic discusses the Max-Log-MAP decoding algorithm code the optimization design, including the software running water, the data access optimization and so on, achieves uses the DSP chip fully the hardware source, obtains the highly effective processing performance the goal.
4.1 C6000 series chip structure and characteristic
TMS320C6000 series DSP is one kind which TI Corporation promotes based on the VLIW technology, has 8 function unit digital signal processor, its CPU uses the Harvard structure, the program bus and the data bus separates, takes the instruction and the executive order may the parallel running, the VLIW technology use be possible to cause operations and so on instruction gain, instruction assignment, instruction execute and data storage forms the multistage running water, in identical clock cycle many instruction overlaps in different function unit internal processing. The C6000 series chip may simultaneously carry out 8 instructions in each clock cycle.
4.2 based on DSP various algoritic module code optimization
4.2.1 BMU modules
The BMU algoritic module is the single cycle sentence, because in the loop body instruction are few, for more simultaneously using the CPU resources, an effective procedure is unwind, like this may cause more operations during reduction cycle-index’s to form running water (pipeline), displays many function units fully parallel processing ability. After the optimized code is as follows:

4.2.2 SMU modules
Because the condition measure’s recursion has recursiveness, namely this time recursion obtains the value will serve as the next time the recursion starting value, therefore regarding this algoritic module’s data read-in read-out operation is the question which is worth considering. May know from 3.2 SMU procedure analyses, the FSM read-write causes between the CPU register and the data-carrier storage frequent carries on load and the store operation, to reduce this operation the instruction consumption, we introduce 3 group of temporary variable FSM_tempj, FSMj_old and FSMj_new(j=0,1,…,7) use for to save FSM the computed result, like this when next time recursion CPU might from the internal register read data, avoid directly from data-carrier storage’s load operation. After the optimized code structure is as follows:


With formerly only used two temporary variable sum1 and sum2 compares, after the optimized code uses more variables, like this may maintain the data the independence, avoids creating the CPU register’s connection, causes the code change in the stream line operation.
4.2.3 LLRU modules
Mainly obtains regarding the LLRU algoritic module’s code optimization from the reduced addition and subtraction operational order, this involves to the algorithm improvement. The preamble mentioned that each time the shift way has 16, if uses 3.3 program structures, must carry on 16 addition and subtraction operations to the branch measure. Considers the branch measure only then 4 kind of values, unifies RSC(13,15) grid chart the mapping relations, will shift the way according to the branch measure’s value to divide into 4 groups, this 4 groups do not give processing first separately regarding the branch measure’s addition and subtraction operation, after namely elects in a big way first, carries on the corresponding branch measure addition and subtraction operation again, like this each circulation may the branch measure addition and subtraction operation by the original 16 reductions to 4 times, therefore may reduce the CPU resources greatly the consumption. The corresponding optimization code structure is as follows:

around 4.3 code optimizations consumes instruction cycle contrast
We use gather the audiences to reach company’s SEED-C6416 simulation development board, used C6416-T the series DSP chip has carried on the translation and the hardware simulation under the CCS 3.1 translation environment to each algoritic module and the entire Max-Log-MAP algorithm, the Turbo code information frame size elects is 144 b, the code data type definition is int, the translation option established as - 03-mt-pm. Uses CCS the supplementary the timer (Timer) function, around to optimized the code consumption instruction cycle to carry on the test, result as shown in Table 1.

Obviously, after the optimized code reduced the CPU instruction cycle consumption greatly, raised the DSP working efficiency. What is worth proposing, when code optimization mainly aimed at aspects and so on algorithm’s instruction operation and data storage has made the improvement, in fact, might also act according to the actual data width in the concrete performance history to use internal integration function (intrinsics), data encapsulation processing (packeddata processing) and so on measures carries on to the code further optimizes, obtained a more highly effective performance.
5 conclusions
This article studied decoding algorithm software programming with has realized the method based on standard C language Turbo the code Max-Log-MAP, and unified TMS320C6000 the series DSP chip the structure and the characteristic has discussed the code optimization design thoroughly, through unwind, the data access optimized, algorithm measures and so on improvement to enhance the code the efficiency, the test result indicated that the process optimized the code might reduce CPU greatly the instruction cycle consumption, thus has obtained the quite highly effective processing performance.