
Philips Semiconductors
Custom Operations for Multimedia
File: cstm.fm5, modified 7/26/99
PRELIMINARY INFORMATION
4-5
have been computed into forward[], and the IDCT results
are assumed to have been computed into idct[].
A straightforward coding of the reconstruction algorithm
shares many of the undesirable properties of the first ex-
ample of byte-matrix transposition. The code accesses
memory a byte at a time instead of a word at a time,
which wastes 75% of the available bandwidth. Also, in
light of the many quad-byte-parallel operations intro-
tions,” it seems inefficient to spend three separate addi-
tions and one shift to process a single eight-bit pixel.
Perhaps even more unfortunate for a VLIW processor
like TM1100 is the branch-intensive code that performs
the saturation testing; eliminating these branches could
reap a significant performance gain.
Since MPEG decoding is the kind of task for which
TM1100 was created, there are two custom operations—
quadavg and dspuquadaddui—that exactly fit this impor-
tant MPEG kernel (and other kernels). These custom op-
erations process four pairs of eight-bit pixel values in par-
allel. In addition, dspuquadaddui performs saturation
tests in hardware, which eliminates any need to execute
explicit tests and branches.
For readers familiar with the details of MPEG algorithms,
the use of eight-bit IDCT values later in this example may
be confusing. The standard MPEG implementation calls
for nine-bit IDCT values, but extensive analysis has
shown that values outside the range [–128..127] occur
so rarely that they can be considered unimportant. Pur-
suant to this observation, the IDCT values are clipped
into the eight-bit range [–128..127] with saturating arith-
metic before the frame reconstruction code runs. The as-
sumption that this saturation occurs permits some of
TM1100’s custom operations to have clean, simple defi-
nitions.
The first step in seeing how custom operations can be of
value in this case, is to unroll the loop by a factor of four.
code that is parallel with respect to the four pixel compu-
tations. As it is easily seen in the code, the four groups of
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i, temp;
for (i = 0; i < 64; i += 1)
{
temp = ((back[i] + forward[i] + 1) >> 1) + idct[i];
if (temp > 255)
temp = 255;
else if (temp < 0)
temp = 0;
destination[i] = temp;
}
Figure 4-4. Straightforward code for MPEG frame reconstruction.