ISO/IEC 14496 (MPEG-4) Video Reference Software

 

Version: Microsoft-FDAM1-2.3-001213

 

User Manual

 

 

Primary Author:             Simon A. J. Winder (Microsoft Corporation)

Last update:                  December 13, 2000

 

 

Introduction

This manual documents the Microsoft MPEG-4 Video Reference Software. This software is for encoding and decoding video sequences using the MPEG-4 compression format. Microsoft Corporation originally developed it during the experimental phase of the MPEG-4 standardization process. Later additions and edits were made by a multitude of parties to add all the Version 1 and Version 2 coding tools. The purpose of this software is to act as a reference implementation to complement the text version of the standard.

 

Copyright Statement

The following copyright statement covers the use of all software modules:

 

This software module was originally developed by Microsoft Corporation and also edited by multiple parties in the course of development of the MPEG-4 Video (ISO/IEC 14496-2). This software module is an implementation of a part of one or more MPEG-4 Video tools as specified by MPEG-4 Video. ISO/IEC gives users of MPEG-4 Video free license to this software module or modifications thereof for use in hardware or software products claiming conformance to the MPEG-4 Video. Those intending to use this software module in hardware or software products are advised that its use may infringe existing patents. The original developer of this software module and his/her company, the subsequent editors and their companies, and ISO/IEC, have no liability for use of this software module or modifications thereof in an implementation. Copyright is not released for non MPEG-4 Video conforming products. Microsoft Corporation and the other originators of this software retain full right to use the code for their own purpose, assign or donate the code to a third party and to inhibit third parties from using the code for non ISO/IEC 14496 conforming products. This copyright notice must be included in all copies or derivative works.”

 

Compilation

The source code provided here compiles into three executables. These are encoder.exe, decoder.exe and convertpar.exe; the latter being a utility program used for upgrading from old to new style parameter files. These three executables are available pre-compiled in the \demo directory.

 

The following source files and directories are used when compiling the encoder:

 

\app\encoder\encoder.dsp

Encoder project file

\app\encoder\encoder.cpp

Encoder main()

\sys

Common files

\sys\encoder

Encoder specific

\tools

 

\type

Common types

\vtc

Wavelet code

 

The following source files and directories are used when compiling the decoder:

 

\app\decoder\decoder.dsp

Decoder project file

\app\decoder\decoder.cpp

Decoder main()

\sys

Common files

\sys\decoder

Decoder specific

\tools

 

\type

Common types

\vtc

Wavelet code

 

The following source files and directories are used when compiling the parameter file conversion utility:

 

\app\convertpar\convertpar.dsp

Convertpar project file

\app\convertpar\convertpar.cpp

Convertpar main()

 

Compilation Under Microsoft Windows

The project files can be used when Microsoft Visual C++ 6.0 is used for compilation. Simply double click the required “.dsp” file to open the project, then select “Build” from the menu. If Microsoft Visual C++ 5.0 is used, the project files will not work. In this case, a new “win32 console application” should be created and the tables above used as a guide to add appropriate files to the new workspace. Alternatively, the list of source files can be obtained by looking in the “.dsp” file in Notepad.

 

You have a choice of building using debug mode, or using release mode (from the Build menu, choose “Set Active Configuration” to select the mode). In release mode, the codec will run faster, however, for historical reasons, the error reporting is very bad in this mode, so you may not get errors if the encoder or decoder fail due to incorrect setting of the encoder parameters, errors in the source bitstream, end of file in the source sequence, and so on. In this case, either there will be an access violation, or else the results will be incorrect. In debug mode, there will most likely be an error message, so it is useful to have both types of executable handy. However, the debug error messages often take the form of assert failures, and it will be necessary to look at the source code or use some intelligence to find out what has gone wrong.

 

The following compilation options can be set:

 

__TRACE_AND_STATS_

Enables code to calculate encoder statistics.

__DOUBLE_PRECISION_

Makes all floating point code double precision.

__DISABLE_4MV_FOR_PVOP_

Prevents the encoder from ever generating 4 motion vectors per macroblock when PVOPs are encoded.

__NBIT_

Enables use of up to 16-bit pixel values.

_MBQP_CHANGE_

Enables random changes in macroblock quantizer when rate control is turned off.

 

Typically, __TRACE_AND_STATS_ and __DOUBLE_PRECISION_ are the only options to be set, unless other modes are needed.

Compilation Under UNIX

Two makefiles have been contributed for compilation under UNIX. These are named makefile and unixmake. Compilation has been tested with GNU g++ 2.8.1. Bugs were found with versions of g++ prior to this one. Additionally, it is necessary to compile with the “-O” option, but not the “-O2” option.

 

Examples

The \demo directory provides a number of example bitstreams, parameter files, and pre-compiled executables. The following files are present:

 

encoder.exe

Video encoder.

decoder.exe

Video decoder.

convertpar.exe 

Helper to convert old parameter files to new type.

example.par

Example parameter file.

example.ref

Example of NEWPRED reference file.

stef_cif.par

Example parameter file (frame-based coding).

brea_cif.par

Example parameter file (binary object-based coding).

brea_qcif.par

Example parameter file (greyscale alpha object-based coding).

demo_stef_cif.par

Example parameter file for re-encoding (frame-based coding).

demo_brea_cif.par

Example parameter file for re-encoding (binary object-based coding).

demo_brea_qcif.par

Example parameter file for re-encoding (greyscale alpha object-based coding).

sprt_stef_sif.par

Example parameter file (static sprite coding).

stef_cif.cmp

Example encoded bitstream (frame-based coding).

brea_cif.cmp

Example encoded bitstream (binary object-based coding).

brea_qcif.cmp 

Example encoded bitstream (greyscale alpha object-based coding).

sprt_stef_sif.cmp

Example sprite bitstream (static sprite coding).

vtc_brea_cif.cmp

Example Visual Texture bitstream.

vtc_brea_cif.ctl

Example Visual Texture control file.

vtc_brea_cif.par

Example Visual Texture parameter file.

wavelet_tiling   

Directory of wavelet examples.

 

Frame-Based Coding Example

The frame-based example bitstream can be decoded using the following command line:

 

decoder stef_cif.cmp stef_cif 352 288

 

This will result in a ten-frame sequence being stored in file stef_cif.yuv, with dimensions 352 ´ 288 pixels. This short sequence can be viewed under Microsoft Windows using yuvtoavi to convert to an AVI file and by playing this using Windows Media Player.

 

It is possible to re-encode this decoded file (with subsequent quality reduction) using the encoder and appropriate parameter file:

 

encoder demo_stef_cif.par

 

This will create the bitstream file cmp\01\stef_cif.cmp and the reconstructed sequence file rec\01\stef_cif.yuv. This sequence is the encoder’s reconstruction of the sequence that would be obtained if the compressed bitstream were decoded. The new bitstream will not be the same as the original bitstream because the compression is lossy.

Binary Shape Object Coding Example

The binary shape example bitstream can be decoded using the following command line:

 

decoder brea_cif.cmp brea_cif 352 288

 

This will result in a ten-frame texture sequence being stored in file brea_cif.yuv and the corresponding ten-frame binary alpha sequence being stored in file brea_cif.seg. The regions of the frame that are outside the shape of the video object are indicated with zero pixel values in the “.seg” file. Pixels within the video object have the value 255.

 

To re-encode this decoded sequence, use the following command:

 

            encoder demo_brea_cif.par

 

This will create the bitstream file cmp\255\brea_cif.cmp and the reconstructed sequence files rec\255\brea_cif.yuv and rec\255\brea_cif.seg. Note that the “object number” is 255 because the parameter file specifies to encode only the regions of each frame where the alpha plane has a pixel value equal to 255.

Grayscale Shape Object Coding Example

The grayscale shape exqample bitstream can be decoded with the following command:

 

decoder brea_qcif.cmp brea_qcif 176 144

 

This will result in three files being created with a frame size of 176 ´ 144 pixels. The ten-frame texture information is stored in brea_qcif.yuv, the binary alpha mask is stored in brea_qcif.seg and the grayscale alpha plane is stored in brea_qcif.0.aux. The grayscale alpha plane contains full transparency information and can be used for high quality anti-aliased composition of the video object onto another background.

 

To re-encode this sequence, first delete the file brea_qcif.seg, and then rename the file brea_qcif.0.aux so that it is now called brea_qcif.seg. The encoder will generate its own binary alpha mask from the grayscale mask. Use the following command to encode the sequence:

 

            encoder demo_brea_qcif.par

 

The encoder will output four files: the bitstream file will be cmp\255\brea_qcif.cmp, the reconstructed texture file will be rec\255\brea_qcif.yuv, the binary alpha mask will be rec\255\brea_qcif.seg, and the grayscale alpha plane will be rec\255\brea_qcif.0.aux.

Sprite Decoding Example

The following command line can be used to decode the example sprite sequence:

 

decoder sprt_stef_sif.cmp sprt_stef_sif 352 240

 

This results is an eight-frame sequence of the crowd background from “Stefan”. The image size is 352 ´ 240. Two files are produced: sprt_stef_sif.yuv is the texture file and sprt_stef_sif.seg is a binary alpha mask. The binary alpha mask for this sequence is pretty uninteresting as it has every pixel set to 255; however, it is possible to encode arbitrary shaped sprites using the correct parameter file options.

Wavelet Still Texture Decoding Example

The following command line can be used to decode the single-frame example image compressed using the wavelet texture encoder:

 

decoder -vtc vtc_brea_cif.cmp out.yuv 352 288 1 1

 

This will result in a single YUV file, out.yuv, having size 352 ´ 288 pixels.

 

Format Conversion

Tools are provided to help with conversion to and from the YUV format used by the codec. The two tools are yuvtoavi.exe and avitoyuv.exe. These can be found in the \demo directory. With these tools you can convert to and from the popular AVI video file format. Please note that YUV and AVI formats use different color-spaces and sampling strategies.

AVITOYUV

Convert from the most common AVI formats to the YUV format. The tool converts the colorspace from RGB to YUV and down-samples the chroma components to obtain YUV 4:2:0. The color-space is ITU-R 709. The down-sampling filter is {40, 88, 88, 40}/256. The format of the command is:

 

            avitoyuv source.avi destination.yuv

 

Any audio data is lost.

YUVTOAVI

Convert from the 4:2:0 YUV format to uncompressed AVI. The tool up-samples the chroma components using the filter {3, -16, 67, 227, -32, 7}/256 and converts to RGB using the ITU-R 709 color-space. The format of the command is:

 

            yuvtoavi source.yuv destination.avi width height framerate

 

The width and height are given in pixels and must be a multiple of two. The framerate dictates how fast the AVI file will play and is specified in frames per second. Common frame sizes are given in the section “Data File Formats”.

Parameter File Updating

The parameter file format changed in August, 1999. Files with the old format (before version 901) can be converted to the new format by running the convertpar tool that is included with the current release. Use the command line:

 

convertpar old_par_file [new_par_file]

 

where the name of the new parameter file is an optional argument. If this argument is not present, then stdout is used.

 

Data File Formats

This section covers the file formats for input to the encoder.

YUV Files

Files with the “.yuv” extension contain video texture data in a raw format. The YUV color-space is used, where Y is the luminance component and U/V are the color components. (Actually this color-space is correctly named Y, Cr, Cb and is not the same as PAL TV YUV components.) For 8-bit video, one byte is used to represent each Y or U/V value. Typically, the Y range is from 16 (black) to 235 (white). A value of UV = {128, 128} indicates that there is no color (grayscale). The reference software only supports YUV 4:2:0 formatted files. In this format, the U/V components have half the resolution, both horizontally and vertically, when compared to the Y component. In the file format, the Y, U and V planes are written one after the other in that order. If there are N frames of video then the components are written consecutively: Y0, U0, V0, Y1, … ,YN, UN, VN. The planes are simply raw pixel bytes written row-by-row, top left to bottom right. If the image size is width × height, then the size of the Y plane is width × height bytes and the U/V planes are each width × height / 4 bytes. The total number of bytes in the file is N × 3 × width × height / 2. There is no header information present. The frame size must be known in advance. Typical dimensions for encoded video are given in the following table:

 

Format

Width

Height

QCIF

176

144

CIF

352

288

SIF

352

240

SIF360

360

240

CCIR601

720

486

 

When the encoder is compiled to support N-bit video, the YUV files contain pixels that are 16-bit rather than 8-bit values. In this case, the file contains twice as many bytes as for the 8-bit case. The lease significant byte of each pixel value is stored first.

SEG Files

These files typically contain binary segmentation maps for shape-based coding or for grayscale alpha plane coding. Each pixel is represented by a byte and these bytes are organized as one plane per frame and each plane is written row-by-row, top left to bottom right. There is no header information. If the image size is width × height, then the size of the alpha plane is width × height bytes and for an N frame sequence, the total number of bytes in the file is N × width × height. There are three ways of interpreting each pixel. In the case of a binary alpha mask, the pixel value is either 0 or else 255, indicating transparent or opaque pixels respectively. In the case of a segmentation mask, the pixel value indicates to which one of multiple binary masks the pixel belongs. For example, three regions of the image can be assigned values 0, 1, and 2. In this way, different parts of the image can be coded; depending on which mask number is chosen during the encoding process. Regions of the image that are not equal to the chosen mask number would be considered to be transparent by the encoder. Finally, pixel values can be used to indicate transparency in the range 0 (transparent) to 255 (opaque) in the same way as with “.aux” files. The method of interpretation depends on the encoder settings.

 

AUX Files

These files always contain grayscale alpha plane data. Each pixel indicates the amount of transparency from 0 (transparent) to 255 (opaque). For each frame, one pixel plane is present. There is no U/V information. The planes are simply raw bytes written row-by-row, top left to bottom right. There is no header information. If the image size is width × height, then the size of the alpha plane is width × height bytes and for an N frame sequence, the total number of bytes in the file is N × width × height.

PNT Files

These files contain sprite point trajectories. They are text files which contain ASCII floating point numbers indicating how the sprite is to be warped on each frame. The format is as follows: The first line indicates the number of points P. This can be in the range 1 to 4. The next line contains P pairs of numbers, the first of the pair being the x coordinate and the second being the y coordinate. These numbers are no longer used by the reference software and can be zero. The rest of the file is formatted with 2×P+1 numbers on each line. The first of these is the frame number. The remaining P pairs indicate the x and y coordinates of the warping points for this frame. These points indicate the locations in the sprite image that correspond to the relevant P corners of the display window. An example of the format for the case of P=4 is given below:

 

4

 -393.00 -88.00   -393.00 361.00   362.00 -88.00   362.00 361.00

0000   135.00 0.00   331.00 0.00   135.00 237.00   331.00 237.00

0001   134.55 0.44   330.74 0.46   134.46 236.92   330.88 236.78

0002   135.11 0.88   330.49 0.92   134.93 236.84   330.78 236.56

0003   135.56 1.17   330.31 1.26   135.43 236.70   330.66 236.29

0004   135.02 1.47   330.13 1.61   134.91 236.58   330.54 236.03

 

The encoder ignores the data values on the second line.

SPT Files

Sprite files contain sprite pixel and alpha mask information. The file format is binary with the following header:

 

struct sprite_header {

char magic[3];

long left, top, right, bottom;

long alpha_mode;

};

 

The value of magic is { ‘S’, ‘P’, ‘T’ }. The next four words indicate the sprite dimensions. After this is the alpha mode, which can be 0, 1, or 2, indicating rectangular, binary, or grayscale sprites. After this header structure come the YUV planes in 4:2:0 format, followed by the alpha plane if the alpha mode indicates a non-rectangular sprite. The YUV and alpha data is stored in the same format as described previously. There is only one frame present in a sprite file.

 

Decoding a Sequence

The syntax of the decoder command varies depending on what kind of sequence is being decoded.

Basic Video Sequence

The decoder takes a bitstream file and decodes it, generating a “.yuv” file and, if object-based decoding is enabled, also a “.seg” file and possibly one or more “.aux” files. The following command line should be used:

 

            decoder bit_file.cmp out_file width height

 

where bit_file.cmp is the name of the compressed bitstream file, and out_file is the root name of the output file to create. The “.yuv”, “.seg”, or “.aux” extensions will be added by the decoder as necessary. Here, width and height represent the width and height of the destination image. This can be larger than the actual encoded image size if necessary, however it cannot be smaller. With the current implementation it is necessary to know the size of the encoded sequence in advance in order to specify it on the command line.

Video Sequences with Enhancement Layers

To decode a bitstream with an enhancement layer, use the command line:

 

            decoder bit_file.cmp bit_file_enh.cmp out_file base_width base_height

 

where bit_file_enh.cmp is the name of the enhancement layer bitstream file, and out_file is the root name of the output file to create.

 

To decode a bitstream with an enhancement layer of type Object Based Spatial Scalability, use the command line:

 

            decoder bit_file.cmp bit_file_enh.cmp out_file base_width base_height enhance_width enhance_height

 

where bit_file_enh.cmp is the name of the enhancement layer bitstream file, and out_file is the root name of the output file to create. Base layer display-frame size and enhancement layer display-frame size also needed as base_width, base_height, enhance_width and enhance_height.

 

More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode: norio@imgsl.mkhar.sharp.co.jp, shson@sait.samsung.co.kr, nagumo@av.crl.sony.co.jp

 

Visual Texture Coded Images

To decode a bitstream containing only a single image that has been compressed using the wavelet still-image codec, use one of the following command lines:

 

decoder –vtc bit_file.cmp out_file.yuv width height spatial_layer snr_layer

decoder –vtc bit_file.cmp out_file.yuv width height spatial_layer snr_layer shape_layer fullsize_out

decoder –vtc bit_file.cmp out_file.yuv width height spatial_layer snr_layer shape_layer fullsize_out start_tile_id end_tile_id

 

The meaning of these parameters is as follows:

 

bit_file.cmp

Bitstream file to decode.

out_file.yuv

File for output.

width, height

Dimensions of output image.

spatial_layer

?

snr_layer

?

shape_layer

The number of spatial layers for shape. This value should have larger value or be equal to the identification number of wavelet decomposition layer used for the last spatial scalability layer (determined by spatial_layer above). This value is limited to wavelet_decompositoin_ levels.

fullsize_out

An enable flag to generate full-size output image. It is set to 1 for full-size output.

start_tile_id

?

end_tile_id

?

 

More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:

norio@imgsl.mkhar.sharp.co.jp, dscho@sait.samsung.co.kr, liang@ti.com, shipeng@sarnoff.com, hjlee@sarnoff.com

 

Encoding a Sequence

The syntax of the encoder command is as follows:

 

encoder –version

encoder params.par

 

The first form of the command prints the current version number of the encoder executable. The second form runs the encoder using the encode options supplied in parameter file params.par. Additional parameter files may be needed, depending on the type of encode tools that are being used. Example parameter files are found in the \demo directory of the release.

Parameter File Format

Parameter files ending in the “.par” extension contain name-value pairs which set the values for encoder parameters. Parameter files must start with the line:

 

!!!MS!!!

 

Numeric parameters are specified as:

 

            Example.Version = 905

 

String parameters are specified with quotes as:

 

Example.FilePrefix = "brea_cif"

 

Lists are specified using braces and commas:

 

Example.SomeList = {1, 2, 3, 4, 5}

 

Some name-value pairs can be specified separately for the base and enhancement layers. In this case, the base layer parameter is:

 

Texture.QuantStep.PVOP [0] = 16

 

and the enhancement layer parameter is:

 

Texture.QuantStep.PVOP [1] = 16

 

Comments can be added by using the “//” form, as in C++, in which case, the remainder of the line is considered to be a comment.

Basic Encoding

In order to encode a sequence it is necessary to prepare a parameter file. The following table describes the essential parameters used for coding most sequences:

 

Version

Version number of the parameter file. Please see the example parameter file for the current version number.

VTC.Enable

When set to 1 this flag causes the encoder to ignore the rest of the parameter file and instead loads the visual texture coding parameter file.

VTC.Filename

The filename for the visual texture coding parameter file.

VersionID

The version number of the bitstream. This must be set to 2 if any MPEG-4 Version 2 tools are used for encoding; otherwise a Version 1 VOL header will be produced.

Source.Width

Source.Height

The dimensions of the source sequence or image, in pixels. These numbers must be even because of 4:2:0 chroma sampling.

Source.FirstFrame

Source.LastFrame

The physical first and last frame number to encode. The very first frame in the YUV file is frame zero. For example, to encode the first ten frame of a sequence at the original frame rate, use 0 and 9 for these numbers.

Source.ObjectIndex.First

The object index selects which object in a multiple object sequence to encode. When a “.seg” file is used, it is the pixel value of the required foreground object in that file. For a binary alpha plane, this value is typically 255.

Source.FilePrefix

The name of the source file to encode, without any filename extension. The encoder adds “.yuv” or “.seg” as necessary.

Source.Directory

The directory containing the source file. If the source sequence is not present in this directory, the number of the object index is next appended to this directory name as a subdirectory. The encoder uses this new location to find the source sequence files.

Source.BitsPerPel

The number of bits per pel when using N-BIT encoding. Typically this is set to 8. N-BIT coding only works if the __NBIT_ flag was set during compile time.

Source.Format

At present, this must be set to “420”

Source.FrameRate [0]

The number of source file frames per second.

Source.SamplingRate [0]

This allows temporal sub-sampling of the source file. The encoder will only encode every Nth frame of the source sequence. When set to 1, every frame of the source file is encoded. When set to 3, every third frame is encoded, starting from the frame number specified in Source.FirstFrame.

Output.Directory.Bitstream

The directory in which to store the encoded bitstream. The number of the object index is appended to this directory name as a subdirectory.

Output.Directory.DecodedFrames

The directory in which to store the reconstructed frames (anticipated decoder output). The number of the object index is appended to this directory name as a subdirectory.

RateControl.Type [0]

Type of rate control to use. Can be “None”, “MP4”, or “TM5”.

RateControl.BitsPerSecond [0]

Number of bits per second when encoding with rate control enabled.

Quant.Type [0]

Quantiser type. Can be “MPEG” or “H263”.

GOV.Enable

This flag enables the generation of Group of VOPs headers when set to 1.

GOV.Period

Specifies how often to send GOV headers. The figure indicates the number of VOPs to insert between two GOV headers.

Texture.IntraDCThreshold [0]

This parameter has the values 0 to 7 and controls the way that texture coefficients are coded. The value 0 means never code Intra DC values using the AC coefficient VLC table. The value 7 means always do that. The value 1 means do that when QP>=13, 2 when QP>=15, 3 when QP>=17, 4 when QP>=19, 5 when QP>=21, and 6 when QP>=23.

Texture.QuantStep.IVOP [0]

Quantiser step size for I-VOPs.

Texture.QuantStep.PVOP [0]

Quantiser step size for P-VOPs.

Texture.QuantStep.BVOP [0]

Quantiser step size for B-VOPs.

Texture.QuantMatrix.Intra

   .Enable [0]

This flag enables loading of a non-default Intra quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix.

Texture.QuantMatrix.Intra [0]

The Intra quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”.

Texture.QuantMatrix.Inter

   .Enable [0]

This flag enables loading of a non-default Inter quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix.

Texture.QuantMatrix.Inter [0]

The Inter quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”.

Motion.RoundingControl.Enable

This flag enables the use of rounding control for P-VOPs when set to 1.

Motion.RoundingControl.StartValue

This indicates the first value for rounding control when enabled (either 0 or 1). Rounding control is a bitstream bit that switches between 0 and 1 on successive predicted frames.

Motion.PBetweenICount

The number of predicted P-VOPs between each I-VOP. If this value is less than zero, then there will be one I-VOP at the start of the sequence and then all successive frames will be predicted.

Motion.BBetweenPCount

The number of B-VOPs between each P-VOP. This value is 2 for an IBBPBBPBBP sequence.

Motion.SearchRange [0]

The search range to use during motion estimation (in pels). If this is large, then encoding will be slow, especially when using B-VOPs, because the P-VOP search range is multiplied by 1+BBetweenPCount.

Motion.SearchRange

   .DirectMode [0]

The search range to use during direct mode B-VOP motion estimation (in half pels). This value is typically small as it represents an offset from the P-VOP motion vector.

Motion.AdvancedPrediction

   .Enable [0]

This flag enables Overlapped Block Motion Compensation.

Motion.SkippedMB.Enable

This flag allows the encoder to skip macroblocks in P-VOPs that do not change between frames. This improves efficiency but can cause artifacts when used with B-VOPs if there is a transient occluding object.

Motion.UseSourceForME

   .Enable [0]

When set to 1 this flag causes the original reference frame to be used for motion estimation instead of the encoded (noisy) reference frame.

Motion.DeblockingFilter.Enable

This flag enables the use of the MPEG de-blocking filter. However, the filter is currently not implemented and the flag should be set to 0.

Motion.ReadWriteMVs [0]

This parameter controls the reading or writing of motion vector files to help with experiments. When set to “Read”, motion vectors are read from a file rather than generating them by motion estimation. When set to “Write”, the estimated motion vectors are saved to the file for future use. The value “Off” disables this behavior.

Motion.ReadWriteMVs

   .Filename [0]

The filename to which motion vectors should be read or written.

Trace.CreateFile.Enable

Enable the generation of a trace file, with “.trc” extension. Trace files are large and slow down encoding. The compile flag

__TRACE_AND_STATS_

must be set to enable this function.

Trace.DetailedDump.Enable

Enable very detailed bitstream tracing.

 

When encoding a sequence, the encoder looks to see if a source file is present at the location specified by the Source.Directory parameter. If the file is not present, then the object index is added to the directory name and the file must be found in this location. For example, if the encode directory is \myencode, the object index is 2 and the source file prefix is fore_cif, then \myencode is first checked for fore_cif.yuv and if not present, the source file \myencode\02\fore_cif.yuv is used. The object index is also used to create the name of the directory in which the reconstructed frames are stored. To encode using frame-based mode, the value of Alpha.Type should be set to “None”.

 

The source file may be encoded with different frame-rates by temporal down sampling. If the first ten frames of a 30Hz sequence are to be encoded at 30Hz, use the following:

 

Source.FirstFrame = 0

Source.LastFrame = 9

Source.FrameRate [0] = 30

Source.SamplingRate [0] = 1

 

To encode ten frames from the same source file, but this time at 10Hz, use the following:

 

Source.FirstFrame = 0

Source.LastFrame = 29

Source.FrameRate [0] = 30

Source.SamplingRate [0] = 3

 

Rate Control

There are two types of rate control: MPEG-4 rate control and TM5 rate control. In MPEG rate control, the quantizer is only changed on a frame basis, whereas TM5 rate control is macroblock-based and so macroblock DQUANT may be non-zero. For both rate control types, RateControl.BitsPerSecond must be set to indicate the number of bits per second. Note that in order for rate control to function correctly, it is also necessary to set Source.FrameRate to the actual frame rate of the original sequence, and Source.SamplingRate to the number of frames to skip if encoding this sequence at a lower rate than it exists in the source YUV file.

 

MPEG rate control is available, but only works correctly under limited situations. It can only be used for sequences that start with one I-VOP and the rest of the sequence is P-VOPs. It does not work with B-VOPs. Also, it is necessary to set the values of Texture.QuantStep.IVOP and Texture.QuantStep.PVOP manually to set the quantizer value for the initial I-VOP and the first P-VOP. For the remaining P-VOPs, the rate control will take over and set the quantizer. To initialize these quantizer values correctly, choose a mid-range value, e.g. 20 for the I-VOP and 15 for the P-VOP, then encode the sequence. At the end, the encoder will report the average quantizer value. Run the sequence again, using this value for the P-VOP quantizer. Adjust the I-VOP quantizer as you wish. If the MPEG rate control is functioning correctly and the bit-rate is not set too low, it will not skip frames.

 

TM5 rate control is the recommended form of rate control, as it is free from bugs, and works correctly for I-VOPs, P-VOPs, and B-VOPs without user intervention. It is not necessary to set the values of the quantizer by hand. However, if you do not want the quantizer step-size to change on every macroblock it is necessary to use MPEG rate control. If the bit-rate is set too low, TM5 rate control will eventually set the quantizer to 31, it will not skip frames. TM5 rate control cannot be used with non-rectangular VOPs.

 

Object-Based Encoding

Object based encoding makes use of one or more alpha planes in order to allow for video objects that are not rectangular. The following parameters control the encoding of these arbitrary shaped VOPs:

 

Alpha.Type

This indicates the type of object based coding. This can be “None” for frame-based, “Binary” for the case when there is only a binary mask, “Gray” for true grayscale alpha transparency coding, and “Shape Only” for the case when there is only a binary mask and no video texture.

Alpha.Binary.RoundingThreshold

Controls amount of error allowed in lossy binary shape coding. A value as high as 128 will produce a large loss. If set to 0, lossless coding results, irrespective of the value of the size conversion parameter below.

Alpha.Binary.SizeConversion.Enable

Controls lossy binary shape coding. If set to 1, the encoder is allowed to down-sample alpha blocks if necessary.

Alpha.Binary.FromGrayThreshold

Used when the encoder creates its own binary mask from a grayscale alpha plane. Grayscale pixels with a value greater or equal to this parameter will be treated as non-transparent. Otherwise, the grayscale plane and the binary plane will both contain transparent pixels.

Alpha.QuantStep.IVOP [0]

Grayscale alpha quantisation step size for I-VOPs.

Alpha.QuantStep.PVOP [0]

Grayscale alpha quantisation step size for P-VOPs.

Alpha.QuantStep.BVOP [0]

Grayscale alpha quantisation step size for B-VOPs.

Alpha.QuantDecouple.Enable [0]

Normally, when rate control is enabled, grayscale alpha quantisation will vary with the texture quantiser. When set to 1, this flag decouples the alpha quantiser and causes it to be fixed.

Alpha.QuantMatrix.Intra.Enable [0]

This flag enables loading of a non-default grayscale alpha Intra quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix.

Alpha.QuantMatrix.Intra [0]

The grayscale alpha Intra quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”.

Alpha.QuantMatrix.Inter.Enable [0]

This flag enables loading of a non-default grayscale alpha Inter quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix.

Alpha.QuantMatrix.Inter [0]

The grayscale alpha Inter quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”.

Alpha.MAC.Enable

This flag enables the use of Multiple Auxiliary Components. When set to 1, multiple planes can be encoded in the same way as grayscale alpha channels.

Alpha.ShapeExtension

The type number from 0 to 15 of the MAC mode. Type 0 is the usual grayscale alpha channel mode. Other mode numbers are described in Table V2 - 1 Semantic meaning of video_object_layer_shape_extension” of the document ISO/IEC 14496-2 Visual Amendment 1.

 

Binary Shape Coding

 

Binary shape coding compresses a binary mask that defines a foreground video object. There are two modes to binary shape coding. In the first, the mask shape is compressed on its own, with no other data present. This is shape-only mode. In the second, the video object texture (YUV data) is also compressed.

 

To use binary shape coding, it is necessary to supply a texture file (“.yuv” file) and a segmentation mask file (“.seg” file). This is true even if shape-only mode is used; the texture file is read, but the information is ignored. The segmentation mask files are formatted as described in the section on data file formats.

 

Two ways of encoding binary shape sequences are possible. In the first, a segmentation file is created which contains a binary mask, with either 0 (background) or 255 (mask) for each pixel. A number of these files can then be used to specify each object in the same sequence. If there are three objects, e.g. background, foreground A, and foreground B, we can give these three VOPs index numbers 0, 1 and 2. We can store the three “.seg” files and the corresponding “.yuv” files in a number of separate directories:

 

\myencode\00\sequence_cif.yuv

Background texture.

\myencode\00\sequence_cif.seg

Background mask.

\myencode\01\sequence_cif.yuv

Foreground A texture.

\myencode\01\sequence_cif.seg

Foreground A mask.

\myencode\02\sequence_cif.yuv

Foreground B texture.

\myencode\02\sequence_cif.seg

Foreground B mask.

 

Then, in order to encode the background, the encode settings should be:

 

Alpha.Type = “Binary”

Source.ObjectIndex.First = 0

Source.FilePrefix = “sequence_cif”

Source.Directory = “\myencode”

 

Setting Source.ObjectIndex.First equal to 1, or 2 will be sufficient to encode the other VOPs.

 

The second method of encoding binary shape sequences is to have a common segmentation file that contains all the segmentation information. In this file, background pixels would be marked as 0, foreground A as 1, and foreground B as 2. The following files would be needed:

 

\myencode\sequence_cif.yuv

Complete texture.

\myencode\sequence_cif.seg

Segmentation mask.

 

To encode the background, foreground A, or foreground B, the encode settings would then be identical to the previous case. This is because the encoder first checks the path specified by Source.Directory for the files to encode, before adding on the object index.

 

Lossy binary shape coding can be achieved by setting the parameters Alpha.Binary.RoundingThreshold and Alpha.Binary.SizeConversion.Enable. The first of these dictates how much loss is acceptable for each shape block and ranges from 0 to 256. See the description in the table above for more information.

 

Grayscale Alpha Coding

 

Grayscale coding involves the compression of (1) a YUV video plane, (2) a binary shape mask, and (3) a grayscale alpha plane. Compression of continuous alpha information is important for anti-aliased compositing and for creating effects with shadows or transparency.

 

To encode a grayscale alpha sequence, use the same procedure as with binary shape coding. The only difference is that the “.seg” file contains values in the range 0 to 255 which represent gradations of transparency from transparent to opaque. The object index is not used to create the binary mask. Instead, the encoder thresholds the alpha plane at the value dictated by Alpha.Binary.FromGrayThreshold and creates its own binary mask from this.

 

Both the encoder reconstructed files and the decoder output files are named slightly differently from the source files when using grayscale alpha coding. The binary mask is saved as a “.seg” file, and the grayscale output is saved as sequence_cif.0.aux.

 

Multiple Auxiliary Components

 

When multiple auxiliary components are used, the encoding procedure is similar to grayscale alpha coding. MAC allows more than one alpha channel to be encoded, provided there is a single binary shape file. It is necessary to provide a “.seg” file to serve as the binary mask and to provide one or more “.aux” files as the auxiliary components to encode. If three components are used, then the following files are needed:

 

\myencode\sequence_cif.yuv

Texture.

\myencode\sequence_cif.seg

Binary mask.

\myencode\sequence_cif.0.aux

Aux component 0

\myencode\sequence_cif.1.aux

Aux component 1

\myencode\sequence_cif.2.aux

Aux component 2

 

To enable MAC, it is necessary to set Alpha.MAC.Enable = 1 and to specify the mode using parameter Alpha.ShapeExtension.

Error Resilient Encoding

MPEG-4 Visual provides error robustness and resilience to allow accessing of image or video information over a wide range of storage and transmission media. The error resilience tools can be divided into three major categories: resynchronization (video packet), data partitioning and reversible VLC.

 

The encoder and decoder support the MPEG-4 error resilient syntax. These syntax elements can be enabled and controlled by the following parameters:

 

ErrorResil.RVLC.Enable [0]

Turn on reversible VLCs. This allows error recovery by reading the bitstream backwards. To use RVLCs, the data partitioned syntax must also be enabled by setting the parameter below.

ErrorResil.DataPartition.Enable [0]

Turn on data partitioning. In this case the motion and texture data are separated from each other in the bitstream.

ErrorResil.VideoPacket.Enable [0]

Enable breaking up the bitstream into video packets to provide periodic resynchronization markers throughout the bitstream.

ErrorResil.VideoPacket.Length [0]

Length of each video packet in bits. When a packet exceeds this length a new packet is created by inserting a resync marker before the next macroblock.

ErrorResil.AlphaRefreshRate

When video packets are enabled and binary or grayscale alpha information is present, inter coded VOPs can be sent with intra coded shape information for refresh purposes. This parameter controls how often this happens. It indicates the number of P-VOPs to allow between each coding of shape in intra mode.

 

The encoder will create an error resilient bitstream when the correct elements are enabled. However, the decoder will not be able to decode a stream containing actual errors, because error recovery is not implemented.

NEWPRED

NEWPRED is an error resilience tool. It uses upstream messages to indicate which NEWPRED segments are correctly decoded and which are erroneously decoded. The encoder, which receives upstream messages, uses only correctly decoded parts for prediction in an inter-frame coding. This prevents temporal error propagation without the insertion of intra coded macroblocks and improves the picture quality in error prone environments.

 

In NEWPRED, the encoder and the decoder may have multiple reference VOP memories, and the expanded choice of reference VOP prevents temporal error propagation. NEWPRED may use backward channel messages from the decoder to the encoder. In this case, the encoder selects a reference VOP taking into account the decoder error situation from these messages. This lets the decoder recover quickly from the error state. Note that the encoder controls the strategy of selecting reference pictures. The strategy may be independent of backward channel messages.

 

The unit of changing reference VOPs is called the NEWPRED segment. If segment is “VOP”, the encoder and the decoder change reference VOPs every VOP. If a segment is “Video Packet”, they change on each Video Packet. In this case, the position of the first macroblock in every Video Packet is fixed between I-VOPs. NEWPRED works independently of other error resilience tools, such as Resync Markers, Data Partitioning, and Reversible VLC. It also works with combinations of these tools. In this software, the encoder sets the reference VOP to use according to the parameter indicated in an additional file, whose name is indicated by the parameter “Newpred.Filename”.

 

Newpred.Enable [0]

Enable the use of NEWPRED mode.

Newpred.SegmentType [0]

This indicates the NEWPRED segment type: “VOP” or “VideoPacket”. If set to “VOP”, the encoder changes reference VOPs every VOP and if set to “VideoPacket”, the encoder changes reference VOPs every Video Packet.

Newpred.Filename [0]

This indicates the name of the file from which the encoder gets a reference VOP number. An example of that file is example.ref in the \demo directory.

Newpred.SliceList [0]

This indicates the macroblock numbers for the first macroblocks in each NEWPRED segment. If NewpredSegmentType is set to “VOP”, this must be set to “0”. If NewpredSegmentType is set to “VideoPacket”, this is set to a list of macroblock numbers, separated by commas, of the first macroblock in each Video Packet. For example, when the macroblock numbers are 0, 33, 66, this is set to “0,33,66”.

 

Interlace Encoding

Interlaced mode is used when the source video was interlaced and there is line tearing when two fields are viewed as a progressive scan frame. Alternate lines are grouped together during compression for more efficiency. The relevant parameters are:

 

Motion.Interlaced.Enable [0]

Turn on interlaced coding.

Motion.Interlaced.TopFieldFirst.Enable [0]

This parameter sets the top_field_first flag in each VOP header, indicating that the top field of the VOP is the first field to be displayed. If not set, the bottom field of the reconstructed VOP is the first field to be displayed.

Motion.Interlaced.AlternativeScan.Enable [0]

This parameter controls the alternate_vertical_scan_flag in each VOP header.  When set to 1 the alternate vertical scan is used for each VOP, otherwise the adaptive scan is used.

 

Interlaced coding can be used in frame-based mode, or with binary or grayscale shape. When coding interlaced content with binary shape, the actual shape mask is coded using the normal progressive method.

N-Bit Encoding

N-Bit mode allows for non-8-bit pixel widths. A typical example would be to use 12-bit words for better luminance quality. The following parameters control N-Bit mode:

 

Not8Bit.Enable

Flag to enable the use of N-Bit mode.

Not8Bit.QuantPrecision

Number of bits Q to represent the texture quantizer. The actual quantizer must lie in the range 1 to (2Q-1).

 

In N-Bit mode, the representation of YUV information is based on 16-bit words. It is necessary to re-compile the encoder with the __NBIT_ flag set. The source YUV files must contain 16-bit pixels rather than 8-bit pixels. More information can be found in the section on file formats.

Temporal Scalability Encoding

 

 

Scalability

One of "None", "Temporal", "Spatial". This parameter selects the type of scalability

Scalability.Temporal.PredictionType

Range 0 to 4.

Scalability.Temporal.EnhancementType

One of "Full", "PartC" or "PartNC" can be used for this parameter.

 

More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:

norio@imgsl.mkhar.sharp.co.jp

 

Spatial Scalability Encoding

The parameters used for spatial scalability encoding are described in the following table:

 

 

Scalability.Spatial.

 EnhancementType

One of "Full", "PartC" or "PartNC" can be used for this parameter. Refer the following table for the usage of Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType.

Scalability.Spatial.PredictionType

For the enhancement layer, P-VOP or B-VOP coding can be used. Set this parameter to one of "PPP" or "PBB".  Forward prediction (P-VOPs) and also backward prediction (B-VOPs) are used as the spatial prediction. The reference VOP is the temporally coincident VOP in the base layer. The forward prediction in the case of B-VOPs is used as the temporal prediction from the enhancement layer VOP. The reference VOP is set to the most recently decoded VOP of the enhancement layer.

Scalability.Spatial.Width

Enhancement layer display-frame width.

Scalability.Spatial.Height

Enhancement layer display-frame height.

Scalability.Spatial.HorizFactor.N

This is a 5-bit unsigned integer that forms the numerator of the ratio used for horizontal spatial re-sampling in scalability. The value of zero is forbidden.

Scalability.Spatial.HorizFactor.M

This is a 5-bit unsigned integer that forms the denominator of the ratio used for horizontal spatial re-sampling in scalability. The value of zero is forbidden.

Scalability.Spatial.VertFactor.N

This is a 5-bit unsigned integer that forms the numerator of the ratio used for vertical spatial re-sampling in scalability. The value of zero is forbidden.

Scalability.Spatial.VertFactor.M

This is a 5-bit unsigned integer that forms the denominator of the ratio used for vertical spatial re-sampling in scalability. The value of zero is forbidden.

Scalability.Spatial.

 UseRefShape.Enable

This is one bit flag that indicates the procedure to decode binary shape for spatial scalability. Refer to the following table for the usage of Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType.

Scalability.Spatial.

 UseRefTexture.Enable

When this flag is set, no update for texture is done. Instead, the available texture in the layer denoted by ref_layer_id will be used.

Scalability.Spatial.Shape.

 HorizFactor.N

This is a 5-bit unsigned integer which forms the numerator of the ratio used for horizontal spatial re-sampling in shape scalability. The value of zero is forbidden.

Scalability.Spatial.Shape.

 HorizFactor.M

This is a 5-bit unsigned integer which forms the denominator of the ratio used for horizontal spatial re-sampling in shape scalability. The value of zero is forbidden.

Scalability.Spatial.Shape.

 VertFactor.N

This is a 5-bit unsigned integer which forms the numerator of the ratio used for vertical spatial re-sampling in shape scalability. The value of zero is forbidden.

Scalability.Spatial.Shape.

 VertFactor.M

This is a 5-bit unsigned integer which forms the denominator of the ratio used for vertical spatial re-sampling in shape scalability. The value of zero is forbidden.

 

Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType usage is described in the following table:

 

Scalability.Spatial.

EnhancementType

Scalability.Spatial.

UseRefShape.Enable

Alpha usage

Enhancement layer coding

Base layer

Enhcmnt layer

"Full"

0

Binary

Binary

Scalable shape and texture coding.

"Full"

1

Binary

Binary

Scalable texture coding with up-sampled base layer shape.

"PartC" or "PartNC"

0

Binary

Binary

Partial region (ROI) scalable shape and texture coding with spatial composition ("PartC") or without  spatial composition("PartNC").

"PartC" or "PartNC"

1

Rectangular/Binary

Binary

Scalable texture coding and non-scalable shape coding with background composition ("PartC") or without background composition. ("PartNC").

 

Sprite Encoding and Global Motion Compensation

In order to encode a basic or low-latency sprite it is necessary to prepare a sprite texture file (“.spt” file) and a warping points file (“.pnt” file). The formats of these files are described in the section on file formats. The following parameters control the encoding of sprites:

 

Sprite.Type

This indicates the coding modes for sprites. This can be “None” if sprite is not used, Static for static/low-latency sprite coding, or “GMC” for global motion compensation encoding.

Sprite.WarpAccuracy

This indicates the quantisation accuracy of motion vectors used in the warping process. This can be set to “1/2”, “1/4”, “1/8”, or “1/16”.

Sprite.Directory

The name of the directory containing the sprite texture file for encoding.

Sprite.Points

This indicates the number of points used in sprite warping. This parameter has can range from 0 to 4, and must be equal to the number of sprite points specified in the “.pnt” file for encoding. When the value is 0, warping is identity (stationary). When its value is 1 translational motion is used, and for 2 or 3, an affine transform is used. Four points correspond to perspective transformation (sprite only; not supported for GMC).

Sprite.Points.Directory

The name of the directory containing the warping points file for encoding.

Sprite.Mode

This indicates the type of sprite coding. This can be Basic for basic sprite or "LowLatency" for Low-latency Sprites.  The values "PieceObject" and "PieceUpdate" are for debugging purpose only and should not be used.

 

To enable basic or low-latency sprites, it is necessary to set Sprite.Type = "Static" and to specify the mode using parameter Sprite.Mode.

 

Sprites can be rectangular or can make use of a binary mask. To encode with a binary mask, the sprite file must contain such a mask located after the texture data, and the value of Alpha.Type must be set to “Binary”.

 

Global Motion Compensation makes possible the encoding of the global motion of an object. To encode using GMC it is necessary to set Sprite.Type = “GMC”. The encoder will then calculate the global motion as needed. It is not necessary to provide “.spt” or “.pnt” files when using GMC.

 

The combinations of GMC with Binary Shape encoding, Grayscale Shape encoding, Interlaced encoding, Error Resilient encoding for Version 1, Quarter-pel Motion Compensation and SA-DCT are supported.

Still Texture Encoding

Still texture mode is used to compress single images using a wavelet-based encoder. A shape coding technology is available that can be used to code still texture objects with arbitrary shape. In addition, there are a number of scalability options.

 

In order to encode using still texture mode, it is necessary to prepare a parameter file with the following lines:

 

Version = 902

VTC.Enable = 1

VTC.Filename = "vtcparfile.ctl"

 

This causes the encoder to start reading the file vtcparfile.ctl instead of the regular parameter file. This “.ctl” file contains a set of numerical or textual parameters separated by carriage returns. The parameters must all be present, and should be provided in the order given in the following table, which describes their use:

 

 

visual_object_verid

This indicates the visual version id: 1 for version 1, 2 for version 2.

YUV image file

The name of the input image file to be encoded.

Texture_object_layer_shape

This indicates the shape type of a texture object layer: 0 for rectangular, 1 for binary shape.

alpha image file

The name of the file containing the binary mask to be encoded when texture_object_layer_shape is 1. This must be present even if a binary mask is not being used.

sto_constant_alpha

This flag should be set to ‘1’ when the opaque alpha values of the binary mask are to be replaced by sto_constant_alpha_value. Otherwise it shoud be set to 0.

sto_constant_alpha_value

This is an 8-bit alpha value to replace the opaque pixels in the binary alpha mask. The value ‘0’ is forbidden.

alpha_threshold

This value controls the amount of error allowed in lossy binary shape coding. A value as high as 128 will produce a large loss. For scalable shape coding this value is always set to 0, i.e, lossless coding is performed.

change_conv_ratio_disable

When set to 1, this allows the encoder to down-sample alpha blocks in the base layer if necessary.

output bit file

Output file name for the compressed bitstream.

image format

Still image format: 1-monochrome, 3-YUV(4:2:0).

image width

The width of the input image in pels.

image height

The height of the input image in pels.

tiling_disable

???

tiling_jump_table_enable

???

tile_width

???

tile_height

???

wavelet_decomposition_levels

???

filter type

???

wavelet_download

???

uniform_wavelet_filter

???

Wavelet filters

???

quantization_type

1-single_quant  2-multi_quant

scan_direction

0-tree_depth  1-band_by_band

start_code_enable

0-disable  1-enable

target spatial levels

???

target SNR levels

???

target shape levels

The number of target spatial layers for shape. This value should be larger or equal to the identification number of wavelet decomposition layer used for the last spatial scalability layer (determined by target_spatial_level above). This value is limited to be less than or equal to wavelet_decomposition_levels.

fullsize_out

When set to 1, this flag enables the generation of a full-size output image.

Start tile Id ?

???

End tile Id ?

???

texture_error_resilience_disable

1: disable, 0: enable (v2)

packet size ?

???

segment size ?

???

DC quantization size for luminance

???

DC quantization size for chorma

???

No. of Spatial scalability

???

No. of SNR level

???

Quant_size for luma signal

???

Quant_size for chroma signal

???

 

 

More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:

norio@imgsl.mkhar.sharp.co.jp, dscho@sait.samsung.co.kr, liang@ti.com, shipeng@sarnoff.com, hjlee@sarnoff.com

 

Complexity Estimation Support

Complexity estimation allows a decoder to allocate its resources efficiently when decoding a compressed stream. There is very little support for this in the Microsoft reference software. The only functionality available is for exercising the bitstream flags that are part of the scheme. When enabled, the encoder sends some dummy data in the bitstream. The dummy data is defined in source file vopseenc.cpp and can be changed if needed.

 

Complexity.Enable [0]

Global enable flag for complexity estimation.

Complexity.EstimationMethod [0]

Method of estimation. If set to 0 then use Version 1 method. If set to 1 then use Version 2 method.

Complexity.Opaque.Enable [0]

Flag enabling transmission of the number of shape blocks coded using opaque coding mode as a percentage of the total number of blocks.

Complexity.Transparent.Enable [0]

Flag enabling transmission of transparent mode data as above.

Complexity.IntraCAE.Enable [0]

Flag enabling transmission of Intra-CAE mode data as above.

Complexity.InterCAE.Enable [0]

Flag enabling transmission of Inter-CAE mode data as above.

Complexity.NoUpdate.Enable [0]

Flag enabling transmission of No-Update mode data as above.

Complexity.UpSampling.Enable [0]

Flag enabling transmission of up-sampling mode data as above.

Complexity.IntraBlocks.Enable [0]

Flag enabling transmission of intra block statistics.

Complexity.InterBlocks.Enable [0]

Flag enabling transmission of inter block statistics.

Complexity.Inter4VBlocks.Enable [0]

Flag enabling transmission of inter 4MV block statistics.

Complexity.NotCodedBlocks.Enable [0]

Flag enabling transmission of non-coded block statistics.

Complexity.DCTCoefs.Enable [0]

Flag enabling transmission of information about the number of DCT coefficients per block.

Complexity.DCTLines.Enable [0]

Flag enabling transmission of DCT 8´1 lines statistics.

Complexity.VLCSymbols.Enable [0]

Flag enabling transmission of information about the average number of VLC symbols per macroblock.

Complexity.VLCBits.Enable [0]

Flag enabling transmission of information about the average number of VLC bits per symbol.

Complexity.APM.Enable [0]

Flag enabling transmission of statistics about the use of advanced prediction mode.

Complexity.NPM.Enable [0]

Flag enabling transmission of statistics about the use of normal prediction mode.

Complexity.InterpMCQ.Enable [0]

Flag enabling transmission of statistics about the use of interpolated blocks.

Complexity.ForwBackMCQ.Enable [0]

Flag enabling transmission of statistics about the use of predicted blocks.

Complexity.HalfPel2.Enable [0]

Flag enabling transmission of statistics about the use of 1D half pel vectors..

Complexity.HalfPel4.Enable [0]

Flag enabling transmission of statistics about the use of 2D half pel vectors..

Complexity.SADCT.Enable [0]

Flag enabling transmission of statistics about the use of SADCT blocks.

Complexity.QuarterPel.Enable [0]

Flag enabling transmission of statistics about the use of quarter pel vectors in 1D or 2D.

Quarter-pel Encoding

In quarter-pel encoding mode, motion vectors are estimated to quarter-pel accuracy. This mode can be enabled with the associated parameter file entry:

 

Motion.QuarterSample.Enable [0]

Set this flag to 1 to enable quarter-pel motion estimation.

 

SA-DCT

Shape adaptive DCTs can be used during binary or grayscale alpha coding instead of 8´8 DCTs in combination with padding. Setting the associated parameter file entry will enable this mode:

 

Texture.SADCT.Enable [0]

Set this flag to 1 to enable the use of shape adaptive DCTs when binary shape coding is enabled.

 

Reduced resolution VOP mode

 

RRVMode.Enable [0]

???

RRVMode.Cycle [0]

???

 

More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:

eishi@flab.fujitsu.co.jp

 

VOL Control Parameters

It is possible to enable the sending of auxiliary information in the VOL header. These VOL control bits can be specified in the encoder parameter file:

 

VOLControl.Enable [0]

Enable the sending of VOL control parameters.

VOLControl.ChromaFormat [0]

This is a two-bit field that indicates the chroma format. The only allowed value is 1, which indicates 4:2:0 chroma.

VOLControl.LowDelay [0]

This is a flag that, if set to 1, signals that the bitstream can be decoded with low delay (contains no B-VOPs).

VOLControl.VBVParams.Enable [0]

Enable the sending of Video Buffering Verifier information.

VOLControl.Bitrate [0]

VBV information. This is a 30-bit representation of the bitrate in units of 400 bits per second.

VOLControl.VBVBuffer.Size [0]

VBV information. This is an 18-bit representation of the VBV buffer size in units of 16384 bits.

VOLControl.VBVBuffer.Occupancy [0]

VBV information. This is a 26-bit representation of the VBV buffer occupancy in units of 64 bits.

 

List of Supported Tools

 

The following table indicates which tools are supported in the software and details the degree of implementation:

 

Tool

Version

Comments

Basic

(I-VOP, P-VOP, AC/DC Prediction, 4MV, Unrestricted MV)

1

Supported

B-VOP

1

Supported. No MPEG rate control.

P-VOP with OBMC

1

Supported

Method 1, Method 2 Quantisation

1

Supported

Error Resilience

1

Syntax only. No recovery from error supported.

Short Header (H.263 emulation)

1

Decode only.

Binary Shape (progressive)

1

Supported. No automatic VOP generation.

Grayscale Shape

1

Supported

Interlace

1

Supported

N-Bit

1

Supported

Temporal Scalability (Base)

1

???

Temporal Scalability (Enhancement)

1

???

Spatial Scalability (Base)

1

???

Spatial Scalability (Enhancement)

1

???

Sprite

1

Supported. No warping parameter estimation.

Still Texture

1

Supported

Dynamic Resolution Conversion

2

 

NEWPRED

2

Upstream signaling is simulated not implemented.

Global Motion Compensation

2

Supported

Quarter-pel Motion Compensation

2

Supported

SA-DCT

2

Supported

Error Resilience for Still Texture Coding

2

Supported

Wavelet Tiling

2

Supported

Scalable Arbitrary Shape for Still Texture Coding

2

???

Object Based Spatial Scalability (Base)

2

Supported

Object Based Spatial Scalability (Enhancement)

2

Supported

Multiple Auxiliary Components

2

Supported

Complexity Estimation Support

2

Bitstream syntax supported only.

 

Known Issues and Limitations

There are many known issues and limitations with this software release. Not all the functionality of MPEG-4 is present – only natural video is covered. Systems layer functionality and 3D/SNHC parts are not included. For some tools, only the syntax parsing is present, but not the functionality that would be needed for a commercial implementation. An approximate list of concerns is presented here:

 

 

 

Bug Reporting

Bugs and specific questions about the software can be reported to the following people:

 

Simon Winder

Primary Contact

swinder@microsoft.com

Ming-Chieh Lee

General Info

mingcl@microsoft.com

Takefumi Nagumo

Spatial Scalability

nagumo@av.crl.sony.co.jp

Norio Ito

Temporal Scalability

Still Texture

norio@imgsl.mkhar.sharp.co.jp

Yoshinori Suzuki

GMC

yosinori@crl.hitachi.co.jp

Karsten Suehring

SA-DCT

suehring@hhi.de

Eishi Morimatsu

Dynamic Rate Control

RRV

eishi@flab.fujitsu.co.jp

Mathias Wien

Quarter Pel

wien@ient.rwth-aachen.de

D.-S. Cho

Still Texture

dscho@sait.samsung.co.kr

Hung-Ju Lee

Still Texture

hjlee@sarnoff.com

David Ruhoff

TM5 Rate Control

druhoff@mediaone.net

Hideaki Kimata

NEWPRED

kimata@nttvdt.hil.ntt.co.jp

Krit Panusopone

Interlaced

kpanusop@procy.gi.com

David Shu

Low Latency Sprite

dbshu@hrl.com

Dick van Smirren

Short Header

D.vanSmirren@research.kpn.com

Cor Quist

Short Header

C.P.Quist@research.kpn.com

Marco Mattavelli

Complexity Estimation

Marco.Mattavelli@epfl.ch

Jie Liang

Still Texture

liang@ti.com

 

Microsoft can be contacted at the following address:

 

Microsoft Corporation

One Microsoft Way

Redmond, WA 98052

 

References

  1. ISO/IEC 14496 Version 1 Part 2 Visual
  2. ISO/IEC 14496 Version 2 Part 2 Visual. (See document N3056 from the Maui MPEG meeting, December 1999).
  3. MPEG Official Web site http://drogo.cselt.stet.it/mpeg/