ISO/IEC
14496 (MPEG-4) Video Reference Software
Primary Author: Simon A. J. Winder (Microsoft Corporation)
Last update: December 13, 2000
This manual documents the Microsoft MPEG-4 Video Reference Software. This software is for encoding and decoding video sequences using the MPEG-4 compression format. Microsoft Corporation originally developed it during the experimental phase of the MPEG-4 standardization process. Later additions and edits were made by a multitude of parties to add all the Version 1 and Version 2 coding tools. The purpose of this software is to act as a reference implementation to complement the text version of the standard.
The following copyright statement covers the use of all software modules:
“This software module was originally developed by Microsoft Corporation and also edited by multiple parties in the course of development of the MPEG-4 Video (ISO/IEC 14496-2). This software module is an implementation of a part of one or more MPEG-4 Video tools as specified by MPEG-4 Video. ISO/IEC gives users of MPEG-4 Video free license to this software module or modifications thereof for use in hardware or software products claiming conformance to the MPEG-4 Video. Those intending to use this software module in hardware or software products are advised that its use may infringe existing patents. The original developer of this software module and his/her company, the subsequent editors and their companies, and ISO/IEC, have no liability for use of this software module or modifications thereof in an implementation. Copyright is not released for non MPEG-4 Video conforming products. Microsoft Corporation and the other originators of this software retain full right to use the code for their own purpose, assign or donate the code to a third party and to inhibit third parties from using the code for non ISO/IEC 14496 conforming products. This copyright notice must be included in all copies or derivative works.”
The source code provided here compiles into three executables. These are encoder.exe, decoder.exe and convertpar.exe; the latter being a utility program used for upgrading from old to new style parameter files. These three executables are available pre-compiled in the \demo directory.
The following source files and directories are used when compiling the encoder:
\app\encoder\encoder.dsp |
Encoder project file |
\app\encoder\encoder.cpp |
Encoder main() |
\sys |
Common files |
\sys\encoder |
Encoder specific |
\tools |
|
\type |
Common types |
\vtc |
Wavelet code |
The following source files and directories are used when compiling the decoder:
\app\decoder\decoder.dsp |
Decoder project file |
\app\decoder\decoder.cpp |
Decoder main() |
\sys |
Common files |
\sys\decoder |
Decoder specific |
\tools |
|
\type |
Common types |
\vtc |
Wavelet code |
The following source files and directories are used when compiling the parameter file conversion utility:
\app\convertpar\convertpar.dsp |
Convertpar project file |
\app\convertpar\convertpar.cpp |
Convertpar main() |
The project files can be used when Microsoft Visual C++ 6.0 is used for compilation. Simply double click the required “.dsp” file to open the project, then select “Build” from the menu. If Microsoft Visual C++ 5.0 is used, the project files will not work. In this case, a new “win32 console application” should be created and the tables above used as a guide to add appropriate files to the new workspace. Alternatively, the list of source files can be obtained by looking in the “.dsp” file in Notepad.
You have a choice of building using debug mode, or using release mode (from the Build menu, choose “Set Active Configuration” to select the mode). In release mode, the codec will run faster, however, for historical reasons, the error reporting is very bad in this mode, so you may not get errors if the encoder or decoder fail due to incorrect setting of the encoder parameters, errors in the source bitstream, end of file in the source sequence, and so on. In this case, either there will be an access violation, or else the results will be incorrect. In debug mode, there will most likely be an error message, so it is useful to have both types of executable handy. However, the debug error messages often take the form of assert failures, and it will be necessary to look at the source code or use some intelligence to find out what has gone wrong.
The following compilation options can be set:
__TRACE_AND_STATS_ |
Enables code to calculate encoder statistics. |
__DOUBLE_PRECISION_ |
Makes all floating point code double precision. |
__DISABLE_4MV_FOR_PVOP_ |
Prevents the encoder from ever generating 4 motion vectors per macroblock when PVOPs are encoded. |
__NBIT_ |
Enables use of up to 16-bit pixel values. |
Enables random changes in macroblock quantizer when rate control is turned off. |
Typically, __TRACE_AND_STATS_ and __DOUBLE_PRECISION_ are the only options to be set, unless other modes are needed.
Two makefiles have been contributed for compilation under UNIX. These are named makefile and unixmake. Compilation has been tested with GNU g++ 2.8.1. Bugs were found with versions of g++ prior to this one. Additionally, it is necessary to compile with the “-O” option, but not the “-O2” option.
The \demo directory provides a number of example bitstreams, parameter files, and pre-compiled executables. The following files are present:
encoder.exe |
Video encoder. |
decoder.exe |
Video decoder. |
convertpar.exe |
Helper to convert old parameter files to new type. |
example.par |
Example parameter file. |
example.ref |
Example of NEWPRED reference file. |
stef_cif.par |
Example parameter file (frame-based coding). |
brea_cif.par |
Example parameter file (binary object-based coding). |
brea_qcif.par |
Example parameter file (greyscale alpha object-based coding). |
demo_stef_cif.par |
Example parameter file for re-encoding (frame-based coding). |
demo_brea_cif.par |
Example parameter file for re-encoding (binary object-based coding). |
demo_brea_qcif.par |
Example parameter file for re-encoding (greyscale alpha object-based coding). |
sprt_stef_sif.par |
Example parameter file (static sprite coding). |
stef_cif.cmp |
Example encoded bitstream (frame-based coding). |
brea_cif.cmp |
Example encoded bitstream (binary object-based coding). |
brea_qcif.cmp |
Example encoded bitstream (greyscale alpha object-based coding). |
sprt_stef_sif.cmp |
Example sprite bitstream (static sprite coding). |
vtc_brea_cif.cmp |
Example Visual Texture bitstream. |
vtc_brea_cif.ctl |
Example Visual Texture control file. |
vtc_brea_cif.par |
Example Visual Texture parameter file. |
wavelet_tiling |
Directory of wavelet examples. |
The frame-based example bitstream can be decoded using the following command line:
decoder stef_cif.cmp stef_cif
352 288
This will result in a ten-frame sequence being stored in file stef_cif.yuv, with dimensions 352 ´ 288 pixels. This short sequence can be viewed under Microsoft Windows using yuvtoavi to convert to an AVI file and by playing this using Windows Media Player.
It is possible to re-encode this decoded file (with subsequent quality reduction) using the encoder and appropriate parameter file:
encoder
demo_stef_cif.par
This will create the bitstream file cmp\01\stef_cif.cmp and the reconstructed sequence file rec\01\stef_cif.yuv. This sequence is the encoder’s reconstruction of the sequence that would be obtained if the compressed bitstream were decoded. The new bitstream will not be the same as the original bitstream because the compression is lossy.
The binary shape example bitstream can be decoded using the following command line:
decoder brea_cif.cmp brea_cif
352 288
This will result in a ten-frame texture sequence being stored in file brea_cif.yuv and the corresponding ten-frame binary alpha sequence being stored in file brea_cif.seg. The regions of the frame that are outside the shape of the video object are indicated with zero pixel values in the “.seg” file. Pixels within the video object have the value 255.
To re-encode this decoded sequence, use the following command:
encoder
demo_brea_cif.par
This will create the bitstream file cmp\255\brea_cif.cmp and the reconstructed sequence files rec\255\brea_cif.yuv and rec\255\brea_cif.seg. Note that the “object number” is 255 because the parameter file specifies to encode only the regions of each frame where the alpha plane has a pixel value equal to 255.
The grayscale shape exqample bitstream can be decoded with the following command:
decoder brea_qcif.cmp brea_qcif
176 144
This will result in three files being created with a frame size of 176 ´ 144 pixels. The ten-frame texture information is stored in brea_qcif.yuv, the binary alpha mask is stored in brea_qcif.seg and the grayscale alpha plane is stored in brea_qcif.0.aux. The grayscale alpha plane contains full transparency information and can be used for high quality anti-aliased composition of the video object onto another background.
To re-encode this sequence, first delete the file brea_qcif.seg, and then rename the file brea_qcif.0.aux so that it is now called brea_qcif.seg. The encoder will generate its own binary alpha mask from the grayscale mask. Use the following command to encode the sequence:
encoder
demo_brea_qcif.par
The encoder will output four files: the bitstream file will be cmp\255\brea_qcif.cmp, the reconstructed texture file will be rec\255\brea_qcif.yuv, the binary alpha mask will be rec\255\brea_qcif.seg, and the grayscale alpha plane will be rec\255\brea_qcif.0.aux.
The following command line can be used to decode the example sprite sequence:
decoder sprt_stef_sif.cmp
sprt_stef_sif 352 240
This results is an eight-frame sequence of the crowd background from “Stefan”. The image size is 352 ´ 240. Two files are produced: sprt_stef_sif.yuv is the texture file and sprt_stef_sif.seg is a binary alpha mask. The binary alpha mask for this sequence is pretty uninteresting as it has every pixel set to 255; however, it is possible to encode arbitrary shaped sprites using the correct parameter file options.
The following command line can be used to decode the single-frame example image compressed using the wavelet texture encoder:
decoder -vtc
vtc_brea_cif.cmp out.yuv 352 288 1 1
This will result in a single YUV file, out.yuv, having size 352 ´ 288 pixels.
Tools are provided to help with conversion to and from the YUV format used by the codec. The two tools are yuvtoavi.exe and avitoyuv.exe. These can be found in the \demo directory. With these tools you can convert to and from the popular AVI video file format. Please note that YUV and AVI formats use different color-spaces and sampling strategies.
Convert from the most common AVI formats to the YUV format. The tool converts the colorspace from RGB to YUV and down-samples the chroma components to obtain YUV 4:2:0. The color-space is ITU-R 709. The down-sampling filter is {40, 88, 88, 40}/256. The format of the command is:
avitoyuv
source.avi destination.yuv
Any audio data is lost.
Convert from the 4:2:0 YUV format to uncompressed AVI. The tool up-samples the chroma components using the filter {3, -16, 67, 227, -32, 7}/256 and converts to RGB using the ITU-R 709 color-space. The format of the command is:
yuvtoavi
source.yuv destination.avi width height framerate
The width and height are given in pixels and must be a multiple of two. The framerate dictates how fast the AVI file will play and is specified in frames per second. Common frame sizes are given in the section “Data File Formats”.
The parameter file format changed in August, 1999. Files with the old format (before version 901) can be converted to the new format by running the convertpar tool that is included with the current release. Use the command line:
convertpar old_par_file
[new_par_file]
where the name of the new parameter file is an optional argument. If this argument is not present, then stdout is used.
This section covers the file formats for input to the encoder.
Files with the “.yuv” extension contain video texture data in a raw format. The YUV color-space is used, where Y is the luminance component and U/V are the color components. (Actually this color-space is correctly named Y, Cr, Cb and is not the same as PAL TV YUV components.) For 8-bit video, one byte is used to represent each Y or U/V value. Typically, the Y range is from 16 (black) to 235 (white). A value of UV = {128, 128} indicates that there is no color (grayscale). The reference software only supports YUV 4:2:0 formatted files. In this format, the U/V components have half the resolution, both horizontally and vertically, when compared to the Y component. In the file format, the Y, U and V planes are written one after the other in that order. If there are N frames of video then the components are written consecutively: Y0, U0, V0, Y1, … ,YN, UN, VN. The planes are simply raw pixel bytes written row-by-row, top left to bottom right. If the image size is width × height, then the size of the Y plane is width × height bytes and the U/V planes are each width × height / 4 bytes. The total number of bytes in the file is N × 3 × width × height / 2. There is no header information present. The frame size must be known in advance. Typical dimensions for encoded video are given in the following table:
Format |
Width |
Height |
QCIF |
176 |
144 |
CIF |
352 |
288 |
SIF |
352 |
240 |
SIF360 |
360 |
240 |
CCIR601 |
720 |
486 |
When the encoder is compiled to support N-bit video, the YUV files contain pixels that are 16-bit rather than 8-bit values. In this case, the file contains twice as many bytes as for the 8-bit case. The lease significant byte of each pixel value is stored first.
These files typically contain binary segmentation maps for shape-based coding or for grayscale alpha plane coding. Each pixel is represented by a byte and these bytes are organized as one plane per frame and each plane is written row-by-row, top left to bottom right. There is no header information. If the image size is width × height, then the size of the alpha plane is width × height bytes and for an N frame sequence, the total number of bytes in the file is N × width × height. There are three ways of interpreting each pixel. In the case of a binary alpha mask, the pixel value is either 0 or else 255, indicating transparent or opaque pixels respectively. In the case of a segmentation mask, the pixel value indicates to which one of multiple binary masks the pixel belongs. For example, three regions of the image can be assigned values 0, 1, and 2. In this way, different parts of the image can be coded; depending on which mask number is chosen during the encoding process. Regions of the image that are not equal to the chosen mask number would be considered to be transparent by the encoder. Finally, pixel values can be used to indicate transparency in the range 0 (transparent) to 255 (opaque) in the same way as with “.aux” files. The method of interpretation depends on the encoder settings.
These files always contain grayscale alpha plane data. Each pixel indicates the amount of transparency from 0 (transparent) to 255 (opaque). For each frame, one pixel plane is present. There is no U/V information. The planes are simply raw bytes written row-by-row, top left to bottom right. There is no header information. If the image size is width × height, then the size of the alpha plane is width × height bytes and for an N frame sequence, the total number of bytes in the file is N × width × height.
These files contain sprite point trajectories. They are text files which contain ASCII floating point numbers indicating how the sprite is to be warped on each frame. The format is as follows: The first line indicates the number of points P. This can be in the range 1 to 4. The next line contains P pairs of numbers, the first of the pair being the x coordinate and the second being the y coordinate. These numbers are no longer used by the reference software and can be zero. The rest of the file is formatted with 2×P+1 numbers on each line. The first of these is the frame number. The remaining P pairs indicate the x and y coordinates of the warping points for this frame. These points indicate the locations in the sprite image that correspond to the relevant P corners of the display window. An example of the format for the case of P=4 is given below:
4
-393.00 -88.00 -393.00
361.00 362.00 -88.00 362.00 361.00
0000 135.00 0.00 331.00
0.00 135.00 237.00 331.00 237.00
0001 134.55 0.44 330.74
0.46 134.46 236.92 330.88 236.78
0002 135.11 0.88 330.49
0.92 134.93 236.84 330.78 236.56
0003 135.56 1.17 330.31
1.26 135.43 236.70 330.66 236.29
0004 135.02 1.47 330.13 1.61 134.91 236.58 330.54 236.03
The encoder ignores the data values on the second line.
Sprite files contain sprite pixel and alpha mask information. The file format is binary with the following header:
struct sprite_header {
char magic[3];
long left, top, right, bottom;
long alpha_mode;
};
The value of magic is { ‘S’, ‘P’, ‘T’ }. The next four words indicate the sprite dimensions. After this is the alpha mode, which can be 0, 1, or 2, indicating rectangular, binary, or grayscale sprites. After this header structure come the YUV planes in 4:2:0 format, followed by the alpha plane if the alpha mode indicates a non-rectangular sprite. The YUV and alpha data is stored in the same format as described previously. There is only one frame present in a sprite file.
The syntax of the decoder command varies depending on what kind of sequence is being decoded.
The decoder takes a bitstream file and decodes it, generating a “.yuv” file and, if object-based decoding is enabled, also a “.seg” file and possibly one or more “.aux” files. The following command line should be used:
decoder bit_file.cmp out_file width height
where bit_file.cmp is the name of the compressed bitstream file, and out_file is the root name of the output file to create. The “.yuv”, “.seg”, or “.aux” extensions will be added by the decoder as necessary. Here, width and height represent the width and height of the destination image. This can be larger than the actual encoded image size if necessary, however it cannot be smaller. With the current implementation it is necessary to know the size of the encoded sequence in advance in order to specify it on the command line.
To decode a bitstream with an enhancement layer, use the command line:
decoder
bit_file.cmp bit_file_enh.cmp out_file base_width base_height
where bit_file_enh.cmp is the name of the enhancement layer bitstream file, and out_file is the root name of the output file to create.
To decode a bitstream with an enhancement layer of type Object Based Spatial Scalability, use the command line:
decoder
bit_file.cmp bit_file_enh.cmp out_file base_width base_height enhance_width
enhance_height
To decode a bitstream containing only a single image that has been compressed using the wavelet still-image codec, use one of the following command lines:
decoder –vtc bit_file.cmp
out_file.yuv width height spatial_layer snr_layer
decoder –vtc bit_file.cmp
out_file.yuv width height spatial_layer snr_layer shape_layer fullsize_out
decoder –vtc bit_file.cmp out_file.yuv width height spatial_layer snr_layer shape_layer fullsize_out start_tile_id end_tile_id
The meaning of these parameters is as follows:
bit_file.cmp |
Bitstream file to decode. |
out_file.yuv |
File for output. |
width, height |
Dimensions of output image. |
spatial_layer |
? |
snr_layer |
? |
shape_layer |
The number of spatial layers for shape. This value should have larger value or be equal to the identification number of wavelet decomposition layer used for the last spatial scalability layer (determined by spatial_layer above). This value is limited to wavelet_decompositoin_ levels. |
fullsize_out |
An enable flag to generate full-size output image. It is set to 1 for full-size output. |
start_tile_id |
? |
end_tile_id |
? |
More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:
norio@imgsl.mkhar.sharp.co.jp,
dscho@sait.samsung.co.kr,
liang@ti.com, shipeng@sarnoff.com,
hjlee@sarnoff.com
The syntax of the encoder command is as follows:
encoder –version
encoder params.par
The first form of the command prints the current version number of the encoder executable. The second form runs the encoder using the encode options supplied in parameter file params.par. Additional parameter files may be needed, depending on the type of encode tools that are being used. Example parameter files are found in the \demo directory of the release.
Parameter files ending in the “.par” extension contain name-value pairs which set the values for encoder parameters. Parameter files must start with the line:
!!!MS!!!
Numeric parameters are specified as:
Example.Version = 905
String parameters are specified with quotes as:
Example.FilePrefix = "brea_cif"
Lists are specified using braces and commas:
Example.SomeList = {1, 2, 3, 4, 5}
Some name-value pairs can be specified separately for the base and enhancement layers. In this case, the base layer parameter is:
Texture.QuantStep.PVOP [0] = 16
and the enhancement layer parameter is:
Texture.QuantStep.PVOP [1] = 16
Comments can be added by using the “//” form, as in C++, in which case, the remainder of the line is considered to be a comment.
In order to encode a sequence it is necessary to prepare a parameter file. The following table describes the essential parameters used for coding most sequences:
Version |
Version number of the parameter file. Please see the example parameter file for the current version number. |
VTC.Enable |
When set to 1 this flag causes the encoder to ignore the rest of the parameter file and instead loads the visual texture coding parameter file. |
VTC.Filename |
The filename for the visual texture coding parameter file. |
VersionID |
The version number of the bitstream. This must be set to 2 if any MPEG-4 Version 2 tools are used for encoding; otherwise a Version 1 VOL header will be produced. |
Source.Width Source.Height |
The dimensions of the source sequence or image, in pixels. These numbers must be even because of 4:2:0 chroma sampling. |
Source.FirstFrame Source.LastFrame |
The physical first and last frame number to encode. The very first frame in the YUV file is frame zero. For example, to encode the first ten frame of a sequence at the original frame rate, use 0 and 9 for these numbers. |
The object index selects which object in a multiple object sequence to encode. When a “.seg” file is used, it is the pixel value of the required foreground object in that file. For a binary alpha plane, this value is typically 255. |
|
Source.FilePrefix |
The name of the source file to encode, without any filename extension. The encoder adds “.yuv” or “.seg” as necessary. |
Source.Directory |
The directory containing the source file. If the source sequence is not present in this directory, the number of the object index is next appended to this directory name as a subdirectory. The encoder uses this new location to find the source sequence files. |
Source.BitsPerPel |
The number of bits per pel when using N-BIT encoding. Typically this is set to 8. N-BIT coding only works if the __NBIT_ flag was set during compile time. |
Source.Format |
At present, this must be set to “420” |
Source.FrameRate [0] |
The number of source file frames per second. |
Source.SamplingRate [0] |
This allows temporal sub-sampling of the source file. The encoder will only encode every Nth frame of the source sequence. When set to 1, every frame of the source file is encoded. When set to 3, every third frame is encoded, starting from the frame number specified in Source.FirstFrame. |
Output.Directory.Bitstream |
The directory in which to store the encoded bitstream. The number of the object index is appended to this directory name as a subdirectory. |
Output.Directory.DecodedFrames |
The directory in which to store the reconstructed frames (anticipated decoder output). The number of the object index is appended to this directory name as a subdirectory. |
RateControl.Type [0] |
Type of rate control to use. Can be “None”, “MP4”, or “TM5”. |
RateControl.BitsPerSecond [0] |
Number of bits per second when encoding with rate control enabled. |
Quant.Type [0] |
Quantiser type. Can be “MPEG” or “H263”. |
GOV.Enable |
This flag enables the generation of Group of VOPs headers when set to 1. |
GOV.Period |
Specifies how often to send GOV headers. The figure indicates the number of VOPs to insert between two GOV headers. |
Texture.IntraDCThreshold [0] |
This parameter has the values 0 to 7 and controls the way that texture coefficients are coded. The value 0 means never code Intra DC values using the AC coefficient VLC table. The value 7 means always do that. The value 1 means do that when QP>=13, 2 when QP>=15, 3 when QP>=17, 4 when QP>=19, 5 when QP>=21, and 6 when QP>=23. |
Texture.QuantStep.IVOP [0] |
Quantiser step size for I-VOPs. |
Texture.QuantStep.PVOP [0] |
Quantiser step size for P-VOPs. |
Texture.QuantStep.BVOP [0] |
Quantiser step size for B-VOPs. |
Texture.QuantMatrix.Intra .Enable [0] |
This flag enables loading of a non-default Intra quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix. |
Texture.QuantMatrix.Intra [0] |
The Intra quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”. |
Texture.QuantMatrix.Inter .Enable [0] |
This flag enables loading of a non-default Inter quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix. |
Texture.QuantMatrix.Inter [0] |
The Inter quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”. |
Motion.RoundingControl.Enable |
This flag enables the use of rounding control for P-VOPs when set to 1. |
Motion.RoundingControl.StartValue |
This indicates the first value for rounding control when enabled (either 0 or 1). Rounding control is a bitstream bit that switches between 0 and 1 on successive predicted frames. |
Motion.PBetweenICount |
The number of predicted P-VOPs between each I-VOP. If this value is less than zero, then there will be one I-VOP at the start of the sequence and then all successive frames will be predicted. |
Motion.BBetweenPCount |
The number of B-VOPs between each P-VOP. This value is 2 for an IBBPBBPBBP sequence. |
Motion.SearchRange [0] |
The search range to use during motion estimation (in pels). If this is large, then encoding will be slow, especially when using B-VOPs, because the P-VOP search range is multiplied by 1+BBetweenPCount. |
Motion.SearchRange .DirectMode [0] |
The search range to use during direct mode B-VOP motion estimation (in half pels). This value is typically small as it represents an offset from the P-VOP motion vector. |
Motion.AdvancedPrediction .Enable [0] |
This flag enables Overlapped Block Motion Compensation. |
Motion.SkippedMB.Enable |
This flag allows the encoder to skip macroblocks in P-VOPs that do not change between frames. This improves efficiency but can cause artifacts when used with B-VOPs if there is a transient occluding object. |
Motion.UseSourceForME .Enable [0] |
When set to 1 this flag causes the original reference frame to be used for motion estimation instead of the encoded (noisy) reference frame. |
Motion.DeblockingFilter.Enable |
This flag enables the use of the MPEG de-blocking filter. However, the filter is currently not implemented and the flag should be set to 0. |
Motion.ReadWriteMVs [0] |
This parameter controls the reading or writing of motion vector files to help with experiments. When set to “Read”, motion vectors are read from a file rather than generating them by motion estimation. When set to “Write”, the estimated motion vectors are saved to the file for future use. The value “Off” disables this behavior. |
Motion.ReadWriteMVs .Filename [0] |
The filename to which motion vectors should be read or written. |
Trace.CreateFile.Enable |
Enable the generation of a trace file, with “.trc” extension. Trace files are large and slow down encoding. The compile flag __TRACE_AND_STATS_ must be set to enable this function. |
Trace.DetailedDump.Enable |
Enable very detailed bitstream tracing. |
When encoding a sequence, the encoder looks to see if a source file is present at the location specified by the Source.Directory parameter. If the file is not present, then the object index is added to the directory name and the file must be found in this location. For example, if the encode directory is \myencode, the object index is 2 and the source file prefix is fore_cif, then \myencode is first checked for fore_cif.yuv and if not present, the source file \myencode\02\fore_cif.yuv is used. The object index is also used to create the name of the directory in which the reconstructed frames are stored. To encode using frame-based mode, the value of Alpha.Type should be set to “None”.
The source file may be encoded with different frame-rates by temporal down sampling. If the first ten frames of a 30Hz sequence are to be encoded at 30Hz, use the following:
Source.FirstFrame = 0
Source.LastFrame = 9
Source.FrameRate [0] = 30
Source.SamplingRate [0] = 1
To encode ten frames from the same source file, but this time at 10Hz, use the following:
Source.FirstFrame = 0
Source.LastFrame = 29
Source.FrameRate [0] = 30
Source.SamplingRate [0] = 3
There are two types of rate control: MPEG-4 rate control and TM5 rate control. In MPEG rate control, the quantizer is only changed on a frame basis, whereas TM5 rate control is macroblock-based and so macroblock DQUANT may be non-zero. For both rate control types, RateControl.BitsPerSecond must be set to indicate the number of bits per second. Note that in order for rate control to function correctly, it is also necessary to set Source.FrameRate to the actual frame rate of the original sequence, and Source.SamplingRate to the number of frames to skip if encoding this sequence at a lower rate than it exists in the source YUV file.
MPEG rate control is available, but only works correctly under limited situations. It can only be used for sequences that start with one I-VOP and the rest of the sequence is P-VOPs. It does not work with B-VOPs. Also, it is necessary to set the values of Texture.QuantStep.IVOP and Texture.QuantStep.PVOP manually to set the quantizer value for the initial I-VOP and the first P-VOP. For the remaining P-VOPs, the rate control will take over and set the quantizer. To initialize these quantizer values correctly, choose a mid-range value, e.g. 20 for the I-VOP and 15 for the P-VOP, then encode the sequence. At the end, the encoder will report the average quantizer value. Run the sequence again, using this value for the P-VOP quantizer. Adjust the I-VOP quantizer as you wish. If the MPEG rate control is functioning correctly and the bit-rate is not set too low, it will not skip frames.
TM5 rate control is the recommended form of rate control, as it is free from bugs, and works correctly for I-VOPs, P-VOPs, and B-VOPs without user intervention. It is not necessary to set the values of the quantizer by hand. However, if you do not want the quantizer step-size to change on every macroblock it is necessary to use MPEG rate control. If the bit-rate is set too low, TM5 rate control will eventually set the quantizer to 31, it will not skip frames. TM5 rate control cannot be used with non-rectangular VOPs.
Object based encoding makes use of one or more alpha planes in order to allow for video objects that are not rectangular. The following parameters control the encoding of these arbitrary shaped VOPs:
Alpha.Type |
This indicates the type of object based coding. This can be “None” for frame-based, “Binary” for the case when there is only a binary mask, “Gray” for true grayscale alpha transparency coding, and “Shape Only” for the case when there is only a binary mask and no video texture. |
Alpha.Binary.RoundingThreshold |
Controls amount of error allowed in lossy binary shape coding. A value as high as 128 will produce a large loss. If set to 0, lossless coding results, irrespective of the value of the size conversion parameter below. |
Alpha.Binary.SizeConversion.Enable |
Controls lossy binary shape coding. If set to 1, the encoder is allowed to down-sample alpha blocks if necessary. |
Alpha.Binary.FromGrayThreshold |
Used when the encoder creates its own binary mask from a grayscale alpha plane. Grayscale pixels with a value greater or equal to this parameter will be treated as non-transparent. Otherwise, the grayscale plane and the binary plane will both contain transparent pixels. |
Alpha.QuantStep.IVOP [0] |
Grayscale alpha quantisation step size for I-VOPs. |
Alpha.QuantStep.PVOP [0] |
Grayscale alpha quantisation step size for P-VOPs. |
Alpha.QuantStep.BVOP [0] |
Grayscale alpha quantisation step size for B-VOPs. |
Alpha.QuantDecouple.Enable [0] |
Normally, when rate control is enabled, grayscale alpha quantisation will vary with the texture quantiser. When set to 1, this flag decouples the alpha quantiser and causes it to be fixed. |
Alpha.QuantMatrix.Intra.Enable [0] |
This flag enables loading of a non-default grayscale alpha Intra quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix. |
Alpha.QuantMatrix.Intra [0] |
The grayscale alpha Intra quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”. |
Alpha.QuantMatrix.Inter.Enable [0] |
This flag enables loading of a non-default grayscale alpha Inter quantisation matrix when using MPEG quantisation. It should be set to 1 to load the matrix. |
Alpha.QuantMatrix.Inter [0] |
The grayscale alpha Inter quantisation table. This is specified as 64 comma separated integers enclosed by “{“ and “}”. |
Alpha.MAC.Enable |
This flag enables the use of Multiple Auxiliary Components. When set to 1, multiple planes can be encoded in the same way as grayscale alpha channels. |
Alpha.ShapeExtension |
The type number from 0 to 15 of the MAC mode. Type 0 is the usual grayscale alpha channel mode. Other mode numbers are described in Table V2 - 1 “Semantic meaning of video_object_layer_shape_extension” of the document ISO/IEC 14496-2 Visual Amendment 1. |
Binary shape coding compresses a binary mask that defines a foreground video object. There are two modes to binary shape coding. In the first, the mask shape is compressed on its own, with no other data present. This is shape-only mode. In the second, the video object texture (YUV data) is also compressed.
To use binary shape coding, it is necessary to supply a texture file (“.yuv” file) and a segmentation mask file (“.seg” file). This is true even if shape-only mode is used; the texture file is read, but the information is ignored. The segmentation mask files are formatted as described in the section on data file formats.
Two ways of encoding binary shape sequences are possible. In the first, a segmentation file is created which contains a binary mask, with either 0 (background) or 255 (mask) for each pixel. A number of these files can then be used to specify each object in the same sequence. If there are three objects, e.g. background, foreground A, and foreground B, we can give these three VOPs index numbers 0, 1 and 2. We can store the three “.seg” files and the corresponding “.yuv” files in a number of separate directories:
\myencode\00\sequence_cif.yuv |
Background texture. |
\myencode\00\sequence_cif.seg |
Background mask. |
\myencode\01\sequence_cif.yuv |
Foreground A texture. |
\myencode\01\sequence_cif.seg |
Foreground A mask. |
\myencode\02\sequence_cif.yuv |
Foreground B texture. |
\myencode\02\sequence_cif.seg |
Foreground B mask. |
Then, in order to encode the background, the encode settings should be:
Alpha.Type = “Binary”
Source.ObjectIndex.First = 0
Source.FilePrefix = “sequence_cif”
Source.Directory = “\myencode”
Setting Source.ObjectIndex.First equal to 1, or 2 will be sufficient to encode the other VOPs.
The second method of encoding binary shape sequences is to have a common segmentation file that contains all the segmentation information. In this file, background pixels would be marked as 0, foreground A as 1, and foreground B as 2. The following files would be needed:
\myencode\sequence_cif.yuv |
Complete texture. |
\myencode\sequence_cif.seg |
Segmentation mask. |
To encode the background, foreground A, or foreground B, the encode settings would then be identical to the previous case. This is because the encoder first checks the path specified by Source.Directory for the files to encode, before adding on the object index.
Lossy binary shape coding can be achieved by setting the parameters Alpha.Binary.RoundingThreshold and Alpha.Binary.SizeConversion.Enable. The first of these dictates how much loss is acceptable for each shape block and ranges from 0 to 256. See the description in the table above for more information.
Grayscale coding involves the compression of (1) a YUV video plane, (2) a binary shape mask, and (3) a grayscale alpha plane. Compression of continuous alpha information is important for anti-aliased compositing and for creating effects with shadows or transparency.
To encode a grayscale alpha sequence, use the same procedure as with binary shape coding. The only difference is that the “.seg” file contains values in the range 0 to 255 which represent gradations of transparency from transparent to opaque. The object index is not used to create the binary mask. Instead, the encoder thresholds the alpha plane at the value dictated by Alpha.Binary.FromGrayThreshold and creates its own binary mask from this.
Both the encoder reconstructed files and the decoder output files are named slightly differently from the source files when using grayscale alpha coding. The binary mask is saved as a “.seg” file, and the grayscale output is saved as sequence_cif.0.aux.
When multiple auxiliary components are used, the encoding procedure is similar to grayscale alpha coding. MAC allows more than one alpha channel to be encoded, provided there is a single binary shape file. It is necessary to provide a “.seg” file to serve as the binary mask and to provide one or more “.aux” files as the auxiliary components to encode. If three components are used, then the following files are needed:
\myencode\sequence_cif.yuv |
Texture. |
\myencode\sequence_cif.seg |
Binary mask. |
\myencode\sequence_cif.0.aux |
Aux component 0 |
\myencode\sequence_cif.1.aux |
Aux component 1 |
\myencode\sequence_cif.2.aux |
Aux component 2 |
To enable MAC, it is necessary to set Alpha.MAC.Enable = 1 and to specify the mode using parameter Alpha.ShapeExtension.
MPEG-4 Visual
provides error robustness and resilience to allow accessing of image or video
information over a wide range of storage and transmission media. The error resilience
tools can be divided into three major categories: resynchronization (video
packet), data partitioning and reversible VLC.
The encoder and decoder support the MPEG-4 error resilient syntax. These syntax elements can be enabled and controlled by the following parameters:
ErrorResil.RVLC.Enable [0] |
Turn on reversible VLCs. This allows error recovery by reading the bitstream backwards. To use RVLCs, the data partitioned syntax must also be enabled by setting the parameter below. |
ErrorResil.DataPartition.Enable [0] |
Turn on data partitioning. In this case the motion and texture data are separated from each other in the bitstream. |
ErrorResil.VideoPacket.Enable [0] |
Enable breaking up the bitstream into video packets to provide periodic resynchronization markers throughout the bitstream. |
ErrorResil.VideoPacket.Length [0] |
Length of each video packet in bits. When a packet exceeds this length a new packet is created by inserting a resync marker before the next macroblock. |
ErrorResil.AlphaRefreshRate |
When video packets are enabled and binary or grayscale alpha information is present, inter coded VOPs can be sent with intra coded shape information for refresh purposes. This parameter controls how often this happens. It indicates the number of P-VOPs to allow between each coding of shape in intra mode. |
The encoder will create an error resilient bitstream when the correct elements are enabled. However, the decoder will not be able to decode a stream containing actual errors, because error recovery is not implemented.
NEWPRED is an error resilience tool. It uses upstream messages to indicate which NEWPRED segments are correctly decoded and which are erroneously decoded. The encoder, which receives upstream messages, uses only correctly decoded parts for prediction in an inter-frame coding. This prevents temporal error propagation without the insertion of intra coded macroblocks and improves the picture quality in error prone environments.
In NEWPRED, the encoder and the decoder may
have multiple reference VOP memories, and the expanded choice of reference VOP
prevents temporal error propagation. NEWPRED may use backward channel messages
from the decoder to the encoder. In this case, the encoder selects a reference
VOP taking into account the decoder error situation from these messages. This
lets the decoder recover quickly from the error state. Note that the encoder
controls the strategy of selecting reference pictures. The strategy may be independent
of backward channel messages.
The unit of changing reference VOPs is called the NEWPRED segment. If segment is “VOP”, the encoder and the decoder change reference VOPs every VOP. If a segment is “Video Packet”, they change on each Video Packet. In this case, the position of the first macroblock in every Video Packet is fixed between I-VOPs. NEWPRED works independently of other error resilience tools, such as Resync Markers, Data Partitioning, and Reversible VLC. It also works with combinations of these tools. In this software, the encoder sets the reference VOP to use according to the parameter indicated in an additional file, whose name is indicated by the parameter “Newpred.Filename”.
Newpred.Enable [0] |
|
Newpred.SegmentType [0] |
|
Newpred.Filename [0] |
|
Newpred.SliceList [0] |
Interlaced mode is used when the source video was interlaced and there is line tearing when two fields are viewed as a progressive scan frame. Alternate lines are grouped together during compression for more efficiency. The relevant parameters are:
Motion.Interlaced.Enable [0] |
Turn on interlaced coding. |
Motion.Interlaced.TopFieldFirst.Enable [0] |
This parameter sets the top_field_first flag in each VOP header, indicating that the top field of the VOP is the first field to be displayed. If not set, the bottom field of the reconstructed VOP is the first field to be displayed. |
Motion.Interlaced.AlternativeScan.Enable [0] |
This parameter controls the alternate_vertical_scan_flag in each VOP header. When set to 1 the alternate vertical scan is used for each VOP, otherwise the adaptive scan is used. |
Interlaced coding can be used in frame-based mode, or with binary or grayscale shape. When coding interlaced content with binary shape, the actual shape mask is coded using the normal progressive method.
N-Bit mode allows for non-8-bit pixel widths. A typical example would be to use 12-bit words for better luminance quality. The following parameters control N-Bit mode:
Not8Bit.Enable |
Flag to enable the use of N-Bit mode. |
Not8Bit.QuantPrecision |
Number of bits Q to represent the texture quantizer. The actual quantizer must lie in the range 1 to (2Q-1). |
In N-Bit mode, the representation of YUV information is based on 16-bit words. It is necessary to re-compile the encoder with the __NBIT_ flag set. The source YUV files must contain 16-bit pixels rather than 8-bit pixels. More information can be found in the section on file formats.
Scalability |
One of "None", "Temporal", "Spatial". This parameter selects the type of scalability |
Scalability.Temporal.PredictionType |
Range 0 to 4. |
Scalability.Temporal.EnhancementType |
One of "Full", "PartC" or "PartNC" can be used for this parameter. |
More information was requested
from the implementers but they did not provide it. Please email the following
people if you have questions on this mode:
The parameters used for spatial scalability encoding are described in the following table:
Scalability.Spatial. EnhancementType |
One of "Full", "PartC" or "PartNC" can be used for this parameter. Refer the
following table for the usage of Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType. |
Scalability.Spatial.PredictionType |
For the enhancement layer, P-VOP or B-VOP coding can be used. Set this parameter to one of "PPP" or "PBB". Forward prediction (P-VOPs) and also backward prediction (B-VOPs) are used as the spatial prediction. The reference VOP is the temporally coincident VOP in the base layer. The forward prediction in the case of B-VOPs is used as the temporal prediction from the enhancement layer VOP. The reference VOP is set to the most recently decoded VOP of the enhancement layer. |
Scalability.Spatial.Width |
Enhancement layer
display-frame width. |
Scalability.Spatial.Height |
Enhancement layer display-frame height. |
Scalability.Spatial.HorizFactor.N |
This is a 5-bit unsigned integer that forms the numerator of the ratio used for horizontal spatial re-sampling in scalability. The value of zero is forbidden. |
Scalability.Spatial.HorizFactor.M |
This is a 5-bit unsigned integer that forms the denominator of the ratio used for horizontal spatial re-sampling in scalability. The value of zero is forbidden. |
Scalability.Spatial.VertFactor.N |
This is a 5-bit unsigned integer that forms the numerator of the ratio used for vertical spatial re-sampling in scalability. The value of zero is forbidden. |
Scalability.Spatial.VertFactor.M |
This is a 5-bit unsigned integer that forms the denominator of the ratio used for vertical spatial re-sampling in scalability. The value of zero is forbidden. |
Scalability.Spatial. UseRefShape.Enable |
This is one bit flag that indicates the procedure to decode binary shape for spatial scalability. Refer to the following table for the usage of Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType. |
Scalability.Spatial. UseRefTexture.Enable |
When this flag is set, no update for texture is done. Instead, the available texture in the layer denoted by ref_layer_id will be used. |
Scalability.Spatial.Shape. HorizFactor.N |
This is a 5-bit unsigned integer which forms the numerator of the ratio used for horizontal spatial re-sampling in shape scalability. The value of zero is forbidden. |
Scalability.Spatial.Shape. HorizFactor.M |
This is a 5-bit unsigned integer which forms the denominator of the ratio used for horizontal spatial re-sampling in shape scalability. The value of zero is forbidden. |
Scalability.Spatial.Shape. VertFactor.N |
This is a 5-bit unsigned integer which forms the numerator of the ratio used for vertical spatial re-sampling in shape scalability. The value of zero is forbidden. |
Scalability.Spatial.Shape. VertFactor.M |
This is a 5-bit unsigned integer which forms the denominator of the ratio used for vertical spatial re-sampling in shape scalability. The value of zero is forbidden. |
Scalability.Spatial.UseRefShape.Enable and Scalability.Spatial.EnhancementType usage is described in the following table:
Scalability.Spatial. EnhancementType |
Scalability.Spatial. UseRefShape.Enable |
Alpha usage |
Enhancement layer coding |
|
Base layer |
Enhcmnt layer |
|||
"Full" |
0 |
Binary |
Binary |
Scalable shape and texture
coding. |
"Full" |
1 |
Binary |
Binary |
Scalable texture coding
with up-sampled base layer shape. |
"PartC"
or
"PartNC" |
0 |
Binary |
Binary |
Partial region (ROI) scalable
shape and texture coding with spatial composition ("PartC")
or without spatial composition("PartNC"). |
"PartC" or "PartNC" |
1 |
Rectangular/Binary |
Binary |
Scalable texture coding
and non-scalable shape coding with background composition ("PartC")
or without background composition. ("PartNC"). |
In order to encode a basic or low-latency sprite it is
necessary to prepare a sprite texture file (“.spt” file) and a warping points
file (“.pnt” file). The formats of these files are described in the
section on file formats. The following parameters control the encoding of sprites:
Sprite.Type |
This indicates the coding modes for sprites. This can be “None” if sprite is not used, “Static” for static/low-latency sprite coding, or “GMC” for global motion compensation encoding. |
Sprite.WarpAccuracy |
This indicates the quantisation accuracy of motion vectors used in the warping process. This can be set to “1/2”, “1/4”, “1/8”, or “1/16”. |
Sprite.Directory |
The name of the directory containing the sprite texture file for encoding. |
Sprite.Points |
This indicates the number of
points used in sprite warping. This parameter has can range from 0 to 4,
and must be equal to the number of sprite points specified in the “.pnt”
file for encoding. When the value is 0, warping is identity (stationary).
When its value is 1 translational motion is used, and for 2 or 3, an affine
transform is used. Four points correspond to perspective transformation
(sprite only; not supported for GMC). |
Sprite.Points.Directory |
The name of the directory containing the warping points file for encoding. |
Sprite.Mode |
This indicates the type of sprite
coding. This can be “Basic” for basic sprite or
"LowLatency" for Low-latency Sprites.
The values "PieceObject" and "PieceUpdate" are for
debugging purpose only and should not be used. |
To enable basic or low-latency sprites, it is necessary to set Sprite.Type = "Static" and to specify the mode using parameter Sprite.Mode.
Sprites can be rectangular or can make use of a binary mask. To encode with a binary mask, the sprite file must contain such a mask located after the texture data, and the value of Alpha.Type must be set to “Binary”.
Global Motion Compensation makes possible the encoding of the global motion of an object. To encode using GMC it is necessary to set Sprite.Type = “GMC”. The encoder will then calculate the global motion as needed. It is not necessary to provide “.spt” or “.pnt” files when using GMC.
The combinations of GMC with Binary Shape encoding, Grayscale Shape encoding, Interlaced encoding, Error Resilient encoding for Version 1, Quarter-pel Motion Compensation and SA-DCT are supported.
Still texture mode is used to compress single images using a wavelet-based encoder. A shape coding technology is available that can be used to code still texture objects with arbitrary shape. In addition, there are a number of scalability options.
In order to encode using still texture mode, it is necessary to prepare a parameter file with the following lines:
Version = 902
VTC.Enable = 1
VTC.Filename = "vtcparfile.ctl"
This causes the encoder to start reading the file vtcparfile.ctl instead of the regular parameter file. This “.ctl” file contains a set of numerical or textual parameters separated by carriage returns. The parameters must all be present, and should be provided in the order given in the following table, which describes their use:
visual_object_verid |
This indicates the visual version id: 1 for version 1, 2 for version 2. |
YUV image file |
The name of the input image file to be encoded. |
Texture_object_layer_shape |
This indicates the shape type of a texture object layer: 0 for
rectangular, 1 for binary shape. |
alpha image file |
The name of the file containing the binary mask to be encoded when texture_object_layer_shape
is 1. This must be present even if a binary mask is not being used. |
sto_constant_alpha |
This flag should be set to ‘1’ when the opaque alpha values of the binary
mask are to be replaced by sto_constant_alpha_value. Otherwise it shoud be
set to 0. |
sto_constant_alpha_value |
This is an 8-bit alpha value to replace the opaque pixels in the binary
alpha mask. The value ‘0’ is forbidden. |
alpha_threshold |
This value controls the amount of error allowed in lossy binary shape
coding. A value as high as 128 will produce a large loss. For scalable shape
coding this value is always set to 0, i.e, lossless coding is performed. |
change_conv_ratio_disable |
When set to 1, this allows the encoder to down-sample alpha blocks in the
base layer if necessary. |
output bit file |
Output file name for the compressed bitstream. |
image format |
Still image format: 1-monochrome, 3-YUV(4:2:0). |
image width |
The width of the input image in pels. |
image height |
The height of the input image in pels. |
tiling_disable |
??? |
tiling_jump_table_enable |
??? |
tile_width |
??? |
tile_height |
??? |
wavelet_decomposition_levels |
??? |
filter type |
??? |
wavelet_download |
??? |
uniform_wavelet_filter |
??? |
Wavelet filters |
??? |
quantization_type |
1-single_quant 2-multi_quant |
scan_direction |
0-tree_depth 1-band_by_band |
start_code_enable |
0-disable 1-enable |
target spatial levels |
??? |
target SNR levels |
??? |
target shape levels |
The number of target spatial layers for shape. This value should be larger or equal to the identification number of wavelet decomposition layer used for the last spatial scalability layer (determined by target_spatial_level above). This value is limited to be less than or equal to wavelet_decomposition_levels. |
fullsize_out |
When set to 1, this flag enables the generation of a full-size output image. |
Start tile Id ? |
??? |
End tile Id ? |
??? |
texture_error_resilience_disable |
1: disable, 0: enable (v2) |
packet size ? |
??? |
segment size ? |
??? |
DC quantization size for luminance |
??? |
DC quantization size for chorma |
??? |
No. of Spatial scalability |
??? |
No. of SNR level |
??? |
Quant_size for luma signal |
??? |
Quant_size for chroma signal |
??? |
More information was requested from the implementers but they did not provide it. Please email the following people if you have questions on this mode:
norio@imgsl.mkhar.sharp.co.jp,
dscho@sait.samsung.co.kr,
liang@ti.com, shipeng@sarnoff.com,
hjlee@sarnoff.com
Complexity estimation allows a decoder to allocate its resources efficiently when decoding a compressed stream. There is very little support for this in the Microsoft reference software. The only functionality available is for exercising the bitstream flags that are part of the scheme. When enabled, the encoder sends some dummy data in the bitstream. The dummy data is defined in source file vopseenc.cpp and can be changed if needed.
Complexity.Enable [0] |
|
Complexity.EstimationMethod [0] |
Method of estimation. If set to 0 then use Version 1 method. If set to 1 then use Version 2 method. |
Complexity.Opaque.Enable [0] |
Flag enabling transmission of the number of shape blocks coded using opaque coding mode as a percentage of the total number of blocks. |
Complexity.Transparent.Enable [0] |
Flag enabling transmission of transparent mode data as above. |
Complexity.IntraCAE.Enable [0] |
Flag enabling transmission of Intra-CAE mode data as above. |
Complexity.InterCAE.Enable [0] |
Flag enabling transmission of Inter-CAE mode data as above. |
Complexity.NoUpdate.Enable [0] |
Flag enabling transmission of No-Update mode data as above. |
Complexity.UpSampling.Enable [0] |
Flag enabling transmission of up-sampling mode data as above. |
Complexity.IntraBlocks.Enable [0] |
Flag enabling transmission of intra block statistics. |
Complexity.InterBlocks.Enable [0] |
Flag enabling transmission of inter block statistics. |
Complexity.Inter4VBlocks.Enable [0] |
Flag enabling transmission of inter 4MV block statistics. |
Complexity.NotCodedBlocks.Enable [0] |
Flag enabling transmission of non-coded block statistics. |
Complexity.DCTCoefs.Enable [0] |
Flag enabling transmission of information about the number of DCT coefficients per block. |
Complexity.DCTLines.Enable [0] |
Flag enabling transmission of DCT 8´1 lines statistics. |
Complexity.VLCSymbols.Enable [0] |
Flag enabling transmission of information about the average number of VLC symbols per macroblock. |
Complexity.VLCBits.Enable [0] |
Flag enabling transmission of information about the average number of VLC bits per symbol. |
Complexity.APM.Enable [0] |
Flag enabling transmission of statistics about the use of advanced prediction mode. |
Complexity.NPM.Enable [0] |
Flag enabling transmission of statistics about the use of normal prediction mode. |
Complexity.InterpMCQ.Enable [0] |
Flag enabling transmission of statistics about the use of interpolated blocks. |
Complexity.ForwBackMCQ.Enable [0] |
Flag enabling transmission of statistics about the use of predicted blocks. |
Complexity.HalfPel2.Enable [0] |
Flag enabling transmission of statistics about the use of 1D half pel vectors.. |
Complexity.HalfPel4.Enable [0] |
Flag enabling transmission of statistics about the use of 2D half pel vectors.. |
Complexity.SADCT.Enable [0] |
Flag enabling transmission of statistics about the use of SADCT blocks. |
Complexity.QuarterPel.Enable [0] |
Flag enabling transmission of statistics about the use of quarter pel vectors in 1D or 2D. |
In quarter-pel encoding mode, motion vectors are estimated to quarter-pel accuracy. This mode can be enabled with the associated parameter file entry:
Motion.QuarterSample.Enable [0] |
Shape adaptive DCTs can be used during binary or grayscale alpha coding instead of 8´8 DCTs in combination with padding. Setting the associated parameter file entry will enable this mode:
Texture.SADCT.Enable [0] |
RRVMode.Enable [0] |
|
RRVMode.Cycle [0] |
More information was requested
from the implementers but they did not provide it. Please email the following
people if you have questions on this mode:
It is possible to enable the sending of auxiliary information in the VOL header. These VOL control bits can be specified in the encoder parameter file:
VOLControl.Enable [0] |
|
VOLControl.ChromaFormat [0] |
This is a two-bit field that indicates the chroma format. The only allowed value is 1, which indicates 4:2:0 chroma. |
VOLControl.LowDelay [0] |
This is a flag that, if set to 1, signals that the bitstream can be decoded with low delay (contains no B-VOPs). |
VOLControl.VBVParams.Enable [0] |
Enable the sending of Video Buffering Verifier information. |
VOLControl.Bitrate [0] |
VBV information. This is a 30-bit representation of the bitrate in units of 400 bits per second. |
VOLControl.VBVBuffer.Size [0] |
|
VOLControl.VBVBuffer.Occupancy [0] |
The following table indicates which tools are supported in the software and details the degree of implementation:
Tool |
Version |
Comments |
Basic (I-VOP, P-VOP, AC/DC Prediction, 4MV, Unrestricted MV) |
1 |
Supported |
B-VOP |
1 |
Supported. No MPEG rate control. |
P-VOP with OBMC |
1 |
Supported |
Method 1, Method 2 Quantisation |
1 |
Supported |
Error Resilience |
1 |
Syntax only. No recovery from error supported. |
1 |
Decode only. |
|
Binary Shape (progressive) |
1 |
Supported. No automatic VOP generation. |
Grayscale Shape |
1 |
Supported |
Interlace |
1 |
Supported |
N-Bit |
1 |
Supported |
Temporal Scalability (Base) |
1 |
??? |
Temporal Scalability (Enhancement) |
1 |
??? |
Spatial Scalability (Base) |
1 |
??? |
Spatial Scalability (Enhancement) |
1 |
??? |
Sprite |
1 |
Supported. No warping parameter estimation. |
Still Texture |
1 |
Supported |
Dynamic Resolution Conversion |
2 |
|
NEWPRED |
2 |
Upstream signaling is simulated not implemented. |
Global Motion Compensation |
2 |
Supported |
Quarter-pel Motion Compensation |
2 |
Supported |
SA-DCT |
2 |
Supported |
Error Resilience for Still Texture Coding |
2 |
Supported |
Wavelet Tiling |
2 |
Supported |
Scalable Arbitrary Shape for Still Texture Coding |
2 |
??? |
Object Based Spatial Scalability (Base) |
2 |
Supported |
Object Based Spatial Scalability (Enhancement) |
2 |
Supported |
Multiple Auxiliary Components |
2 |
Supported |
Complexity Estimation Support |
2 |
Bitstream syntax supported only. |
There are many known issues and limitations with this software release. Not all the functionality of MPEG-4 is present – only natural video is covered. Systems layer functionality and 3D/SNHC parts are not included. For some tools, only the syntax parsing is present, but not the functionality that would be needed for a commercial implementation. An approximate list of concerns is presented here:
Bugs and specific questions about the software can be reported to the following people:
Simon Winder |
Primary Contact |
|
Ming-Chieh Lee |
General Info |
|
Takefumi Nagumo |
Spatial Scalability |
|
Norio Ito |
Temporal Scalability Still Texture |
|
Yoshinori Suzuki |
GMC |
|
Karsten Suehring |
SA-DCT |
|
Eishi Morimatsu |
Dynamic Rate Control RRV |
|
Mathias Wien |
Quarter Pel |
|
D.-S. Cho |
Still Texture |
|
Hung-Ju Lee |
Still Texture |
|
David Ruhoff |
TM5 Rate Control |
|
Hideaki Kimata |
NEWPRED |
|
Krit Panusopone |
Interlaced |
|
David Shu |
Low Latency Sprite |
|
Dick van Smirren |
Short Header |
|
Cor Quist |
Short Header |
|
Marco Mattavelli |
Complexity Estimation |
|
Jie Liang |
Still Texture |
Microsoft can be contacted at the following address:
One Microsoft Way
Redmond, WA 98052