| 1 | IJG JPEG LIBRARY:  SYSTEM ARCHITECTURE | 
|---|
| 2 |  | 
|---|
| 3 | Copyright (C) 1991-2009, Thomas G. Lane, Guido Vollbeding. | 
|---|
| 4 | This file is part of the Independent JPEG Group's software. | 
|---|
| 5 | For conditions of distribution and use, see the accompanying README file. | 
|---|
| 6 |  | 
|---|
| 7 |  | 
|---|
| 8 | This file provides an overview of the architecture of the IJG JPEG software; | 
|---|
| 9 | that is, the functions of the various modules in the system and the interfaces | 
|---|
| 10 | between modules.  For more precise details about any data structure or calling | 
|---|
| 11 | convention, see the include files and comments in the source code. | 
|---|
| 12 |  | 
|---|
| 13 | We assume that the reader is already somewhat familiar with the JPEG standard. | 
|---|
| 14 | The README file includes references for learning about JPEG.  The file | 
|---|
| 15 | libjpeg.txt describes the library from the viewpoint of an application | 
|---|
| 16 | programmer using the library; it's best to read that file before this one. | 
|---|
| 17 | Also, the file coderules.txt describes the coding style conventions we use. | 
|---|
| 18 |  | 
|---|
| 19 | In this document, JPEG-specific terminology follows the JPEG standard: | 
|---|
| 20 | A "component" means a color channel, e.g., Red or Luminance. | 
|---|
| 21 | A "sample" is a single component value (i.e., one number in the image data). | 
|---|
| 22 | A "coefficient" is a frequency coefficient (a DCT transform output number). | 
|---|
| 23 | A "block" is an 8x8 group of samples or coefficients. | 
|---|
| 24 | An "MCU" (minimum coded unit) is an interleaved set of blocks of size | 
|---|
| 25 | determined by the sampling factors, or a single block in a | 
|---|
| 26 | noninterleaved scan. | 
|---|
| 27 | We do not use the terms "pixel" and "sample" interchangeably.  When we say | 
|---|
| 28 | pixel, we mean an element of the full-size image, while a sample is an element | 
|---|
| 29 | of the downsampled image.  Thus the number of samples may vary across | 
|---|
| 30 | components while the number of pixels does not.  (This terminology is not used | 
|---|
| 31 | rigorously throughout the code, but it is used in places where confusion would | 
|---|
| 32 | otherwise result.) | 
|---|
| 33 |  | 
|---|
| 34 |  | 
|---|
| 35 | *** System features *** | 
|---|
| 36 |  | 
|---|
| 37 | The IJG distribution contains two parts: | 
|---|
| 38 | * A subroutine library for JPEG compression and decompression. | 
|---|
| 39 | * cjpeg/djpeg, two sample applications that use the library to transform | 
|---|
| 40 | JFIF JPEG files to and from several other image formats. | 
|---|
| 41 | cjpeg/djpeg are of no great intellectual complexity: they merely add a simple | 
|---|
| 42 | command-line user interface and I/O routines for several uncompressed image | 
|---|
| 43 | formats.  This document concentrates on the library itself. | 
|---|
| 44 |  | 
|---|
| 45 | We desire the library to be capable of supporting all JPEG baseline, extended | 
|---|
| 46 | sequential, and progressive DCT processes.  Hierarchical processes are not | 
|---|
| 47 | supported. | 
|---|
| 48 |  | 
|---|
| 49 | The library does not support the lossless (spatial) JPEG process.  Lossless | 
|---|
| 50 | JPEG shares little or no code with lossy JPEG, and would normally be used | 
|---|
| 51 | without the extensive pre- and post-processing provided by this library. | 
|---|
| 52 | We feel that lossless JPEG is better handled by a separate library. | 
|---|
| 53 |  | 
|---|
| 54 | Within these limits, any set of compression parameters allowed by the JPEG | 
|---|
| 55 | spec should be readable for decompression.  (We can be more restrictive about | 
|---|
| 56 | what formats we can generate.)  Although the system design allows for all | 
|---|
| 57 | parameter values, some uncommon settings are not yet implemented and may | 
|---|
| 58 | never be; nonintegral sampling ratios are the prime example.  Furthermore, | 
|---|
| 59 | we treat 8-bit vs. 12-bit data precision as a compile-time switch, not a | 
|---|
| 60 | run-time option, because most machines can store 8-bit pixels much more | 
|---|
| 61 | compactly than 12-bit. | 
|---|
| 62 |  | 
|---|
| 63 | By itself, the library handles only interchange JPEG datastreams --- in | 
|---|
| 64 | particular the widely used JFIF file format.  The library can be used by | 
|---|
| 65 | surrounding code to process interchange or abbreviated JPEG datastreams that | 
|---|
| 66 | are embedded in more complex file formats.  (For example, libtiff uses this | 
|---|
| 67 | library to implement JPEG compression within the TIFF file format.) | 
|---|
| 68 |  | 
|---|
| 69 | The library includes a substantial amount of code that is not covered by the | 
|---|
| 70 | JPEG standard but is necessary for typical applications of JPEG.  These | 
|---|
| 71 | functions preprocess the image before JPEG compression or postprocess it after | 
|---|
| 72 | decompression.  They include colorspace conversion, downsampling/upsampling, | 
|---|
| 73 | and color quantization.  This code can be omitted if not needed. | 
|---|
| 74 |  | 
|---|
| 75 | A wide range of quality vs. speed tradeoffs are possible in JPEG processing, | 
|---|
| 76 | and even more so in decompression postprocessing.  The decompression library | 
|---|
| 77 | provides multiple implementations that cover most of the useful tradeoffs, | 
|---|
| 78 | ranging from very-high-quality down to fast-preview operation.  On the | 
|---|
| 79 | compression side we have generally not provided low-quality choices, since | 
|---|
| 80 | compression is normally less time-critical.  It should be understood that the | 
|---|
| 81 | low-quality modes may not meet the JPEG standard's accuracy requirements; | 
|---|
| 82 | nonetheless, they are useful for viewers. | 
|---|
| 83 |  | 
|---|
| 84 |  | 
|---|
| 85 | *** Portability issues *** | 
|---|
| 86 |  | 
|---|
| 87 | Portability is an essential requirement for the library.  The key portability | 
|---|
| 88 | issues that show up at the level of system architecture are: | 
|---|
| 89 |  | 
|---|
| 90 | 1.  Memory usage.  We want the code to be able to run on PC-class machines | 
|---|
| 91 | with limited memory.  Images should therefore be processed sequentially (in | 
|---|
| 92 | strips), to avoid holding the whole image in memory at once.  Where a | 
|---|
| 93 | full-image buffer is necessary, we should be able to use either virtual memory | 
|---|
| 94 | or temporary files. | 
|---|
| 95 |  | 
|---|
| 96 | 2.  Near/far pointer distinction.  To run efficiently on 80x86 machines, the | 
|---|
| 97 | code should distinguish "small" objects (kept in near data space) from | 
|---|
| 98 | "large" ones (kept in far data space).  This is an annoying restriction, but | 
|---|
| 99 | fortunately it does not impact code quality for less brain-damaged machines, | 
|---|
| 100 | and the source code clutter turns out to be minimal with sufficient use of | 
|---|
| 101 | pointer typedefs. | 
|---|
| 102 |  | 
|---|
| 103 | 3. Data precision.  We assume that "char" is at least 8 bits, "short" and | 
|---|
| 104 | "int" at least 16, "long" at least 32.  The code will work fine with larger | 
|---|
| 105 | data sizes, although memory may be used inefficiently in some cases.  However, | 
|---|
| 106 | the JPEG compressed datastream must ultimately appear on external storage as a | 
|---|
| 107 | sequence of 8-bit bytes if it is to conform to the standard.  This may pose a | 
|---|
| 108 | problem on machines where char is wider than 8 bits.  The library represents | 
|---|
| 109 | compressed data as an array of values of typedef JOCTET.  If no data type | 
|---|
| 110 | exactly 8 bits wide is available, custom data source and data destination | 
|---|
| 111 | modules must be written to unpack and pack the chosen JOCTET datatype into | 
|---|
| 112 | 8-bit external representation. | 
|---|
| 113 |  | 
|---|
| 114 |  | 
|---|
| 115 | *** System overview *** | 
|---|
| 116 |  | 
|---|
| 117 | The compressor and decompressor are each divided into two main sections: | 
|---|
| 118 | the JPEG compressor or decompressor proper, and the preprocessing or | 
|---|
| 119 | postprocessing functions.  The interface between these two sections is the | 
|---|
| 120 | image data that the official JPEG spec regards as its input or output: this | 
|---|
| 121 | data is in the colorspace to be used for compression, and it is downsampled | 
|---|
| 122 | to the sampling factors to be used.  The preprocessing and postprocessing | 
|---|
| 123 | steps are responsible for converting a normal image representation to or from | 
|---|
| 124 | this form.  (Those few applications that want to deal with YCbCr downsampled | 
|---|
| 125 | data can skip the preprocessing or postprocessing step.) | 
|---|
| 126 |  | 
|---|
| 127 | Looking more closely, the compressor library contains the following main | 
|---|
| 128 | elements: | 
|---|
| 129 |  | 
|---|
| 130 | Preprocessing: | 
|---|
| 131 | * Color space conversion (e.g., RGB to YCbCr). | 
|---|
| 132 | * Edge expansion and downsampling.  Optionally, this step can do simple | 
|---|
| 133 | smoothing --- this is often helpful for low-quality source data. | 
|---|
| 134 | JPEG proper: | 
|---|
| 135 | * MCU assembly, DCT, quantization. | 
|---|
| 136 | * Entropy coding (sequential or progressive, Huffman or arithmetic). | 
|---|
| 137 |  | 
|---|
| 138 | In addition to these modules we need overall control, marker generation, | 
|---|
| 139 | and support code (memory management & error handling).  There is also a | 
|---|
| 140 | module responsible for physically writing the output data --- typically | 
|---|
| 141 | this is just an interface to fwrite(), but some applications may need to | 
|---|
| 142 | do something else with the data. | 
|---|
| 143 |  | 
|---|
| 144 | The decompressor library contains the following main elements: | 
|---|
| 145 |  | 
|---|
| 146 | JPEG proper: | 
|---|
| 147 | * Entropy decoding (sequential or progressive, Huffman or arithmetic). | 
|---|
| 148 | * Dequantization, inverse DCT, MCU disassembly. | 
|---|
| 149 | Postprocessing: | 
|---|
| 150 | * Upsampling.  Optionally, this step may be able to do more general | 
|---|
| 151 | rescaling of the image. | 
|---|
| 152 | * Color space conversion (e.g., YCbCr to RGB).  This step may also | 
|---|
| 153 | provide gamma adjustment [ currently it does not ]. | 
|---|
| 154 | * Optional color quantization (e.g., reduction to 256 colors). | 
|---|
| 155 | * Optional color precision reduction (e.g., 24-bit to 15-bit color). | 
|---|
| 156 | [This feature is not currently implemented.] | 
|---|
| 157 |  | 
|---|
| 158 | We also need overall control, marker parsing, and a data source module. | 
|---|
| 159 | The support code (memory management & error handling) can be shared with | 
|---|
| 160 | the compression half of the library. | 
|---|
| 161 |  | 
|---|
| 162 | There may be several implementations of each of these elements, particularly | 
|---|
| 163 | in the decompressor, where a wide range of speed/quality tradeoffs is very | 
|---|
| 164 | useful.  It must be understood that some of the best speedups involve | 
|---|
| 165 | merging adjacent steps in the pipeline.  For example, upsampling, color space | 
|---|
| 166 | conversion, and color quantization might all be done at once when using a | 
|---|
| 167 | low-quality ordered-dither technique.  The system architecture is designed to | 
|---|
| 168 | allow such merging where appropriate. | 
|---|
| 169 |  | 
|---|
| 170 |  | 
|---|
| 171 | Note: it is convenient to regard edge expansion (padding to block boundaries) | 
|---|
| 172 | as a preprocessing/postprocessing function, even though the JPEG spec includes | 
|---|
| 173 | it in compression/decompression.  We do this because downsampling/upsampling | 
|---|
| 174 | can be simplified a little if they work on padded data: it's not necessary to | 
|---|
| 175 | have special cases at the right and bottom edges.  Therefore the interface | 
|---|
| 176 | buffer is always an integral number of blocks wide and high, and we expect | 
|---|
| 177 | compression preprocessing to pad the source data properly.  Padding will occur | 
|---|
| 178 | only to the next block (8-sample) boundary.  In an interleaved-scan situation, | 
|---|
| 179 | additional dummy blocks may be used to fill out MCUs, but the MCU assembly and | 
|---|
| 180 | disassembly logic will create or discard these blocks internally.  (This is | 
|---|
| 181 | advantageous for speed reasons, since we avoid DCTing the dummy blocks. | 
|---|
| 182 | It also permits a small reduction in file size, because the compressor can | 
|---|
| 183 | choose dummy block contents so as to minimize their size in compressed form. | 
|---|
| 184 | Finally, it makes the interface buffer specification independent of whether | 
|---|
| 185 | the file is actually interleaved or not.)  Applications that wish to deal | 
|---|
| 186 | directly with the downsampled data must provide similar buffering and padding | 
|---|
| 187 | for odd-sized images. | 
|---|
| 188 |  | 
|---|
| 189 |  | 
|---|
| 190 | *** Poor man's object-oriented programming *** | 
|---|
| 191 |  | 
|---|
| 192 | It should be clear by now that we have a lot of quasi-independent processing | 
|---|
| 193 | steps, many of which have several possible behaviors.  To avoid cluttering the | 
|---|
| 194 | code with lots of switch statements, we use a simple form of object-style | 
|---|
| 195 | programming to separate out the different possibilities. | 
|---|
| 196 |  | 
|---|
| 197 | For example, two different color quantization algorithms could be implemented | 
|---|
| 198 | as two separate modules that present the same external interface; at runtime, | 
|---|
| 199 | the calling code will access the proper module indirectly through an "object". | 
|---|
| 200 |  | 
|---|
| 201 | We can get the limited features we need while staying within portable C. | 
|---|
| 202 | The basic tool is a function pointer.  An "object" is just a struct | 
|---|
| 203 | containing one or more function pointer fields, each of which corresponds to | 
|---|
| 204 | a method name in real object-oriented languages.  During initialization we | 
|---|
| 205 | fill in the function pointers with references to whichever module we have | 
|---|
| 206 | determined we need to use in this run.  Then invocation of the module is done | 
|---|
| 207 | by indirecting through a function pointer; on most machines this is no more | 
|---|
| 208 | expensive than a switch statement, which would be the only other way of | 
|---|
| 209 | making the required run-time choice.  The really significant benefit, of | 
|---|
| 210 | course, is keeping the source code clean and well structured. | 
|---|
| 211 |  | 
|---|
| 212 | We can also arrange to have private storage that varies between different | 
|---|
| 213 | implementations of the same kind of object.  We do this by making all the | 
|---|
| 214 | module-specific object structs be separately allocated entities, which will | 
|---|
| 215 | be accessed via pointers in the master compression or decompression struct. | 
|---|
| 216 | The "public" fields or methods for a given kind of object are specified by | 
|---|
| 217 | a commonly known struct.  But a module's initialization code can allocate | 
|---|
| 218 | a larger struct that contains the common struct as its first member, plus | 
|---|
| 219 | additional private fields.  With appropriate pointer casting, the module's | 
|---|
| 220 | internal functions can access these private fields.  (For a simple example, | 
|---|
| 221 | see jdatadst.c, which implements the external interface specified by struct | 
|---|
| 222 | jpeg_destination_mgr, but adds extra fields.) | 
|---|
| 223 |  | 
|---|
| 224 | (Of course this would all be a lot easier if we were using C++, but we are | 
|---|
| 225 | not yet prepared to assume that everyone has a C++ compiler.) | 
|---|
| 226 |  | 
|---|
| 227 | An important benefit of this scheme is that it is easy to provide multiple | 
|---|
| 228 | versions of any method, each tuned to a particular case.  While a lot of | 
|---|
| 229 | precalculation might be done to select an optimal implementation of a method, | 
|---|
| 230 | the cost per invocation is constant.  For example, the upsampling step might | 
|---|
| 231 | have a "generic" method, plus one or more "hardwired" methods for the most | 
|---|
| 232 | popular sampling factors; the hardwired methods would be faster because they'd | 
|---|
| 233 | use straight-line code instead of for-loops.  The cost to determine which | 
|---|
| 234 | method to use is paid only once, at startup, and the selection criteria are | 
|---|
| 235 | hidden from the callers of the method. | 
|---|
| 236 |  | 
|---|
| 237 | This plan differs a little bit from usual object-oriented structures, in that | 
|---|
| 238 | only one instance of each object class will exist during execution.  The | 
|---|
| 239 | reason for having the class structure is that on different runs we may create | 
|---|
| 240 | different instances (choose to execute different modules).  You can think of | 
|---|
| 241 | the term "method" as denoting the common interface presented by a particular | 
|---|
| 242 | set of interchangeable functions, and "object" as denoting a group of related | 
|---|
| 243 | methods, or the total shared interface behavior of a group of modules. | 
|---|
| 244 |  | 
|---|
| 245 |  | 
|---|
| 246 | *** Overall control structure *** | 
|---|
| 247 |  | 
|---|
| 248 | We previously mentioned the need for overall control logic in the compression | 
|---|
| 249 | and decompression libraries.  In IJG implementations prior to v5, overall | 
|---|
| 250 | control was mostly provided by "pipeline control" modules, which proved to be | 
|---|
| 251 | large, unwieldy, and hard to understand.  To improve the situation, the | 
|---|
| 252 | control logic has been subdivided into multiple modules.  The control modules | 
|---|
| 253 | consist of: | 
|---|
| 254 |  | 
|---|
| 255 | 1. Master control for module selection and initialization.  This has two | 
|---|
| 256 | responsibilities: | 
|---|
| 257 |  | 
|---|
| 258 | 1A.  Startup initialization at the beginning of image processing. | 
|---|
| 259 | The individual processing modules to be used in this run are selected | 
|---|
| 260 | and given initialization calls. | 
|---|
| 261 |  | 
|---|
| 262 | 1B.  Per-pass control.  This determines how many passes will be performed | 
|---|
| 263 | and calls each active processing module to configure itself | 
|---|
| 264 | appropriately at the beginning of each pass.  End-of-pass processing, | 
|---|
| 265 | where necessary, is also invoked from the master control module. | 
|---|
| 266 |  | 
|---|
| 267 | Method selection is partially distributed, in that a particular processing | 
|---|
| 268 | module may contain several possible implementations of a particular method, | 
|---|
| 269 | which it will select among when given its initialization call.  The master | 
|---|
| 270 | control code need only be concerned with decisions that affect more than | 
|---|
| 271 | one module. | 
|---|
| 272 |  | 
|---|
| 273 | 2. Data buffering control.  A separate control module exists for each | 
|---|
| 274 | inter-processing-step data buffer.  This module is responsible for | 
|---|
| 275 | invoking the processing steps that write or read that data buffer. | 
|---|
| 276 |  | 
|---|
| 277 | Each buffer controller sees the world as follows: | 
|---|
| 278 |  | 
|---|
| 279 | input data => processing step A => buffer => processing step B => output data | 
|---|
| 280 | |              |               | | 
|---|
| 281 | ------------------ controller ------------------ | 
|---|
| 282 |  | 
|---|
| 283 | The controller knows the dataflow requirements of steps A and B: how much data | 
|---|
| 284 | they want to accept in one chunk and how much they output in one chunk.  Its | 
|---|
| 285 | function is to manage its buffer and call A and B at the proper times. | 
|---|
| 286 |  | 
|---|
| 287 | A data buffer control module may itself be viewed as a processing step by a | 
|---|
| 288 | higher-level control module; thus the control modules form a binary tree with | 
|---|
| 289 | elementary processing steps at the leaves of the tree. | 
|---|
| 290 |  | 
|---|
| 291 | The control modules are objects.  A considerable amount of flexibility can | 
|---|
| 292 | be had by replacing implementations of a control module.  For example: | 
|---|
| 293 | * Merging of adjacent steps in the pipeline is done by replacing a control | 
|---|
| 294 | module and its pair of processing-step modules with a single processing- | 
|---|
| 295 | step module.  (Hence the possible merges are determined by the tree of | 
|---|
| 296 | control modules.) | 
|---|
| 297 | * In some processing modes, a given interstep buffer need only be a "strip" | 
|---|
| 298 | buffer large enough to accommodate the desired data chunk sizes.  In other | 
|---|
| 299 | modes, a full-image buffer is needed and several passes are required. | 
|---|
| 300 | The control module determines which kind of buffer is used and manipulates | 
|---|
| 301 | virtual array buffers as needed.  One or both processing steps may be | 
|---|
| 302 | unaware of the multi-pass behavior. | 
|---|
| 303 |  | 
|---|
| 304 | In theory, we might be able to make all of the data buffer controllers | 
|---|
| 305 | interchangeable and provide just one set of implementations for all.  In | 
|---|
| 306 | practice, each one contains considerable special-case processing for its | 
|---|
| 307 | particular job.  The buffer controller concept should be regarded as an | 
|---|
| 308 | overall system structuring principle, not as a complete description of the | 
|---|
| 309 | task performed by any one controller. | 
|---|
| 310 |  | 
|---|
| 311 |  | 
|---|
| 312 | *** Compression object structure *** | 
|---|
| 313 |  | 
|---|
| 314 | Here is a sketch of the logical structure of the JPEG compression library: | 
|---|
| 315 |  | 
|---|
| 316 | |-- Colorspace conversion | 
|---|
| 317 | |-- Preprocessing controller --| | 
|---|
| 318 | |                              |-- Downsampling | 
|---|
| 319 | Main controller --| | 
|---|
| 320 | |                            |-- Forward DCT, quantize | 
|---|
| 321 | |-- Coefficient controller --| | 
|---|
| 322 | |-- Entropy encoding | 
|---|
| 323 |  | 
|---|
| 324 | This sketch also describes the flow of control (subroutine calls) during | 
|---|
| 325 | typical image data processing.  Each of the components shown in the diagram is | 
|---|
| 326 | an "object" which may have several different implementations available.  One | 
|---|
| 327 | or more source code files contain the actual implementation(s) of each object. | 
|---|
| 328 |  | 
|---|
| 329 | The objects shown above are: | 
|---|
| 330 |  | 
|---|
| 331 | * Main controller: buffer controller for the subsampled-data buffer, which | 
|---|
| 332 | holds the preprocessed input data.  This controller invokes preprocessing to | 
|---|
| 333 | fill the subsampled-data buffer, and JPEG compression to empty it.  There is | 
|---|
| 334 | usually no need for a full-image buffer here; a strip buffer is adequate. | 
|---|
| 335 |  | 
|---|
| 336 | * Preprocessing controller: buffer controller for the downsampling input data | 
|---|
| 337 | buffer, which lies between colorspace conversion and downsampling.  Note | 
|---|
| 338 | that a unified conversion/downsampling module would probably replace this | 
|---|
| 339 | controller entirely. | 
|---|
| 340 |  | 
|---|
| 341 | * Colorspace conversion: converts application image data into the desired | 
|---|
| 342 | JPEG color space; also changes the data from pixel-interleaved layout to | 
|---|
| 343 | separate component planes.  Processes one pixel row at a time. | 
|---|
| 344 |  | 
|---|
| 345 | * Downsampling: performs reduction of chroma components as required. | 
|---|
| 346 | Optionally may perform pixel-level smoothing as well.  Processes a "row | 
|---|
| 347 | group" at a time, where a row group is defined as Vmax pixel rows of each | 
|---|
| 348 | component before downsampling, and Vk sample rows afterwards (remember Vk | 
|---|
| 349 | differs across components).  Some downsampling or smoothing algorithms may | 
|---|
| 350 | require context rows above and below the current row group; the | 
|---|
| 351 | preprocessing controller is responsible for supplying these rows via proper | 
|---|
| 352 | buffering.  The downsampler is responsible for edge expansion at the right | 
|---|
| 353 | edge (i.e., extending each sample row to a multiple of 8 samples); but the | 
|---|
| 354 | preprocessing controller is responsible for vertical edge expansion (i.e., | 
|---|
| 355 | duplicating the bottom sample row as needed to make a multiple of 8 rows). | 
|---|
| 356 |  | 
|---|
| 357 | * Coefficient controller: buffer controller for the DCT-coefficient data. | 
|---|
| 358 | This controller handles MCU assembly, including insertion of dummy DCT | 
|---|
| 359 | blocks when needed at the right or bottom edge.  When performing | 
|---|
| 360 | Huffman-code optimization or emitting a multiscan JPEG file, this | 
|---|
| 361 | controller is responsible for buffering the full image.  The equivalent of | 
|---|
| 362 | one fully interleaved MCU row of subsampled data is processed per call, | 
|---|
| 363 | even when the JPEG file is noninterleaved. | 
|---|
| 364 |  | 
|---|
| 365 | * Forward DCT and quantization: Perform DCT, quantize, and emit coefficients. | 
|---|
| 366 | Works on one or more DCT blocks at a time.  (Note: the coefficients are now | 
|---|
| 367 | emitted in normal array order, which the entropy encoder is expected to | 
|---|
| 368 | convert to zigzag order as necessary.  Prior versions of the IJG code did | 
|---|
| 369 | the conversion to zigzag order within the quantization step.) | 
|---|
| 370 |  | 
|---|
| 371 | * Entropy encoding: Perform Huffman or arithmetic entropy coding and emit the | 
|---|
| 372 | coded data to the data destination module.  Works on one MCU per call. | 
|---|
| 373 | For progressive JPEG, the same DCT blocks are fed to the entropy coder | 
|---|
| 374 | during each pass, and the coder must emit the appropriate subset of | 
|---|
| 375 | coefficients. | 
|---|
| 376 |  | 
|---|
| 377 | In addition to the above objects, the compression library includes these | 
|---|
| 378 | objects: | 
|---|
| 379 |  | 
|---|
| 380 | * Master control: determines the number of passes required, controls overall | 
|---|
| 381 | and per-pass initialization of the other modules. | 
|---|
| 382 |  | 
|---|
| 383 | * Marker writing: generates JPEG markers (except for RSTn, which is emitted | 
|---|
| 384 | by the entropy encoder when needed). | 
|---|
| 385 |  | 
|---|
| 386 | * Data destination manager: writes the output JPEG datastream to its final | 
|---|
| 387 | destination (e.g., a file).  The destination manager supplied with the | 
|---|
| 388 | library knows how to write to a stdio stream; for other behaviors, the | 
|---|
| 389 | surrounding application may provide its own destination manager. | 
|---|
| 390 |  | 
|---|
| 391 | * Memory manager: allocates and releases memory, controls virtual arrays | 
|---|
| 392 | (with backing store management, where required). | 
|---|
| 393 |  | 
|---|
| 394 | * Error handler: performs formatting and output of error and trace messages; | 
|---|
| 395 | determines handling of nonfatal errors.  The surrounding application may | 
|---|
| 396 | override some or all of this object's methods to change error handling. | 
|---|
| 397 |  | 
|---|
| 398 | * Progress monitor: supports output of "percent-done" progress reports. | 
|---|
| 399 | This object represents an optional callback to the surrounding application: | 
|---|
| 400 | if wanted, it must be supplied by the application. | 
|---|
| 401 |  | 
|---|
| 402 | The error handler, destination manager, and progress monitor objects are | 
|---|
| 403 | defined as separate objects in order to simplify application-specific | 
|---|
| 404 | customization of the JPEG library.  A surrounding application may override | 
|---|
| 405 | individual methods or supply its own all-new implementation of one of these | 
|---|
| 406 | objects.  The object interfaces for these objects are therefore treated as | 
|---|
| 407 | part of the application interface of the library, whereas the other objects | 
|---|
| 408 | are internal to the library. | 
|---|
| 409 |  | 
|---|
| 410 | The error handler and memory manager are shared by JPEG compression and | 
|---|
| 411 | decompression; the progress monitor, if used, may be shared as well. | 
|---|
| 412 |  | 
|---|
| 413 |  | 
|---|
| 414 | *** Decompression object structure *** | 
|---|
| 415 |  | 
|---|
| 416 | Here is a sketch of the logical structure of the JPEG decompression library: | 
|---|
| 417 |  | 
|---|
| 418 | |-- Entropy decoding | 
|---|
| 419 | |-- Coefficient controller --| | 
|---|
| 420 | |                            |-- Dequantize, Inverse DCT | 
|---|
| 421 | Main controller --| | 
|---|
| 422 | |                               |-- Upsampling | 
|---|
| 423 | |-- Postprocessing controller --|   |-- Colorspace conversion | 
|---|
| 424 | |-- Color quantization | 
|---|
| 425 | |-- Color precision reduction | 
|---|
| 426 |  | 
|---|
| 427 | As before, this diagram also represents typical control flow.  The objects | 
|---|
| 428 | shown are: | 
|---|
| 429 |  | 
|---|
| 430 | * Main controller: buffer controller for the subsampled-data buffer, which | 
|---|
| 431 | holds the output of JPEG decompression proper.  This controller's primary | 
|---|
| 432 | task is to feed the postprocessing procedure.  Some upsampling algorithms | 
|---|
| 433 | may require context rows above and below the current row group; when this | 
|---|
| 434 | is true, the main controller is responsible for managing its buffer so as | 
|---|
| 435 | to make context rows available.  In the current design, the main buffer is | 
|---|
| 436 | always a strip buffer; a full-image buffer is never required. | 
|---|
| 437 |  | 
|---|
| 438 | * Coefficient controller: buffer controller for the DCT-coefficient data. | 
|---|
| 439 | This controller handles MCU disassembly, including deletion of any dummy | 
|---|
| 440 | DCT blocks at the right or bottom edge.  When reading a multiscan JPEG | 
|---|
| 441 | file, this controller is responsible for buffering the full image. | 
|---|
| 442 | (Buffering DCT coefficients, rather than samples, is necessary to support | 
|---|
| 443 | progressive JPEG.)  The equivalent of one fully interleaved MCU row of | 
|---|
| 444 | subsampled data is processed per call, even when the source JPEG file is | 
|---|
| 445 | noninterleaved. | 
|---|
| 446 |  | 
|---|
| 447 | * Entropy decoding: Read coded data from the data source module and perform | 
|---|
| 448 | Huffman or arithmetic entropy decoding.  Works on one MCU per call. | 
|---|
| 449 | For progressive JPEG decoding, the coefficient controller supplies the prior | 
|---|
| 450 | coefficients of each MCU (initially all zeroes), which the entropy decoder | 
|---|
| 451 | modifies in each scan. | 
|---|
| 452 |  | 
|---|
| 453 | * Dequantization and inverse DCT: like it says.  Note that the coefficients | 
|---|
| 454 | buffered by the coefficient controller have NOT been dequantized; we | 
|---|
| 455 | merge dequantization and inverse DCT into a single step for speed reasons. | 
|---|
| 456 | When scaled-down output is asked for, simplified DCT algorithms may be used | 
|---|
| 457 | that need fewer coefficients and emit fewer samples per DCT block, not the | 
|---|
| 458 | full 8x8.  Works on one DCT block at a time. | 
|---|
| 459 |  | 
|---|
| 460 | * Postprocessing controller: buffer controller for the color quantization | 
|---|
| 461 | input buffer, when quantization is in use.  (Without quantization, this | 
|---|
| 462 | controller just calls the upsampler.)  For two-pass quantization, this | 
|---|
| 463 | controller is responsible for buffering the full-image data. | 
|---|
| 464 |  | 
|---|
| 465 | * Upsampling: restores chroma components to full size.  (May support more | 
|---|
| 466 | general output rescaling, too.  Note that if undersized DCT outputs have | 
|---|
| 467 | been emitted by the DCT module, this module must adjust so that properly | 
|---|
| 468 | sized outputs are created.)  Works on one row group at a time.  This module | 
|---|
| 469 | also calls the color conversion module, so its top level is effectively a | 
|---|
| 470 | buffer controller for the upsampling->color conversion buffer.  However, in | 
|---|
| 471 | all but the highest-quality operating modes, upsampling and color | 
|---|
| 472 | conversion are likely to be merged into a single step. | 
|---|
| 473 |  | 
|---|
| 474 | * Colorspace conversion: convert from JPEG color space to output color space, | 
|---|
| 475 | and change data layout from separate component planes to pixel-interleaved. | 
|---|
| 476 | Works on one pixel row at a time. | 
|---|
| 477 |  | 
|---|
| 478 | * Color quantization: reduce the data to colormapped form, using either an | 
|---|
| 479 | externally specified colormap or an internally generated one.  This module | 
|---|
| 480 | is not used for full-color output.  Works on one pixel row at a time; may | 
|---|
| 481 | require two passes to generate a color map.  Note that the output will | 
|---|
| 482 | always be a single component representing colormap indexes.  In the current | 
|---|
| 483 | design, the output values are JSAMPLEs, so an 8-bit compilation cannot | 
|---|
| 484 | quantize to more than 256 colors.  This is unlikely to be a problem in | 
|---|
| 485 | practice. | 
|---|
| 486 |  | 
|---|
| 487 | * Color reduction: this module handles color precision reduction, e.g., | 
|---|
| 488 | generating 15-bit color (5 bits/primary) from JPEG's 24-bit output. | 
|---|
| 489 | Not quite clear yet how this should be handled... should we merge it with | 
|---|
| 490 | colorspace conversion??? | 
|---|
| 491 |  | 
|---|
| 492 | Note that some high-speed operating modes might condense the entire | 
|---|
| 493 | postprocessing sequence to a single module (upsample, color convert, and | 
|---|
| 494 | quantize in one step). | 
|---|
| 495 |  | 
|---|
| 496 | In addition to the above objects, the decompression library includes these | 
|---|
| 497 | objects: | 
|---|
| 498 |  | 
|---|
| 499 | * Master control: determines the number of passes required, controls overall | 
|---|
| 500 | and per-pass initialization of the other modules.  This is subdivided into | 
|---|
| 501 | input and output control: jdinput.c controls only input-side processing, | 
|---|
| 502 | while jdmaster.c handles overall initialization and output-side control. | 
|---|
| 503 |  | 
|---|
| 504 | * Marker reading: decodes JPEG markers (except for RSTn). | 
|---|
| 505 |  | 
|---|
| 506 | * Data source manager: supplies the input JPEG datastream.  The source | 
|---|
| 507 | manager supplied with the library knows how to read from a stdio stream; | 
|---|
| 508 | for other behaviors, the surrounding application may provide its own source | 
|---|
| 509 | manager. | 
|---|
| 510 |  | 
|---|
| 511 | * Memory manager: same as for compression library. | 
|---|
| 512 |  | 
|---|
| 513 | * Error handler: same as for compression library. | 
|---|
| 514 |  | 
|---|
| 515 | * Progress monitor: same as for compression library. | 
|---|
| 516 |  | 
|---|
| 517 | As with compression, the data source manager, error handler, and progress | 
|---|
| 518 | monitor are candidates for replacement by a surrounding application. | 
|---|
| 519 |  | 
|---|
| 520 |  | 
|---|
| 521 | *** Decompression input and output separation *** | 
|---|
| 522 |  | 
|---|
| 523 | To support efficient incremental display of progressive JPEG files, the | 
|---|
| 524 | decompressor is divided into two sections that can run independently: | 
|---|
| 525 |  | 
|---|
| 526 | 1. Data input includes marker parsing, entropy decoding, and input into the | 
|---|
| 527 | coefficient controller's DCT coefficient buffer.  Note that this | 
|---|
| 528 | processing is relatively cheap and fast. | 
|---|
| 529 |  | 
|---|
| 530 | 2. Data output reads from the DCT coefficient buffer and performs the IDCT | 
|---|
| 531 | and all postprocessing steps. | 
|---|
| 532 |  | 
|---|
| 533 | For a progressive JPEG file, the data input processing is allowed to get | 
|---|
| 534 | arbitrarily far ahead of the data output processing.  (This occurs only | 
|---|
| 535 | if the application calls jpeg_consume_input(); otherwise input and output | 
|---|
| 536 | run in lockstep, since the input section is called only when the output | 
|---|
| 537 | section needs more data.)  In this way the application can avoid making | 
|---|
| 538 | extra display passes when data is arriving faster than the display pass | 
|---|
| 539 | can run.  Furthermore, it is possible to abort an output pass without | 
|---|
| 540 | losing anything, since the coefficient buffer is read-only as far as the | 
|---|
| 541 | output section is concerned.  See libjpeg.txt for more detail. | 
|---|
| 542 |  | 
|---|
| 543 | A full-image coefficient array is only created if the JPEG file has multiple | 
|---|
| 544 | scans (or if the application specifies buffered-image mode anyway).  When | 
|---|
| 545 | reading a single-scan file, the coefficient controller normally creates only | 
|---|
| 546 | a one-MCU buffer, so input and output processing must run in lockstep in this | 
|---|
| 547 | case.  jpeg_consume_input() is effectively a no-op in this situation. | 
|---|
| 548 |  | 
|---|
| 549 | The main impact of dividing the decompressor in this fashion is that we must | 
|---|
| 550 | be very careful with shared variables in the cinfo data structure.  Each | 
|---|
| 551 | variable that can change during the course of decompression must be | 
|---|
| 552 | classified as belonging to data input or data output, and each section must | 
|---|
| 553 | look only at its own variables.  For example, the data output section may not | 
|---|
| 554 | depend on any of the variables that describe the current scan in the JPEG | 
|---|
| 555 | file, because these may change as the data input section advances into a new | 
|---|
| 556 | scan. | 
|---|
| 557 |  | 
|---|
| 558 | The progress monitor is (somewhat arbitrarily) defined to treat input of the | 
|---|
| 559 | file as one pass when buffered-image mode is not used, and to ignore data | 
|---|
| 560 | input work completely when buffered-image mode is used.  Note that the | 
|---|
| 561 | library has no reliable way to predict the number of passes when dealing | 
|---|
| 562 | with a progressive JPEG file, nor can it predict the number of output passes | 
|---|
| 563 | in buffered-image mode.  So the work estimate is inherently bogus anyway. | 
|---|
| 564 |  | 
|---|
| 565 | No comparable division is currently made in the compression library, because | 
|---|
| 566 | there isn't any real need for it. | 
|---|
| 567 |  | 
|---|
| 568 |  | 
|---|
| 569 | *** Data formats *** | 
|---|
| 570 |  | 
|---|
| 571 | Arrays of pixel sample values use the following data structure: | 
|---|
| 572 |  | 
|---|
| 573 | typedef something JSAMPLE;          a pixel component value, 0..MAXJSAMPLE | 
|---|
| 574 | typedef JSAMPLE *JSAMPROW;          ptr to a row of samples | 
|---|
| 575 | typedef JSAMPROW *JSAMPARRAY;       ptr to a list of rows | 
|---|
| 576 | typedef JSAMPARRAY *JSAMPIMAGE;     ptr to a list of color-component arrays | 
|---|
| 577 |  | 
|---|
| 578 | The basic element type JSAMPLE will typically be one of unsigned char, | 
|---|
| 579 | (signed) char, or short.  Short will be used if samples wider than 8 bits are | 
|---|
| 580 | to be supported (this is a compile-time option).  Otherwise, unsigned char is | 
|---|
| 581 | used if possible.  If the compiler only supports signed chars, then it is | 
|---|
| 582 | necessary to mask off the value when reading.  Thus, all reads of JSAMPLE | 
|---|
| 583 | values must be coded as "GETJSAMPLE(value)", where the macro will be defined | 
|---|
| 584 | as "((value) & 0xFF)" on signed-char machines and "((int) (value))" elsewhere. | 
|---|
| 585 |  | 
|---|
| 586 | With these conventions, JSAMPLE values can be assumed to be >= 0.  This helps | 
|---|
| 587 | simplify correct rounding during downsampling, etc.  The JPEG standard's | 
|---|
| 588 | specification that sample values run from -128..127 is accommodated by | 
|---|
| 589 | subtracting 128 from the sample value in the DCT step.  Similarly, during | 
|---|
| 590 | decompression the output of the IDCT step will be immediately shifted back to | 
|---|
| 591 | 0..255.  (NB: different values are required when 12-bit samples are in use. | 
|---|
| 592 | The code is written in terms of MAXJSAMPLE and CENTERJSAMPLE, which will be | 
|---|
| 593 | defined as 255 and 128 respectively in an 8-bit implementation, and as 4095 | 
|---|
| 594 | and 2048 in a 12-bit implementation.) | 
|---|
| 595 |  | 
|---|
| 596 | We use a pointer per row, rather than a two-dimensional JSAMPLE array.  This | 
|---|
| 597 | choice costs only a small amount of memory and has several benefits: | 
|---|
| 598 | * Code using the data structure doesn't need to know the allocated width of | 
|---|
| 599 | the rows.  This simplifies edge expansion/compression, since we can work | 
|---|
| 600 | in an array that's wider than the logical picture width. | 
|---|
| 601 | * Indexing doesn't require multiplication; this is a performance win on many | 
|---|
| 602 | machines. | 
|---|
| 603 | * Arrays with more than 64K total elements can be supported even on machines | 
|---|
| 604 | where malloc() cannot allocate chunks larger than 64K. | 
|---|
| 605 | * The rows forming a component array may be allocated at different times | 
|---|
| 606 | without extra copying.  This trick allows some speedups in smoothing steps | 
|---|
| 607 | that need access to the previous and next rows. | 
|---|
| 608 |  | 
|---|
| 609 | Note that each color component is stored in a separate array; we don't use the | 
|---|
| 610 | traditional layout in which the components of a pixel are stored together. | 
|---|
| 611 | This simplifies coding of modules that work on each component independently, | 
|---|
| 612 | because they don't need to know how many components there are.  Furthermore, | 
|---|
| 613 | we can read or write each component to a temporary file independently, which | 
|---|
| 614 | is helpful when dealing with noninterleaved JPEG files. | 
|---|
| 615 |  | 
|---|
| 616 | In general, a specific sample value is accessed by code such as | 
|---|
| 617 | GETJSAMPLE(image[colorcomponent][row][col]) | 
|---|
| 618 | where col is measured from the image left edge, but row is measured from the | 
|---|
| 619 | first sample row currently in memory.  Either of the first two indexings can | 
|---|
| 620 | be precomputed by copying the relevant pointer. | 
|---|
| 621 |  | 
|---|
| 622 |  | 
|---|
| 623 | Since most image-processing applications prefer to work on images in which | 
|---|
| 624 | the components of a pixel are stored together, the data passed to or from the | 
|---|
| 625 | surrounding application uses the traditional convention: a single pixel is | 
|---|
| 626 | represented by N consecutive JSAMPLE values, and an image row is an array of | 
|---|
| 627 | (# of color components)*(image width) JSAMPLEs.  One or more rows of data can | 
|---|
| 628 | be represented by a pointer of type JSAMPARRAY in this scheme.  This scheme is | 
|---|
| 629 | converted to component-wise storage inside the JPEG library.  (Applications | 
|---|
| 630 | that want to skip JPEG preprocessing or postprocessing will have to contend | 
|---|
| 631 | with component-wise storage.) | 
|---|
| 632 |  | 
|---|
| 633 |  | 
|---|
| 634 | Arrays of DCT-coefficient values use the following data structure: | 
|---|
| 635 |  | 
|---|
| 636 | typedef short JCOEF;                a 16-bit signed integer | 
|---|
| 637 | typedef JCOEF JBLOCK[DCTSIZE2];     an 8x8 block of coefficients | 
|---|
| 638 | typedef JBLOCK *JBLOCKROW;          ptr to one horizontal row of 8x8 blocks | 
|---|
| 639 | typedef JBLOCKROW *JBLOCKARRAY;     ptr to a list of such rows | 
|---|
| 640 | typedef JBLOCKARRAY *JBLOCKIMAGE;   ptr to a list of color component arrays | 
|---|
| 641 |  | 
|---|
| 642 | The underlying type is at least a 16-bit signed integer; while "short" is big | 
|---|
| 643 | enough on all machines of interest, on some machines it is preferable to use | 
|---|
| 644 | "int" for speed reasons, despite the storage cost.  Coefficients are grouped | 
|---|
| 645 | into 8x8 blocks (but we always use #defines DCTSIZE and DCTSIZE2 rather than | 
|---|
| 646 | "8" and "64"). | 
|---|
| 647 |  | 
|---|
| 648 | The contents of a coefficient block may be in either "natural" or zigzagged | 
|---|
| 649 | order, and may be true values or divided by the quantization coefficients, | 
|---|
| 650 | depending on where the block is in the processing pipeline.  In the current | 
|---|
| 651 | library, coefficient blocks are kept in natural order everywhere; the entropy | 
|---|
| 652 | codecs zigzag or dezigzag the data as it is written or read.  The blocks | 
|---|
| 653 | contain quantized coefficients everywhere outside the DCT/IDCT subsystems. | 
|---|
| 654 | (This latter decision may need to be revisited to support variable | 
|---|
| 655 | quantization a la JPEG Part 3.) | 
|---|
| 656 |  | 
|---|
| 657 | Notice that the allocation unit is now a row of 8x8 blocks, corresponding to | 
|---|
| 658 | eight rows of samples.  Otherwise the structure is much the same as for | 
|---|
| 659 | samples, and for the same reasons. | 
|---|
| 660 |  | 
|---|
| 661 | On machines where malloc() can't handle a request bigger than 64Kb, this data | 
|---|
| 662 | structure limits us to rows of less than 512 JBLOCKs, or a picture width of | 
|---|
| 663 | 4000+ pixels.  This seems an acceptable restriction. | 
|---|
| 664 |  | 
|---|
| 665 |  | 
|---|
| 666 | On 80x86 machines, the bottom-level pointer types (JSAMPROW and JBLOCKROW) | 
|---|
| 667 | must be declared as "far" pointers, but the upper levels can be "near" | 
|---|
| 668 | (implying that the pointer lists are allocated in the DS segment). | 
|---|
| 669 | We use a #define symbol FAR, which expands to the "far" keyword when | 
|---|
| 670 | compiling on 80x86 machines and to nothing elsewhere. | 
|---|
| 671 |  | 
|---|
| 672 |  | 
|---|
| 673 | *** Suspendable processing *** | 
|---|
| 674 |  | 
|---|
| 675 | In some applications it is desirable to use the JPEG library as an | 
|---|
| 676 | incremental, memory-to-memory filter.  In this situation the data source or | 
|---|
| 677 | destination may be a limited-size buffer, and we can't rely on being able to | 
|---|
| 678 | empty or refill the buffer at arbitrary times.  Instead the application would | 
|---|
| 679 | like to have control return from the library at buffer overflow/underrun, and | 
|---|
| 680 | then resume compression or decompression at a later time. | 
|---|
| 681 |  | 
|---|
| 682 | This scenario is supported for simple cases.  (For anything more complex, we | 
|---|
| 683 | recommend that the application "bite the bullet" and develop real multitasking | 
|---|
| 684 | capability.)  The libjpeg.txt file goes into more detail about the usage and | 
|---|
| 685 | limitations of this capability; here we address the implications for library | 
|---|
| 686 | structure. | 
|---|
| 687 |  | 
|---|
| 688 | The essence of the problem is that the entropy codec (coder or decoder) must | 
|---|
| 689 | be prepared to stop at arbitrary times.  In turn, the controllers that call | 
|---|
| 690 | the entropy codec must be able to stop before having produced or consumed all | 
|---|
| 691 | the data that they normally would handle in one call.  That part is reasonably | 
|---|
| 692 | straightforward: we make the controller call interfaces include "progress | 
|---|
| 693 | counters" which indicate the number of data chunks successfully processed, and | 
|---|
| 694 | we require callers to test the counter rather than just assume all of the data | 
|---|
| 695 | was processed. | 
|---|
| 696 |  | 
|---|
| 697 | Rather than trying to restart at an arbitrary point, the current Huffman | 
|---|
| 698 | codecs are designed to restart at the beginning of the current MCU after a | 
|---|
| 699 | suspension due to buffer overflow/underrun.  At the start of each call, the | 
|---|
| 700 | codec's internal state is loaded from permanent storage (in the JPEG object | 
|---|
| 701 | structures) into local variables.  On successful completion of the MCU, the | 
|---|
| 702 | permanent state is updated.  (This copying is not very expensive, and may even | 
|---|
| 703 | lead to *improved* performance if the local variables can be registerized.) | 
|---|
| 704 | If a suspension occurs, the codec simply returns without updating the state, | 
|---|
| 705 | thus effectively reverting to the start of the MCU.  Note that this implies | 
|---|
| 706 | leaving some data unprocessed in the source/destination buffer (ie, the | 
|---|
| 707 | compressed partial MCU).  The data source/destination module interfaces are | 
|---|
| 708 | specified so as to make this possible.  This also implies that the data buffer | 
|---|
| 709 | must be large enough to hold a worst-case compressed MCU; a couple thousand | 
|---|
| 710 | bytes should be enough. | 
|---|
| 711 |  | 
|---|
| 712 | In a successive-approximation AC refinement scan, the progressive Huffman | 
|---|
| 713 | decoder has to be able to undo assignments of newly nonzero coefficients if it | 
|---|
| 714 | suspends before the MCU is complete, since decoding requires distinguishing | 
|---|
| 715 | previously-zero and previously-nonzero coefficients.  This is a bit tedious | 
|---|
| 716 | but probably won't have much effect on performance.  Other variants of Huffman | 
|---|
| 717 | decoding need not worry about this, since they will just store the same values | 
|---|
| 718 | again if forced to repeat the MCU. | 
|---|
| 719 |  | 
|---|
| 720 | This approach would probably not work for an arithmetic codec, since its | 
|---|
| 721 | modifiable state is quite large and couldn't be copied cheaply.  Instead it | 
|---|
| 722 | would have to suspend and resume exactly at the point of the buffer end. | 
|---|
| 723 |  | 
|---|
| 724 | The JPEG marker reader is designed to cope with suspension at an arbitrary | 
|---|
| 725 | point.  It does so by backing up to the start of the marker parameter segment, | 
|---|
| 726 | so the data buffer must be big enough to hold the largest marker of interest. | 
|---|
| 727 | Again, a couple KB should be adequate.  (A special "skip" convention is used | 
|---|
| 728 | to bypass COM and APPn markers, so these can be larger than the buffer size | 
|---|
| 729 | without causing problems; otherwise a 64K buffer would be needed in the worst | 
|---|
| 730 | case.) | 
|---|
| 731 |  | 
|---|
| 732 | The JPEG marker writer currently does *not* cope with suspension. | 
|---|
| 733 | We feel that this is not necessary; it is much easier simply to require | 
|---|
| 734 | the application to ensure there is enough buffer space before starting.  (An | 
|---|
| 735 | empty 2K buffer is more than sufficient for the header markers; and ensuring | 
|---|
| 736 | there are a dozen or two bytes available before calling jpeg_finish_compress() | 
|---|
| 737 | will suffice for the trailer.)  This would not work for writing multi-scan | 
|---|
| 738 | JPEG files, but we simply do not intend to support that capability with | 
|---|
| 739 | suspension. | 
|---|
| 740 |  | 
|---|
| 741 |  | 
|---|
| 742 | *** Memory manager services *** | 
|---|
| 743 |  | 
|---|
| 744 | The JPEG library's memory manager controls allocation and deallocation of | 
|---|
| 745 | memory, and it manages large "virtual" data arrays on machines where the | 
|---|
| 746 | operating system does not provide virtual memory.  Note that the same | 
|---|
| 747 | memory manager serves both compression and decompression operations. | 
|---|
| 748 |  | 
|---|
| 749 | In all cases, allocated objects are tied to a particular compression or | 
|---|
| 750 | decompression master record, and they will be released when that master | 
|---|
| 751 | record is destroyed. | 
|---|
| 752 |  | 
|---|
| 753 | The memory manager does not provide explicit deallocation of objects. | 
|---|
| 754 | Instead, objects are created in "pools" of free storage, and a whole pool | 
|---|
| 755 | can be freed at once.  This approach helps prevent storage-leak bugs, and | 
|---|
| 756 | it speeds up operations whenever malloc/free are slow (as they often are). | 
|---|
| 757 | The pools can be regarded as lifetime identifiers for objects.  Two | 
|---|
| 758 | pools/lifetimes are defined: | 
|---|
| 759 | * JPOOL_PERMANENT     lasts until master record is destroyed | 
|---|
| 760 | * JPOOL_IMAGE         lasts until done with image (JPEG datastream) | 
|---|
| 761 | Permanent lifetime is used for parameters and tables that should be carried | 
|---|
| 762 | across from one datastream to another; this includes all application-visible | 
|---|
| 763 | parameters.  Image lifetime is used for everything else.  (A third lifetime, | 
|---|
| 764 | JPOOL_PASS = one processing pass, was originally planned.  However it was | 
|---|
| 765 | dropped as not being worthwhile.  The actual usage patterns are such that the | 
|---|
| 766 | peak memory usage would be about the same anyway; and having per-pass storage | 
|---|
| 767 | substantially complicates the virtual memory allocation rules --- see below.) | 
|---|
| 768 |  | 
|---|
| 769 | The memory manager deals with three kinds of object: | 
|---|
| 770 | 1. "Small" objects.  Typically these require no more than 10K-20K total. | 
|---|
| 771 | 2. "Large" objects.  These may require tens to hundreds of K depending on | 
|---|
| 772 | image size.  Semantically they behave the same as small objects, but we | 
|---|
| 773 | distinguish them for two reasons: | 
|---|
| 774 | * On MS-DOS machines, large objects are referenced by FAR pointers, | 
|---|
| 775 | small objects by NEAR pointers. | 
|---|
| 776 | * Pool allocation heuristics may differ for large and small objects. | 
|---|
| 777 | Note that individual "large" objects cannot exceed the size allowed by | 
|---|
| 778 | type size_t, which may be 64K or less on some machines. | 
|---|
| 779 | 3. "Virtual" objects.  These are large 2-D arrays of JSAMPLEs or JBLOCKs | 
|---|
| 780 | (typically large enough for the entire image being processed).  The | 
|---|
| 781 | memory manager provides stripwise access to these arrays.  On machines | 
|---|
| 782 | without virtual memory, the rest of the array may be swapped out to a | 
|---|
| 783 | temporary file. | 
|---|
| 784 |  | 
|---|
| 785 | (Note: JSAMPARRAY and JBLOCKARRAY data structures are a combination of large | 
|---|
| 786 | objects for the data proper and small objects for the row pointers.  For | 
|---|
| 787 | convenience and speed, the memory manager provides single routines to create | 
|---|
| 788 | these structures.  Similarly, virtual arrays include a small control block | 
|---|
| 789 | and a JSAMPARRAY or JBLOCKARRAY working buffer, all created with one call.) | 
|---|
| 790 |  | 
|---|
| 791 | In the present implementation, virtual arrays are only permitted to have image | 
|---|
| 792 | lifespan.  (Permanent lifespan would not be reasonable, and pass lifespan is | 
|---|
| 793 | not very useful since a virtual array's raison d'etre is to store data for | 
|---|
| 794 | multiple passes through the image.)  We also expect that only "small" objects | 
|---|
| 795 | will be given permanent lifespan, though this restriction is not required by | 
|---|
| 796 | the memory manager. | 
|---|
| 797 |  | 
|---|
| 798 | In a non-virtual-memory machine, some performance benefit can be gained by | 
|---|
| 799 | making the in-memory buffers for virtual arrays be as large as possible. | 
|---|
| 800 | (For small images, the buffers might fit entirely in memory, so blind | 
|---|
| 801 | swapping would be very wasteful.)  The memory manager will adjust the height | 
|---|
| 802 | of the buffers to fit within a prespecified maximum memory usage.  In order | 
|---|
| 803 | to do this in a reasonably optimal fashion, the manager needs to allocate all | 
|---|
| 804 | of the virtual arrays at once.  Therefore, there isn't a one-step allocation | 
|---|
| 805 | routine for virtual arrays; instead, there is a "request" routine that simply | 
|---|
| 806 | allocates the control block, and a "realize" routine (called just once) that | 
|---|
| 807 | determines space allocation and creates all of the actual buffers.  The | 
|---|
| 808 | realize routine must allow for space occupied by non-virtual large objects. | 
|---|
| 809 | (We don't bother to factor in the space needed for small objects, on the | 
|---|
| 810 | grounds that it isn't worth the trouble.) | 
|---|
| 811 |  | 
|---|
| 812 | To support all this, we establish the following protocol for doing business | 
|---|
| 813 | with the memory manager: | 
|---|
| 814 | 1. Modules must request virtual arrays (which may have only image lifespan) | 
|---|
| 815 | during the initial setup phase, i.e., in their jinit_xxx routines. | 
|---|
| 816 | 2. All "large" objects (including JSAMPARRAYs and JBLOCKARRAYs) must also be | 
|---|
| 817 | allocated during initial setup. | 
|---|
| 818 | 3. realize_virt_arrays will be called at the completion of initial setup. | 
|---|
| 819 | The above conventions ensure that sufficient information is available | 
|---|
| 820 | for it to choose a good size for virtual array buffers. | 
|---|
| 821 | Small objects of any lifespan may be allocated at any time.  We expect that | 
|---|
| 822 | the total space used for small objects will be small enough to be negligible | 
|---|
| 823 | in the realize_virt_arrays computation. | 
|---|
| 824 |  | 
|---|
| 825 | In a virtual-memory machine, we simply pretend that the available space is | 
|---|
| 826 | infinite, thus causing realize_virt_arrays to decide that it can allocate all | 
|---|
| 827 | the virtual arrays as full-size in-memory buffers.  The overhead of the | 
|---|
| 828 | virtual-array access protocol is very small when no swapping occurs. | 
|---|
| 829 |  | 
|---|
| 830 | A virtual array can be specified to be "pre-zeroed"; when this flag is set, | 
|---|
| 831 | never-yet-written sections of the array are set to zero before being made | 
|---|
| 832 | available to the caller.  If this flag is not set, never-written sections | 
|---|
| 833 | of the array contain garbage.  (This feature exists primarily because the | 
|---|
| 834 | equivalent logic would otherwise be needed in jdcoefct.c for progressive | 
|---|
| 835 | JPEG mode; we may as well make it available for possible other uses.) | 
|---|
| 836 |  | 
|---|
| 837 | The first write pass on a virtual array is required to occur in top-to-bottom | 
|---|
| 838 | order; read passes, as well as any write passes after the first one, may | 
|---|
| 839 | access the array in any order.  This restriction exists partly to simplify | 
|---|
| 840 | the virtual array control logic, and partly because some file systems may not | 
|---|
| 841 | support seeking beyond the current end-of-file in a temporary file.  The main | 
|---|
| 842 | implication of this restriction is that rearrangement of rows (such as | 
|---|
| 843 | converting top-to-bottom data order to bottom-to-top) must be handled while | 
|---|
| 844 | reading data out of the virtual array, not while putting it in. | 
|---|
| 845 |  | 
|---|
| 846 |  | 
|---|
| 847 | *** Memory manager internal structure *** | 
|---|
| 848 |  | 
|---|
| 849 | To isolate system dependencies as much as possible, we have broken the | 
|---|
| 850 | memory manager into two parts.  There is a reasonably system-independent | 
|---|
| 851 | "front end" (jmemmgr.c) and a "back end" that contains only the code | 
|---|
| 852 | likely to change across systems.  All of the memory management methods | 
|---|
| 853 | outlined above are implemented by the front end.  The back end provides | 
|---|
| 854 | the following routines for use by the front end (none of these routines | 
|---|
| 855 | are known to the rest of the JPEG code): | 
|---|
| 856 |  | 
|---|
| 857 | jpeg_mem_init, jpeg_mem_term    system-dependent initialization/shutdown | 
|---|
| 858 |  | 
|---|
| 859 | jpeg_get_small, jpeg_free_small interface to malloc and free library routines | 
|---|
| 860 | (or their equivalents) | 
|---|
| 861 |  | 
|---|
| 862 | jpeg_get_large, jpeg_free_large interface to FAR malloc/free in MSDOS machines; | 
|---|
| 863 | else usually the same as | 
|---|
| 864 | jpeg_get_small/jpeg_free_small | 
|---|
| 865 |  | 
|---|
| 866 | jpeg_mem_available              estimate available memory | 
|---|
| 867 |  | 
|---|
| 868 | jpeg_open_backing_store         create a backing-store object | 
|---|
| 869 |  | 
|---|
| 870 | read_backing_store,             manipulate a backing-store object | 
|---|
| 871 | write_backing_store, | 
|---|
| 872 | close_backing_store | 
|---|
| 873 |  | 
|---|
| 874 | On some systems there will be more than one type of backing-store object | 
|---|
| 875 | (specifically, in MS-DOS a backing store file might be an area of extended | 
|---|
| 876 | memory as well as a disk file).  jpeg_open_backing_store is responsible for | 
|---|
| 877 | choosing how to implement a given object.  The read/write/close routines | 
|---|
| 878 | are method pointers in the structure that describes a given object; this | 
|---|
| 879 | lets them be different for different object types. | 
|---|
| 880 |  | 
|---|
| 881 | It may be necessary to ensure that backing store objects are explicitly | 
|---|
| 882 | released upon abnormal program termination.  For example, MS-DOS won't free | 
|---|
| 883 | extended memory by itself.  To support this, we will expect the main program | 
|---|
| 884 | or surrounding application to arrange to call self_destruct (typically via | 
|---|
| 885 | jpeg_destroy) upon abnormal termination.  This may require a SIGINT signal | 
|---|
| 886 | handler or equivalent.  We don't want to have the back end module install its | 
|---|
| 887 | own signal handler, because that would pre-empt the surrounding application's | 
|---|
| 888 | ability to control signal handling. | 
|---|
| 889 |  | 
|---|
| 890 | The IJG distribution includes several memory manager back end implementations. | 
|---|
| 891 | Usually the same back end should be suitable for all applications on a given | 
|---|
| 892 | system, but it is possible for an application to supply its own back end at | 
|---|
| 893 | need. | 
|---|
| 894 |  | 
|---|
| 895 |  | 
|---|
| 896 | *** Implications of DNL marker *** | 
|---|
| 897 |  | 
|---|
| 898 | Some JPEG files may use a DNL marker to postpone definition of the image | 
|---|
| 899 | height (this would be useful for a fax-like scanner's output, for instance). | 
|---|
| 900 | In these files the SOF marker claims the image height is 0, and you only | 
|---|
| 901 | find out the true image height at the end of the first scan. | 
|---|
| 902 |  | 
|---|
| 903 | We could read these files as follows: | 
|---|
| 904 | 1. Upon seeing zero image height, replace it by 65535 (the maximum allowed). | 
|---|
| 905 | 2. When the DNL is found, update the image height in the global image | 
|---|
| 906 | descriptor. | 
|---|
| 907 | This implies that control modules must avoid making copies of the image | 
|---|
| 908 | height, and must re-test for termination after each MCU row.  This would | 
|---|
| 909 | be easy enough to do. | 
|---|
| 910 |  | 
|---|
| 911 | In cases where image-size data structures are allocated, this approach will | 
|---|
| 912 | result in very inefficient use of virtual memory or much-larger-than-necessary | 
|---|
| 913 | temporary files.  This seems acceptable for something that probably won't be a | 
|---|
| 914 | mainstream usage.  People might have to forgo use of memory-hogging options | 
|---|
| 915 | (such as two-pass color quantization or noninterleaved JPEG files) if they | 
|---|
| 916 | want efficient conversion of such files.  (One could improve efficiency by | 
|---|
| 917 | demanding a user-supplied upper bound for the height, less than 65536; in most | 
|---|
| 918 | cases it could be much less.) | 
|---|
| 919 |  | 
|---|
| 920 | The standard also permits the SOF marker to overestimate the image height, | 
|---|
| 921 | with a DNL to give the true, smaller height at the end of the first scan. | 
|---|
| 922 | This would solve the space problems if the overestimate wasn't too great. | 
|---|
| 923 | However, it implies that you don't even know whether DNL will be used. | 
|---|
| 924 |  | 
|---|
| 925 | This leads to a couple of very serious objections: | 
|---|
| 926 | 1. Testing for a DNL marker must occur in the inner loop of the decompressor's | 
|---|
| 927 | Huffman decoder; this implies a speed penalty whether the feature is used | 
|---|
| 928 | or not. | 
|---|
| 929 | 2. There is no way to hide the last-minute change in image height from an | 
|---|
| 930 | application using the decoder.  Thus *every* application using the IJG | 
|---|
| 931 | library would suffer a complexity penalty whether it cared about DNL or | 
|---|
| 932 | not. | 
|---|
| 933 | We currently do not support DNL because of these problems. | 
|---|
| 934 |  | 
|---|
| 935 | A different approach is to insist that DNL-using files be preprocessed by a | 
|---|
| 936 | separate program that reads ahead to the DNL, then goes back and fixes the SOF | 
|---|
| 937 | marker.  This is a much simpler solution and is probably far more efficient. | 
|---|
| 938 | Even if one wants piped input, buffering the first scan of the JPEG file needs | 
|---|
| 939 | a lot smaller temp file than is implied by the maximum-height method.  For | 
|---|
| 940 | this approach we'd simply treat DNL as a no-op in the decompressor (at most, | 
|---|
| 941 | check that it matches the SOF image height). | 
|---|
| 942 |  | 
|---|
| 943 | We will not worry about making the compressor capable of outputting DNL. | 
|---|
| 944 | Something similar to the first scheme above could be applied if anyone ever | 
|---|
| 945 | wants to make that work. | 
|---|