1 | IJG JPEG LIBRARY: SYSTEM ARCHITECTURE
|
---|
2 |
|
---|
3 | Copyright (C) 1991-2009, Thomas G. Lane, Guido Vollbeding.
|
---|
4 | This file is part of the Independent JPEG Group's software.
|
---|
5 | For conditions of distribution and use, see the accompanying README file.
|
---|
6 |
|
---|
7 |
|
---|
8 | This file provides an overview of the architecture of the IJG JPEG software;
|
---|
9 | that is, the functions of the various modules in the system and the interfaces
|
---|
10 | between modules. For more precise details about any data structure or calling
|
---|
11 | convention, see the include files and comments in the source code.
|
---|
12 |
|
---|
13 | We assume that the reader is already somewhat familiar with the JPEG standard.
|
---|
14 | The README file includes references for learning about JPEG. The file
|
---|
15 | libjpeg.txt describes the library from the viewpoint of an application
|
---|
16 | programmer using the library; it's best to read that file before this one.
|
---|
17 | Also, the file coderules.txt describes the coding style conventions we use.
|
---|
18 |
|
---|
19 | In this document, JPEG-specific terminology follows the JPEG standard:
|
---|
20 | A "component" means a color channel, e.g., Red or Luminance.
|
---|
21 | A "sample" is a single component value (i.e., one number in the image data).
|
---|
22 | A "coefficient" is a frequency coefficient (a DCT transform output number).
|
---|
23 | A "block" is an 8x8 group of samples or coefficients.
|
---|
24 | An "MCU" (minimum coded unit) is an interleaved set of blocks of size
|
---|
25 | determined by the sampling factors, or a single block in a
|
---|
26 | noninterleaved scan.
|
---|
27 | We do not use the terms "pixel" and "sample" interchangeably. When we say
|
---|
28 | pixel, we mean an element of the full-size image, while a sample is an element
|
---|
29 | of the downsampled image. Thus the number of samples may vary across
|
---|
30 | components while the number of pixels does not. (This terminology is not used
|
---|
31 | rigorously throughout the code, but it is used in places where confusion would
|
---|
32 | otherwise result.)
|
---|
33 |
|
---|
34 |
|
---|
35 | *** System features ***
|
---|
36 |
|
---|
37 | The IJG distribution contains two parts:
|
---|
38 | * A subroutine library for JPEG compression and decompression.
|
---|
39 | * cjpeg/djpeg, two sample applications that use the library to transform
|
---|
40 | JFIF JPEG files to and from several other image formats.
|
---|
41 | cjpeg/djpeg are of no great intellectual complexity: they merely add a simple
|
---|
42 | command-line user interface and I/O routines for several uncompressed image
|
---|
43 | formats. This document concentrates on the library itself.
|
---|
44 |
|
---|
45 | We desire the library to be capable of supporting all JPEG baseline, extended
|
---|
46 | sequential, and progressive DCT processes. Hierarchical processes are not
|
---|
47 | supported.
|
---|
48 |
|
---|
49 | The library does not support the lossless (spatial) JPEG process. Lossless
|
---|
50 | JPEG shares little or no code with lossy JPEG, and would normally be used
|
---|
51 | without the extensive pre- and post-processing provided by this library.
|
---|
52 | We feel that lossless JPEG is better handled by a separate library.
|
---|
53 |
|
---|
54 | Within these limits, any set of compression parameters allowed by the JPEG
|
---|
55 | spec should be readable for decompression. (We can be more restrictive about
|
---|
56 | what formats we can generate.) Although the system design allows for all
|
---|
57 | parameter values, some uncommon settings are not yet implemented and may
|
---|
58 | never be; nonintegral sampling ratios are the prime example. Furthermore,
|
---|
59 | we treat 8-bit vs. 12-bit data precision as a compile-time switch, not a
|
---|
60 | run-time option, because most machines can store 8-bit pixels much more
|
---|
61 | compactly than 12-bit.
|
---|
62 |
|
---|
63 | By itself, the library handles only interchange JPEG datastreams --- in
|
---|
64 | particular the widely used JFIF file format. The library can be used by
|
---|
65 | surrounding code to process interchange or abbreviated JPEG datastreams that
|
---|
66 | are embedded in more complex file formats. (For example, libtiff uses this
|
---|
67 | library to implement JPEG compression within the TIFF file format.)
|
---|
68 |
|
---|
69 | The library includes a substantial amount of code that is not covered by the
|
---|
70 | JPEG standard but is necessary for typical applications of JPEG. These
|
---|
71 | functions preprocess the image before JPEG compression or postprocess it after
|
---|
72 | decompression. They include colorspace conversion, downsampling/upsampling,
|
---|
73 | and color quantization. This code can be omitted if not needed.
|
---|
74 |
|
---|
75 | A wide range of quality vs. speed tradeoffs are possible in JPEG processing,
|
---|
76 | and even more so in decompression postprocessing. The decompression library
|
---|
77 | provides multiple implementations that cover most of the useful tradeoffs,
|
---|
78 | ranging from very-high-quality down to fast-preview operation. On the
|
---|
79 | compression side we have generally not provided low-quality choices, since
|
---|
80 | compression is normally less time-critical. It should be understood that the
|
---|
81 | low-quality modes may not meet the JPEG standard's accuracy requirements;
|
---|
82 | nonetheless, they are useful for viewers.
|
---|
83 |
|
---|
84 |
|
---|
85 | *** Portability issues ***
|
---|
86 |
|
---|
87 | Portability is an essential requirement for the library. The key portability
|
---|
88 | issues that show up at the level of system architecture are:
|
---|
89 |
|
---|
90 | 1. Memory usage. We want the code to be able to run on PC-class machines
|
---|
91 | with limited memory. Images should therefore be processed sequentially (in
|
---|
92 | strips), to avoid holding the whole image in memory at once. Where a
|
---|
93 | full-image buffer is necessary, we should be able to use either virtual memory
|
---|
94 | or temporary files.
|
---|
95 |
|
---|
96 | 2. Near/far pointer distinction. To run efficiently on 80x86 machines, the
|
---|
97 | code should distinguish "small" objects (kept in near data space) from
|
---|
98 | "large" ones (kept in far data space). This is an annoying restriction, but
|
---|
99 | fortunately it does not impact code quality for less brain-damaged machines,
|
---|
100 | and the source code clutter turns out to be minimal with sufficient use of
|
---|
101 | pointer typedefs.
|
---|
102 |
|
---|
103 | 3. Data precision. We assume that "char" is at least 8 bits, "short" and
|
---|
104 | "int" at least 16, "long" at least 32. The code will work fine with larger
|
---|
105 | data sizes, although memory may be used inefficiently in some cases. However,
|
---|
106 | the JPEG compressed datastream must ultimately appear on external storage as a
|
---|
107 | sequence of 8-bit bytes if it is to conform to the standard. This may pose a
|
---|
108 | problem on machines where char is wider than 8 bits. The library represents
|
---|
109 | compressed data as an array of values of typedef JOCTET. If no data type
|
---|
110 | exactly 8 bits wide is available, custom data source and data destination
|
---|
111 | modules must be written to unpack and pack the chosen JOCTET datatype into
|
---|
112 | 8-bit external representation.
|
---|
113 |
|
---|
114 |
|
---|
115 | *** System overview ***
|
---|
116 |
|
---|
117 | The compressor and decompressor are each divided into two main sections:
|
---|
118 | the JPEG compressor or decompressor proper, and the preprocessing or
|
---|
119 | postprocessing functions. The interface between these two sections is the
|
---|
120 | image data that the official JPEG spec regards as its input or output: this
|
---|
121 | data is in the colorspace to be used for compression, and it is downsampled
|
---|
122 | to the sampling factors to be used. The preprocessing and postprocessing
|
---|
123 | steps are responsible for converting a normal image representation to or from
|
---|
124 | this form. (Those few applications that want to deal with YCbCr downsampled
|
---|
125 | data can skip the preprocessing or postprocessing step.)
|
---|
126 |
|
---|
127 | Looking more closely, the compressor library contains the following main
|
---|
128 | elements:
|
---|
129 |
|
---|
130 | Preprocessing:
|
---|
131 | * Color space conversion (e.g., RGB to YCbCr).
|
---|
132 | * Edge expansion and downsampling. Optionally, this step can do simple
|
---|
133 | smoothing --- this is often helpful for low-quality source data.
|
---|
134 | JPEG proper:
|
---|
135 | * MCU assembly, DCT, quantization.
|
---|
136 | * Entropy coding (sequential or progressive, Huffman or arithmetic).
|
---|
137 |
|
---|
138 | In addition to these modules we need overall control, marker generation,
|
---|
139 | and support code (memory management & error handling). There is also a
|
---|
140 | module responsible for physically writing the output data --- typically
|
---|
141 | this is just an interface to fwrite(), but some applications may need to
|
---|
142 | do something else with the data.
|
---|
143 |
|
---|
144 | The decompressor library contains the following main elements:
|
---|
145 |
|
---|
146 | JPEG proper:
|
---|
147 | * Entropy decoding (sequential or progressive, Huffman or arithmetic).
|
---|
148 | * Dequantization, inverse DCT, MCU disassembly.
|
---|
149 | Postprocessing:
|
---|
150 | * Upsampling. Optionally, this step may be able to do more general
|
---|
151 | rescaling of the image.
|
---|
152 | * Color space conversion (e.g., YCbCr to RGB). This step may also
|
---|
153 | provide gamma adjustment [ currently it does not ].
|
---|
154 | * Optional color quantization (e.g., reduction to 256 colors).
|
---|
155 | * Optional color precision reduction (e.g., 24-bit to 15-bit color).
|
---|
156 | [This feature is not currently implemented.]
|
---|
157 |
|
---|
158 | We also need overall control, marker parsing, and a data source module.
|
---|
159 | The support code (memory management & error handling) can be shared with
|
---|
160 | the compression half of the library.
|
---|
161 |
|
---|
162 | There may be several implementations of each of these elements, particularly
|
---|
163 | in the decompressor, where a wide range of speed/quality tradeoffs is very
|
---|
164 | useful. It must be understood that some of the best speedups involve
|
---|
165 | merging adjacent steps in the pipeline. For example, upsampling, color space
|
---|
166 | conversion, and color quantization might all be done at once when using a
|
---|
167 | low-quality ordered-dither technique. The system architecture is designed to
|
---|
168 | allow such merging where appropriate.
|
---|
169 |
|
---|
170 |
|
---|
171 | Note: it is convenient to regard edge expansion (padding to block boundaries)
|
---|
172 | as a preprocessing/postprocessing function, even though the JPEG spec includes
|
---|
173 | it in compression/decompression. We do this because downsampling/upsampling
|
---|
174 | can be simplified a little if they work on padded data: it's not necessary to
|
---|
175 | have special cases at the right and bottom edges. Therefore the interface
|
---|
176 | buffer is always an integral number of blocks wide and high, and we expect
|
---|
177 | compression preprocessing to pad the source data properly. Padding will occur
|
---|
178 | only to the next block (8-sample) boundary. In an interleaved-scan situation,
|
---|
179 | additional dummy blocks may be used to fill out MCUs, but the MCU assembly and
|
---|
180 | disassembly logic will create or discard these blocks internally. (This is
|
---|
181 | advantageous for speed reasons, since we avoid DCTing the dummy blocks.
|
---|
182 | It also permits a small reduction in file size, because the compressor can
|
---|
183 | choose dummy block contents so as to minimize their size in compressed form.
|
---|
184 | Finally, it makes the interface buffer specification independent of whether
|
---|
185 | the file is actually interleaved or not.) Applications that wish to deal
|
---|
186 | directly with the downsampled data must provide similar buffering and padding
|
---|
187 | for odd-sized images.
|
---|
188 |
|
---|
189 |
|
---|
190 | *** Poor man's object-oriented programming ***
|
---|
191 |
|
---|
192 | It should be clear by now that we have a lot of quasi-independent processing
|
---|
193 | steps, many of which have several possible behaviors. To avoid cluttering the
|
---|
194 | code with lots of switch statements, we use a simple form of object-style
|
---|
195 | programming to separate out the different possibilities.
|
---|
196 |
|
---|
197 | For example, two different color quantization algorithms could be implemented
|
---|
198 | as two separate modules that present the same external interface; at runtime,
|
---|
199 | the calling code will access the proper module indirectly through an "object".
|
---|
200 |
|
---|
201 | We can get the limited features we need while staying within portable C.
|
---|
202 | The basic tool is a function pointer. An "object" is just a struct
|
---|
203 | containing one or more function pointer fields, each of which corresponds to
|
---|
204 | a method name in real object-oriented languages. During initialization we
|
---|
205 | fill in the function pointers with references to whichever module we have
|
---|
206 | determined we need to use in this run. Then invocation of the module is done
|
---|
207 | by indirecting through a function pointer; on most machines this is no more
|
---|
208 | expensive than a switch statement, which would be the only other way of
|
---|
209 | making the required run-time choice. The really significant benefit, of
|
---|
210 | course, is keeping the source code clean and well structured.
|
---|
211 |
|
---|
212 | We can also arrange to have private storage that varies between different
|
---|
213 | implementations of the same kind of object. We do this by making all the
|
---|
214 | module-specific object structs be separately allocated entities, which will
|
---|
215 | be accessed via pointers in the master compression or decompression struct.
|
---|
216 | The "public" fields or methods for a given kind of object are specified by
|
---|
217 | a commonly known struct. But a module's initialization code can allocate
|
---|
218 | a larger struct that contains the common struct as its first member, plus
|
---|
219 | additional private fields. With appropriate pointer casting, the module's
|
---|
220 | internal functions can access these private fields. (For a simple example,
|
---|
221 | see jdatadst.c, which implements the external interface specified by struct
|
---|
222 | jpeg_destination_mgr, but adds extra fields.)
|
---|
223 |
|
---|
224 | (Of course this would all be a lot easier if we were using C++, but we are
|
---|
225 | not yet prepared to assume that everyone has a C++ compiler.)
|
---|
226 |
|
---|
227 | An important benefit of this scheme is that it is easy to provide multiple
|
---|
228 | versions of any method, each tuned to a particular case. While a lot of
|
---|
229 | precalculation might be done to select an optimal implementation of a method,
|
---|
230 | the cost per invocation is constant. For example, the upsampling step might
|
---|
231 | have a "generic" method, plus one or more "hardwired" methods for the most
|
---|
232 | popular sampling factors; the hardwired methods would be faster because they'd
|
---|
233 | use straight-line code instead of for-loops. The cost to determine which
|
---|
234 | method to use is paid only once, at startup, and the selection criteria are
|
---|
235 | hidden from the callers of the method.
|
---|
236 |
|
---|
237 | This plan differs a little bit from usual object-oriented structures, in that
|
---|
238 | only one instance of each object class will exist during execution. The
|
---|
239 | reason for having the class structure is that on different runs we may create
|
---|
240 | different instances (choose to execute different modules). You can think of
|
---|
241 | the term "method" as denoting the common interface presented by a particular
|
---|
242 | set of interchangeable functions, and "object" as denoting a group of related
|
---|
243 | methods, or the total shared interface behavior of a group of modules.
|
---|
244 |
|
---|
245 |
|
---|
246 | *** Overall control structure ***
|
---|
247 |
|
---|
248 | We previously mentioned the need for overall control logic in the compression
|
---|
249 | and decompression libraries. In IJG implementations prior to v5, overall
|
---|
250 | control was mostly provided by "pipeline control" modules, which proved to be
|
---|
251 | large, unwieldy, and hard to understand. To improve the situation, the
|
---|
252 | control logic has been subdivided into multiple modules. The control modules
|
---|
253 | consist of:
|
---|
254 |
|
---|
255 | 1. Master control for module selection and initialization. This has two
|
---|
256 | responsibilities:
|
---|
257 |
|
---|
258 | 1A. Startup initialization at the beginning of image processing.
|
---|
259 | The individual processing modules to be used in this run are selected
|
---|
260 | and given initialization calls.
|
---|
261 |
|
---|
262 | 1B. Per-pass control. This determines how many passes will be performed
|
---|
263 | and calls each active processing module to configure itself
|
---|
264 | appropriately at the beginning of each pass. End-of-pass processing,
|
---|
265 | where necessary, is also invoked from the master control module.
|
---|
266 |
|
---|
267 | Method selection is partially distributed, in that a particular processing
|
---|
268 | module may contain several possible implementations of a particular method,
|
---|
269 | which it will select among when given its initialization call. The master
|
---|
270 | control code need only be concerned with decisions that affect more than
|
---|
271 | one module.
|
---|
272 |
|
---|
273 | 2. Data buffering control. A separate control module exists for each
|
---|
274 | inter-processing-step data buffer. This module is responsible for
|
---|
275 | invoking the processing steps that write or read that data buffer.
|
---|
276 |
|
---|
277 | Each buffer controller sees the world as follows:
|
---|
278 |
|
---|
279 | input data => processing step A => buffer => processing step B => output data
|
---|
280 | | | |
|
---|
281 | ------------------ controller ------------------
|
---|
282 |
|
---|
283 | The controller knows the dataflow requirements of steps A and B: how much data
|
---|
284 | they want to accept in one chunk and how much they output in one chunk. Its
|
---|
285 | function is to manage its buffer and call A and B at the proper times.
|
---|
286 |
|
---|
287 | A data buffer control module may itself be viewed as a processing step by a
|
---|
288 | higher-level control module; thus the control modules form a binary tree with
|
---|
289 | elementary processing steps at the leaves of the tree.
|
---|
290 |
|
---|
291 | The control modules are objects. A considerable amount of flexibility can
|
---|
292 | be had by replacing implementations of a control module. For example:
|
---|
293 | * Merging of adjacent steps in the pipeline is done by replacing a control
|
---|
294 | module and its pair of processing-step modules with a single processing-
|
---|
295 | step module. (Hence the possible merges are determined by the tree of
|
---|
296 | control modules.)
|
---|
297 | * In some processing modes, a given interstep buffer need only be a "strip"
|
---|
298 | buffer large enough to accommodate the desired data chunk sizes. In other
|
---|
299 | modes, a full-image buffer is needed and several passes are required.
|
---|
300 | The control module determines which kind of buffer is used and manipulates
|
---|
301 | virtual array buffers as needed. One or both processing steps may be
|
---|
302 | unaware of the multi-pass behavior.
|
---|
303 |
|
---|
304 | In theory, we might be able to make all of the data buffer controllers
|
---|
305 | interchangeable and provide just one set of implementations for all. In
|
---|
306 | practice, each one contains considerable special-case processing for its
|
---|
307 | particular job. The buffer controller concept should be regarded as an
|
---|
308 | overall system structuring principle, not as a complete description of the
|
---|
309 | task performed by any one controller.
|
---|
310 |
|
---|
311 |
|
---|
312 | *** Compression object structure ***
|
---|
313 |
|
---|
314 | Here is a sketch of the logical structure of the JPEG compression library:
|
---|
315 |
|
---|
316 | |-- Colorspace conversion
|
---|
317 | |-- Preprocessing controller --|
|
---|
318 | | |-- Downsampling
|
---|
319 | Main controller --|
|
---|
320 | | |-- Forward DCT, quantize
|
---|
321 | |-- Coefficient controller --|
|
---|
322 | |-- Entropy encoding
|
---|
323 |
|
---|
324 | This sketch also describes the flow of control (subroutine calls) during
|
---|
325 | typical image data processing. Each of the components shown in the diagram is
|
---|
326 | an "object" which may have several different implementations available. One
|
---|
327 | or more source code files contain the actual implementation(s) of each object.
|
---|
328 |
|
---|
329 | The objects shown above are:
|
---|
330 |
|
---|
331 | * Main controller: buffer controller for the subsampled-data buffer, which
|
---|
332 | holds the preprocessed input data. This controller invokes preprocessing to
|
---|
333 | fill the subsampled-data buffer, and JPEG compression to empty it. There is
|
---|
334 | usually no need for a full-image buffer here; a strip buffer is adequate.
|
---|
335 |
|
---|
336 | * Preprocessing controller: buffer controller for the downsampling input data
|
---|
337 | buffer, which lies between colorspace conversion and downsampling. Note
|
---|
338 | that a unified conversion/downsampling module would probably replace this
|
---|
339 | controller entirely.
|
---|
340 |
|
---|
341 | * Colorspace conversion: converts application image data into the desired
|
---|
342 | JPEG color space; also changes the data from pixel-interleaved layout to
|
---|
343 | separate component planes. Processes one pixel row at a time.
|
---|
344 |
|
---|
345 | * Downsampling: performs reduction of chroma components as required.
|
---|
346 | Optionally may perform pixel-level smoothing as well. Processes a "row
|
---|
347 | group" at a time, where a row group is defined as Vmax pixel rows of each
|
---|
348 | component before downsampling, and Vk sample rows afterwards (remember Vk
|
---|
349 | differs across components). Some downsampling or smoothing algorithms may
|
---|
350 | require context rows above and below the current row group; the
|
---|
351 | preprocessing controller is responsible for supplying these rows via proper
|
---|
352 | buffering. The downsampler is responsible for edge expansion at the right
|
---|
353 | edge (i.e., extending each sample row to a multiple of 8 samples); but the
|
---|
354 | preprocessing controller is responsible for vertical edge expansion (i.e.,
|
---|
355 | duplicating the bottom sample row as needed to make a multiple of 8 rows).
|
---|
356 |
|
---|
357 | * Coefficient controller: buffer controller for the DCT-coefficient data.
|
---|
358 | This controller handles MCU assembly, including insertion of dummy DCT
|
---|
359 | blocks when needed at the right or bottom edge. When performing
|
---|
360 | Huffman-code optimization or emitting a multiscan JPEG file, this
|
---|
361 | controller is responsible for buffering the full image. The equivalent of
|
---|
362 | one fully interleaved MCU row of subsampled data is processed per call,
|
---|
363 | even when the JPEG file is noninterleaved.
|
---|
364 |
|
---|
365 | * Forward DCT and quantization: Perform DCT, quantize, and emit coefficients.
|
---|
366 | Works on one or more DCT blocks at a time. (Note: the coefficients are now
|
---|
367 | emitted in normal array order, which the entropy encoder is expected to
|
---|
368 | convert to zigzag order as necessary. Prior versions of the IJG code did
|
---|
369 | the conversion to zigzag order within the quantization step.)
|
---|
370 |
|
---|
371 | * Entropy encoding: Perform Huffman or arithmetic entropy coding and emit the
|
---|
372 | coded data to the data destination module. Works on one MCU per call.
|
---|
373 | For progressive JPEG, the same DCT blocks are fed to the entropy coder
|
---|
374 | during each pass, and the coder must emit the appropriate subset of
|
---|
375 | coefficients.
|
---|
376 |
|
---|
377 | In addition to the above objects, the compression library includes these
|
---|
378 | objects:
|
---|
379 |
|
---|
380 | * Master control: determines the number of passes required, controls overall
|
---|
381 | and per-pass initialization of the other modules.
|
---|
382 |
|
---|
383 | * Marker writing: generates JPEG markers (except for RSTn, which is emitted
|
---|
384 | by the entropy encoder when needed).
|
---|
385 |
|
---|
386 | * Data destination manager: writes the output JPEG datastream to its final
|
---|
387 | destination (e.g., a file). The destination manager supplied with the
|
---|
388 | library knows how to write to a stdio stream; for other behaviors, the
|
---|
389 | surrounding application may provide its own destination manager.
|
---|
390 |
|
---|
391 | * Memory manager: allocates and releases memory, controls virtual arrays
|
---|
392 | (with backing store management, where required).
|
---|
393 |
|
---|
394 | * Error handler: performs formatting and output of error and trace messages;
|
---|
395 | determines handling of nonfatal errors. The surrounding application may
|
---|
396 | override some or all of this object's methods to change error handling.
|
---|
397 |
|
---|
398 | * Progress monitor: supports output of "percent-done" progress reports.
|
---|
399 | This object represents an optional callback to the surrounding application:
|
---|
400 | if wanted, it must be supplied by the application.
|
---|
401 |
|
---|
402 | The error handler, destination manager, and progress monitor objects are
|
---|
403 | defined as separate objects in order to simplify application-specific
|
---|
404 | customization of the JPEG library. A surrounding application may override
|
---|
405 | individual methods or supply its own all-new implementation of one of these
|
---|
406 | objects. The object interfaces for these objects are therefore treated as
|
---|
407 | part of the application interface of the library, whereas the other objects
|
---|
408 | are internal to the library.
|
---|
409 |
|
---|
410 | The error handler and memory manager are shared by JPEG compression and
|
---|
411 | decompression; the progress monitor, if used, may be shared as well.
|
---|
412 |
|
---|
413 |
|
---|
414 | *** Decompression object structure ***
|
---|
415 |
|
---|
416 | Here is a sketch of the logical structure of the JPEG decompression library:
|
---|
417 |
|
---|
418 | |-- Entropy decoding
|
---|
419 | |-- Coefficient controller --|
|
---|
420 | | |-- Dequantize, Inverse DCT
|
---|
421 | Main controller --|
|
---|
422 | | |-- Upsampling
|
---|
423 | |-- Postprocessing controller --| |-- Colorspace conversion
|
---|
424 | |-- Color quantization
|
---|
425 | |-- Color precision reduction
|
---|
426 |
|
---|
427 | As before, this diagram also represents typical control flow. The objects
|
---|
428 | shown are:
|
---|
429 |
|
---|
430 | * Main controller: buffer controller for the subsampled-data buffer, which
|
---|
431 | holds the output of JPEG decompression proper. This controller's primary
|
---|
432 | task is to feed the postprocessing procedure. Some upsampling algorithms
|
---|
433 | may require context rows above and below the current row group; when this
|
---|
434 | is true, the main controller is responsible for managing its buffer so as
|
---|
435 | to make context rows available. In the current design, the main buffer is
|
---|
436 | always a strip buffer; a full-image buffer is never required.
|
---|
437 |
|
---|
438 | * Coefficient controller: buffer controller for the DCT-coefficient data.
|
---|
439 | This controller handles MCU disassembly, including deletion of any dummy
|
---|
440 | DCT blocks at the right or bottom edge. When reading a multiscan JPEG
|
---|
441 | file, this controller is responsible for buffering the full image.
|
---|
442 | (Buffering DCT coefficients, rather than samples, is necessary to support
|
---|
443 | progressive JPEG.) The equivalent of one fully interleaved MCU row of
|
---|
444 | subsampled data is processed per call, even when the source JPEG file is
|
---|
445 | noninterleaved.
|
---|
446 |
|
---|
447 | * Entropy decoding: Read coded data from the data source module and perform
|
---|
448 | Huffman or arithmetic entropy decoding. Works on one MCU per call.
|
---|
449 | For progressive JPEG decoding, the coefficient controller supplies the prior
|
---|
450 | coefficients of each MCU (initially all zeroes), which the entropy decoder
|
---|
451 | modifies in each scan.
|
---|
452 |
|
---|
453 | * Dequantization and inverse DCT: like it says. Note that the coefficients
|
---|
454 | buffered by the coefficient controller have NOT been dequantized; we
|
---|
455 | merge dequantization and inverse DCT into a single step for speed reasons.
|
---|
456 | When scaled-down output is asked for, simplified DCT algorithms may be used
|
---|
457 | that need fewer coefficients and emit fewer samples per DCT block, not the
|
---|
458 | full 8x8. Works on one DCT block at a time.
|
---|
459 |
|
---|
460 | * Postprocessing controller: buffer controller for the color quantization
|
---|
461 | input buffer, when quantization is in use. (Without quantization, this
|
---|
462 | controller just calls the upsampler.) For two-pass quantization, this
|
---|
463 | controller is responsible for buffering the full-image data.
|
---|
464 |
|
---|
465 | * Upsampling: restores chroma components to full size. (May support more
|
---|
466 | general output rescaling, too. Note that if undersized DCT outputs have
|
---|
467 | been emitted by the DCT module, this module must adjust so that properly
|
---|
468 | sized outputs are created.) Works on one row group at a time. This module
|
---|
469 | also calls the color conversion module, so its top level is effectively a
|
---|
470 | buffer controller for the upsampling->color conversion buffer. However, in
|
---|
471 | all but the highest-quality operating modes, upsampling and color
|
---|
472 | conversion are likely to be merged into a single step.
|
---|
473 |
|
---|
474 | * Colorspace conversion: convert from JPEG color space to output color space,
|
---|
475 | and change data layout from separate component planes to pixel-interleaved.
|
---|
476 | Works on one pixel row at a time.
|
---|
477 |
|
---|
478 | * Color quantization: reduce the data to colormapped form, using either an
|
---|
479 | externally specified colormap or an internally generated one. This module
|
---|
480 | is not used for full-color output. Works on one pixel row at a time; may
|
---|
481 | require two passes to generate a color map. Note that the output will
|
---|
482 | always be a single component representing colormap indexes. In the current
|
---|
483 | design, the output values are JSAMPLEs, so an 8-bit compilation cannot
|
---|
484 | quantize to more than 256 colors. This is unlikely to be a problem in
|
---|
485 | practice.
|
---|
486 |
|
---|
487 | * Color reduction: this module handles color precision reduction, e.g.,
|
---|
488 | generating 15-bit color (5 bits/primary) from JPEG's 24-bit output.
|
---|
489 | Not quite clear yet how this should be handled... should we merge it with
|
---|
490 | colorspace conversion???
|
---|
491 |
|
---|
492 | Note that some high-speed operating modes might condense the entire
|
---|
493 | postprocessing sequence to a single module (upsample, color convert, and
|
---|
494 | quantize in one step).
|
---|
495 |
|
---|
496 | In addition to the above objects, the decompression library includes these
|
---|
497 | objects:
|
---|
498 |
|
---|
499 | * Master control: determines the number of passes required, controls overall
|
---|
500 | and per-pass initialization of the other modules. This is subdivided into
|
---|
501 | input and output control: jdinput.c controls only input-side processing,
|
---|
502 | while jdmaster.c handles overall initialization and output-side control.
|
---|
503 |
|
---|
504 | * Marker reading: decodes JPEG markers (except for RSTn).
|
---|
505 |
|
---|
506 | * Data source manager: supplies the input JPEG datastream. The source
|
---|
507 | manager supplied with the library knows how to read from a stdio stream;
|
---|
508 | for other behaviors, the surrounding application may provide its own source
|
---|
509 | manager.
|
---|
510 |
|
---|
511 | * Memory manager: same as for compression library.
|
---|
512 |
|
---|
513 | * Error handler: same as for compression library.
|
---|
514 |
|
---|
515 | * Progress monitor: same as for compression library.
|
---|
516 |
|
---|
517 | As with compression, the data source manager, error handler, and progress
|
---|
518 | monitor are candidates for replacement by a surrounding application.
|
---|
519 |
|
---|
520 |
|
---|
521 | *** Decompression input and output separation ***
|
---|
522 |
|
---|
523 | To support efficient incremental display of progressive JPEG files, the
|
---|
524 | decompressor is divided into two sections that can run independently:
|
---|
525 |
|
---|
526 | 1. Data input includes marker parsing, entropy decoding, and input into the
|
---|
527 | coefficient controller's DCT coefficient buffer. Note that this
|
---|
528 | processing is relatively cheap and fast.
|
---|
529 |
|
---|
530 | 2. Data output reads from the DCT coefficient buffer and performs the IDCT
|
---|
531 | and all postprocessing steps.
|
---|
532 |
|
---|
533 | For a progressive JPEG file, the data input processing is allowed to get
|
---|
534 | arbitrarily far ahead of the data output processing. (This occurs only
|
---|
535 | if the application calls jpeg_consume_input(); otherwise input and output
|
---|
536 | run in lockstep, since the input section is called only when the output
|
---|
537 | section needs more data.) In this way the application can avoid making
|
---|
538 | extra display passes when data is arriving faster than the display pass
|
---|
539 | can run. Furthermore, it is possible to abort an output pass without
|
---|
540 | losing anything, since the coefficient buffer is read-only as far as the
|
---|
541 | output section is concerned. See libjpeg.txt for more detail.
|
---|
542 |
|
---|
543 | A full-image coefficient array is only created if the JPEG file has multiple
|
---|
544 | scans (or if the application specifies buffered-image mode anyway). When
|
---|
545 | reading a single-scan file, the coefficient controller normally creates only
|
---|
546 | a one-MCU buffer, so input and output processing must run in lockstep in this
|
---|
547 | case. jpeg_consume_input() is effectively a no-op in this situation.
|
---|
548 |
|
---|
549 | The main impact of dividing the decompressor in this fashion is that we must
|
---|
550 | be very careful with shared variables in the cinfo data structure. Each
|
---|
551 | variable that can change during the course of decompression must be
|
---|
552 | classified as belonging to data input or data output, and each section must
|
---|
553 | look only at its own variables. For example, the data output section may not
|
---|
554 | depend on any of the variables that describe the current scan in the JPEG
|
---|
555 | file, because these may change as the data input section advances into a new
|
---|
556 | scan.
|
---|
557 |
|
---|
558 | The progress monitor is (somewhat arbitrarily) defined to treat input of the
|
---|
559 | file as one pass when buffered-image mode is not used, and to ignore data
|
---|
560 | input work completely when buffered-image mode is used. Note that the
|
---|
561 | library has no reliable way to predict the number of passes when dealing
|
---|
562 | with a progressive JPEG file, nor can it predict the number of output passes
|
---|
563 | in buffered-image mode. So the work estimate is inherently bogus anyway.
|
---|
564 |
|
---|
565 | No comparable division is currently made in the compression library, because
|
---|
566 | there isn't any real need for it.
|
---|
567 |
|
---|
568 |
|
---|
569 | *** Data formats ***
|
---|
570 |
|
---|
571 | Arrays of pixel sample values use the following data structure:
|
---|
572 |
|
---|
573 | typedef something JSAMPLE; a pixel component value, 0..MAXJSAMPLE
|
---|
574 | typedef JSAMPLE *JSAMPROW; ptr to a row of samples
|
---|
575 | typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows
|
---|
576 | typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays
|
---|
577 |
|
---|
578 | The basic element type JSAMPLE will typically be one of unsigned char,
|
---|
579 | (signed) char, or short. Short will be used if samples wider than 8 bits are
|
---|
580 | to be supported (this is a compile-time option). Otherwise, unsigned char is
|
---|
581 | used if possible. If the compiler only supports signed chars, then it is
|
---|
582 | necessary to mask off the value when reading. Thus, all reads of JSAMPLE
|
---|
583 | values must be coded as "GETJSAMPLE(value)", where the macro will be defined
|
---|
584 | as "((value) & 0xFF)" on signed-char machines and "((int) (value))" elsewhere.
|
---|
585 |
|
---|
586 | With these conventions, JSAMPLE values can be assumed to be >= 0. This helps
|
---|
587 | simplify correct rounding during downsampling, etc. The JPEG standard's
|
---|
588 | specification that sample values run from -128..127 is accommodated by
|
---|
589 | subtracting 128 from the sample value in the DCT step. Similarly, during
|
---|
590 | decompression the output of the IDCT step will be immediately shifted back to
|
---|
591 | 0..255. (NB: different values are required when 12-bit samples are in use.
|
---|
592 | The code is written in terms of MAXJSAMPLE and CENTERJSAMPLE, which will be
|
---|
593 | defined as 255 and 128 respectively in an 8-bit implementation, and as 4095
|
---|
594 | and 2048 in a 12-bit implementation.)
|
---|
595 |
|
---|
596 | We use a pointer per row, rather than a two-dimensional JSAMPLE array. This
|
---|
597 | choice costs only a small amount of memory and has several benefits:
|
---|
598 | * Code using the data structure doesn't need to know the allocated width of
|
---|
599 | the rows. This simplifies edge expansion/compression, since we can work
|
---|
600 | in an array that's wider than the logical picture width.
|
---|
601 | * Indexing doesn't require multiplication; this is a performance win on many
|
---|
602 | machines.
|
---|
603 | * Arrays with more than 64K total elements can be supported even on machines
|
---|
604 | where malloc() cannot allocate chunks larger than 64K.
|
---|
605 | * The rows forming a component array may be allocated at different times
|
---|
606 | without extra copying. This trick allows some speedups in smoothing steps
|
---|
607 | that need access to the previous and next rows.
|
---|
608 |
|
---|
609 | Note that each color component is stored in a separate array; we don't use the
|
---|
610 | traditional layout in which the components of a pixel are stored together.
|
---|
611 | This simplifies coding of modules that work on each component independently,
|
---|
612 | because they don't need to know how many components there are. Furthermore,
|
---|
613 | we can read or write each component to a temporary file independently, which
|
---|
614 | is helpful when dealing with noninterleaved JPEG files.
|
---|
615 |
|
---|
616 | In general, a specific sample value is accessed by code such as
|
---|
617 | GETJSAMPLE(image[colorcomponent][row][col])
|
---|
618 | where col is measured from the image left edge, but row is measured from the
|
---|
619 | first sample row currently in memory. Either of the first two indexings can
|
---|
620 | be precomputed by copying the relevant pointer.
|
---|
621 |
|
---|
622 |
|
---|
623 | Since most image-processing applications prefer to work on images in which
|
---|
624 | the components of a pixel are stored together, the data passed to or from the
|
---|
625 | surrounding application uses the traditional convention: a single pixel is
|
---|
626 | represented by N consecutive JSAMPLE values, and an image row is an array of
|
---|
627 | (# of color components)*(image width) JSAMPLEs. One or more rows of data can
|
---|
628 | be represented by a pointer of type JSAMPARRAY in this scheme. This scheme is
|
---|
629 | converted to component-wise storage inside the JPEG library. (Applications
|
---|
630 | that want to skip JPEG preprocessing or postprocessing will have to contend
|
---|
631 | with component-wise storage.)
|
---|
632 |
|
---|
633 |
|
---|
634 | Arrays of DCT-coefficient values use the following data structure:
|
---|
635 |
|
---|
636 | typedef short JCOEF; a 16-bit signed integer
|
---|
637 | typedef JCOEF JBLOCK[DCTSIZE2]; an 8x8 block of coefficients
|
---|
638 | typedef JBLOCK *JBLOCKROW; ptr to one horizontal row of 8x8 blocks
|
---|
639 | typedef JBLOCKROW *JBLOCKARRAY; ptr to a list of such rows
|
---|
640 | typedef JBLOCKARRAY *JBLOCKIMAGE; ptr to a list of color component arrays
|
---|
641 |
|
---|
642 | The underlying type is at least a 16-bit signed integer; while "short" is big
|
---|
643 | enough on all machines of interest, on some machines it is preferable to use
|
---|
644 | "int" for speed reasons, despite the storage cost. Coefficients are grouped
|
---|
645 | into 8x8 blocks (but we always use #defines DCTSIZE and DCTSIZE2 rather than
|
---|
646 | "8" and "64").
|
---|
647 |
|
---|
648 | The contents of a coefficient block may be in either "natural" or zigzagged
|
---|
649 | order, and may be true values or divided by the quantization coefficients,
|
---|
650 | depending on where the block is in the processing pipeline. In the current
|
---|
651 | library, coefficient blocks are kept in natural order everywhere; the entropy
|
---|
652 | codecs zigzag or dezigzag the data as it is written or read. The blocks
|
---|
653 | contain quantized coefficients everywhere outside the DCT/IDCT subsystems.
|
---|
654 | (This latter decision may need to be revisited to support variable
|
---|
655 | quantization a la JPEG Part 3.)
|
---|
656 |
|
---|
657 | Notice that the allocation unit is now a row of 8x8 blocks, corresponding to
|
---|
658 | eight rows of samples. Otherwise the structure is much the same as for
|
---|
659 | samples, and for the same reasons.
|
---|
660 |
|
---|
661 | On machines where malloc() can't handle a request bigger than 64Kb, this data
|
---|
662 | structure limits us to rows of less than 512 JBLOCKs, or a picture width of
|
---|
663 | 4000+ pixels. This seems an acceptable restriction.
|
---|
664 |
|
---|
665 |
|
---|
666 | On 80x86 machines, the bottom-level pointer types (JSAMPROW and JBLOCKROW)
|
---|
667 | must be declared as "far" pointers, but the upper levels can be "near"
|
---|
668 | (implying that the pointer lists are allocated in the DS segment).
|
---|
669 | We use a #define symbol FAR, which expands to the "far" keyword when
|
---|
670 | compiling on 80x86 machines and to nothing elsewhere.
|
---|
671 |
|
---|
672 |
|
---|
673 | *** Suspendable processing ***
|
---|
674 |
|
---|
675 | In some applications it is desirable to use the JPEG library as an
|
---|
676 | incremental, memory-to-memory filter. In this situation the data source or
|
---|
677 | destination may be a limited-size buffer, and we can't rely on being able to
|
---|
678 | empty or refill the buffer at arbitrary times. Instead the application would
|
---|
679 | like to have control return from the library at buffer overflow/underrun, and
|
---|
680 | then resume compression or decompression at a later time.
|
---|
681 |
|
---|
682 | This scenario is supported for simple cases. (For anything more complex, we
|
---|
683 | recommend that the application "bite the bullet" and develop real multitasking
|
---|
684 | capability.) The libjpeg.txt file goes into more detail about the usage and
|
---|
685 | limitations of this capability; here we address the implications for library
|
---|
686 | structure.
|
---|
687 |
|
---|
688 | The essence of the problem is that the entropy codec (coder or decoder) must
|
---|
689 | be prepared to stop at arbitrary times. In turn, the controllers that call
|
---|
690 | the entropy codec must be able to stop before having produced or consumed all
|
---|
691 | the data that they normally would handle in one call. That part is reasonably
|
---|
692 | straightforward: we make the controller call interfaces include "progress
|
---|
693 | counters" which indicate the number of data chunks successfully processed, and
|
---|
694 | we require callers to test the counter rather than just assume all of the data
|
---|
695 | was processed.
|
---|
696 |
|
---|
697 | Rather than trying to restart at an arbitrary point, the current Huffman
|
---|
698 | codecs are designed to restart at the beginning of the current MCU after a
|
---|
699 | suspension due to buffer overflow/underrun. At the start of each call, the
|
---|
700 | codec's internal state is loaded from permanent storage (in the JPEG object
|
---|
701 | structures) into local variables. On successful completion of the MCU, the
|
---|
702 | permanent state is updated. (This copying is not very expensive, and may even
|
---|
703 | lead to *improved* performance if the local variables can be registerized.)
|
---|
704 | If a suspension occurs, the codec simply returns without updating the state,
|
---|
705 | thus effectively reverting to the start of the MCU. Note that this implies
|
---|
706 | leaving some data unprocessed in the source/destination buffer (ie, the
|
---|
707 | compressed partial MCU). The data source/destination module interfaces are
|
---|
708 | specified so as to make this possible. This also implies that the data buffer
|
---|
709 | must be large enough to hold a worst-case compressed MCU; a couple thousand
|
---|
710 | bytes should be enough.
|
---|
711 |
|
---|
712 | In a successive-approximation AC refinement scan, the progressive Huffman
|
---|
713 | decoder has to be able to undo assignments of newly nonzero coefficients if it
|
---|
714 | suspends before the MCU is complete, since decoding requires distinguishing
|
---|
715 | previously-zero and previously-nonzero coefficients. This is a bit tedious
|
---|
716 | but probably won't have much effect on performance. Other variants of Huffman
|
---|
717 | decoding need not worry about this, since they will just store the same values
|
---|
718 | again if forced to repeat the MCU.
|
---|
719 |
|
---|
720 | This approach would probably not work for an arithmetic codec, since its
|
---|
721 | modifiable state is quite large and couldn't be copied cheaply. Instead it
|
---|
722 | would have to suspend and resume exactly at the point of the buffer end.
|
---|
723 |
|
---|
724 | The JPEG marker reader is designed to cope with suspension at an arbitrary
|
---|
725 | point. It does so by backing up to the start of the marker parameter segment,
|
---|
726 | so the data buffer must be big enough to hold the largest marker of interest.
|
---|
727 | Again, a couple KB should be adequate. (A special "skip" convention is used
|
---|
728 | to bypass COM and APPn markers, so these can be larger than the buffer size
|
---|
729 | without causing problems; otherwise a 64K buffer would be needed in the worst
|
---|
730 | case.)
|
---|
731 |
|
---|
732 | The JPEG marker writer currently does *not* cope with suspension.
|
---|
733 | We feel that this is not necessary; it is much easier simply to require
|
---|
734 | the application to ensure there is enough buffer space before starting. (An
|
---|
735 | empty 2K buffer is more than sufficient for the header markers; and ensuring
|
---|
736 | there are a dozen or two bytes available before calling jpeg_finish_compress()
|
---|
737 | will suffice for the trailer.) This would not work for writing multi-scan
|
---|
738 | JPEG files, but we simply do not intend to support that capability with
|
---|
739 | suspension.
|
---|
740 |
|
---|
741 |
|
---|
742 | *** Memory manager services ***
|
---|
743 |
|
---|
744 | The JPEG library's memory manager controls allocation and deallocation of
|
---|
745 | memory, and it manages large "virtual" data arrays on machines where the
|
---|
746 | operating system does not provide virtual memory. Note that the same
|
---|
747 | memory manager serves both compression and decompression operations.
|
---|
748 |
|
---|
749 | In all cases, allocated objects are tied to a particular compression or
|
---|
750 | decompression master record, and they will be released when that master
|
---|
751 | record is destroyed.
|
---|
752 |
|
---|
753 | The memory manager does not provide explicit deallocation of objects.
|
---|
754 | Instead, objects are created in "pools" of free storage, and a whole pool
|
---|
755 | can be freed at once. This approach helps prevent storage-leak bugs, and
|
---|
756 | it speeds up operations whenever malloc/free are slow (as they often are).
|
---|
757 | The pools can be regarded as lifetime identifiers for objects. Two
|
---|
758 | pools/lifetimes are defined:
|
---|
759 | * JPOOL_PERMANENT lasts until master record is destroyed
|
---|
760 | * JPOOL_IMAGE lasts until done with image (JPEG datastream)
|
---|
761 | Permanent lifetime is used for parameters and tables that should be carried
|
---|
762 | across from one datastream to another; this includes all application-visible
|
---|
763 | parameters. Image lifetime is used for everything else. (A third lifetime,
|
---|
764 | JPOOL_PASS = one processing pass, was originally planned. However it was
|
---|
765 | dropped as not being worthwhile. The actual usage patterns are such that the
|
---|
766 | peak memory usage would be about the same anyway; and having per-pass storage
|
---|
767 | substantially complicates the virtual memory allocation rules --- see below.)
|
---|
768 |
|
---|
769 | The memory manager deals with three kinds of object:
|
---|
770 | 1. "Small" objects. Typically these require no more than 10K-20K total.
|
---|
771 | 2. "Large" objects. These may require tens to hundreds of K depending on
|
---|
772 | image size. Semantically they behave the same as small objects, but we
|
---|
773 | distinguish them for two reasons:
|
---|
774 | * On MS-DOS machines, large objects are referenced by FAR pointers,
|
---|
775 | small objects by NEAR pointers.
|
---|
776 | * Pool allocation heuristics may differ for large and small objects.
|
---|
777 | Note that individual "large" objects cannot exceed the size allowed by
|
---|
778 | type size_t, which may be 64K or less on some machines.
|
---|
779 | 3. "Virtual" objects. These are large 2-D arrays of JSAMPLEs or JBLOCKs
|
---|
780 | (typically large enough for the entire image being processed). The
|
---|
781 | memory manager provides stripwise access to these arrays. On machines
|
---|
782 | without virtual memory, the rest of the array may be swapped out to a
|
---|
783 | temporary file.
|
---|
784 |
|
---|
785 | (Note: JSAMPARRAY and JBLOCKARRAY data structures are a combination of large
|
---|
786 | objects for the data proper and small objects for the row pointers. For
|
---|
787 | convenience and speed, the memory manager provides single routines to create
|
---|
788 | these structures. Similarly, virtual arrays include a small control block
|
---|
789 | and a JSAMPARRAY or JBLOCKARRAY working buffer, all created with one call.)
|
---|
790 |
|
---|
791 | In the present implementation, virtual arrays are only permitted to have image
|
---|
792 | lifespan. (Permanent lifespan would not be reasonable, and pass lifespan is
|
---|
793 | not very useful since a virtual array's raison d'etre is to store data for
|
---|
794 | multiple passes through the image.) We also expect that only "small" objects
|
---|
795 | will be given permanent lifespan, though this restriction is not required by
|
---|
796 | the memory manager.
|
---|
797 |
|
---|
798 | In a non-virtual-memory machine, some performance benefit can be gained by
|
---|
799 | making the in-memory buffers for virtual arrays be as large as possible.
|
---|
800 | (For small images, the buffers might fit entirely in memory, so blind
|
---|
801 | swapping would be very wasteful.) The memory manager will adjust the height
|
---|
802 | of the buffers to fit within a prespecified maximum memory usage. In order
|
---|
803 | to do this in a reasonably optimal fashion, the manager needs to allocate all
|
---|
804 | of the virtual arrays at once. Therefore, there isn't a one-step allocation
|
---|
805 | routine for virtual arrays; instead, there is a "request" routine that simply
|
---|
806 | allocates the control block, and a "realize" routine (called just once) that
|
---|
807 | determines space allocation and creates all of the actual buffers. The
|
---|
808 | realize routine must allow for space occupied by non-virtual large objects.
|
---|
809 | (We don't bother to factor in the space needed for small objects, on the
|
---|
810 | grounds that it isn't worth the trouble.)
|
---|
811 |
|
---|
812 | To support all this, we establish the following protocol for doing business
|
---|
813 | with the memory manager:
|
---|
814 | 1. Modules must request virtual arrays (which may have only image lifespan)
|
---|
815 | during the initial setup phase, i.e., in their jinit_xxx routines.
|
---|
816 | 2. All "large" objects (including JSAMPARRAYs and JBLOCKARRAYs) must also be
|
---|
817 | allocated during initial setup.
|
---|
818 | 3. realize_virt_arrays will be called at the completion of initial setup.
|
---|
819 | The above conventions ensure that sufficient information is available
|
---|
820 | for it to choose a good size for virtual array buffers.
|
---|
821 | Small objects of any lifespan may be allocated at any time. We expect that
|
---|
822 | the total space used for small objects will be small enough to be negligible
|
---|
823 | in the realize_virt_arrays computation.
|
---|
824 |
|
---|
825 | In a virtual-memory machine, we simply pretend that the available space is
|
---|
826 | infinite, thus causing realize_virt_arrays to decide that it can allocate all
|
---|
827 | the virtual arrays as full-size in-memory buffers. The overhead of the
|
---|
828 | virtual-array access protocol is very small when no swapping occurs.
|
---|
829 |
|
---|
830 | A virtual array can be specified to be "pre-zeroed"; when this flag is set,
|
---|
831 | never-yet-written sections of the array are set to zero before being made
|
---|
832 | available to the caller. If this flag is not set, never-written sections
|
---|
833 | of the array contain garbage. (This feature exists primarily because the
|
---|
834 | equivalent logic would otherwise be needed in jdcoefct.c for progressive
|
---|
835 | JPEG mode; we may as well make it available for possible other uses.)
|
---|
836 |
|
---|
837 | The first write pass on a virtual array is required to occur in top-to-bottom
|
---|
838 | order; read passes, as well as any write passes after the first one, may
|
---|
839 | access the array in any order. This restriction exists partly to simplify
|
---|
840 | the virtual array control logic, and partly because some file systems may not
|
---|
841 | support seeking beyond the current end-of-file in a temporary file. The main
|
---|
842 | implication of this restriction is that rearrangement of rows (such as
|
---|
843 | converting top-to-bottom data order to bottom-to-top) must be handled while
|
---|
844 | reading data out of the virtual array, not while putting it in.
|
---|
845 |
|
---|
846 |
|
---|
847 | *** Memory manager internal structure ***
|
---|
848 |
|
---|
849 | To isolate system dependencies as much as possible, we have broken the
|
---|
850 | memory manager into two parts. There is a reasonably system-independent
|
---|
851 | "front end" (jmemmgr.c) and a "back end" that contains only the code
|
---|
852 | likely to change across systems. All of the memory management methods
|
---|
853 | outlined above are implemented by the front end. The back end provides
|
---|
854 | the following routines for use by the front end (none of these routines
|
---|
855 | are known to the rest of the JPEG code):
|
---|
856 |
|
---|
857 | jpeg_mem_init, jpeg_mem_term system-dependent initialization/shutdown
|
---|
858 |
|
---|
859 | jpeg_get_small, jpeg_free_small interface to malloc and free library routines
|
---|
860 | (or their equivalents)
|
---|
861 |
|
---|
862 | jpeg_get_large, jpeg_free_large interface to FAR malloc/free in MSDOS machines;
|
---|
863 | else usually the same as
|
---|
864 | jpeg_get_small/jpeg_free_small
|
---|
865 |
|
---|
866 | jpeg_mem_available estimate available memory
|
---|
867 |
|
---|
868 | jpeg_open_backing_store create a backing-store object
|
---|
869 |
|
---|
870 | read_backing_store, manipulate a backing-store object
|
---|
871 | write_backing_store,
|
---|
872 | close_backing_store
|
---|
873 |
|
---|
874 | On some systems there will be more than one type of backing-store object
|
---|
875 | (specifically, in MS-DOS a backing store file might be an area of extended
|
---|
876 | memory as well as a disk file). jpeg_open_backing_store is responsible for
|
---|
877 | choosing how to implement a given object. The read/write/close routines
|
---|
878 | are method pointers in the structure that describes a given object; this
|
---|
879 | lets them be different for different object types.
|
---|
880 |
|
---|
881 | It may be necessary to ensure that backing store objects are explicitly
|
---|
882 | released upon abnormal program termination. For example, MS-DOS won't free
|
---|
883 | extended memory by itself. To support this, we will expect the main program
|
---|
884 | or surrounding application to arrange to call self_destruct (typically via
|
---|
885 | jpeg_destroy) upon abnormal termination. This may require a SIGINT signal
|
---|
886 | handler or equivalent. We don't want to have the back end module install its
|
---|
887 | own signal handler, because that would pre-empt the surrounding application's
|
---|
888 | ability to control signal handling.
|
---|
889 |
|
---|
890 | The IJG distribution includes several memory manager back end implementations.
|
---|
891 | Usually the same back end should be suitable for all applications on a given
|
---|
892 | system, but it is possible for an application to supply its own back end at
|
---|
893 | need.
|
---|
894 |
|
---|
895 |
|
---|
896 | *** Implications of DNL marker ***
|
---|
897 |
|
---|
898 | Some JPEG files may use a DNL marker to postpone definition of the image
|
---|
899 | height (this would be useful for a fax-like scanner's output, for instance).
|
---|
900 | In these files the SOF marker claims the image height is 0, and you only
|
---|
901 | find out the true image height at the end of the first scan.
|
---|
902 |
|
---|
903 | We could read these files as follows:
|
---|
904 | 1. Upon seeing zero image height, replace it by 65535 (the maximum allowed).
|
---|
905 | 2. When the DNL is found, update the image height in the global image
|
---|
906 | descriptor.
|
---|
907 | This implies that control modules must avoid making copies of the image
|
---|
908 | height, and must re-test for termination after each MCU row. This would
|
---|
909 | be easy enough to do.
|
---|
910 |
|
---|
911 | In cases where image-size data structures are allocated, this approach will
|
---|
912 | result in very inefficient use of virtual memory or much-larger-than-necessary
|
---|
913 | temporary files. This seems acceptable for something that probably won't be a
|
---|
914 | mainstream usage. People might have to forgo use of memory-hogging options
|
---|
915 | (such as two-pass color quantization or noninterleaved JPEG files) if they
|
---|
916 | want efficient conversion of such files. (One could improve efficiency by
|
---|
917 | demanding a user-supplied upper bound for the height, less than 65536; in most
|
---|
918 | cases it could be much less.)
|
---|
919 |
|
---|
920 | The standard also permits the SOF marker to overestimate the image height,
|
---|
921 | with a DNL to give the true, smaller height at the end of the first scan.
|
---|
922 | This would solve the space problems if the overestimate wasn't too great.
|
---|
923 | However, it implies that you don't even know whether DNL will be used.
|
---|
924 |
|
---|
925 | This leads to a couple of very serious objections:
|
---|
926 | 1. Testing for a DNL marker must occur in the inner loop of the decompressor's
|
---|
927 | Huffman decoder; this implies a speed penalty whether the feature is used
|
---|
928 | or not.
|
---|
929 | 2. There is no way to hide the last-minute change in image height from an
|
---|
930 | application using the decoder. Thus *every* application using the IJG
|
---|
931 | library would suffer a complexity penalty whether it cared about DNL or
|
---|
932 | not.
|
---|
933 | We currently do not support DNL because of these problems.
|
---|
934 |
|
---|
935 | A different approach is to insist that DNL-using files be preprocessed by a
|
---|
936 | separate program that reads ahead to the DNL, then goes back and fixes the SOF
|
---|
937 | marker. This is a much simpler solution and is probably far more efficient.
|
---|
938 | Even if one wants piped input, buffering the first scan of the JPEG file needs
|
---|
939 | a lot smaller temp file than is implied by the maximum-height method. For
|
---|
940 | this approach we'd simply treat DNL as a no-op in the decompressor (at most,
|
---|
941 | check that it matches the SOF image height).
|
---|
942 |
|
---|
943 | We will not worry about making the compressor capable of outputting DNL.
|
---|
944 | Something similar to the first scheme above could be applied if anyone ever
|
---|
945 | wants to make that work.
|
---|