| 1 | <HTML> | 
|---|
| 2 | <HEAD> | 
|---|
| 3 | <TITLE> Conservative GC Algorithmic Overview </TITLE> | 
|---|
| 4 | <AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author> | 
|---|
| 5 | </HEAD> | 
|---|
| 6 | <BODY> | 
|---|
| 7 | <H1> <I>This is under construction, and may always be.</i> </h1> | 
|---|
| 8 | <H1> Conservative GC Algorithmic Overview </h1> | 
|---|
| 9 | <P> | 
|---|
| 10 | This is a description of the algorithms and data structures used in our | 
|---|
| 11 | conservative garbage collector.  I expect the level of detail to increase | 
|---|
| 12 | with time.  For a survey of GC algorithms, see for example | 
|---|
| 13 | <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's | 
|---|
| 14 | excellent paper</a>.  For an overview of the collector interface, | 
|---|
| 15 | see <A HREF="gcinterface.html">here</a>. | 
|---|
| 16 | <P> | 
|---|
| 17 | This description is targeted primarily at someone trying to understand the | 
|---|
| 18 | source code.  It specifically refers to variable and function names. | 
|---|
| 19 | It may also be useful for understanding the algorithms at a higher level. | 
|---|
| 20 | <P> | 
|---|
| 21 | The description here assumes that the collector is used in default mode. | 
|---|
| 22 | In particular, we assume that it used as a garbage collector, and not just | 
|---|
| 23 | a leak detector.  We initially assume that it is used in stop-the-world, | 
|---|
| 24 | non-incremental mode, though the presence of the incremental collector | 
|---|
| 25 | will be apparent in the design. | 
|---|
| 26 | We assume the default finalization model, but the code affected by that | 
|---|
| 27 | is very localized. | 
|---|
| 28 | <H2> Introduction </h2> | 
|---|
| 29 | The garbage collector uses a modified mark-sweep algorithm.  Conceptually | 
|---|
| 30 | it operates roughly in four phases, which are performed occasionally | 
|---|
| 31 | as part of a memory allocation: | 
|---|
| 32 |  | 
|---|
| 33 | <OL> | 
|---|
| 34 |  | 
|---|
| 35 | <LI> | 
|---|
| 36 | <I>Preparation</i> Each object has an associated mark bit. | 
|---|
| 37 | Clear all mark bits, indicating that all objects | 
|---|
| 38 | are potentially unreachable. | 
|---|
| 39 |  | 
|---|
| 40 | <LI> | 
|---|
| 41 | <I>Mark phase</i> Marks all objects that can be reachable via chains of | 
|---|
| 42 | pointers from variables.  Often the collector has no real information | 
|---|
| 43 | about the location of pointer variables in the heap, so it | 
|---|
| 44 | views all static data areas, stacks and registers as potentially containing | 
|---|
| 45 | pointers.  Any bit patterns that represent addresses inside | 
|---|
| 46 | heap objects managed by the collector are viewed as pointers. | 
|---|
| 47 | Unless the client program has made heap object layout information | 
|---|
| 48 | available to the collector, any heap objects found to be reachable from | 
|---|
| 49 | variables are again scanned similarly. | 
|---|
| 50 |  | 
|---|
| 51 | <LI> | 
|---|
| 52 | <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked, | 
|---|
| 53 | objects, and returns them to an appropriate free list for reuse.  This is | 
|---|
| 54 | not really a separate phase; even in non incremental mode this is operation | 
|---|
| 55 | is usually performed on demand during an allocation that discovers an empty | 
|---|
| 56 | free list.  Thus the sweep phase is very unlikely to touch a page that | 
|---|
| 57 | would not have been touched shortly thereafter anyway. | 
|---|
| 58 |  | 
|---|
| 59 | <LI> | 
|---|
| 60 | <I>Finalization phase</i> Unreachable objects which had been registered | 
|---|
| 61 | for finalization are enqueued for finalization outside the collector. | 
|---|
| 62 |  | 
|---|
| 63 | </ol> | 
|---|
| 64 |  | 
|---|
| 65 | <P> | 
|---|
| 66 | The remaining sections describe the memory allocation data structures, | 
|---|
| 67 | and then the last 3 collection phases in more detail. We conclude by | 
|---|
| 68 | outlining some of the additional features implemented in the collector. | 
|---|
| 69 |  | 
|---|
| 70 | <H2>Allocation</h2> | 
|---|
| 71 | The collector includes its own memory allocator.  The allocator obtains | 
|---|
| 72 | memory from the system in a platform-dependent way.  Under UNIX, it | 
|---|
| 73 | uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>. | 
|---|
| 74 | <P> | 
|---|
| 75 | Most static data used by the allocator, as well as that needed by the | 
|---|
| 76 | rest of the garbage collector is stored inside the | 
|---|
| 77 | <TT>_GC_arrays</tt> structure. | 
|---|
| 78 | This allows the garbage collector to easily ignore the collectors own | 
|---|
| 79 | data structures when it searches for root pointers.  Other allocator | 
|---|
| 80 | and collector internal data structures are allocated dynamically | 
|---|
| 81 | with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not | 
|---|
| 82 | allow for deallocation, and is therefore used only for permanent data | 
|---|
| 83 | structures. | 
|---|
| 84 | <P> | 
|---|
| 85 | The allocator allocates objects of different <I>kinds</i>. | 
|---|
| 86 | Different kinds are handled somewhat differently by certain parts | 
|---|
| 87 | of the garbage collector.  Certain kinds are scanned for pointers, | 
|---|
| 88 | others are not.  Some may have per-object type descriptors that | 
|---|
| 89 | determine pointer locations.  Or a specific kind may correspond | 
|---|
| 90 | to one specific object layout.  Two built-in kinds are uncollectable. | 
|---|
| 91 | One (<TT>STUBBORN</tt>) is immutable without special precautions. | 
|---|
| 92 | In spite of that, it is very likely that most C clients of the | 
|---|
| 93 | collector currently | 
|---|
| 94 | use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects. | 
|---|
| 95 | The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes | 
|---|
| 96 | heavy use of a kind (allocated with GC_gcj_malloc) that stores | 
|---|
| 97 | type information at a known offset in method tables. | 
|---|
| 98 | <P> | 
|---|
| 99 | The collector uses a two level allocator.  A large block is defined to | 
|---|
| 100 | be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2, | 
|---|
| 101 | typically on the order of the page size. | 
|---|
| 102 | <P> | 
|---|
| 103 | Large block sizes are rounded up to | 
|---|
| 104 | the next multiple of <TT>HBLKSIZE</tt> and then allocated by | 
|---|
| 105 | <TT>GC_allochblk</tt>.  Recent versions of the collector | 
|---|
| 106 | use an approximate best fit algorithm by keeping free lists for | 
|---|
| 107 | several large block sizes. | 
|---|
| 108 | The actual | 
|---|
| 109 | implementation of <TT>GC_allochblk</tt> | 
|---|
| 110 | is significantly complicated by black-listing issues | 
|---|
| 111 | (see below). | 
|---|
| 112 | <P> | 
|---|
| 113 | Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>. | 
|---|
| 114 | Each chunk is | 
|---|
| 115 | dedicated to only one object size and kind.  The allocator maintains | 
|---|
| 116 | separate free lists for each size and kind of object. | 
|---|
| 117 | <P> | 
|---|
| 118 | Once a large block is split for use in smaller objects, it can only | 
|---|
| 119 | be used for objects of that size, unless the collector discovers a completely | 
|---|
| 120 | empty chunk.  Completely empty chunks are restored to the appropriate | 
|---|
| 121 | large block free list. | 
|---|
| 122 | <P> | 
|---|
| 123 | In order to avoid allocating blocks for too many distinct object sizes, | 
|---|
| 124 | the collector normally does not directly allocate objects of every possible | 
|---|
| 125 | request size.  Instead request are rounded up to one of a smaller number | 
|---|
| 126 | of allocated sizes, for which free lists are maintained.  The exact | 
|---|
| 127 | allocated sizes are computed on demand, but subject to the constraint | 
|---|
| 128 | that they increase roughly in geometric progression.  Thus objects | 
|---|
| 129 | requested early in the execution are likely to be allocated with exactly | 
|---|
| 130 | the requested size, subject to alignment constraints. | 
|---|
| 131 | See <TT>GC_init_size_map</tt> for details. | 
|---|
| 132 | <P> | 
|---|
| 133 | The actual size rounding operation during small object allocation is | 
|---|
| 134 | implemented as a table lookup in <TT>GC_size_map</tt>. | 
|---|
| 135 | <P> | 
|---|
| 136 | Both collector initialization and computation of allocated sizes are | 
|---|
| 137 | handled carefully so that they do not slow down the small object fast | 
|---|
| 138 | allocation path.  An attempt to allocate before the collector is initialized, | 
|---|
| 139 | or before the appropriate <TT>GC_size_map</tt> entry is computed, | 
|---|
| 140 | will take the same path as an allocation attempt with an empty free list. | 
|---|
| 141 | This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>) | 
|---|
| 142 | which performs the appropriate initialization checks. | 
|---|
| 143 | <P> | 
|---|
| 144 | In non-incremental mode, we make a decision about whether to garbage collect | 
|---|
| 145 | whenever an allocation would otherwise have failed with the current heap size. | 
|---|
| 146 | If the total amount of allocation since the last collection is less than | 
|---|
| 147 | the heap size divided by <TT>GC_free_space_divisor</tt>, we try to | 
|---|
| 148 | expand the heap.  Otherwise, we initiate a garbage collection.  This ensures | 
|---|
| 149 | that the amount of garbage collection work per allocated byte remains | 
|---|
| 150 | constant. | 
|---|
| 151 | <P> | 
|---|
| 152 | The above is in fact an oversimplification of the real heap expansion | 
|---|
| 153 | and GC triggering heuristic, which adjusts slightly for root size | 
|---|
| 154 | and certain kinds of | 
|---|
| 155 | fragmentation.  In particular: | 
|---|
| 156 | <UL> | 
|---|
| 157 | <LI> Programs with a large root set size and | 
|---|
| 158 | little live heap memory will expand the heap to amortize the cost of | 
|---|
| 159 | scanning the roots. | 
|---|
| 160 | <LI> Versions 5.x of the collector actually collect more frequently in | 
|---|
| 161 | nonincremental mode.  The large block allocator usually refuses to split | 
|---|
| 162 | large heap blocks once the garbage collection threshold is | 
|---|
| 163 | reached.  This often has the effect of collecting well before the | 
|---|
| 164 | heap fills up, thus reducing fragmentation and working set size at the | 
|---|
| 165 | expense of GC time.  Versions 6.x choose an intermediate strategy depending | 
|---|
| 166 | on how much large object allocation has taken place in the past. | 
|---|
| 167 | (If the collector is configured to unmap unused pages, versions 6.x | 
|---|
| 168 | use the 5.x strategy.) | 
|---|
| 169 | <LI> In calculating the amount of allocation since the last collection we | 
|---|
| 170 | give partial credit for objects we expect to be explicitly deallocated. | 
|---|
| 171 | Even if all objects are explicitly managed, it is often desirable to collect | 
|---|
| 172 | on rare occasion, since that is our only mechanism for coalescing completely | 
|---|
| 173 | empty chunks. | 
|---|
| 174 | </ul> | 
|---|
| 175 | <P> | 
|---|
| 176 | It has been suggested that this should be adjusted so that we favor | 
|---|
| 177 | expansion if the resulting heap still fits into physical memory. | 
|---|
| 178 | In many cases, that would no doubt help.  But it is tricky to do this | 
|---|
| 179 | in a way that remains robust if multiple application are contending | 
|---|
| 180 | for a single pool of physical memory. | 
|---|
| 181 |  | 
|---|
| 182 | <H2>Mark phase</h2> | 
|---|
| 183 |  | 
|---|
| 184 | At each collection, the collector marks all objects that are | 
|---|
| 185 | possibly reachable from pointer variables.  Since it cannot generally | 
|---|
| 186 | tell where pointer variables are located, it scans the following | 
|---|
| 187 | <I>root segments</i> for pointers: | 
|---|
| 188 | <UL> | 
|---|
| 189 | <LI>The registers.  Depending on the architecture, this may be done using | 
|---|
| 190 | assembly code, or by calling a <TT>setjmp</tt>-like function which saves | 
|---|
| 191 | register contents on the stack. | 
|---|
| 192 | <LI>The stack(s).  In the case of a single-threaded application, | 
|---|
| 193 | on most platforms this | 
|---|
| 194 | is done by scanning the memory between (an approximation of) the current | 
|---|
| 195 | stack pointer and <TT>GC_stackbottom</tt>.  (For Itanium, the register stack | 
|---|
| 196 | scanned separately.)  The <TT>GC_stackbottom</tt> variable is set in | 
|---|
| 197 | a highly platform-specific way depending on the appropriate configuration | 
|---|
| 198 | information in <TT>gcconfig.h</tt>.  Note that the currently active | 
|---|
| 199 | stack needs to be scanned carefully, since callee-save registers of | 
|---|
| 200 | client code may appear inside collector stack frames, which may | 
|---|
| 201 | change during the mark process.  This is addressed by scanning | 
|---|
| 202 | some sections of the stack "eagerly", effectively capturing a snapshot | 
|---|
| 203 | at one point in time. | 
|---|
| 204 | <LI>Static data region(s).  In the simplest case, this is the region | 
|---|
| 205 | between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in | 
|---|
| 206 | <TT>gcconfig.h</tt>.  However, in most cases, this will also involve | 
|---|
| 207 | static data regions associated with dynamic libraries.  These are | 
|---|
| 208 | identified by the mostly platform-specific code in <TT>dyn_load.c</tt>. | 
|---|
| 209 | </ul> | 
|---|
| 210 | The marker maintains an explicit stack of memory regions that are known | 
|---|
| 211 | to be accessible, but that have not yet been searched for contained pointers. | 
|---|
| 212 | Each stack entry contains the starting address of the block to be scanned, | 
|---|
| 213 | as well as a descriptor of the block.  If no layout information is | 
|---|
| 214 | available for the block, then the descriptor is simply a length. | 
|---|
| 215 | (For other possibilities, see <TT>gc_mark.h</tt>.) | 
|---|
| 216 | <P> | 
|---|
| 217 | At the beginning of the mark phase, all root segments | 
|---|
| 218 | (as described above) are pushed on the | 
|---|
| 219 | stack by <TT>GC_push_roots</tt>.  (Registers and eagerly processed | 
|---|
| 220 | stack sections are processed by pushing the referenced objects instead | 
|---|
| 221 | of the stack section itself.)  If <TT>ALL_INTERIOR_PTRS</tt> is not | 
|---|
| 222 | defined, then stack roots require special treatment.  In this case, the | 
|---|
| 223 | normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt> | 
|---|
| 224 | explicitly checks for interior pointers and pushes descriptors for target | 
|---|
| 225 | objects. | 
|---|
| 226 | <P> | 
|---|
| 227 | The marker is structured to allow incremental marking. | 
|---|
| 228 | Each call to <TT>GC_mark_some</tt> performs a small amount of | 
|---|
| 229 | work towards marking the heap. | 
|---|
| 230 | It maintains | 
|---|
| 231 | explicit state in the form of <TT>GC_mark_state</tt>, which | 
|---|
| 232 | identifies a particular sub-phase.  Some other pieces of state, most | 
|---|
| 233 | notably the mark stack, identify how much work remains to be done | 
|---|
| 234 | in each sub-phase.  The normal progression of mark states for | 
|---|
| 235 | a stop-the-world collection is: | 
|---|
| 236 | <OL> | 
|---|
| 237 | <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked | 
|---|
| 238 | objects.  In this case <TT>GC_objects_are_marked</tt> will simultaneously | 
|---|
| 239 | be false, so the mark state is advanced to | 
|---|
| 240 | <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push | 
|---|
| 241 | uncollectable objects, roots, and then mark everything reachable from them. | 
|---|
| 242 | <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable | 
|---|
| 243 | objects are pushed, and objects reachable from them are marked. | 
|---|
| 244 | At that point, the next call to <TT>GC_mark_some</tt> calls | 
|---|
| 245 | <TT>GC_push_roots</tt> to push the roots.  It the advances the | 
|---|
| 246 | mark state to | 
|---|
| 247 | <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is | 
|---|
| 248 | empty, all reachable objects are marked.  Once in this state, we work | 
|---|
| 249 | only on emptying the mark stack.  Once this is completed, the state | 
|---|
| 250 | changes to | 
|---|
| 251 | <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked. | 
|---|
| 252 | </ol> | 
|---|
| 253 |  | 
|---|
| 254 | The core mark routine <TT>GC_mark_from</tt>, is called | 
|---|
| 255 | repeatedly by several of the sub-phases when the mark stack starts to fill | 
|---|
| 256 | up.  It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state | 
|---|
| 257 | to empty the mark stack. | 
|---|
| 258 | The routine is designed to only perform a limited amount of marking at | 
|---|
| 259 | each call, so that it can also be used by the incremental collector. | 
|---|
| 260 | It is fairly carefully tuned, since it usually consumes a large majority | 
|---|
| 261 | of the garbage collection time. | 
|---|
| 262 | <P> | 
|---|
| 263 | The fact that it perform a only a small amount of work per call also | 
|---|
| 264 | allows it to be used as the core routine of the parallel marker.  In that | 
|---|
| 265 | case it is normally invoked on thread-private mark stacks instead of the | 
|---|
| 266 | global mark stack.  More details can be found in | 
|---|
| 267 | <A HREF="scale.html">scale.html</a> | 
|---|
| 268 | <P> | 
|---|
| 269 | The marker correctly handles mark stack overflows.  Whenever the mark stack | 
|---|
| 270 | overflows, the mark state is reset to <TT>MS_INVALID</tt>. | 
|---|
| 271 | Since there are already marked objects in the heap, | 
|---|
| 272 | this eventually forces a complete | 
|---|
| 273 | scan of the heap, searching for pointers, during which any unmarked objects | 
|---|
| 274 | referenced by marked objects are again pushed on the mark stack.  This | 
|---|
| 275 | process is repeated until the mark phase completes without a stack overflow. | 
|---|
| 276 | Each time the stack overflows, an attempt is made to grow the mark stack. | 
|---|
| 277 | All pieces of the collector that push regions onto the mark stack have to be | 
|---|
| 278 | careful to ensure forward progress, even in case of repeated mark stack | 
|---|
| 279 | overflows.  Every mark attempt results in additional marked objects. | 
|---|
| 280 | <P> | 
|---|
| 281 | Each mark stack entry is processed by examining all candidate pointers | 
|---|
| 282 | in the range described by the entry.  If the region has no associated | 
|---|
| 283 | type information, then this typically requires that each 4-byte aligned | 
|---|
| 284 | quantity (8-byte aligned with 64-bit pointers) be considered a candidate | 
|---|
| 285 | pointer. | 
|---|
| 286 | <P> | 
|---|
| 287 | We determine whether a candidate pointer is actually the address of | 
|---|
| 288 | a heap block.  This is done in the following steps: | 
|---|
| 289 | <NL> | 
|---|
| 290 | <LI> The candidate pointer is checked against rough heap bounds. | 
|---|
| 291 | These heap bounds are maintained such that all actual heap objects | 
|---|
| 292 | fall between them.  In order to facilitate black-listing (see below) | 
|---|
| 293 | we also include address regions that the heap is likely to expand into. | 
|---|
| 294 | Most non-pointers fail this initial test. | 
|---|
| 295 | <LI> The candidate pointer is divided into two pieces; the most significant | 
|---|
| 296 | bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and | 
|---|
| 297 | the least significant bits specify an offset within that page. | 
|---|
| 298 | (A hardware page may actually consist of multiple such pages. | 
|---|
| 299 | HBLKSIZE is usually the page size divided by a small power of two.) | 
|---|
| 300 | <LI> | 
|---|
| 301 | The page address part of the candidate pointer is looked up in a | 
|---|
| 302 | <A HREF="tree.html">table</a>. | 
|---|
| 303 | Each table entry contains either 0, indicating that the page is not part | 
|---|
| 304 | of the garbage collected heap, a small integer <I>n</i>, indicating | 
|---|
| 305 | that the page is part of large object, starting at least <I>n</i> pages | 
|---|
| 306 | back, or a pointer to a descriptor for the page.  In the first case, | 
|---|
| 307 | the candidate pointer i not a true pointer and can be safely ignored. | 
|---|
| 308 | In the last two cases, we can obtain a descriptor for the page containing | 
|---|
| 309 | the beginning of the object. | 
|---|
| 310 | <LI> | 
|---|
| 311 | The starting address of the referenced object is computed. | 
|---|
| 312 | The page descriptor contains the size of the object(s) | 
|---|
| 313 | in that page, the object kind, and the necessary mark bits for those | 
|---|
| 314 | objects.  The size information can be used to map the candidate pointer | 
|---|
| 315 | to the object starting address.  To accelerate this process, the page header | 
|---|
| 316 | also contains a pointer to a precomputed map of page offsets to displacements | 
|---|
| 317 | from the beginning of an object.  The use of this map avoids a | 
|---|
| 318 | potentially slow integer remainder operation in computing the object | 
|---|
| 319 | start address. | 
|---|
| 320 | <LI> | 
|---|
| 321 | The mark bit for the target object is checked and set.  If the object | 
|---|
| 322 | was previously unmarked, the object is pushed on the mark stack. | 
|---|
| 323 | The descriptor is read from the page descriptor.  (This is computed | 
|---|
| 324 | from information <TT>GC_obj_kinds</tt> when the page is first allocated.) | 
|---|
| 325 | </nl> | 
|---|
| 326 | <P> | 
|---|
| 327 | At the end of the mark phase, mark bits for left-over free lists are cleared, | 
|---|
| 328 | in case a free list was accidentally marked due to a stray pointer. | 
|---|
| 329 |  | 
|---|
| 330 | <H2>Sweep phase</h2> | 
|---|
| 331 |  | 
|---|
| 332 | At the end of the mark phase, all blocks in the heap are examined. | 
|---|
| 333 | Unmarked large objects are immediately returned to the large object free list. | 
|---|
| 334 | Each small object page is checked to see if all mark bits are clear. | 
|---|
| 335 | If so, the entire page is returned to the large object free list. | 
|---|
| 336 | Small object pages containing some reachable object are queued for later | 
|---|
| 337 | sweeping, unless we determine that the page contains very little free | 
|---|
| 338 | space, in which case it is not examined further. | 
|---|
| 339 | <P> | 
|---|
| 340 | This initial sweep pass touches only block headers, not | 
|---|
| 341 | the blocks themselves.  Thus it does not require significant paging, even | 
|---|
| 342 | if large sections of the heap are not in physical memory. | 
|---|
| 343 | <P> | 
|---|
| 344 | Nonempty small object pages are swept when an allocation attempt | 
|---|
| 345 | encounters an empty free list for that object size and kind. | 
|---|
| 346 | Pages for the correct size and kind are repeatedly swept until at | 
|---|
| 347 | least one empty block is found.  Sweeping such a page involves | 
|---|
| 348 | scanning the mark bit array in the page header, and building a free | 
|---|
| 349 | list linked through the first words in the objects themselves. | 
|---|
| 350 | This does involve touching the appropriate data page, but in most cases | 
|---|
| 351 | it will be touched only just before it is used for allocation. | 
|---|
| 352 | Hence any paging is essentially unavoidable. | 
|---|
| 353 | <P> | 
|---|
| 354 | Except in the case of pointer-free objects, we maintain the invariant | 
|---|
| 355 | that any object in a small object free list is cleared (except possibly | 
|---|
| 356 | for the link field).  Thus it becomes the burden of the small object | 
|---|
| 357 | sweep routine to clear objects.  This has the advantage that we can | 
|---|
| 358 | easily recover from accidentally marking a free list, though that could | 
|---|
| 359 | also be handled by other means.  The collector currently spends a fair | 
|---|
| 360 | amount of time clearing objects, and this approach should probably be | 
|---|
| 361 | revisited. | 
|---|
| 362 | <P> | 
|---|
| 363 | In most configurations, we use specialized sweep routines to handle common | 
|---|
| 364 | small object sizes.  Since we allocate one mark bit per word, it becomes | 
|---|
| 365 | easier to examine the relevant mark bits if the object size divides | 
|---|
| 366 | the word length evenly.  We also suitably unroll the inner sweep loop | 
|---|
| 367 | in each case.  (It is conceivable that profile-based procedure cloning | 
|---|
| 368 | in the compiler could make this unnecessary and counterproductive.  I | 
|---|
| 369 | know of no existing compiler to which this applies.) | 
|---|
| 370 | <P> | 
|---|
| 371 | The sweeping of small object pages could be avoided completely at the expense | 
|---|
| 372 | of examining mark bits directly in the allocator.  This would probably | 
|---|
| 373 | be more expensive, since each allocation call would have to reload | 
|---|
| 374 | a large amount of state (e.g. next object address to be swept, position | 
|---|
| 375 | in mark bit table) before it could do its work.  The current scheme | 
|---|
| 376 | keeps the allocator simple and allows useful optimizations in the sweeper. | 
|---|
| 377 |  | 
|---|
| 378 | <H2>Finalization</h2> | 
|---|
| 379 | Both <TT>GC_register_disappearing_link</tt> and | 
|---|
| 380 | <TT>GC_register_finalizer</tt> add the request to a corresponding hash | 
|---|
| 381 | table.  The hash table is allocated out of collected memory, but | 
|---|
| 382 | the reference to the finalizable object is hidden from the collector. | 
|---|
| 383 | Currently finalization requests are processed non-incrementally at the | 
|---|
| 384 | end of a mark cycle. | 
|---|
| 385 | <P> | 
|---|
| 386 | The collector makes an initial pass over the table of finalizable objects, | 
|---|
| 387 | pushing the contents of unmarked objects onto the mark stack. | 
|---|
| 388 | After pushing each object, the marker is invoked to mark all objects | 
|---|
| 389 | reachable from it.  The object itself is not explicitly marked. | 
|---|
| 390 | This assures that objects on which a finalizer depends are neither | 
|---|
| 391 | collected nor finalized. | 
|---|
| 392 | <P> | 
|---|
| 393 | If in the process of marking from an object the | 
|---|
| 394 | object itself becomes marked, we have uncovered | 
|---|
| 395 | a cycle involving the object.  This usually results in a warning from the | 
|---|
| 396 | collector.  Such objects are not finalized, since it may be | 
|---|
| 397 | unsafe to do so.  See the more detailed | 
|---|
| 398 | <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>. | 
|---|
| 399 | <P> | 
|---|
| 400 | Any objects remaining unmarked at the end of this process are added to | 
|---|
| 401 | a queue of objects whose finalizers can be run.  Depending on collector | 
|---|
| 402 | configuration, finalizers are dequeued and run either implicitly during | 
|---|
| 403 | allocation calls, or explicitly in response to a user request. | 
|---|
| 404 | (Note that the former is unfortunately both the default and not generally safe. | 
|---|
| 405 | If finalizers perform synchronization, it may result in deadlocks. | 
|---|
| 406 | Nontrivial finalizers generally need to perform synchronization, and | 
|---|
| 407 | thus require a different collector configuration.) | 
|---|
| 408 | <P> | 
|---|
| 409 | The collector provides a mechanism for replacing the procedure that is | 
|---|
| 410 | used to mark through objects.  This is used both to provide support for | 
|---|
| 411 | Java-style unordered finalization, and to ignore certain kinds of cycles, | 
|---|
| 412 | <I>e.g.</i> those arising from C++ implementations of virtual inheritance. | 
|---|
| 413 |  | 
|---|
| 414 | <H2>Generational Collection and Dirty Bits</h2> | 
|---|
| 415 | We basically use the concurrent and generational GC algorithm described in | 
|---|
| 416 | <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>, | 
|---|
| 417 | by Boehm, Demers, and Shenker. | 
|---|
| 418 | <P> | 
|---|
| 419 | The most significant modification is that | 
|---|
| 420 | the collector always starts running in the allocating thread. | 
|---|
| 421 | There is no separate garbage collector thread.  (If parallel GC is | 
|---|
| 422 | enabled, helper threads may also be woken up.) | 
|---|
| 423 | If an allocation attempt either requests a large object, or encounters | 
|---|
| 424 | an empty small object free list, and notices that there is a collection | 
|---|
| 425 | in progress, it immediately performs a small amount of marking work | 
|---|
| 426 | as described above. | 
|---|
| 427 | <P> | 
|---|
| 428 | This change was made both because we wanted to easily accommodate | 
|---|
| 429 | single-threaded environments, and because a separate GC thread requires | 
|---|
| 430 | very careful control over the scheduler to prevent the mutator from | 
|---|
| 431 | out-running the collector, and hence provoking unneeded heap growth. | 
|---|
| 432 | <P> | 
|---|
| 433 | In incremental mode, the heap is always expanded when we encounter | 
|---|
| 434 | insufficient space for an allocation.  Garbage collection is triggered | 
|---|
| 435 | whenever we notice that more than | 
|---|
| 436 | <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt> | 
|---|
| 437 | bytes of allocation have taken place. | 
|---|
| 438 | After <TT>GC_full_freq</tt> minor collections a major collection | 
|---|
| 439 | is started. | 
|---|
| 440 | <P> | 
|---|
| 441 | All collections initially run interrupted until a predetermined | 
|---|
| 442 | amount of time (50 msecs by default) has expired.  If this allows | 
|---|
| 443 | the collection to complete entirely, we can avoid correcting | 
|---|
| 444 | for data structure modifications during the collection.  If it does | 
|---|
| 445 | not complete, we return control to the mutator, and perform small | 
|---|
| 446 | amounts of additional GC work during those later allocations that | 
|---|
| 447 | cannot be satisfied from small object free lists. When marking completes, | 
|---|
| 448 | the set of modified pages is retrieved, and we mark once again from | 
|---|
| 449 | marked objects on those pages, this time with the mutator stopped. | 
|---|
| 450 | <P> | 
|---|
| 451 | We keep track of modified pages using one of several distinct mechanisms: | 
|---|
| 452 | <OL> | 
|---|
| 453 | <LI> | 
|---|
| 454 | Through explicit mutator cooperation.  Currently this requires | 
|---|
| 455 | the use of <TT>GC_malloc_stubborn</tt>, and is rarely used. | 
|---|
| 456 | <LI> | 
|---|
| 457 | (<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and | 
|---|
| 458 | catching write faults.  This is | 
|---|
| 459 | implemented for many Unix-like systems and for win32.  It is not possible | 
|---|
| 460 | in a few environments. | 
|---|
| 461 | <LI> | 
|---|
| 462 | (<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc. | 
|---|
| 463 | (Currently only Sun's | 
|---|
| 464 | Solaris supports this.  Though this is considerably cleaner, performance | 
|---|
| 465 | may actually be better with mprotect and signals.) | 
|---|
| 466 | <LI> | 
|---|
| 467 | (<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this | 
|---|
| 468 | case the one in Xerox PCR. | 
|---|
| 469 | <LI> | 
|---|
| 470 | (<TT>DEFAULT_VDB</tt>) By treating all pages as dirty.  This is the default if | 
|---|
| 471 | none of the other techniques is known to be usable, and | 
|---|
| 472 | <TT>GC_malloc_stubborn</tt> is not used.  Practical only for testing, or if | 
|---|
| 473 | the vast majority of objects use <TT>GC_malloc_stubborn</tt>. | 
|---|
| 474 | </ol> | 
|---|
| 475 |  | 
|---|
| 476 | <H2>Black-listing</h2> | 
|---|
| 477 |  | 
|---|
| 478 | The collector implements <I>black-listing</i> of pages, as described | 
|---|
| 479 | in | 
|---|
| 480 | <A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/"> | 
|---|
| 481 | Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available | 
|---|
| 482 | <A HREF="papers/pldi93.ps.Z">here</a>. | 
|---|
| 483 | <P> | 
|---|
| 484 | During the mark phase, the collector tracks ``near misses'', i.e. attempts | 
|---|
| 485 | to follow a ``pointer'' to just outside the garbage-collected heap, or | 
|---|
| 486 | to a currently unallocated page inside the heap.  Pages that have been | 
|---|
| 487 | the targets of such near misses are likely to be the targets of | 
|---|
| 488 | misidentified ``pointers'' in the future.  To minimize the future | 
|---|
| 489 | damage caused by such misidentifications they will be allocated only to | 
|---|
| 490 | small pointerfree objects. | 
|---|
| 491 | <P> | 
|---|
| 492 | The collector understands two different kinds of black-listing.  A | 
|---|
| 493 | page may be black listed for interior pointer references | 
|---|
| 494 | (<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near | 
|---|
| 495 | miss from a location that requires interior pointer recognition, | 
|---|
| 496 | <I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt> | 
|---|
| 497 | is set.  In this case, we also avoid allocating large blocks that include | 
|---|
| 498 | this page. | 
|---|
| 499 | <P> | 
|---|
| 500 | If the near miss came from a source that did not require interior | 
|---|
| 501 | pointer recognition, it is black-listed with | 
|---|
| 502 | <TT>GC_add_to_black_list_normal</tt>. | 
|---|
| 503 | A page black-listed in this way may appear inside a large object, | 
|---|
| 504 | so long as it is not the first page of a large object. | 
|---|
| 505 | <P> | 
|---|
| 506 | The <TT>GC_allochblk</tt> routine respects black-listing when assigning | 
|---|
| 507 | a block to a particular object kind and size.  It occasionally | 
|---|
| 508 | drops (i.e. allocates and forgets) blocks that are completely black-listed | 
|---|
| 509 | in order to avoid excessively long large block free lists containing | 
|---|
| 510 | only unusable blocks.  This would otherwise become an issue | 
|---|
| 511 | if there is low demand for small pointerfree objects. | 
|---|
| 512 |  | 
|---|
| 513 | <H2>Thread support</h2> | 
|---|
| 514 | We support several different threading models.  Unfortunately Pthreads, | 
|---|
| 515 | the only reasonably well standardized thread model, supports too narrow | 
|---|
| 516 | an interface for conservative garbage collection.  There appears to be | 
|---|
| 517 | no completely portable way to allow the collector | 
|---|
| 518 | to coexist with various Pthreads | 
|---|
| 519 | implementations.  Hence we currently support only the more | 
|---|
| 520 | common Pthreads implementations. | 
|---|
| 521 | <P> | 
|---|
| 522 | In particular, it is very difficult for the collector to stop all other | 
|---|
| 523 | threads in the system and examine the register contents.  This is currently | 
|---|
| 524 | accomplished with very different mechanisms for some Pthreads | 
|---|
| 525 | implementations.  The Solaris implementation temporarily disables much | 
|---|
| 526 | of the user-level threads implementation by stopping kernel-level threads | 
|---|
| 527 | ("lwp"s).  The Linux/HPUX/OSF1 and Irix implementations sends signals to | 
|---|
| 528 | individual Pthreads and has them wait in the signal handler. | 
|---|
| 529 | <P> | 
|---|
| 530 | The Linux and Irix implementations use | 
|---|
| 531 | only documented Pthreads calls, but rely on extensions to their semantics. | 
|---|
| 532 | The Linux implementation <TT>linux_threads.c</tt> relies on only very | 
|---|
| 533 | mild extensions to the pthreads semantics, and already supports a large number | 
|---|
| 534 | of other Unix-like pthreads implementations.  Our goal is to make this the | 
|---|
| 535 | only pthread support in the collector. | 
|---|
| 536 | <P> | 
|---|
| 537 | (The Irix implementation is separate only for historical reasons and should | 
|---|
| 538 | clearly be merged.  The current Solaris implementation probably performs | 
|---|
| 539 | better in the uniprocessor case, but does not support thread operations in the | 
|---|
| 540 | collector.  Hence it cannot support the parallel marker.) | 
|---|
| 541 | <P> | 
|---|
| 542 | All implementations must | 
|---|
| 543 | intercept thread creation and a few other thread-specific calls to allow | 
|---|
| 544 | enumeration of threads and location of thread stacks.  This is current | 
|---|
| 545 | accomplished with <TT># define</tt>'s in <TT>gc.h</tt> | 
|---|
| 546 | (really <TT>gc_pthread_redirects.h</tt>), or optionally | 
|---|
| 547 | by using ld's function call wrapping mechanism under Linux. | 
|---|
| 548 | <P> | 
|---|
| 549 | Recent versions of the collector support several facilites to enhance | 
|---|
| 550 | the processor-scalability and thread performance of the collector. | 
|---|
| 551 | These are discussed in more detail <A HREF="scale.html">here</a>. | 
|---|
| 552 | <P> | 
|---|
| 553 | Comments are appreciated.  Please send mail to | 
|---|
| 554 | <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or | 
|---|
| 555 | <A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a> | 
|---|
| 556 | <P> | 
|---|
| 557 | This is a modified copy of a page written while the author was at SGI. | 
|---|
| 558 | The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>. | 
|---|
| 559 | </body> | 
|---|
| 560 | </html> | 
|---|