1 | <HTML>
|
---|
2 | <HEAD>
|
---|
3 | <TITLE> Conservative GC Algorithmic Overview </TITLE>
|
---|
4 | <AUTHOR> Hans-J. Boehm, Silicon Graphics</author>
|
---|
5 | </HEAD>
|
---|
6 | <BODY>
|
---|
7 | <H1> <I>This is under construction</i> </h1>
|
---|
8 | <H1> Conservative GC Algorithmic Overview </h1>
|
---|
9 | <P>
|
---|
10 | This is a description of the algorithms and data structures used in our
|
---|
11 | conservative garbage collector. I expect the level of detail to increase
|
---|
12 | with time. For a survey of GC algorithms, see for example
|
---|
13 | <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
|
---|
14 | excellent paper</a>. For an overview of the collector interface,
|
---|
15 | see <A HREF="gcinterface.html">here</a>.
|
---|
16 | <P>
|
---|
17 | This description is targeted primarily at someone trying to understand the
|
---|
18 | source code. It specifically refers to variable and function names.
|
---|
19 | It may also be useful for understanding the algorithms at a higher level.
|
---|
20 | <P>
|
---|
21 | The description here assumes that the collector is used in default mode.
|
---|
22 | In particular, we assume that it used as a garbage collector, and not just
|
---|
23 | a leak detector. We initially assume that it is used in stop-the-world,
|
---|
24 | non-incremental mode, though the presence of the incremental collector
|
---|
25 | will be apparent in the design.
|
---|
26 | We assume the default finalization model, but the code affected by that
|
---|
27 | is very localized.
|
---|
28 | <H2> Introduction </h2>
|
---|
29 | The garbage collector uses a modified mark-sweep algorithm. Conceptually
|
---|
30 | it operates roughly in four phases:
|
---|
31 |
|
---|
32 | <OL>
|
---|
33 |
|
---|
34 | <LI>
|
---|
35 | <I>Preparation</i> Clear all mark bits, indicating that all objects
|
---|
36 | are potentially unreachable.
|
---|
37 |
|
---|
38 | <LI>
|
---|
39 | <I>Mark phase</i> Marks all objects that can be reachable via chains of
|
---|
40 | pointers from variables. Normally the collector has no real information
|
---|
41 | about the location of pointer variables in the heap, so it
|
---|
42 | views all static data areas, stacks and registers as potentially containing
|
---|
43 | containing pointers. Any bit patterns that represent addresses inside
|
---|
44 | heap objects managed by the collector are viewed as pointers.
|
---|
45 | Unless the client program has made heap object layout information
|
---|
46 | available to the collector, any heap objects found to be reachable from
|
---|
47 | variables are again scanned similarly.
|
---|
48 |
|
---|
49 | <LI>
|
---|
50 | <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
|
---|
51 | objects, and returns them to an appropriate free list for reuse. This is
|
---|
52 | not really a separate phase; even in non incremental mode this is operation
|
---|
53 | is usually performed on demand during an allocation that discovers an empty
|
---|
54 | free list. Thus the sweep phase is very unlikely to touch a page that
|
---|
55 | would not have been touched shortly thereafter anyway.
|
---|
56 |
|
---|
57 | <LI>
|
---|
58 | <I>Finalization phase</i> Unreachable objects which had been registered
|
---|
59 | for finalization are enqueued for finalization outside the collector.
|
---|
60 |
|
---|
61 | </ol>
|
---|
62 |
|
---|
63 | <P>
|
---|
64 | The remaining sections describe the memory allocation data structures,
|
---|
65 | and then the last 3 collection phases in more detail. We conclude by
|
---|
66 | outlining some of the additional features implemented in the collector.
|
---|
67 |
|
---|
68 | <H2>Allocation</h2>
|
---|
69 | The collector includes its own memory allocator. The allocator obtains
|
---|
70 | memory from the system in a platform-dependent way. Under UNIX, it
|
---|
71 | uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
|
---|
72 | <P>
|
---|
73 | Most static data used by the allocator, as well as that needed by the
|
---|
74 | rest of the garbage collector is stored inside the
|
---|
75 | <TT>_GC_arrays</tt> structure.
|
---|
76 | This allows the garbage collector to easily ignore the collectors own
|
---|
77 | data structures when it searches for root pointers. Other allocator
|
---|
78 | and collector internal data structures are allocated dynamically
|
---|
79 | with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
|
---|
80 | allow for deallocation, and is therefore used only for permanent data
|
---|
81 | structures.
|
---|
82 | <P>
|
---|
83 | The allocator allocates objects of different <I>kinds</i>.
|
---|
84 | Different kinds are handled somewhat differently by certain parts
|
---|
85 | of the garbage collector. Certain kinds are scanned for pointers,
|
---|
86 | others are not. Some may have per-object type descriptors that
|
---|
87 | determine pointer locations. Or a specific kind may correspond
|
---|
88 | to one specific object layout. Two built-in kinds are uncollectable.
|
---|
89 | One (<TT>STUBBORN</tt>) is immutable without special precautions.
|
---|
90 | In spite of that, it is very likely that most applications currently
|
---|
91 | use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
|
---|
92 | <P>
|
---|
93 | The collector uses a two level allocator. A large block is defined to
|
---|
94 | be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
|
---|
95 | typically on the order of the page size.
|
---|
96 | <P>
|
---|
97 | Large block sizes are rounded up to
|
---|
98 | the next multiple of <TT>HBLKSIZE</tt> and then allocated by
|
---|
99 | <TT>GC_allochblk</tt>. This uses roughly what Paul Wilson has termed
|
---|
100 | a "next fit" algorithm, i.e. first-fit with a rotating pointer.
|
---|
101 | The implementation does check for a better fitting immediately
|
---|
102 | adjacent block, which gives it somewhat better fragmentation characteristics.
|
---|
103 | I'm now convinced it should use a best fit algorithm. The actual
|
---|
104 | implementation of <TT>GC_allochblk</tt>
|
---|
105 | is significantly complicated by black-listing issues
|
---|
106 | (see below).
|
---|
107 | <P>
|
---|
108 | Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>.
|
---|
109 | Each block is
|
---|
110 | dedicated to only one object size and kind. The allocator maintains
|
---|
111 | separate free lists for each size and kind of object.
|
---|
112 | <P>
|
---|
113 | In order to avoid allocating blocks for too many distinct object sizes,
|
---|
114 | the collector normally does not directly allocate objects of every possible
|
---|
115 | request size. Instead request are rounded up to one of a smaller number
|
---|
116 | of allocated sizes, for which free lists are maintained. The exact
|
---|
117 | allocated sizes are computed on demand, but subject to the constraint
|
---|
118 | that they increase roughly in geometric progression. Thus objects
|
---|
119 | requested early in the execution are likely to be allocated with exactly
|
---|
120 | the requested size, subject to alignment constraints.
|
---|
121 | See <TT>GC_init_size_map</tt> for details.
|
---|
122 | <P>
|
---|
123 | The actual size rounding operation during small object allocation is
|
---|
124 | implemented as a table lookup in <TT>GC_size_map</tt>.
|
---|
125 | <P>
|
---|
126 | Both collector initialization and computation of allocated sizes are
|
---|
127 | handled carefully so that they do not slow down the small object fast
|
---|
128 | allocation path. An attempt to allocate before the collector is initialized,
|
---|
129 | or before the appropriate <TT>GC_size_map</tt> entry is computed,
|
---|
130 | will take the same path as an allocation attempt with an empty free list.
|
---|
131 | This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
|
---|
132 | which performs the appropriate initialization checks.
|
---|
133 | <P>
|
---|
134 | In non-incremental mode, we make a decision about whether to garbage collect
|
---|
135 | whenever an allocation would otherwise have failed with the current heap size.
|
---|
136 | If the total amount of allocation since the last collection is less than
|
---|
137 | the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
|
---|
138 | expand the heap. Otherwise, we initiate a garbage collection. This ensures
|
---|
139 | that the amount of garbage collection work per allocated byte remains
|
---|
140 | constant.
|
---|
141 | <P>
|
---|
142 | The above is in fat an oversimplification of the real heap expansion
|
---|
143 | heuristic, which adjusts slightly for root size and certain kinds of
|
---|
144 | fragmentation. In particular, programs with a large root set size and
|
---|
145 | little live heap memory will expand the heap to amortize the cost of
|
---|
146 | scanning the roots.
|
---|
147 | <P>
|
---|
148 | Versions 5.x of the collector actually collect more frequently in
|
---|
149 | nonincremental mode. The large block allocator usually refuses to split
|
---|
150 | large heap blocks once the garbage collection threshold is
|
---|
151 | reached. This often has the effect of collecting well before the
|
---|
152 | heap fills up, thus reducing fragmentation and working set size at the
|
---|
153 | expense of GC time. 6.x will chose an intermediate strategy depending
|
---|
154 | on how much large object allocation has taken place in the past.
|
---|
155 | (If the collector is configured to unmap unused pages, versions 6.x
|
---|
156 | will use the 5.x strategy.)
|
---|
157 | <P>
|
---|
158 | (It has been suggested that this should be adjusted so that we favor
|
---|
159 | expansion if the resulting heap still fits into physical memory.
|
---|
160 | In many cases, that would no doubt help. But it is tricky to do this
|
---|
161 | in a way that remains robust if multiple application are contending
|
---|
162 | for a single pool of physical memory.)
|
---|
163 |
|
---|
164 | <H2>Mark phase</h2>
|
---|
165 |
|
---|
166 | The marker maintains an explicit stack of memory regions that are known
|
---|
167 | to be accessible, but that have not yet been searched for contained pointers.
|
---|
168 | Each stack entry contains the starting address of the block to be scanned,
|
---|
169 | as well as a descriptor of the block. If no layout information is
|
---|
170 | available for the block, then the descriptor is simply a length.
|
---|
171 | (For other possibilities, see <TT>gc_mark.h</tt>.)
|
---|
172 | <P>
|
---|
173 | At the beginning of the mark phase, all root segments are pushed on the
|
---|
174 | stack by <TT>GC_push_roots</tt>. If <TT>ALL_INTERIOR_PTRS</tt> is not
|
---|
175 | defined, then stack roots require special treatment. In this case, the
|
---|
176 | normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
|
---|
177 | explicitly checks for interior pointers and pushes descriptors for target
|
---|
178 | objects.
|
---|
179 | <P>
|
---|
180 | The marker is structured to allow incremental marking.
|
---|
181 | Each call to <TT>GC_mark_some</tt> performs a small amount of
|
---|
182 | work towards marking the heap.
|
---|
183 | It maintains
|
---|
184 | explicit state in the form of <TT>GC_mark_state</tt>, which
|
---|
185 | identifies a particular sub-phase. Some other pieces of state, most
|
---|
186 | notably the mark stack, identify how much work remains to be done
|
---|
187 | in each sub-phase. The normal progression of mark states for
|
---|
188 | a stop-the-world collection is:
|
---|
189 | <OL>
|
---|
190 | <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
|
---|
191 | objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
|
---|
192 | be false, so the mark state is advanced to
|
---|
193 | <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
|
---|
194 | uncollectable objects, roots, and then mark everything reachable from them.
|
---|
195 | <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
|
---|
196 | objects are pushed, and objects reachable from them are marked.
|
---|
197 | At that point, the next call to <TT>GC_mark_some</tt> calls
|
---|
198 | <TT>GC_push_roots</tt> to push the roots. It the advances the
|
---|
199 | mark state to
|
---|
200 | <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
|
---|
201 | empty, all reachable objects are marked. Once in this state, we work
|
---|
202 | only on emptying the mark stack. Once this is completed, the state
|
---|
203 | changes to
|
---|
204 | <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
|
---|
205 | </ol>
|
---|
206 |
|
---|
207 | The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called
|
---|
208 | repeatedly by several of the sub-phases when the mark stack starts to fill
|
---|
209 | up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
|
---|
210 | to empty the mark stack.
|
---|
211 | The routine is designed to only perform a limited amount of marking at
|
---|
212 | each call, so that it can also be used by the incremental collector.
|
---|
213 | It is fairly carefully tuned, since it usually consumes a large majority
|
---|
214 | of the garbage collection time.
|
---|
215 | <P>
|
---|
216 | The marker correctly handles mark stack overflows. Whenever the mark stack
|
---|
217 | overflows, the mark state is reset to <TT>MS_INVALID</tt>.
|
---|
218 | Since there are already marked objects in the heap,
|
---|
219 | this eventually forces a complete
|
---|
220 | scan of the heap, searching for pointers, during which any unmarked objects
|
---|
221 | referenced by marked objects are again pushed on the mark stack. This
|
---|
222 | process is repeated until the mark phase completes without a stack overflow.
|
---|
223 | Each time the stack overflows, an attempt is made to grow the mark stack.
|
---|
224 | All pieces of the collector that push regions onto the mark stack have to be
|
---|
225 | careful to ensure forward progress, even in case of repeated mark stack
|
---|
226 | overflows. Every mark attempt results in additional marked objects.
|
---|
227 | <P>
|
---|
228 | Each mark stack entry is processed by examining all candidate pointers
|
---|
229 | in the range described by the entry. If the region has no associated
|
---|
230 | type information, then this typically requires that each 4-byte aligned
|
---|
231 | quantity (8-byte aligned with 64-bit pointers) be considered a candidate
|
---|
232 | pointer.
|
---|
233 | <P>
|
---|
234 | We determine whether a candidate pointer is actually the address of
|
---|
235 | a heap block. This is done in the following steps:
|
---|
236 | <NL>
|
---|
237 | <LI> The candidate pointer is checked against rough heap bounds.
|
---|
238 | These heap bounds are maintained such that all actual heap objects
|
---|
239 | fall between them. In order to facilitate black-listing (see below)
|
---|
240 | we also include address regions that the heap is likely to expand into.
|
---|
241 | Most non-pointers fail this initial test.
|
---|
242 | <LI> The candidate pointer is divided into two pieces; the most significant
|
---|
243 | bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
|
---|
244 | the least significant bits specify an offset within that page.
|
---|
245 | (A hardware page may actually consist of multiple such pages.
|
---|
246 | HBLKSIZE is usually the page size divided by a small power of two.)
|
---|
247 | <LI>
|
---|
248 | The page address part of the candidate pointer is looked up in a
|
---|
249 | <A HREF="tree.html">table</a>.
|
---|
250 | Each table entry contains either 0, indicating that the page is not part
|
---|
251 | of the garbage collected heap, a small integer <I>n</i>, indicating
|
---|
252 | that the page is part of large object, starting at least <I>n</i> pages
|
---|
253 | back, or a pointer to a descriptor for the page. In the first case,
|
---|
254 | the candidate pointer i not a true pointer and can be safely ignored.
|
---|
255 | In the last two cases, we can obtain a descriptor for the page containing
|
---|
256 | the beginning of the object.
|
---|
257 | <LI>
|
---|
258 | The starting address of the referenced object is computed.
|
---|
259 | The page descriptor contains the size of the object(s)
|
---|
260 | in that page, the object kind, and the necessary mark bits for those
|
---|
261 | objects. The size information can be used to map the candidate pointer
|
---|
262 | to the object starting address. To accelerate this process, the page header
|
---|
263 | also contains a pointer to a precomputed map of page offsets to displacements
|
---|
264 | from the beginning of an object. The use of this map avoids a
|
---|
265 | potentially slow integer remainder operation in computing the object
|
---|
266 | start address.
|
---|
267 | <LI>
|
---|
268 | The mark bit for the target object is checked and set. If the object
|
---|
269 | was previously unmarked, the object is pushed on the mark stack.
|
---|
270 | The descriptor is read from the page descriptor. (This is computed
|
---|
271 | from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
|
---|
272 | </nl>
|
---|
273 | <P>
|
---|
274 | At the end of the mark phase, mark bits for left-over free lists are cleared,
|
---|
275 | in case a free list was accidentally marked due to a stray pointer.
|
---|
276 |
|
---|
277 | <H2>Sweep phase</h2>
|
---|
278 |
|
---|
279 | At the end of the mark phase, all blocks in the heap are examined.
|
---|
280 | Unmarked large objects are immediately returned to the large object free list.
|
---|
281 | Each small object page is checked to see if all mark bits are clear.
|
---|
282 | If so, the entire page is returned to the large object free list.
|
---|
283 | Small object pages containing some reachable object are queued for later
|
---|
284 | sweeping.
|
---|
285 | <P>
|
---|
286 | This initial sweep pass touches only block headers, not
|
---|
287 | the blocks themselves. Thus it does not require significant paging, even
|
---|
288 | if large sections of the heap are not in physical memory.
|
---|
289 | <P>
|
---|
290 | Nonempty small object pages are swept when an allocation attempt
|
---|
291 | encounters an empty free list for that object size and kind.
|
---|
292 | Pages for the correct size and kind are repeatedly swept until at
|
---|
293 | least one empty block is found. Sweeping such a page involves
|
---|
294 | scanning the mark bit array in the page header, and building a free
|
---|
295 | list linked through the first words in the objects themselves.
|
---|
296 | This does involve touching the appropriate data page, but in most cases
|
---|
297 | it will be touched only just before it is used for allocation.
|
---|
298 | Hence any paging is essentially unavoidable.
|
---|
299 | <P>
|
---|
300 | Except in the case of pointer-free objects, we maintain the invariant
|
---|
301 | that any object in a small object free list is cleared (except possibly
|
---|
302 | for the link field). Thus it becomes the burden of the small object
|
---|
303 | sweep routine to clear objects. This has the advantage that we can
|
---|
304 | easily recover from accidentally marking a free list, though that could
|
---|
305 | also be handled by other means. The collector currently spends a fair
|
---|
306 | amount of time clearing objects, and this approach should probably be
|
---|
307 | revisited.
|
---|
308 | <P>
|
---|
309 | In most configurations, we use specialized sweep routines to handle common
|
---|
310 | small object sizes. Since we allocate one mark bit per word, it becomes
|
---|
311 | easier to examine the relevant mark bits if the object size divides
|
---|
312 | the word length evenly. We also suitably unroll the inner sweep loop
|
---|
313 | in each case. (It is conceivable that profile-based procedure cloning
|
---|
314 | in the compiler could make this unnecessary and counterproductive. I
|
---|
315 | know of no existing compiler to which this applies.)
|
---|
316 | <P>
|
---|
317 | The sweeping of small object pages could be avoided completely at the expense
|
---|
318 | of examining mark bits directly in the allocator. This would probably
|
---|
319 | be more expensive, since each allocation call would have to reload
|
---|
320 | a large amount of state (e.g. next object address to be swept, position
|
---|
321 | in mark bit table) before it could do its work. The current scheme
|
---|
322 | keeps the allocator simple and allows useful optimizations in the sweeper.
|
---|
323 |
|
---|
324 | <H2>Finalization</h2>
|
---|
325 | Both <TT>GC_register_disappearing_link</tt> and
|
---|
326 | <TT>GC_register_finalizer</tt> add the request to a corresponding hash
|
---|
327 | table. The hash table is allocated out of collected memory, but
|
---|
328 | the reference to the finalizable object is hidden from the collector.
|
---|
329 | Currently finalization requests are processed non-incrementally at the
|
---|
330 | end of a mark cycle.
|
---|
331 | <P>
|
---|
332 | The collector makes an initial pass over the table of finalizable objects,
|
---|
333 | pushing the contents of unmarked objects onto the mark stack.
|
---|
334 | After pushing each object, the marker is invoked to mark all objects
|
---|
335 | reachable from it. The object itself is not explicitly marked.
|
---|
336 | This assures that objects on which a finalizer depends are neither
|
---|
337 | collected nor finalized.
|
---|
338 | <P>
|
---|
339 | If in the process of marking from an object the
|
---|
340 | object itself becomes marked, we have uncovered
|
---|
341 | a cycle involving the object. This usually results in a warning from the
|
---|
342 | collector. Such objects are not finalized, since it may be
|
---|
343 | unsafe to do so. See the more detailed
|
---|
344 | <A HREF="finalization.html"> discussion of finalization semantics</a>.
|
---|
345 | <P>
|
---|
346 | Any objects remaining unmarked at the end of this process are added to
|
---|
347 | a queue of objects whose finalizers can be run. Depending on collector
|
---|
348 | configuration, finalizers are dequeued and run either implicitly during
|
---|
349 | allocation calls, or explicitly in response to a user request.
|
---|
350 | <P>
|
---|
351 | The collector provides a mechanism for replacing the procedure that is
|
---|
352 | used to mark through objects. This is used both to provide support for
|
---|
353 | Java-style unordered finalization, and to ignore certain kinds of cycles,
|
---|
354 | <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
|
---|
355 |
|
---|
356 | <H2>Generational Collection and Dirty Bits</h2>
|
---|
357 | We basically use the parallel and generational GC algorithm described in
|
---|
358 | <A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>,
|
---|
359 | by Boehm, Demers, and Shenker.
|
---|
360 | <P>
|
---|
361 | The most significant modification is that
|
---|
362 | the collector always runs in the allocating thread.
|
---|
363 | There is no separate garbage collector thread.
|
---|
364 | If an allocation attempt either requests a large object, or encounters
|
---|
365 | an empty small object free list, and notices that there is a collection
|
---|
366 | in progress, it immediately performs a small amount of marking work
|
---|
367 | as described above.
|
---|
368 | <P>
|
---|
369 | This change was made both because we wanted to easily accommodate
|
---|
370 | single-threaded environments, and because a separate GC thread requires
|
---|
371 | very careful control over the scheduler to prevent the mutator from
|
---|
372 | out-running the collector, and hence provoking unneeded heap growth.
|
---|
373 | <P>
|
---|
374 | In incremental mode, the heap is always expanded when we encounter
|
---|
375 | insufficient space for an allocation. Garbage collection is triggered
|
---|
376 | whenever we notice that more than
|
---|
377 | <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
|
---|
378 | bytes of allocation have taken place.
|
---|
379 | After <TT>GC_full_freq</tt> minor collections a major collection
|
---|
380 | is started.
|
---|
381 | <P>
|
---|
382 | All collections initially run interrupted until a predetermined
|
---|
383 | amount of time (50 msecs by default) has expired. If this allows
|
---|
384 | the collection to complete entirely, we can avoid correcting
|
---|
385 | for data structure modifications during the collection. If it does
|
---|
386 | not complete, we return control to the mutator, and perform small
|
---|
387 | amounts of additional GC work during those later allocations that
|
---|
388 | cannot be satisfied from small object free lists. When marking completes,
|
---|
389 | the set of modified pages is retrieved, and we mark once again from
|
---|
390 | marked objects on those pages, this time with the mutator stopped.
|
---|
391 | <P>
|
---|
392 | We keep track of modified pages using one of three distinct mechanisms:
|
---|
393 | <OL>
|
---|
394 | <LI>
|
---|
395 | Through explicit mutator cooperation. Currently this requires
|
---|
396 | the use of <TT>GC_malloc_stubborn</tt>.
|
---|
397 | <LI>
|
---|
398 | By write-protecting physical pages and catching write faults. This is
|
---|
399 | implemented for many Unix-like systems and for win32. It is not possible
|
---|
400 | in a few environments.
|
---|
401 | <LI>
|
---|
402 | By retrieving dirty bit information from /proc. (Currently only Sun's
|
---|
403 | Solaris supports this. Though this is considerably cleaner, performance
|
---|
404 | may actually be better with mprotect and signals.)
|
---|
405 | </ol>
|
---|
406 |
|
---|
407 | <H2>Thread support</h2>
|
---|
408 | We support several different threading models. Unfortunately Pthreads,
|
---|
409 | the only reasonably well standardized thread model, supports too narrow
|
---|
410 | an interface for conservative garbage collection. There appears to be
|
---|
411 | no portable way to allow the collector to coexist with various Pthreads
|
---|
412 | implementations. Hence we currently support only a few of the more
|
---|
413 | common Pthreads implementations.
|
---|
414 | <P>
|
---|
415 | In particular, it is very difficult for the collector to stop all other
|
---|
416 | threads in the system and examine the register contents. This is currently
|
---|
417 | accomplished with very different mechanisms for different Pthreads
|
---|
418 | implementations. The Solaris implementation temporarily disables much
|
---|
419 | of the user-level threads implementation by stopping kernel-level threads
|
---|
420 | ("lwp"s). The Irix implementation sends signals to individual Pthreads
|
---|
421 | and has them wait in the signal handler. The Linux implementation
|
---|
422 | is similar in spirit to the Irix one.
|
---|
423 | <P>
|
---|
424 | The Irix implementation uses
|
---|
425 | only documented Pthreads calls, but relies on extensions to their semantics,
|
---|
426 | notably the use of mutexes and condition variables from signal
|
---|
427 | handlers. The Linux implementation should be far closer to
|
---|
428 | portable, though impirically it is not completely portable.
|
---|
429 | <P>
|
---|
430 | All implementations must
|
---|
431 | intercept thread creation and a few other thread-specific calls to allow
|
---|
432 | enumeration of threads and location of thread stacks. This is current
|
---|
433 | accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally
|
---|
434 | by using ld's function call wrapping mechanism under Linux.
|
---|
435 | <P>
|
---|
436 | Comments are appreciated. Please send mail to
|
---|
437 | <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a>
|
---|
438 | </body>
|
---|