Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

gcdescr.html@ 3003

Visit:

Last change on this file since 3003 was 2, checked in by bird, 22 years ago
Initial revision
Property cvs2svn:cvs-rev set to `1.1` Property svn:eol-style set to `native` Property svn:executable set to ``*
File size: 21.8 KB

Line
1	<HTML>
2	<HEAD>
3	<TITLE> Conservative GC Algorithmic Overview </TITLE>
4	<AUTHOR> Hans-J. Boehm, Silicon Graphics</author>
5	</HEAD>
6	<BODY>
7	<H1> <I>This is under construction</i> </h1>
8	<H1> Conservative GC Algorithmic Overview </h1>
9	<P>
10	This is a description of the algorithms and data structures used in our
11	conservative garbage collector. I expect the level of detail to increase
12	with time. For a survey of GC algorithms, see for example
13	<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
14	excellent paper</a>. For an overview of the collector interface,
15	see <A HREF="gcinterface.html">here</a>.
16	<P>
17	This description is targeted primarily at someone trying to understand the
18	source code. It specifically refers to variable and function names.
19	It may also be useful for understanding the algorithms at a higher level.
20	<P>
21	The description here assumes that the collector is used in default mode.
22	In particular, we assume that it used as a garbage collector, and not just
23	a leak detector. We initially assume that it is used in stop-the-world,
24	non-incremental mode, though the presence of the incremental collector
25	will be apparent in the design.
26	We assume the default finalization model, but the code affected by that
27	is very localized.
28	<H2> Introduction </h2>
29	The garbage collector uses a modified mark-sweep algorithm. Conceptually
30	it operates roughly in four phases:
31
32	<OL>
33
34	<LI>
35	<I>Preparation</i> Clear all mark bits, indicating that all objects
36	are potentially unreachable.
37
38	<LI>
39	<I>Mark phase</i> Marks all objects that can be reachable via chains of
40	pointers from variables. Normally the collector has no real information
41	about the location of pointer variables in the heap, so it
42	views all static data areas, stacks and registers as potentially containing
43	containing pointers. Any bit patterns that represent addresses inside
44	heap objects managed by the collector are viewed as pointers.
45	Unless the client program has made heap object layout information
46	available to the collector, any heap objects found to be reachable from
47	variables are again scanned similarly.
48
49	<LI>
50	<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
51	objects, and returns them to an appropriate free list for reuse. This is
52	not really a separate phase; even in non incremental mode this is operation
53	is usually performed on demand during an allocation that discovers an empty
54	free list. Thus the sweep phase is very unlikely to touch a page that
55	would not have been touched shortly thereafter anyway.
56
57	<LI>
58	<I>Finalization phase</i> Unreachable objects which had been registered
59	for finalization are enqueued for finalization outside the collector.
60
61	</ol>
62
63	<P>
64	The remaining sections describe the memory allocation data structures,
65	and then the last 3 collection phases in more detail. We conclude by
66	outlining some of the additional features implemented in the collector.
67
68	<H2>Allocation</h2>
69	The collector includes its own memory allocator. The allocator obtains
70	memory from the system in a platform-dependent way. Under UNIX, it
71	uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
72	<P>
73	Most static data used by the allocator, as well as that needed by the
74	rest of the garbage collector is stored inside the
75	<TT>_GC_arrays</tt> structure.
76	This allows the garbage collector to easily ignore the collectors own
77	data structures when it searches for root pointers. Other allocator
78	and collector internal data structures are allocated dynamically
79	with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
80	allow for deallocation, and is therefore used only for permanent data
81	structures.
82	<P>
83	The allocator allocates objects of different <I>kinds</i>.
84	Different kinds are handled somewhat differently by certain parts
85	of the garbage collector. Certain kinds are scanned for pointers,
86	others are not. Some may have per-object type descriptors that
87	determine pointer locations. Or a specific kind may correspond
88	to one specific object layout. Two built-in kinds are uncollectable.
89	One (<TT>STUBBORN</tt>) is immutable without special precautions.
90	In spite of that, it is very likely that most applications currently
91	use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
92	<P>
93	The collector uses a two level allocator. A large block is defined to
94	be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
95	typically on the order of the page size.
96	<P>
97	Large block sizes are rounded up to
98	the next multiple of <TT>HBLKSIZE</tt> and then allocated by
99	<TT>GC_allochblk</tt>. This uses roughly what Paul Wilson has termed
100	a "next fit" algorithm, i.e. first-fit with a rotating pointer.
101	The implementation does check for a better fitting immediately
102	adjacent block, which gives it somewhat better fragmentation characteristics.
103	I'm now convinced it should use a best fit algorithm. The actual
104	implementation of <TT>GC_allochblk</tt>
105	is significantly complicated by black-listing issues
106	(see below).
107	<P>
108	Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>.
109	Each block is
110	dedicated to only one object size and kind. The allocator maintains
111	separate free lists for each size and kind of object.
112	<P>
113	In order to avoid allocating blocks for too many distinct object sizes,
114	the collector normally does not directly allocate objects of every possible
115	request size. Instead request are rounded up to one of a smaller number
116	of allocated sizes, for which free lists are maintained. The exact
117	allocated sizes are computed on demand, but subject to the constraint
118	that they increase roughly in geometric progression. Thus objects
119	requested early in the execution are likely to be allocated with exactly
120	the requested size, subject to alignment constraints.
121	See <TT>GC_init_size_map</tt> for details.
122	<P>
123	The actual size rounding operation during small object allocation is
124	implemented as a table lookup in <TT>GC_size_map</tt>.
125	<P>
126	Both collector initialization and computation of allocated sizes are
127	handled carefully so that they do not slow down the small object fast
128	allocation path. An attempt to allocate before the collector is initialized,
129	or before the appropriate <TT>GC_size_map</tt> entry is computed,
130	will take the same path as an allocation attempt with an empty free list.
131	This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
132	which performs the appropriate initialization checks.
133	<P>
134	In non-incremental mode, we make a decision about whether to garbage collect
135	whenever an allocation would otherwise have failed with the current heap size.
136	If the total amount of allocation since the last collection is less than
137	the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
138	expand the heap. Otherwise, we initiate a garbage collection. This ensures
139	that the amount of garbage collection work per allocated byte remains
140	constant.
141	<P>
142	The above is in fat an oversimplification of the real heap expansion
143	heuristic, which adjusts slightly for root size and certain kinds of
144	fragmentation. In particular, programs with a large root set size and
145	little live heap memory will expand the heap to amortize the cost of
146	scanning the roots.
147	<P>
148	Versions 5.x of the collector actually collect more frequently in
149	nonincremental mode. The large block allocator usually refuses to split
150	large heap blocks once the garbage collection threshold is
151	reached. This often has the effect of collecting well before the
152	heap fills up, thus reducing fragmentation and working set size at the
153	expense of GC time. 6.x will chose an intermediate strategy depending
154	on how much large object allocation has taken place in the past.
155	(If the collector is configured to unmap unused pages, versions 6.x
156	will use the 5.x strategy.)
157	<P>
158	(It has been suggested that this should be adjusted so that we favor
159	expansion if the resulting heap still fits into physical memory.
160	In many cases, that would no doubt help. But it is tricky to do this
161	in a way that remains robust if multiple application are contending
162	for a single pool of physical memory.)
163
164	<H2>Mark phase</h2>
165
166	The marker maintains an explicit stack of memory regions that are known
167	to be accessible, but that have not yet been searched for contained pointers.
168	Each stack entry contains the starting address of the block to be scanned,
169	as well as a descriptor of the block. If no layout information is
170	available for the block, then the descriptor is simply a length.
171	(For other possibilities, see <TT>gc_mark.h</tt>.)
172	<P>
173	At the beginning of the mark phase, all root segments are pushed on the
174	stack by <TT>GC_push_roots</tt>. If <TT>ALL_INTERIOR_PTRS</tt> is not
175	defined, then stack roots require special treatment. In this case, the
176	normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
177	explicitly checks for interior pointers and pushes descriptors for target
178	objects.
179	<P>
180	The marker is structured to allow incremental marking.
181	Each call to <TT>GC_mark_some</tt> performs a small amount of
182	work towards marking the heap.
183	It maintains
184	explicit state in the form of <TT>GC_mark_state</tt>, which
185	identifies a particular sub-phase. Some other pieces of state, most
186	notably the mark stack, identify how much work remains to be done
187	in each sub-phase. The normal progression of mark states for
188	a stop-the-world collection is:
189	<OL>
190	<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
191	objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
192	be false, so the mark state is advanced to
193	<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
194	uncollectable objects, roots, and then mark everything reachable from them.
195	<TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
196	objects are pushed, and objects reachable from them are marked.
197	At that point, the next call to <TT>GC_mark_some</tt> calls
198	<TT>GC_push_roots</tt> to push the roots. It the advances the
199	mark state to
200	<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
201	empty, all reachable objects are marked. Once in this state, we work
202	only on emptying the mark stack. Once this is completed, the state
203	changes to
204	<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
205	</ol>
206
207	The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called
208	repeatedly by several of the sub-phases when the mark stack starts to fill
209	up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
210	to empty the mark stack.
211	The routine is designed to only perform a limited amount of marking at
212	each call, so that it can also be used by the incremental collector.
213	It is fairly carefully tuned, since it usually consumes a large majority
214	of the garbage collection time.
215	<P>
216	The marker correctly handles mark stack overflows. Whenever the mark stack
217	overflows, the mark state is reset to <TT>MS_INVALID</tt>.
218	Since there are already marked objects in the heap,
219	this eventually forces a complete
220	scan of the heap, searching for pointers, during which any unmarked objects
221	referenced by marked objects are again pushed on the mark stack. This
222	process is repeated until the mark phase completes without a stack overflow.
223	Each time the stack overflows, an attempt is made to grow the mark stack.
224	All pieces of the collector that push regions onto the mark stack have to be
225	careful to ensure forward progress, even in case of repeated mark stack
226	overflows. Every mark attempt results in additional marked objects.
227	<P>
228	Each mark stack entry is processed by examining all candidate pointers
229	in the range described by the entry. If the region has no associated
230	type information, then this typically requires that each 4-byte aligned
231	quantity (8-byte aligned with 64-bit pointers) be considered a candidate
232	pointer.
233	<P>
234	We determine whether a candidate pointer is actually the address of
235	a heap block. This is done in the following steps:
236	<NL>
237	<LI> The candidate pointer is checked against rough heap bounds.
238	These heap bounds are maintained such that all actual heap objects
239	fall between them. In order to facilitate black-listing (see below)
240	we also include address regions that the heap is likely to expand into.
241	Most non-pointers fail this initial test.
242	<LI> The candidate pointer is divided into two pieces; the most significant
243	bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
244	the least significant bits specify an offset within that page.
245	(A hardware page may actually consist of multiple such pages.
246	HBLKSIZE is usually the page size divided by a small power of two.)
247	<LI>
248	The page address part of the candidate pointer is looked up in a
249	<A HREF="tree.html">table</a>.
250	Each table entry contains either 0, indicating that the page is not part
251	of the garbage collected heap, a small integer <I>n</i>, indicating
252	that the page is part of large object, starting at least <I>n</i> pages
253	back, or a pointer to a descriptor for the page. In the first case,
254	the candidate pointer i not a true pointer and can be safely ignored.
255	In the last two cases, we can obtain a descriptor for the page containing
256	the beginning of the object.
257	<LI>
258	The starting address of the referenced object is computed.
259	The page descriptor contains the size of the object(s)
260	in that page, the object kind, and the necessary mark bits for those
261	objects. The size information can be used to map the candidate pointer
262	to the object starting address. To accelerate this process, the page header
263	also contains a pointer to a precomputed map of page offsets to displacements
264	from the beginning of an object. The use of this map avoids a
265	potentially slow integer remainder operation in computing the object
266	start address.
267	<LI>
268	The mark bit for the target object is checked and set. If the object
269	was previously unmarked, the object is pushed on the mark stack.
270	The descriptor is read from the page descriptor. (This is computed
271	from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
272	</nl>
273	<P>
274	At the end of the mark phase, mark bits for left-over free lists are cleared,
275	in case a free list was accidentally marked due to a stray pointer.
276
277	<H2>Sweep phase</h2>
278
279	At the end of the mark phase, all blocks in the heap are examined.
280	Unmarked large objects are immediately returned to the large object free list.
281	Each small object page is checked to see if all mark bits are clear.
282	If so, the entire page is returned to the large object free list.
283	Small object pages containing some reachable object are queued for later
284	sweeping.
285	<P>
286	This initial sweep pass touches only block headers, not
287	the blocks themselves. Thus it does not require significant paging, even
288	if large sections of the heap are not in physical memory.
289	<P>
290	Nonempty small object pages are swept when an allocation attempt
291	encounters an empty free list for that object size and kind.
292	Pages for the correct size and kind are repeatedly swept until at
293	least one empty block is found. Sweeping such a page involves
294	scanning the mark bit array in the page header, and building a free
295	list linked through the first words in the objects themselves.
296	This does involve touching the appropriate data page, but in most cases
297	it will be touched only just before it is used for allocation.
298	Hence any paging is essentially unavoidable.
299	<P>
300	Except in the case of pointer-free objects, we maintain the invariant
301	that any object in a small object free list is cleared (except possibly
302	for the link field). Thus it becomes the burden of the small object
303	sweep routine to clear objects. This has the advantage that we can
304	easily recover from accidentally marking a free list, though that could
305	also be handled by other means. The collector currently spends a fair
306	amount of time clearing objects, and this approach should probably be
307	revisited.
308	<P>
309	In most configurations, we use specialized sweep routines to handle common
310	small object sizes. Since we allocate one mark bit per word, it becomes
311	easier to examine the relevant mark bits if the object size divides
312	the word length evenly. We also suitably unroll the inner sweep loop
313	in each case. (It is conceivable that profile-based procedure cloning
314	in the compiler could make this unnecessary and counterproductive. I
315	know of no existing compiler to which this applies.)
316	<P>
317	The sweeping of small object pages could be avoided completely at the expense
318	of examining mark bits directly in the allocator. This would probably
319	be more expensive, since each allocation call would have to reload
320	a large amount of state (e.g. next object address to be swept, position
321	in mark bit table) before it could do its work. The current scheme
322	keeps the allocator simple and allows useful optimizations in the sweeper.
323
324	<H2>Finalization</h2>
325	Both <TT>GC_register_disappearing_link</tt> and
326	<TT>GC_register_finalizer</tt> add the request to a corresponding hash
327	table. The hash table is allocated out of collected memory, but
328	the reference to the finalizable object is hidden from the collector.
329	Currently finalization requests are processed non-incrementally at the
330	end of a mark cycle.
331	<P>
332	The collector makes an initial pass over the table of finalizable objects,
333	pushing the contents of unmarked objects onto the mark stack.
334	After pushing each object, the marker is invoked to mark all objects
335	reachable from it. The object itself is not explicitly marked.
336	This assures that objects on which a finalizer depends are neither
337	collected nor finalized.
338	<P>
339	If in the process of marking from an object the
340	object itself becomes marked, we have uncovered
341	a cycle involving the object. This usually results in a warning from the
342	collector. Such objects are not finalized, since it may be
343	unsafe to do so. See the more detailed
344	<A HREF="finalization.html"> discussion of finalization semantics</a>.
345	<P>
346	Any objects remaining unmarked at the end of this process are added to
347	a queue of objects whose finalizers can be run. Depending on collector
348	configuration, finalizers are dequeued and run either implicitly during
349	allocation calls, or explicitly in response to a user request.
350	<P>
351	The collector provides a mechanism for replacing the procedure that is
352	used to mark through objects. This is used both to provide support for
353	Java-style unordered finalization, and to ignore certain kinds of cycles,
354	<I>e.g.</i> those arising from C++ implementations of virtual inheritance.
355
356	<H2>Generational Collection and Dirty Bits</h2>
357	We basically use the parallel and generational GC algorithm described in
358	<A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>,
359	by Boehm, Demers, and Shenker.
360	<P>
361	The most significant modification is that
362	the collector always runs in the allocating thread.
363	There is no separate garbage collector thread.
364	If an allocation attempt either requests a large object, or encounters
365	an empty small object free list, and notices that there is a collection
366	in progress, it immediately performs a small amount of marking work
367	as described above.
368	<P>
369	This change was made both because we wanted to easily accommodate
370	single-threaded environments, and because a separate GC thread requires
371	very careful control over the scheduler to prevent the mutator from
372	out-running the collector, and hence provoking unneeded heap growth.
373	<P>
374	In incremental mode, the heap is always expanded when we encounter
375	insufficient space for an allocation. Garbage collection is triggered
376	whenever we notice that more than
377	<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
378	bytes of allocation have taken place.
379	After <TT>GC_full_freq</tt> minor collections a major collection
380	is started.
381	<P>
382	All collections initially run interrupted until a predetermined
383	amount of time (50 msecs by default) has expired. If this allows
384	the collection to complete entirely, we can avoid correcting
385	for data structure modifications during the collection. If it does
386	not complete, we return control to the mutator, and perform small
387	amounts of additional GC work during those later allocations that
388	cannot be satisfied from small object free lists. When marking completes,
389	the set of modified pages is retrieved, and we mark once again from
390	marked objects on those pages, this time with the mutator stopped.
391	<P>
392	We keep track of modified pages using one of three distinct mechanisms:
393	<OL>
394	<LI>
395	Through explicit mutator cooperation. Currently this requires
396	the use of <TT>GC_malloc_stubborn</tt>.
397	<LI>
398	By write-protecting physical pages and catching write faults. This is
399	implemented for many Unix-like systems and for win32. It is not possible
400	in a few environments.
401	<LI>
402	By retrieving dirty bit information from /proc. (Currently only Sun's
403	Solaris supports this. Though this is considerably cleaner, performance
404	may actually be better with mprotect and signals.)
405	</ol>
406
407	<H2>Thread support</h2>
408	We support several different threading models. Unfortunately Pthreads,
409	the only reasonably well standardized thread model, supports too narrow
410	an interface for conservative garbage collection. There appears to be
411	no portable way to allow the collector to coexist with various Pthreads
412	implementations. Hence we currently support only a few of the more
413	common Pthreads implementations.
414	<P>
415	In particular, it is very difficult for the collector to stop all other
416	threads in the system and examine the register contents. This is currently
417	accomplished with very different mechanisms for different Pthreads
418	implementations. The Solaris implementation temporarily disables much
419	of the user-level threads implementation by stopping kernel-level threads
420	("lwp"s). The Irix implementation sends signals to individual Pthreads
421	and has them wait in the signal handler. The Linux implementation
422	is similar in spirit to the Irix one.
423	<P>
424	The Irix implementation uses
425	only documented Pthreads calls, but relies on extensions to their semantics,
426	notably the use of mutexes and condition variables from signal
427	handlers. The Linux implementation should be far closer to
428	portable, though impirically it is not completely portable.
429	<P>
430	All implementations must
431	intercept thread creation and a few other thread-specific calls to allow
432	enumeration of threads and location of thread stacks. This is current
433	accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally
434	by using ld's function call wrapping mechanism under Linux.
435	<P>
436	Comments are appreciated. Please send mail to
437	<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a>
438	</body>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/gcc/boehm-gc/doc/gcdescr.html@ 3003

Download in other formats: