Context Navigation

gprof.info-2@ 1036

Visit:

Last change on this file since 1036 was 610, checked in by bird, 22 years ago
This commit was generated by cvs2svn to compensate for changes in r609, which included commits to RCS files with non-trunk default branches.
Property cvs2svn:cvs-rev set to `1.1.1.2` Property svn:eol-style set to `native` Property svn:executable set to ``*
File size: 34.0 KB

Line
1	This is gprof.info, produced by makeinfo version 4.3 from gprof.texi.
2
3	START-INFO-DIR-ENTRY
4	* gprof: (gprof). Profiling your program's execution
5	END-INFO-DIR-ENTRY
6
7	This file documents the gprof profiler of the GNU system.
8
9	Copyright (C) 1988, 92, 97, 98, 99, 2000, 2001 Free Software
10	Foundation, Inc.
11
12	Permission is granted to copy, distribute and/or modify this document
13	under the terms of the GNU Free Documentation License, Version 1.1 or
14	any later version published by the Free Software Foundation; with no
15	Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
16	Texts. A copy of the license is included in the section entitled "GNU
17	Free Documentation License".
18
19
20	File: gprof.info, Node: Line-by-line, Next: Annotated Source, Prev: Call Graph, Up: Output
21
22	Line-by-line Profiling
23	======================
24
25	`gprof''s `-l' option causes the program to perform "line-by-line"
26	profiling. In this mode, histogram samples are assigned not to
27	functions, but to individual lines of source code. The program usually
28	must be compiled with a `-g' option, in addition to `-pg', in order to
29	generate debugging symbols for tracking source code lines.
30
31	The flat profile is the most useful output table in line-by-line
32	mode. The call graph isn't as useful as normal, since the current
33	version of `gprof' does not propagate call graph arcs from source code
34	lines to the enclosing function. The call graph does, however, show
35	each line of code that called each function, along with a count.
36
37	Here is a section of `gprof''s output, without line-by-line
38	profiling. Note that `ct_init' accounted for four histogram hits, and
39	13327 calls to `init_block'.
40
41	Flat profile:
42
43	Each sample counts as 0.01 seconds.
44	% cumulative self self total
45	time seconds seconds calls us/call us/call name
46	30.77 0.13 0.04 6335 6.31 6.31 ct_init
47
48
49	Call graph (explanation follows)
50
51
52	granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
53
54	index % time self children called name
55
56	0.00 0.00 1/13496 name_too_long
57	0.00 0.00 40/13496 deflate
58	0.00 0.00 128/13496 deflate_fast
59	0.00 0.00 13327/13496 ct_init
60	[7] 0.0 0.00 0.00 13496 init_block
61
62	Now let's look at some of `gprof''s output from the same program run,
63	this time with line-by-line profiling enabled. Note that `ct_init''s
64	four histogram hits are broken down into four lines of source code -
65	one hit occurred on each of lines 349, 351, 382 and 385. In the call
66	graph, note how `ct_init''s 13327 calls to `init_block' are broken down
67	into one call from line 396, 3071 calls from line 384, 3730 calls from
68	line 385, and 6525 calls from 387.
69
70	Flat profile:
71
72	Each sample counts as 0.01 seconds.
73	% cumulative self
74	time seconds seconds calls name
75	7.69 0.10 0.01 ct_init (trees.c:349)
76	7.69 0.11 0.01 ct_init (trees.c:351)
77	7.69 0.12 0.01 ct_init (trees.c:382)
78	7.69 0.13 0.01 ct_init (trees.c:385)
79
80
81	Call graph (explanation follows)
82
83
84	granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
85
86	% time self children called name
87
88	0.00 0.00 1/13496 name_too_long (gzip.c:1440)
89	0.00 0.00 1/13496 deflate (deflate.c:763)
90	0.00 0.00 1/13496 ct_init (trees.c:396)
91	0.00 0.00 2/13496 deflate (deflate.c:727)
92	0.00 0.00 4/13496 deflate (deflate.c:686)
93	0.00 0.00 5/13496 deflate (deflate.c:675)
94	0.00 0.00 12/13496 deflate (deflate.c:679)
95	0.00 0.00 16/13496 deflate (deflate.c:730)
96	0.00 0.00 128/13496 deflate_fast (deflate.c:654)
97	0.00 0.00 3071/13496 ct_init (trees.c:384)
98	0.00 0.00 3730/13496 ct_init (trees.c:385)
99	0.00 0.00 6525/13496 ct_init (trees.c:387)
100	[6] 0.0 0.00 0.00 13496 init_block (trees.c:408)
101
102
103	File: gprof.info, Node: Annotated Source, Prev: Line-by-line, Up: Output
104
105	The Annotated Source Listing
106	============================
107
108	`gprof''s `-A' option triggers an annotated source listing, which
109	lists the program's source code, each function labeled with the number
110	of times it was called. You may also need to specify the `-I' option,
111	if `gprof' can't find the source code files.
112
113	Compiling with `gcc ... -g -pg -a' augments your program with
114	basic-block counting code, in addition to function counting code. This
115	enables `gprof' to determine how many times each line of code was
116	executed. For example, consider the following function, taken from
117	gzip, with line numbers added:
118
119	1 ulg updcrc(s, n)
120	2 uch *s;
121	3 unsigned n;
122	4 {
123	5 register ulg c;
124	6
125	7 static ulg crc = (ulg)0xffffffffL;
126	8
127	9 if (s == NULL) {
128	10 c = 0xffffffffL;
129	11 } else {
130	12 c = crc;
131	13 if (n) do {
132	14 c = crc_32_tab[...];
133	15 } while (--n);
134	16 }
135	17 crc = c;
136	18 return c ^ 0xffffffffL;
137	19 }
138
139	`updcrc' has at least five basic-blocks. One is the function
140	itself. The `if' statement on line 9 generates two more basic-blocks,
141	one for each branch of the `if'. A fourth basic-block results from the
142	`if' on line 13, and the contents of the `do' loop form the fifth
143	basic-block. The compiler may also generate additional basic-blocks to
144	handle various special cases.
145
146	A program augmented for basic-block counting can be analyzed with
147	`gprof -l -A'. I also suggest use of the `-x' option, which ensures
148	that each line of code is labeled at least once. Here is `updcrc''s
149	annotated source listing for a sample `gzip' run:
150
151	ulg updcrc(s, n)
152	uch *s;
153	unsigned n;
154	2 ->{
155	register ulg c;
156
157	static ulg crc = (ulg)0xffffffffL;
158
159	2 -> if (s == NULL) {
160	1 -> c = 0xffffffffL;
161	1 -> } else {
162	1 -> c = crc;
163	1 -> if (n) do {
164	26312 -> c = crc_32_tab[...];
165	26312,1,26311 -> } while (--n);
166	}
167	2 -> crc = c;
168	2 -> return c ^ 0xffffffffL;
169	2 ->}
170
171	In this example, the function was called twice, passing once through
172	each branch of the `if' statement. The body of the `do' loop was
173	executed a total of 26312 times. Note how the `while' statement is
174	annotated. It began execution 26312 times, once for each iteration
175	through the loop. One of those times (the last time) it exited, while
176	it branched back to the beginning of the loop 26311 times.
177
178
179	File: gprof.info, Node: Inaccuracy, Next: How do I?, Prev: Output, Up: Top
180
181	Inaccuracy of `gprof' Output
182	****************************
183
184	* Menu:
185
186	* Sampling Error:: Statistical margins of error
187	* Assumptions:: Estimating children times
188
189
190	File: gprof.info, Node: Sampling Error, Next: Assumptions, Up: Inaccuracy
191
192	Statistical Sampling Error
193	==========================
194
195	The run-time figures that `gprof' gives you are based on a sampling
196	process, so they are subject to statistical inaccuracy. If a function
197	runs only a small amount of time, so that on the average the sampling
198	process ought to catch that function in the act only once, there is a
199	pretty good chance it will actually find that function zero times, or
200	twice.
201
202	By contrast, the number-of-calls and basic-block figures are derived
203	by counting, not sampling. They are completely accurate and will not
204	vary from run to run if your program is deterministic.
205
206	The "sampling period" that is printed at the beginning of the flat
207	profile says how often samples are taken. The rule of thumb is that a
208	run-time figure is accurate if it is considerably bigger than the
209	sampling period.
210
211	The actual amount of error can be predicted. For N samples, the
212	_expected_ error is the square-root of N. For example, if the sampling
213	period is 0.01 seconds and `foo''s run-time is 1 second, N is 100
214	samples (1 second/0.01 seconds), sqrt(N) is 10 samples, so the expected
215	error in `foo''s run-time is 0.1 seconds (10*0.01 seconds), or ten
216	percent of the observed value. Again, if the sampling period is 0.01
217	seconds and `bar''s run-time is 100 seconds, N is 10000 samples,
218	sqrt(N) is 100 samples, so the expected error in `bar''s run-time is 1
219	second, or one percent of the observed value. It is likely to vary
220	this much _on the average_ from one profiling run to the next.
221	(_Sometimes_ it will vary more.)
222
223	This does not mean that a small run-time figure is devoid of
224	information. If the program's _total_ run-time is large, a small
225	run-time for one function does tell you that that function used an
226	insignificant fraction of the whole program's time. Usually this means
227	it is not worth optimizing.
228
229	One way to get more accuracy is to give your program more (but
230	similar) input data so it will take longer. Another way is to combine
231	the data from several runs, using the `-s' option of `gprof'. Here is
232	how:
233
234	1. Run your program once.
235
236	2. Issue the command `mv gmon.out gmon.sum'.
237
238	3. Run your program again, the same as before.
239
240	4. Merge the new data in `gmon.out' into `gmon.sum' with this command:
241
242	gprof -s EXECUTABLE-FILE gmon.out gmon.sum
243
244	5. Repeat the last two steps as often as you wish.
245
246	6. Analyze the cumulative data using this command:
247
248	gprof EXECUTABLE-FILE gmon.sum > OUTPUT-FILE
249
250
251	File: gprof.info, Node: Assumptions, Prev: Sampling Error, Up: Inaccuracy
252
253	Estimating `children' Times
254	===========================
255
256	Some of the figures in the call graph are estimates--for example, the
257	`children' time values and all the the time figures in caller and
258	subroutine lines.
259
260	There is no direct information about these measurements in the
261	profile data itself. Instead, `gprof' estimates them by making an
262	assumption about your program that might or might not be true.
263
264	The assumption made is that the average time spent in each call to
265	any function `foo' is not correlated with who called `foo'. If `foo'
266	used 5 seconds in all, and 2/5 of the calls to `foo' came from `a',
267	then `foo' contributes 2 seconds to `a''s `children' time, by
268	assumption.
269
270	This assumption is usually true enough, but for some programs it is
271	far from true. Suppose that `foo' returns very quickly when its
272	argument is zero; suppose that `a' always passes zero as an argument,
273	while other callers of `foo' pass other arguments. In this program,
274	all the time spent in `foo' is in the calls from callers other than `a'.
275	But `gprof' has no way of knowing this; it will blindly and incorrectly
276	charge 2 seconds of time in `foo' to the children of `a'.
277
278	We hope some day to put more complete data into `gmon.out', so that
279	this assumption is no longer needed, if we can figure out how. For the
280	nonce, the estimated figures are usually more useful than misleading.
281
282
283	File: gprof.info, Node: How do I?, Next: Incompatibilities, Prev: Inaccuracy, Up: Top
284
285	Answers to Common Questions
286	***************************
287
288	How do I find which lines in my program were executed the most times?
289	Compile your program with basic-block counting enabled, run it,
290	then use the following pipeline:
291
292	gprof -l -C OBJFILE \| sort -k 3 -n -r
293
294	This listing will show you the lines in your code executed most
295	often, but not necessarily those that consumed the most time.
296
297	How do I find which lines in my program called a particular function?
298	Use `gprof -l' and lookup the function in the call graph. The
299	callers will be broken down by function and line number.
300
301	How do I analyze a program that runs for less than a second?
302	Try using a shell script like this one:
303
304	for i in `seq 1 100`; do
305	fastprog
306	mv gmon.out gmon.out.$i
307	done
308
309	gprof -s fastprog gmon.out.*
310
311	gprof fastprog gmon.sum
312
313	If your program is completely deterministic, all the call counts
314	will be simple multiples of 100 (i.e. a function called once in
315	each run will appear with a call count of 100).
316
317
318	File: gprof.info, Node: Incompatibilities, Next: Details, Prev: How do I?, Up: Top
319
320	Incompatibilities with Unix `gprof'
321	***********************************
322
323	GNU `gprof' and Berkeley Unix `gprof' use the same data file
324	`gmon.out', and provide essentially the same information. But there
325	are a few differences.
326
327	* GNU `gprof' uses a new, generalized file format with support for
328	basic-block execution counts and non-realtime histograms. A magic
329	cookie and version number allows `gprof' to easily identify new
330	style files. Old BSD-style files can still be read. *Note File
331	Format::.
332
333	* For a recursive function, Unix `gprof' lists the function as a
334	parent and as a child, with a `calls' field that lists the number
335	of recursive calls. GNU `gprof' omits these lines and puts the
336	number of recursive calls in the primary line.
337
338	* When a function is suppressed from the call graph with `-e', GNU
339	`gprof' still lists it as a subroutine of functions that call it.
340
341	* GNU `gprof' accepts the `-k' with its argument in the form
342	`from/to', instead of `from to'.
343
344	* In the annotated source listing, if there are multiple basic
345	blocks on the same line, GNU `gprof' prints all of their counts,
346	separated by commas.
347
348	* The blurbs, field widths, and output formats are different. GNU
349	`gprof' prints blurbs after the tables, so that you can see the
350	tables without skipping the blurbs.
351
352
353	File: gprof.info, Node: Details, Next: GNU Free Documentation License, Prev: Incompatibilities, Up: Top
354
355	Details of Profiling
356	********************
357
358	* Menu:
359
360	* Implementation:: How a program collects profiling information
361	* File Format:: Format of `gmon.out' files
362	* Internals:: `gprof''s internal operation
363	* Debugging:: Using `gprof''s `-d' option
364
365
366	File: gprof.info, Node: Implementation, Next: File Format, Up: Details
367
368	Implementation of Profiling
369	===========================
370
371	Profiling works by changing how every function in your program is
372	compiled so that when it is called, it will stash away some information
373	about where it was called from. From this, the profiler can figure out
374	what function called it, and can count how many times it was called.
375	This change is made by the compiler when your program is compiled with
376	the `-pg' option, which causes every function to call `mcount' (or
377	`_mcount', or `__mcount', depending on the OS and compiler) as one of
378	its first operations.
379
380	The `mcount' routine, included in the profiling library, is
381	responsible for recording in an in-memory call graph table both its
382	parent routine (the child) and its parent's parent. This is typically
383	done by examining the stack frame to find both the address of the
384	child, and the return address in the original parent. Since this is a
385	very machine-dependent operation, `mcount' itself is typically a short
386	assembly-language stub routine that extracts the required information,
387	and then calls `__mcount_internal' (a normal C function) with two
388	arguments - `frompc' and `selfpc'. `__mcount_internal' is responsible
389	for maintaining the in-memory call graph, which records `frompc',
390	`selfpc', and the number of times each of these call arcs was traversed.
391
392	GCC Version 2 provides a magical function
393	(`__builtin_return_address'), which allows a generic `mcount' function
394	to extract the required information from the stack frame. However, on
395	some architectures, most notably the SPARC, using this builtin can be
396	very computationally expensive, and an assembly language version of
397	`mcount' is used for performance reasons.
398
399	Number-of-calls information for library routines is collected by
400	using a special version of the C library. The programs in it are the
401	same as in the usual C library, but they were compiled with `-pg'. If
402	you link your program with `gcc ... -pg', it automatically uses the
403	profiling version of the library.
404
405	Profiling also involves watching your program as it runs, and
406	keeping a histogram of where the program counter happens to be every
407	now and then. Typically the program counter is looked at around 100
408	times per second of run time, but the exact frequency may vary from
409	system to system.
410
411	This is done is one of two ways. Most UNIX-like operating systems
412	provide a `profil()' system call, which registers a memory array with
413	the kernel, along with a scale factor that determines how the program's
414	address space maps into the array. Typical scaling values cause every
415	2 to 8 bytes of address space to map into a single array slot. On
416	every tick of the system clock (assuming the profiled program is
417	running), the value of the program counter is examined and the
418	corresponding slot in the memory array is incremented. Since this is
419	done in the kernel, which had to interrupt the process anyway to handle
420	the clock interrupt, very little additional system overhead is required.
421
422	However, some operating systems, most notably Linux 2.0 (and
423	earlier), do not provide a `profil()' system call. On such a system,
424	arrangements are made for the kernel to periodically deliver a signal
425	to the process (typically via `setitimer()'), which then performs the
426	same operation of examining the program counter and incrementing a slot
427	in the memory array. Since this method requires a signal to be
428	delivered to user space every time a sample is taken, it uses
429	considerably more overhead than kernel-based profiling. Also, due to
430	the added delay required to deliver the signal, this method is less
431	accurate as well.
432
433	A special startup routine allocates memory for the histogram and
434	either calls `profil()' or sets up a clock signal handler. This
435	routine (`monstartup') can be invoked in several ways. On Linux
436	systems, a special profiling startup file `gcrt0.o', which invokes
437	`monstartup' before `main', is used instead of the default `crt0.o'.
438	Use of this special startup file is one of the effects of using `gcc
439	... -pg' to link. On SPARC systems, no special startup files are used.
440	Rather, the `mcount' routine, when it is invoked for the first time
441	(typically when `main' is called), calls `monstartup'.
442
443	If the compiler's `-a' option was used, basic-block counting is also
444	enabled. Each object file is then compiled with a static array of
445	counts, initially zero. In the executable code, every time a new
446	basic-block begins (i.e. when an `if' statement appears), an extra
447	instruction is inserted to increment the corresponding count in the
448	array. At compile time, a paired array was constructed that recorded
449	the starting address of each basic-block. Taken together, the two
450	arrays record the starting address of every basic-block, along with the
451	number of times it was executed.
452
453	The profiling library also includes a function (`mcleanup') which is
454	typically registered using `atexit()' to be called as the program
455	exits, and is responsible for writing the file `gmon.out'. Profiling
456	is turned off, various headers are output, and the histogram is
457	written, followed by the call-graph arcs and the basic-block counts.
458
459	The output from `gprof' gives no indication of parts of your program
460	that are limited by I/O or swapping bandwidth. This is because samples
461	of the program counter are taken at fixed intervals of the program's
462	run time. Therefore, the time measurements in `gprof' output say
463	nothing about time that your program was not running. For example, a
464	part of the program that creates so much data that it cannot all fit in
465	physical memory at once may run very slowly due to thrashing, but
466	`gprof' will say it uses little time. On the other hand, sampling by
467	run time has the advantage that the amount of load due to other users
468	won't directly affect the output you get.
469
470
471	File: gprof.info, Node: File Format, Next: Internals, Prev: Implementation, Up: Details
472
473	Profiling Data File Format
474	==========================
475
476	The old BSD-derived file format used for profile data does not
477	contain a magic cookie that allows to check whether a data file really
478	is a `gprof' file. Furthermore, it does not provide a version number,
479	thus rendering changes to the file format almost impossible. GNU
480	`gprof' uses a new file format that provides these features. For
481	backward compatibility, GNU `gprof' continues to support the old
482	BSD-derived format, but not all features are supported with it. For
483	example, basic-block execution counts cannot be accommodated by the old
484	file format.
485
486	The new file format is defined in header file `gmon_out.h'. It
487	consists of a header containing the magic cookie and a version number,
488	as well as some spare bytes available for future extensions. All data
489	in a profile data file is in the native format of the target for which
490	the profile was collected. GNU `gprof' adapts automatically to the
491	byte-order in use.
492
493	In the new file format, the header is followed by a sequence of
494	records. Currently, there are three different record types: histogram
495	records, call-graph arc records, and basic-block execution count
496	records. Each file can contain any number of each record type. When
497	reading a file, GNU `gprof' will ensure records of the same type are
498	compatible with each other and compute the union of all records. For
499	example, for basic-block execution counts, the union is simply the sum
500	of all execution counts for each basic-block.
501
502	Histogram Records
503	-----------------
504
505	Histogram records consist of a header that is followed by an array of
506	bins. The header contains the text-segment range that the histogram
507	spans, the size of the histogram in bytes (unlike in the old BSD
508	format, this does not include the size of the header), the rate of the
509	profiling clock, and the physical dimension that the bin counts
510	represent after being scaled by the profiling clock rate. The physical
511	dimension is specified in two parts: a long name of up to 15 characters
512	and a single character abbreviation. For example, a histogram
513	representing real-time would specify the long name as "seconds" and the
514	abbreviation as "s". This feature is useful for architectures that
515	support performance monitor hardware (which, fortunately, is becoming
516	increasingly common). For example, under DEC OSF/1, the "uprofile"
517	command can be used to produce a histogram of, say, instruction cache
518	misses. In this case, the dimension in the histogram header could be
519	set to "i-cache misses" and the abbreviation could be set to "1"
520	(because it is simply a count, not a physical dimension). Also, the
521	profiling rate would have to be set to 1 in this case.
522
523	Histogram bins are 16-bit numbers and each bin represent an equal
524	amount of text-space. For example, if the text-segment is one thousand
525	bytes long and if there are ten bins in the histogram, each bin
526	represents one hundred bytes.
527
528	Call-Graph Records
529	------------------
530
531	Call-graph records have a format that is identical to the one used in
532	the BSD-derived file format. It consists of an arc in the call graph
533	and a count indicating the number of times the arc was traversed during
534	program execution. Arcs are specified by a pair of addresses: the
535	first must be within caller's function and the second must be within
536	the callee's function. When performing profiling at the function
537	level, these addresses can point anywhere within the respective
538	function. However, when profiling at the line-level, it is better if
539	the addresses are as close to the call-site/entry-point as possible.
540	This will ensure that the line-level call-graph is able to identify
541	exactly which line of source code performed calls to a function.
542
543	Basic-Block Execution Count Records
544	-----------------------------------
545
546	Basic-block execution count records consist of a header followed by a
547	sequence of address/count pairs. The header simply specifies the
548	length of the sequence. In an address/count pair, the address
549	identifies a basic-block and the count specifies the number of times
550	that basic-block was executed. Any address within the basic-address can
551	be used.
552
553
554	File: gprof.info, Node: Internals, Next: Debugging, Prev: File Format, Up: Details
555
556	`gprof''s Internal Operation
557	============================
558
559	Like most programs, `gprof' begins by processing its options.
560	During this stage, it may building its symspec list
561	(`sym_ids.c:sym_id_add'), if options are specified which use symspecs.
562	`gprof' maintains a single linked list of symspecs, which will
563	eventually get turned into 12 symbol tables, organized into six
564	include/exclude pairs - one pair each for the flat profile
565	(INCL_FLAT/EXCL_FLAT), the call graph arcs (INCL_ARCS/EXCL_ARCS),
566	printing in the call graph (INCL_GRAPH/EXCL_GRAPH), timing propagation
567	in the call graph (INCL_TIME/EXCL_TIME), the annotated source listing
568	(INCL_ANNO/EXCL_ANNO), and the execution count listing
569	(INCL_EXEC/EXCL_EXEC).
570
571	After option processing, `gprof' finishes building the symspec list
572	by adding all the symspecs in `default_excluded_list' to the exclude
573	lists EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is
574	specified, EXCL_FLAT as well. These default excludes are not added to
575	EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
576
577	Next, the BFD library is called to open the object file, verify that
578	it is an object file, and read its symbol table (`core.c:core_init'),
579	using `bfd_canonicalize_symtab' after mallocing an appropriately sized
580	array of symbols. At this point, function mappings are read (if the
581	`--file-ordering' option has been specified), and the core text space
582	is read into memory (if the `-c' option was given).
583
584	`gprof''s own symbol table, an array of Sym structures, is now built.
585	This is done in one of two ways, by one of two routines, depending on
586	whether line-by-line profiling (`-l' option) has been enabled. For
587	normal profiling, the BFD canonical symbol table is scanned. For
588	line-by-line profiling, every text space address is examined, and a new
589	symbol table entry gets created every time the line number changes. In
590	either case, two passes are made through the symbol table - one to
591	count the size of the symbol table required, and the other to actually
592	read the symbols. In between the two passes, a single array of type
593	`Sym' is created of the appropriate length. Finally,
594	`symtab.c:symtab_finalize' is called to sort the symbol table and
595	remove duplicate entries (entries with the same memory address).
596
597	The symbol table must be a contiguous array for two reasons. First,
598	the `qsort' library function (which sorts an array) will be used to
599	sort the symbol table. Also, the symbol lookup routine
600	(`symtab.c:sym_lookup'), which finds symbols based on memory address,
601	uses a binary search algorithm which requires the symbol table to be a
602	sorted array. Function symbols are indicated with an `is_func' flag.
603	Line number symbols have no special flags set. Additionally, a symbol
604	can have an `is_static' flag to indicate that it is a local symbol.
605
606	With the symbol table read, the symspecs can now be translated into
607	Syms (`sym_ids.c:sym_id_parse'). Remember that a single symspec can
608	match multiple symbols. An array of symbol tables (`syms') is created,
609	each entry of which is a symbol table of Syms to be included or
610	excluded from a particular listing. The master symbol table and the
611	symspecs are examined by nested loops, and every symbol that matches a
612	symspec is inserted into the appropriate syms table. This is done
613	twice, once to count the size of each required symbol table, and again
614	to build the tables, which have been malloced between passes. From now
615	on, to determine whether a symbol is on an include or exclude symspec
616	list, `gprof' simply uses its standard symbol lookup routine on the
617	appropriate table in the `syms' array.
618
619	Now the profile data file(s) themselves are read
620	(`gmon_io.c:gmon_out_read'), first by checking for a new-style
621	`gmon.out' header, then assuming this is an old-style BSD `gmon.out' if
622	the magic number test failed.
623
624	New-style histogram records are read by `hist.c:hist_read_rec'. For
625	the first histogram record, allocate a memory array to hold all the
626	bins, and read them in. When multiple profile data files (or files
627	with multiple histogram records) are read, the starting address, ending
628	address, number of bins and sampling rate must match between the
629	various histograms, or a fatal error will result. If everything
630	matches, just sum the additional histograms into the existing in-memory
631	array.
632
633	As each call graph record is read (`call_graph.c:cg_read_rec'), the
634	parent and child addresses are matched to symbol table entries, and a
635	call graph arc is created by `cg_arcs.c:arc_add', unless the arc fails
636	a symspec check against INCL_ARCS/EXCL_ARCS. As each arc is added, a
637	linked list is maintained of the parent's child arcs, and of the child's
638	parent arcs. Both the child's call count and the arc's call count are
639	incremented by the record's call count.
640
641	Basic-block records are read (`basic_blocks.c:bb_read_rec'), but
642	only if line-by-line profiling has been selected. Each basic-block
643	address is matched to a corresponding line symbol in the symbol table,
644	and an entry made in the symbol's bb_addr and bb_calls arrays. Again,
645	if multiple basic-block records are present for the same address, the
646	call counts are cumulative.
647
648	A gmon.sum file is dumped, if requested (`gmon_io.c:gmon_out_write').
649
650	If histograms were present in the data files, assign them to symbols
651	(`hist.c:hist_assign_samples') by iterating over all the sample bins
652	and assigning them to symbols. Since the symbol table is sorted in
653	order of ascending memory addresses, we can simple follow along in the
654	symbol table as we make our pass over the sample bins. This step
655	includes a symspec check against INCL_FLAT/EXCL_FLAT. Depending on the
656	histogram scale factor, a sample bin may span multiple symbols, in
657	which case a fraction of the sample count is allocated to each symbol,
658	proportional to the degree of overlap. This effect is rare for normal
659	profiling, but overlaps are more common during line-by-line profiling,
660	and can cause each of two adjacent lines to be credited with half a
661	hit, for example.
662
663	If call graph data is present, `cg_arcs.c:cg_assemble' is called.
664	First, if `-c' was specified, a machine-dependent routine (`find_call')
665	scans through each symbol's machine code, looking for subroutine call
666	instructions, and adding them to the call graph with a zero call count.
667	A topological sort is performed by depth-first numbering all the
668	symbols (`cg_dfn.c:cg_dfn'), so that children are always numbered less
669	than their parents, then making a array of pointers into the symbol
670	table and sorting it into numerical order, which is reverse topological
671	order (children appear before parents). Cycles are also detected at
672	this point, all members of which are assigned the same topological
673	number. Two passes are now made through this sorted array of symbol
674	pointers. The first pass, from end to beginning (parents to children),
675	computes the fraction of child time to propagate to each parent and a
676	print flag. The print flag reflects symspec handling of
677	INCL_GRAPH/EXCL_GRAPH, with a parent's include or exclude (print or no
678	print) property being propagated to its children, unless they
679	themselves explicitly appear in INCL_GRAPH or EXCL_GRAPH. A second
680	pass, from beginning to end (children to parents) actually propagates
681	the timings along the call graph, subject to a check against
682	INCL_TIME/EXCL_TIME. With the print flag, fractions, and timings now
683	stored in the symbol structures, the topological sort array is now
684	discarded, and a new array of pointers is assembled, this time sorted
685	by propagated time.
686
687	Finally, print the various outputs the user requested, which is now
688	fairly straightforward. The call graph (`cg_print.c:cg_print') and
689	flat profile (`hist.c:hist_print') are regurgitations of values already
690	computed. The annotated source listing
691	(`basic_blocks.c:print_annotated_source') uses basic-block information,
692	if present, to label each line of code with call counts, otherwise only
693	the function call counts are presented.
694
695	The function ordering code is marginally well documented in the
696	source code itself (`cg_print.c'). Basically, the functions with the
697	most use and the most parents are placed first, followed by other
698	functions with the most use, followed by lower use functions, followed
699	by unused functions at the end.
700
701
702	File: gprof.info, Node: Debugging, Prev: Internals, Up: Details
703
704	Debugging `gprof'
705	-----------------
706
707	If `gprof' was compiled with debugging enabled, the `-d' option
708	triggers debugging output (to stdout) which can be helpful in
709	understanding its operation. The debugging number specified is
710	interpreted as a sum of the following options:
711
712	2 - Topological sort
713	Monitor depth-first numbering of symbols during call graph analysis
714
715	4 - Cycles
716	Shows symbols as they are identified as cycle heads
717
718	16 - Tallying
719	As the call graph arcs are read, show each arc and how the total
720	calls to each function are tallied
721
722	32 - Call graph arc sorting
723	Details sorting individual parents/children within each call graph
724	entry
725
726	64 - Reading histogram and call graph records
727	Shows address ranges of histograms as they are read, and each call
728	graph arc
729
730	128 - Symbol table
731	Reading, classifying, and sorting the symbol table from the object
732	file. For line-by-line profiling (`-l' option), also shows line
733	numbers being assigned to memory addresses.
734
735	256 - Static call graph
736	Trace operation of `-c' option
737
738	512 - Symbol table and arc table lookups
739	Detail operation of lookup routines
740
741	1024 - Call graph propagation
742	Shows how function times are propagated along the call graph
743
744	2048 - Basic-blocks
745	Shows basic-block records as they are read from profile data (only
746	meaningful with `-l' option)
747
748	4096 - Symspecs
749	Shows symspec-to-symbol pattern matching operation
750
751	8192 - Annotate source
752	Tracks operation of `-A' option
753

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format