source: trunk/src/binutils/gprof/gprof.info-2@ 2039

Last change on this file since 2039 was 610, checked in by bird, 22 years ago

This commit was generated by cvs2svn to compensate for changes in r609,
which included commits to RCS files with non-trunk default branches.

  • Property cvs2svn:cvs-rev set to 1.1.1.2
  • Property svn:eol-style set to native
  • Property svn:executable set to *
File size: 34.0 KB
Line 
1This is gprof.info, produced by makeinfo version 4.3 from gprof.texi.
2
3START-INFO-DIR-ENTRY
4* gprof: (gprof). Profiling your program's execution
5END-INFO-DIR-ENTRY
6
7 This file documents the gprof profiler of the GNU system.
8
9 Copyright (C) 1988, 92, 97, 98, 99, 2000, 2001 Free Software
10Foundation, Inc.
11
12 Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.1 or
14any later version published by the Free Software Foundation; with no
15Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
16Texts. A copy of the license is included in the section entitled "GNU
17Free Documentation License".
18
19
20File: gprof.info, Node: Line-by-line, Next: Annotated Source, Prev: Call Graph, Up: Output
21
22Line-by-line Profiling
23======================
24
25 `gprof''s `-l' option causes the program to perform "line-by-line"
26profiling. In this mode, histogram samples are assigned not to
27functions, but to individual lines of source code. The program usually
28must be compiled with a `-g' option, in addition to `-pg', in order to
29generate debugging symbols for tracking source code lines.
30
31 The flat profile is the most useful output table in line-by-line
32mode. The call graph isn't as useful as normal, since the current
33version of `gprof' does not propagate call graph arcs from source code
34lines to the enclosing function. The call graph does, however, show
35each line of code that called each function, along with a count.
36
37 Here is a section of `gprof''s output, without line-by-line
38profiling. Note that `ct_init' accounted for four histogram hits, and
3913327 calls to `init_block'.
40
41 Flat profile:
42
43 Each sample counts as 0.01 seconds.
44 % cumulative self self total
45 time seconds seconds calls us/call us/call name
46 30.77 0.13 0.04 6335 6.31 6.31 ct_init
47
48
49 Call graph (explanation follows)
50
51
52 granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
53
54 index % time self children called name
55
56 0.00 0.00 1/13496 name_too_long
57 0.00 0.00 40/13496 deflate
58 0.00 0.00 128/13496 deflate_fast
59 0.00 0.00 13327/13496 ct_init
60 [7] 0.0 0.00 0.00 13496 init_block
61
62 Now let's look at some of `gprof''s output from the same program run,
63this time with line-by-line profiling enabled. Note that `ct_init''s
64four histogram hits are broken down into four lines of source code -
65one hit occurred on each of lines 349, 351, 382 and 385. In the call
66graph, note how `ct_init''s 13327 calls to `init_block' are broken down
67into one call from line 396, 3071 calls from line 384, 3730 calls from
68line 385, and 6525 calls from 387.
69
70 Flat profile:
71
72 Each sample counts as 0.01 seconds.
73 % cumulative self
74 time seconds seconds calls name
75 7.69 0.10 0.01 ct_init (trees.c:349)
76 7.69 0.11 0.01 ct_init (trees.c:351)
77 7.69 0.12 0.01 ct_init (trees.c:382)
78 7.69 0.13 0.01 ct_init (trees.c:385)
79
80
81 Call graph (explanation follows)
82
83
84 granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
85
86 % time self children called name
87
88 0.00 0.00 1/13496 name_too_long (gzip.c:1440)
89 0.00 0.00 1/13496 deflate (deflate.c:763)
90 0.00 0.00 1/13496 ct_init (trees.c:396)
91 0.00 0.00 2/13496 deflate (deflate.c:727)
92 0.00 0.00 4/13496 deflate (deflate.c:686)
93 0.00 0.00 5/13496 deflate (deflate.c:675)
94 0.00 0.00 12/13496 deflate (deflate.c:679)
95 0.00 0.00 16/13496 deflate (deflate.c:730)
96 0.00 0.00 128/13496 deflate_fast (deflate.c:654)
97 0.00 0.00 3071/13496 ct_init (trees.c:384)
98 0.00 0.00 3730/13496 ct_init (trees.c:385)
99 0.00 0.00 6525/13496 ct_init (trees.c:387)
100 [6] 0.0 0.00 0.00 13496 init_block (trees.c:408)
101
102
103File: gprof.info, Node: Annotated Source, Prev: Line-by-line, Up: Output
104
105The Annotated Source Listing
106============================
107
108 `gprof''s `-A' option triggers an annotated source listing, which
109lists the program's source code, each function labeled with the number
110of times it was called. You may also need to specify the `-I' option,
111if `gprof' can't find the source code files.
112
113 Compiling with `gcc ... -g -pg -a' augments your program with
114basic-block counting code, in addition to function counting code. This
115enables `gprof' to determine how many times each line of code was
116executed. For example, consider the following function, taken from
117gzip, with line numbers added:
118
119 1 ulg updcrc(s, n)
120 2 uch *s;
121 3 unsigned n;
122 4 {
123 5 register ulg c;
124 6
125 7 static ulg crc = (ulg)0xffffffffL;
126 8
127 9 if (s == NULL) {
128 10 c = 0xffffffffL;
129 11 } else {
130 12 c = crc;
131 13 if (n) do {
132 14 c = crc_32_tab[...];
133 15 } while (--n);
134 16 }
135 17 crc = c;
136 18 return c ^ 0xffffffffL;
137 19 }
138
139 `updcrc' has at least five basic-blocks. One is the function
140itself. The `if' statement on line 9 generates two more basic-blocks,
141one for each branch of the `if'. A fourth basic-block results from the
142`if' on line 13, and the contents of the `do' loop form the fifth
143basic-block. The compiler may also generate additional basic-blocks to
144handle various special cases.
145
146 A program augmented for basic-block counting can be analyzed with
147`gprof -l -A'. I also suggest use of the `-x' option, which ensures
148that each line of code is labeled at least once. Here is `updcrc''s
149annotated source listing for a sample `gzip' run:
150
151 ulg updcrc(s, n)
152 uch *s;
153 unsigned n;
154 2 ->{
155 register ulg c;
156
157 static ulg crc = (ulg)0xffffffffL;
158
159 2 -> if (s == NULL) {
160 1 -> c = 0xffffffffL;
161 1 -> } else {
162 1 -> c = crc;
163 1 -> if (n) do {
164 26312 -> c = crc_32_tab[...];
165 26312,1,26311 -> } while (--n);
166 }
167 2 -> crc = c;
168 2 -> return c ^ 0xffffffffL;
169 2 ->}
170
171 In this example, the function was called twice, passing once through
172each branch of the `if' statement. The body of the `do' loop was
173executed a total of 26312 times. Note how the `while' statement is
174annotated. It began execution 26312 times, once for each iteration
175through the loop. One of those times (the last time) it exited, while
176it branched back to the beginning of the loop 26311 times.
177
178
179File: gprof.info, Node: Inaccuracy, Next: How do I?, Prev: Output, Up: Top
180
181Inaccuracy of `gprof' Output
182****************************
183
184* Menu:
185
186* Sampling Error:: Statistical margins of error
187* Assumptions:: Estimating children times
188
189
190File: gprof.info, Node: Sampling Error, Next: Assumptions, Up: Inaccuracy
191
192Statistical Sampling Error
193==========================
194
195 The run-time figures that `gprof' gives you are based on a sampling
196process, so they are subject to statistical inaccuracy. If a function
197runs only a small amount of time, so that on the average the sampling
198process ought to catch that function in the act only once, there is a
199pretty good chance it will actually find that function zero times, or
200twice.
201
202 By contrast, the number-of-calls and basic-block figures are derived
203by counting, not sampling. They are completely accurate and will not
204vary from run to run if your program is deterministic.
205
206 The "sampling period" that is printed at the beginning of the flat
207profile says how often samples are taken. The rule of thumb is that a
208run-time figure is accurate if it is considerably bigger than the
209sampling period.
210
211 The actual amount of error can be predicted. For N samples, the
212_expected_ error is the square-root of N. For example, if the sampling
213period is 0.01 seconds and `foo''s run-time is 1 second, N is 100
214samples (1 second/0.01 seconds), sqrt(N) is 10 samples, so the expected
215error in `foo''s run-time is 0.1 seconds (10*0.01 seconds), or ten
216percent of the observed value. Again, if the sampling period is 0.01
217seconds and `bar''s run-time is 100 seconds, N is 10000 samples,
218sqrt(N) is 100 samples, so the expected error in `bar''s run-time is 1
219second, or one percent of the observed value. It is likely to vary
220this much _on the average_ from one profiling run to the next.
221(_Sometimes_ it will vary more.)
222
223 This does not mean that a small run-time figure is devoid of
224information. If the program's _total_ run-time is large, a small
225run-time for one function does tell you that that function used an
226insignificant fraction of the whole program's time. Usually this means
227it is not worth optimizing.
228
229 One way to get more accuracy is to give your program more (but
230similar) input data so it will take longer. Another way is to combine
231the data from several runs, using the `-s' option of `gprof'. Here is
232how:
233
234 1. Run your program once.
235
236 2. Issue the command `mv gmon.out gmon.sum'.
237
238 3. Run your program again, the same as before.
239
240 4. Merge the new data in `gmon.out' into `gmon.sum' with this command:
241
242 gprof -s EXECUTABLE-FILE gmon.out gmon.sum
243
244 5. Repeat the last two steps as often as you wish.
245
246 6. Analyze the cumulative data using this command:
247
248 gprof EXECUTABLE-FILE gmon.sum > OUTPUT-FILE
249
250
251File: gprof.info, Node: Assumptions, Prev: Sampling Error, Up: Inaccuracy
252
253Estimating `children' Times
254===========================
255
256 Some of the figures in the call graph are estimates--for example, the
257`children' time values and all the the time figures in caller and
258subroutine lines.
259
260 There is no direct information about these measurements in the
261profile data itself. Instead, `gprof' estimates them by making an
262assumption about your program that might or might not be true.
263
264 The assumption made is that the average time spent in each call to
265any function `foo' is not correlated with who called `foo'. If `foo'
266used 5 seconds in all, and 2/5 of the calls to `foo' came from `a',
267then `foo' contributes 2 seconds to `a''s `children' time, by
268assumption.
269
270 This assumption is usually true enough, but for some programs it is
271far from true. Suppose that `foo' returns very quickly when its
272argument is zero; suppose that `a' always passes zero as an argument,
273while other callers of `foo' pass other arguments. In this program,
274all the time spent in `foo' is in the calls from callers other than `a'.
275But `gprof' has no way of knowing this; it will blindly and incorrectly
276charge 2 seconds of time in `foo' to the children of `a'.
277
278 We hope some day to put more complete data into `gmon.out', so that
279this assumption is no longer needed, if we can figure out how. For the
280nonce, the estimated figures are usually more useful than misleading.
281
282
283File: gprof.info, Node: How do I?, Next: Incompatibilities, Prev: Inaccuracy, Up: Top
284
285Answers to Common Questions
286***************************
287
288How do I find which lines in my program were executed the most times?
289 Compile your program with basic-block counting enabled, run it,
290 then use the following pipeline:
291
292 gprof -l -C OBJFILE | sort -k 3 -n -r
293
294 This listing will show you the lines in your code executed most
295 often, but not necessarily those that consumed the most time.
296
297How do I find which lines in my program called a particular function?
298 Use `gprof -l' and lookup the function in the call graph. The
299 callers will be broken down by function and line number.
300
301How do I analyze a program that runs for less than a second?
302 Try using a shell script like this one:
303
304 for i in `seq 1 100`; do
305 fastprog
306 mv gmon.out gmon.out.$i
307 done
308
309 gprof -s fastprog gmon.out.*
310
311 gprof fastprog gmon.sum
312
313 If your program is completely deterministic, all the call counts
314 will be simple multiples of 100 (i.e. a function called once in
315 each run will appear with a call count of 100).
316
317
318File: gprof.info, Node: Incompatibilities, Next: Details, Prev: How do I?, Up: Top
319
320Incompatibilities with Unix `gprof'
321***********************************
322
323 GNU `gprof' and Berkeley Unix `gprof' use the same data file
324`gmon.out', and provide essentially the same information. But there
325are a few differences.
326
327 * GNU `gprof' uses a new, generalized file format with support for
328 basic-block execution counts and non-realtime histograms. A magic
329 cookie and version number allows `gprof' to easily identify new
330 style files. Old BSD-style files can still be read. *Note File
331 Format::.
332
333 * For a recursive function, Unix `gprof' lists the function as a
334 parent and as a child, with a `calls' field that lists the number
335 of recursive calls. GNU `gprof' omits these lines and puts the
336 number of recursive calls in the primary line.
337
338 * When a function is suppressed from the call graph with `-e', GNU
339 `gprof' still lists it as a subroutine of functions that call it.
340
341 * GNU `gprof' accepts the `-k' with its argument in the form
342 `from/to', instead of `from to'.
343
344 * In the annotated source listing, if there are multiple basic
345 blocks on the same line, GNU `gprof' prints all of their counts,
346 separated by commas.
347
348 * The blurbs, field widths, and output formats are different. GNU
349 `gprof' prints blurbs after the tables, so that you can see the
350 tables without skipping the blurbs.
351
352
353File: gprof.info, Node: Details, Next: GNU Free Documentation License, Prev: Incompatibilities, Up: Top
354
355Details of Profiling
356********************
357
358* Menu:
359
360* Implementation:: How a program collects profiling information
361* File Format:: Format of `gmon.out' files
362* Internals:: `gprof''s internal operation
363* Debugging:: Using `gprof''s `-d' option
364
365
366File: gprof.info, Node: Implementation, Next: File Format, Up: Details
367
368Implementation of Profiling
369===========================
370
371 Profiling works by changing how every function in your program is
372compiled so that when it is called, it will stash away some information
373about where it was called from. From this, the profiler can figure out
374what function called it, and can count how many times it was called.
375This change is made by the compiler when your program is compiled with
376the `-pg' option, which causes every function to call `mcount' (or
377`_mcount', or `__mcount', depending on the OS and compiler) as one of
378its first operations.
379
380 The `mcount' routine, included in the profiling library, is
381responsible for recording in an in-memory call graph table both its
382parent routine (the child) and its parent's parent. This is typically
383done by examining the stack frame to find both the address of the
384child, and the return address in the original parent. Since this is a
385very machine-dependent operation, `mcount' itself is typically a short
386assembly-language stub routine that extracts the required information,
387and then calls `__mcount_internal' (a normal C function) with two
388arguments - `frompc' and `selfpc'. `__mcount_internal' is responsible
389for maintaining the in-memory call graph, which records `frompc',
390`selfpc', and the number of times each of these call arcs was traversed.
391
392 GCC Version 2 provides a magical function
393(`__builtin_return_address'), which allows a generic `mcount' function
394to extract the required information from the stack frame. However, on
395some architectures, most notably the SPARC, using this builtin can be
396very computationally expensive, and an assembly language version of
397`mcount' is used for performance reasons.
398
399 Number-of-calls information for library routines is collected by
400using a special version of the C library. The programs in it are the
401same as in the usual C library, but they were compiled with `-pg'. If
402you link your program with `gcc ... -pg', it automatically uses the
403profiling version of the library.
404
405 Profiling also involves watching your program as it runs, and
406keeping a histogram of where the program counter happens to be every
407now and then. Typically the program counter is looked at around 100
408times per second of run time, but the exact frequency may vary from
409system to system.
410
411 This is done is one of two ways. Most UNIX-like operating systems
412provide a `profil()' system call, which registers a memory array with
413the kernel, along with a scale factor that determines how the program's
414address space maps into the array. Typical scaling values cause every
4152 to 8 bytes of address space to map into a single array slot. On
416every tick of the system clock (assuming the profiled program is
417running), the value of the program counter is examined and the
418corresponding slot in the memory array is incremented. Since this is
419done in the kernel, which had to interrupt the process anyway to handle
420the clock interrupt, very little additional system overhead is required.
421
422 However, some operating systems, most notably Linux 2.0 (and
423earlier), do not provide a `profil()' system call. On such a system,
424arrangements are made for the kernel to periodically deliver a signal
425to the process (typically via `setitimer()'), which then performs the
426same operation of examining the program counter and incrementing a slot
427in the memory array. Since this method requires a signal to be
428delivered to user space every time a sample is taken, it uses
429considerably more overhead than kernel-based profiling. Also, due to
430the added delay required to deliver the signal, this method is less
431accurate as well.
432
433 A special startup routine allocates memory for the histogram and
434either calls `profil()' or sets up a clock signal handler. This
435routine (`monstartup') can be invoked in several ways. On Linux
436systems, a special profiling startup file `gcrt0.o', which invokes
437`monstartup' before `main', is used instead of the default `crt0.o'.
438Use of this special startup file is one of the effects of using `gcc
439... -pg' to link. On SPARC systems, no special startup files are used.
440Rather, the `mcount' routine, when it is invoked for the first time
441(typically when `main' is called), calls `monstartup'.
442
443 If the compiler's `-a' option was used, basic-block counting is also
444enabled. Each object file is then compiled with a static array of
445counts, initially zero. In the executable code, every time a new
446basic-block begins (i.e. when an `if' statement appears), an extra
447instruction is inserted to increment the corresponding count in the
448array. At compile time, a paired array was constructed that recorded
449the starting address of each basic-block. Taken together, the two
450arrays record the starting address of every basic-block, along with the
451number of times it was executed.
452
453 The profiling library also includes a function (`mcleanup') which is
454typically registered using `atexit()' to be called as the program
455exits, and is responsible for writing the file `gmon.out'. Profiling
456is turned off, various headers are output, and the histogram is
457written, followed by the call-graph arcs and the basic-block counts.
458
459 The output from `gprof' gives no indication of parts of your program
460that are limited by I/O or swapping bandwidth. This is because samples
461of the program counter are taken at fixed intervals of the program's
462run time. Therefore, the time measurements in `gprof' output say
463nothing about time that your program was not running. For example, a
464part of the program that creates so much data that it cannot all fit in
465physical memory at once may run very slowly due to thrashing, but
466`gprof' will say it uses little time. On the other hand, sampling by
467run time has the advantage that the amount of load due to other users
468won't directly affect the output you get.
469
470
471File: gprof.info, Node: File Format, Next: Internals, Prev: Implementation, Up: Details
472
473Profiling Data File Format
474==========================
475
476 The old BSD-derived file format used for profile data does not
477contain a magic cookie that allows to check whether a data file really
478is a `gprof' file. Furthermore, it does not provide a version number,
479thus rendering changes to the file format almost impossible. GNU
480`gprof' uses a new file format that provides these features. For
481backward compatibility, GNU `gprof' continues to support the old
482BSD-derived format, but not all features are supported with it. For
483example, basic-block execution counts cannot be accommodated by the old
484file format.
485
486 The new file format is defined in header file `gmon_out.h'. It
487consists of a header containing the magic cookie and a version number,
488as well as some spare bytes available for future extensions. All data
489in a profile data file is in the native format of the target for which
490the profile was collected. GNU `gprof' adapts automatically to the
491byte-order in use.
492
493 In the new file format, the header is followed by a sequence of
494records. Currently, there are three different record types: histogram
495records, call-graph arc records, and basic-block execution count
496records. Each file can contain any number of each record type. When
497reading a file, GNU `gprof' will ensure records of the same type are
498compatible with each other and compute the union of all records. For
499example, for basic-block execution counts, the union is simply the sum
500of all execution counts for each basic-block.
501
502Histogram Records
503-----------------
504
505 Histogram records consist of a header that is followed by an array of
506bins. The header contains the text-segment range that the histogram
507spans, the size of the histogram in bytes (unlike in the old BSD
508format, this does not include the size of the header), the rate of the
509profiling clock, and the physical dimension that the bin counts
510represent after being scaled by the profiling clock rate. The physical
511dimension is specified in two parts: a long name of up to 15 characters
512and a single character abbreviation. For example, a histogram
513representing real-time would specify the long name as "seconds" and the
514abbreviation as "s". This feature is useful for architectures that
515support performance monitor hardware (which, fortunately, is becoming
516increasingly common). For example, under DEC OSF/1, the "uprofile"
517command can be used to produce a histogram of, say, instruction cache
518misses. In this case, the dimension in the histogram header could be
519set to "i-cache misses" and the abbreviation could be set to "1"
520(because it is simply a count, not a physical dimension). Also, the
521profiling rate would have to be set to 1 in this case.
522
523 Histogram bins are 16-bit numbers and each bin represent an equal
524amount of text-space. For example, if the text-segment is one thousand
525bytes long and if there are ten bins in the histogram, each bin
526represents one hundred bytes.
527
528Call-Graph Records
529------------------
530
531 Call-graph records have a format that is identical to the one used in
532the BSD-derived file format. It consists of an arc in the call graph
533and a count indicating the number of times the arc was traversed during
534program execution. Arcs are specified by a pair of addresses: the
535first must be within caller's function and the second must be within
536the callee's function. When performing profiling at the function
537level, these addresses can point anywhere within the respective
538function. However, when profiling at the line-level, it is better if
539the addresses are as close to the call-site/entry-point as possible.
540This will ensure that the line-level call-graph is able to identify
541exactly which line of source code performed calls to a function.
542
543Basic-Block Execution Count Records
544-----------------------------------
545
546 Basic-block execution count records consist of a header followed by a
547sequence of address/count pairs. The header simply specifies the
548length of the sequence. In an address/count pair, the address
549identifies a basic-block and the count specifies the number of times
550that basic-block was executed. Any address within the basic-address can
551be used.
552
553
554File: gprof.info, Node: Internals, Next: Debugging, Prev: File Format, Up: Details
555
556`gprof''s Internal Operation
557============================
558
559 Like most programs, `gprof' begins by processing its options.
560During this stage, it may building its symspec list
561(`sym_ids.c:sym_id_add'), if options are specified which use symspecs.
562`gprof' maintains a single linked list of symspecs, which will
563eventually get turned into 12 symbol tables, organized into six
564include/exclude pairs - one pair each for the flat profile
565(INCL_FLAT/EXCL_FLAT), the call graph arcs (INCL_ARCS/EXCL_ARCS),
566printing in the call graph (INCL_GRAPH/EXCL_GRAPH), timing propagation
567in the call graph (INCL_TIME/EXCL_TIME), the annotated source listing
568(INCL_ANNO/EXCL_ANNO), and the execution count listing
569(INCL_EXEC/EXCL_EXEC).
570
571 After option processing, `gprof' finishes building the symspec list
572by adding all the symspecs in `default_excluded_list' to the exclude
573lists EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is
574specified, EXCL_FLAT as well. These default excludes are not added to
575EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
576
577 Next, the BFD library is called to open the object file, verify that
578it is an object file, and read its symbol table (`core.c:core_init'),
579using `bfd_canonicalize_symtab' after mallocing an appropriately sized
580array of symbols. At this point, function mappings are read (if the
581`--file-ordering' option has been specified), and the core text space
582is read into memory (if the `-c' option was given).
583
584 `gprof''s own symbol table, an array of Sym structures, is now built.
585This is done in one of two ways, by one of two routines, depending on
586whether line-by-line profiling (`-l' option) has been enabled. For
587normal profiling, the BFD canonical symbol table is scanned. For
588line-by-line profiling, every text space address is examined, and a new
589symbol table entry gets created every time the line number changes. In
590either case, two passes are made through the symbol table - one to
591count the size of the symbol table required, and the other to actually
592read the symbols. In between the two passes, a single array of type
593`Sym' is created of the appropriate length. Finally,
594`symtab.c:symtab_finalize' is called to sort the symbol table and
595remove duplicate entries (entries with the same memory address).
596
597 The symbol table must be a contiguous array for two reasons. First,
598the `qsort' library function (which sorts an array) will be used to
599sort the symbol table. Also, the symbol lookup routine
600(`symtab.c:sym_lookup'), which finds symbols based on memory address,
601uses a binary search algorithm which requires the symbol table to be a
602sorted array. Function symbols are indicated with an `is_func' flag.
603Line number symbols have no special flags set. Additionally, a symbol
604can have an `is_static' flag to indicate that it is a local symbol.
605
606 With the symbol table read, the symspecs can now be translated into
607Syms (`sym_ids.c:sym_id_parse'). Remember that a single symspec can
608match multiple symbols. An array of symbol tables (`syms') is created,
609each entry of which is a symbol table of Syms to be included or
610excluded from a particular listing. The master symbol table and the
611symspecs are examined by nested loops, and every symbol that matches a
612symspec is inserted into the appropriate syms table. This is done
613twice, once to count the size of each required symbol table, and again
614to build the tables, which have been malloced between passes. From now
615on, to determine whether a symbol is on an include or exclude symspec
616list, `gprof' simply uses its standard symbol lookup routine on the
617appropriate table in the `syms' array.
618
619 Now the profile data file(s) themselves are read
620(`gmon_io.c:gmon_out_read'), first by checking for a new-style
621`gmon.out' header, then assuming this is an old-style BSD `gmon.out' if
622the magic number test failed.
623
624 New-style histogram records are read by `hist.c:hist_read_rec'. For
625the first histogram record, allocate a memory array to hold all the
626bins, and read them in. When multiple profile data files (or files
627with multiple histogram records) are read, the starting address, ending
628address, number of bins and sampling rate must match between the
629various histograms, or a fatal error will result. If everything
630matches, just sum the additional histograms into the existing in-memory
631array.
632
633 As each call graph record is read (`call_graph.c:cg_read_rec'), the
634parent and child addresses are matched to symbol table entries, and a
635call graph arc is created by `cg_arcs.c:arc_add', unless the arc fails
636a symspec check against INCL_ARCS/EXCL_ARCS. As each arc is added, a
637linked list is maintained of the parent's child arcs, and of the child's
638parent arcs. Both the child's call count and the arc's call count are
639incremented by the record's call count.
640
641 Basic-block records are read (`basic_blocks.c:bb_read_rec'), but
642only if line-by-line profiling has been selected. Each basic-block
643address is matched to a corresponding line symbol in the symbol table,
644and an entry made in the symbol's bb_addr and bb_calls arrays. Again,
645if multiple basic-block records are present for the same address, the
646call counts are cumulative.
647
648 A gmon.sum file is dumped, if requested (`gmon_io.c:gmon_out_write').
649
650 If histograms were present in the data files, assign them to symbols
651(`hist.c:hist_assign_samples') by iterating over all the sample bins
652and assigning them to symbols. Since the symbol table is sorted in
653order of ascending memory addresses, we can simple follow along in the
654symbol table as we make our pass over the sample bins. This step
655includes a symspec check against INCL_FLAT/EXCL_FLAT. Depending on the
656histogram scale factor, a sample bin may span multiple symbols, in
657which case a fraction of the sample count is allocated to each symbol,
658proportional to the degree of overlap. This effect is rare for normal
659profiling, but overlaps are more common during line-by-line profiling,
660and can cause each of two adjacent lines to be credited with half a
661hit, for example.
662
663 If call graph data is present, `cg_arcs.c:cg_assemble' is called.
664First, if `-c' was specified, a machine-dependent routine (`find_call')
665scans through each symbol's machine code, looking for subroutine call
666instructions, and adding them to the call graph with a zero call count.
667A topological sort is performed by depth-first numbering all the
668symbols (`cg_dfn.c:cg_dfn'), so that children are always numbered less
669than their parents, then making a array of pointers into the symbol
670table and sorting it into numerical order, which is reverse topological
671order (children appear before parents). Cycles are also detected at
672this point, all members of which are assigned the same topological
673number. Two passes are now made through this sorted array of symbol
674pointers. The first pass, from end to beginning (parents to children),
675computes the fraction of child time to propagate to each parent and a
676print flag. The print flag reflects symspec handling of
677INCL_GRAPH/EXCL_GRAPH, with a parent's include or exclude (print or no
678print) property being propagated to its children, unless they
679themselves explicitly appear in INCL_GRAPH or EXCL_GRAPH. A second
680pass, from beginning to end (children to parents) actually propagates
681the timings along the call graph, subject to a check against
682INCL_TIME/EXCL_TIME. With the print flag, fractions, and timings now
683stored in the symbol structures, the topological sort array is now
684discarded, and a new array of pointers is assembled, this time sorted
685by propagated time.
686
687 Finally, print the various outputs the user requested, which is now
688fairly straightforward. The call graph (`cg_print.c:cg_print') and
689flat profile (`hist.c:hist_print') are regurgitations of values already
690computed. The annotated source listing
691(`basic_blocks.c:print_annotated_source') uses basic-block information,
692if present, to label each line of code with call counts, otherwise only
693the function call counts are presented.
694
695 The function ordering code is marginally well documented in the
696source code itself (`cg_print.c'). Basically, the functions with the
697most use and the most parents are placed first, followed by other
698functions with the most use, followed by lower use functions, followed
699by unused functions at the end.
700
701
702File: gprof.info, Node: Debugging, Prev: Internals, Up: Details
703
704Debugging `gprof'
705-----------------
706
707 If `gprof' was compiled with debugging enabled, the `-d' option
708triggers debugging output (to stdout) which can be helpful in
709understanding its operation. The debugging number specified is
710interpreted as a sum of the following options:
711
7122 - Topological sort
713 Monitor depth-first numbering of symbols during call graph analysis
714
7154 - Cycles
716 Shows symbols as they are identified as cycle heads
717
71816 - Tallying
719 As the call graph arcs are read, show each arc and how the total
720 calls to each function are tallied
721
72232 - Call graph arc sorting
723 Details sorting individual parents/children within each call graph
724 entry
725
72664 - Reading histogram and call graph records
727 Shows address ranges of histograms as they are read, and each call
728 graph arc
729
730128 - Symbol table
731 Reading, classifying, and sorting the symbol table from the object
732 file. For line-by-line profiling (`-l' option), also shows line
733 numbers being assigned to memory addresses.
734
735256 - Static call graph
736 Trace operation of `-c' option
737
738512 - Symbol table and arc table lookups
739 Detail operation of lookup routines
740
7411024 - Call graph propagation
742 Shows how function times are propagated along the call graph
743
7442048 - Basic-blocks
745 Shows basic-block records as they are read from profile data (only
746 meaningful with `-l' option)
747
7484096 - Symspecs
749 Shows symspec-to-symbol pattern matching operation
750
7518192 - Annotate source
752 Tracks operation of `-A' option
753
Note: See TracBrowser for help on using the repository browser.