1 | .TH LOCATEDB 5 \" -*- nroff -*-
|
---|
2 | .SH NAME
|
---|
3 | locatedb \- front-compressed file name database
|
---|
4 | .SH DESCRIPTION
|
---|
5 | This manual page documents the format of file name databases for the
|
---|
6 | GNU version of
|
---|
7 | .BR locate .
|
---|
8 | The file name databases contain lists of files that were in
|
---|
9 | particular directory trees when the databases were last updated.
|
---|
10 | .P
|
---|
11 | There can be multiple databases. Users can select which databases
|
---|
12 | \fBlocate\fP searches using an environment variable or command line
|
---|
13 | option; see \fBlocate\fP(1). The system administrator can choose the
|
---|
14 | file name of the default database, the frequency with which the
|
---|
15 | databases are updated, and the directories for which they contain
|
---|
16 | entries. Normally, file name databases are updated by running the
|
---|
17 | \fBupdatedb\fP program periodically, typically nightly; see
|
---|
18 | \fBupdatedb\fP(1).
|
---|
19 | .P
|
---|
20 | \fBupdatedb\fP runs a program called \fBfrcode\fP to compress the list
|
---|
21 | of file names using front-compression, which reduces
|
---|
22 | the database size by a factor of 4 to 5. Front-compression (also
|
---|
23 | known as incremental encoding) works as follows.
|
---|
24 | .P
|
---|
25 | The database entries are a sorted list (case-insensitively, for users'
|
---|
26 | convenience). Since the list is sorted, each entry is likely to share
|
---|
27 | a prefix (initial string) with the previous entry. Each database
|
---|
28 | entry begins with an offset-differential count byte, which is the
|
---|
29 | additional number of characters of prefix of the preceding entry to
|
---|
30 | use beyond the number that the preceding entry is using of its
|
---|
31 | predecessor. (The counts can be negative.) Following the count is a
|
---|
32 | null-terminated ASCII remainder \(em the part of the name that follows
|
---|
33 | the shared prefix.
|
---|
34 | .P
|
---|
35 | If the offset-differential count is larger than can be stored in a
|
---|
36 | byte (+/\-127), the byte has the value 0x80 and the count follows in a
|
---|
37 | 2-byte word, with the high byte first (network byte order).
|
---|
38 | .P
|
---|
39 | Every database begins with a dummy entry for a file called `LOCATE02',
|
---|
40 | which \fBlocate\fP checks for to ensure that the database file has the
|
---|
41 | correct format; it ignores the entry in doing the search.
|
---|
42 | .P
|
---|
43 | Databases can not be concatenated together, even if the first
|
---|
44 | (dummy) entry is trimmed from all but the first database. This
|
---|
45 | is because the offset-differential count in the first entry of the
|
---|
46 | second and following databases will be wrong.
|
---|
47 | .P
|
---|
48 | There is also an old database format, used by Unix
|
---|
49 | .B locate
|
---|
50 | and
|
---|
51 | .B find
|
---|
52 | programs and earlier releases of the GNU ones. \fBupdatedb\fP runs
|
---|
53 | programs called \fBbigram\fP and \fBcode\fP to produce old-format
|
---|
54 | databases. The old format differs from the above description in the
|
---|
55 | following ways. Instead of each entry starting with an
|
---|
56 | offset-differential count byte and ending with a null, byte values
|
---|
57 | from 0 through 28 indicate offset-differential counts from -14 through
|
---|
58 | 14. The byte value indicating that a long offset-differential count
|
---|
59 | follows is 0x1e (30), not 0x80. The long counts are stored in host
|
---|
60 | byte order, which is not necessarily network byte order, and host
|
---|
61 | integer word size, which is usually 4 bytes. They also represent a
|
---|
62 | count 14 less than their value. The database lines have no
|
---|
63 | termination byte; the start of the next line is indicated by its first
|
---|
64 | byte having a value <= 30.
|
---|
65 | .P
|
---|
66 | In addition, instead of starting with a dummy entry, the old database
|
---|
67 | format starts with a 256 byte table containing the 128 most common
|
---|
68 | bigrams in the file list. A bigram is a pair of adjacent bytes.
|
---|
69 | Bytes in the database that have the high bit set are indexes (with the
|
---|
70 | high bit cleared) into the bigram table. The bigram and
|
---|
71 | offset-differential count coding makes these databases 20-25% smaller
|
---|
72 | than the new format, but makes them not 8-bit clean. Any byte in a
|
---|
73 | file name that is in the ranges used for the special codes is replaced
|
---|
74 | in the database by a question mark, which not coincidentally is the
|
---|
75 | shell wildcard to match a single character.
|
---|
76 | .SH EXAMPLE
|
---|
77 | .nf
|
---|
78 |
|
---|
79 | Input to \fBfrcode\fP:
|
---|
80 | .\" with nulls changed to newlines:
|
---|
81 | /usr/src
|
---|
82 | /usr/src/cmd/aardvark.c
|
---|
83 | /usr/src/cmd/armadillo.c
|
---|
84 | /usr/tmp/zoo
|
---|
85 |
|
---|
86 | Length of the longest prefix of the preceding entry to share:
|
---|
87 | 0 /usr/src
|
---|
88 | 8 /cmd/aardvark.c
|
---|
89 | 14 rmadillo.c
|
---|
90 | 5 tmp/zoo
|
---|
91 |
|
---|
92 | .fi
|
---|
93 | Output from \fBfrcode\fP, with trailing nulls changed to newlines
|
---|
94 | and count bytes made printable:
|
---|
95 | .nf
|
---|
96 | 0 LOCATE02
|
---|
97 | 0 /usr/src
|
---|
98 | 8 /cmd/aardvark.c
|
---|
99 | 6 rmadillo.c
|
---|
100 | \-9 tmp/zoo
|
---|
101 |
|
---|
102 | (6 = 14 \- 8, and \-9 = 5 \- 14)
|
---|
103 | .fi
|
---|
104 | .SH "SEE ALSO"
|
---|
105 | \fBfind\fP(1), \fBlocate\fP(1), \fBlocatedb\fP(5), \fBxargs\fP(1)
|
---|
106 | \fBFinding Files\fP (on-line in Info, or printed)
|
---|
107 | .SH "BUGS"
|
---|
108 | .P
|
---|
109 | The best way to report a bug is to use the form at
|
---|
110 | http://savannah.gnu.org/bugs/?group=findutils.
|
---|
111 | The reason for this is that you will then be able to track progress in
|
---|
112 | fixing the problem. Other comments about \fBlocate\fP(1) and about
|
---|
113 | the findutils package in general can be sent to the
|
---|
114 | .I bug-findutils
|
---|
115 | mailing list. To join the list, send email to
|
---|
116 | .IR bug-findutils-request@gnu.org .
|
---|