1 | @c This is part of the paxutils manual.
|
---|
2 | @c Copyright (C) 2006 Free Software Foundation, Inc.
|
---|
3 | @c This file is distributed under GFDL 1.1 or any later version
|
---|
4 | @c published by the Free Software Foundation.
|
---|
5 |
|
---|
6 | @menu
|
---|
7 | * Standard:: Basic Tar Format
|
---|
8 | * Extensions:: @acronym{GNU} Extensions to the Archive Format
|
---|
9 | * Sparse Formats:: Storing Sparse Files
|
---|
10 | * Snapshot Files::
|
---|
11 | * Dumpdir::
|
---|
12 | @end menu
|
---|
13 |
|
---|
14 | @node Standard
|
---|
15 | @unnumberedsec Basic Tar Format
|
---|
16 | @UNREVISED
|
---|
17 |
|
---|
18 | While an archive may contain many files, the archive itself is a
|
---|
19 | single ordinary file. Like any other file, an archive file can be
|
---|
20 | written to a storage device such as a tape or disk, sent through a
|
---|
21 | pipe or over a network, saved on the active file system, or even
|
---|
22 | stored in another archive. An archive file is not easy to read or
|
---|
23 | manipulate without using the @command{tar} utility or Tar mode in
|
---|
24 | @acronym{GNU} Emacs.
|
---|
25 |
|
---|
26 | Physically, an archive consists of a series of file entries terminated
|
---|
27 | by an end-of-archive entry, which consists of two 512 blocks of zero
|
---|
28 | bytes. A file
|
---|
29 | entry usually describes one of the files in the archive (an
|
---|
30 | @dfn{archive member}), and consists of a file header and the contents
|
---|
31 | of the file. File headers contain file names and statistics, checksum
|
---|
32 | information which @command{tar} uses to detect file corruption, and
|
---|
33 | information about file types.
|
---|
34 |
|
---|
35 | Archives are permitted to have more than one member with the same
|
---|
36 | member name. One way this situation can occur is if more than one
|
---|
37 | version of a file has been stored in the archive. For information
|
---|
38 | about adding new versions of a file to an archive, see @ref{update}.
|
---|
39 | @FIXME-xref{To learn more about having more than one archive member with the
|
---|
40 | same name, see -backup node, when it's written.}
|
---|
41 |
|
---|
42 | In addition to entries describing archive members, an archive may
|
---|
43 | contain entries which @command{tar} itself uses to store information.
|
---|
44 | @xref{label}, for an example of such an archive entry.
|
---|
45 |
|
---|
46 | A @command{tar} archive file contains a series of blocks. Each block
|
---|
47 | contains @code{BLOCKSIZE} bytes. Although this format may be thought
|
---|
48 | of as being on magnetic tape, other media are often used.
|
---|
49 |
|
---|
50 | Each file archived is represented by a header block which describes
|
---|
51 | the file, followed by zero or more blocks which give the contents
|
---|
52 | of the file. At the end of the archive file there are two 512-byte blocks
|
---|
53 | filled with binary zeros as an end-of-file marker. A reasonable system
|
---|
54 | should write such end-of-file marker at the end of an archive, but
|
---|
55 | must not assume that such a block exists when reading an archive. In
|
---|
56 | particular @GNUTAR{} always issues a warning if it does not encounter it.
|
---|
57 |
|
---|
58 | The blocks may be @dfn{blocked} for physical I/O operations.
|
---|
59 | Each record of @var{n} blocks (where @var{n} is set by the
|
---|
60 | @option{--blocking-factor=@var{512-size}} (@option{-b @var{512-size}}) option to @command{tar}) is written with a single
|
---|
61 | @w{@samp{write ()}} operation. On magnetic tapes, the result of
|
---|
62 | such a write is a single record. When writing an archive,
|
---|
63 | the last record of blocks should be written at the full size, with
|
---|
64 | blocks after the zero block containing all zeros. When reading
|
---|
65 | an archive, a reasonable system should properly handle an archive
|
---|
66 | whose last record is shorter than the rest, or which contains garbage
|
---|
67 | records after a zero block.
|
---|
68 |
|
---|
69 | The header block is defined in C as follows. In the @GNUTAR{}
|
---|
70 | distribution, this is part of file @file{src/tar.h}:
|
---|
71 |
|
---|
72 | @smallexample
|
---|
73 | @include header.texi
|
---|
74 | @end smallexample
|
---|
75 |
|
---|
76 | All characters in header blocks are represented by using 8-bit
|
---|
77 | characters in the local variant of ASCII. Each field within the
|
---|
78 | structure is contiguous; that is, there is no padding used within
|
---|
79 | the structure. Each character on the archive medium is stored
|
---|
80 | contiguously.
|
---|
81 |
|
---|
82 | Bytes representing the contents of files (after the header block
|
---|
83 | of each file) are not translated in any way and are not constrained
|
---|
84 | to represent characters in any character set. The @command{tar} format
|
---|
85 | does not distinguish text files from binary files, and no translation
|
---|
86 | of file contents is performed.
|
---|
87 |
|
---|
88 | The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and
|
---|
89 | @code{gname} are null-terminated character strings. All other fields
|
---|
90 | are zero-filled octal numbers in ASCII. Each numeric field of width
|
---|
91 | @var{w} contains @var{w} minus 1 digits, and a null.
|
---|
92 |
|
---|
93 | The @code{name} field is the file name of the file, with directory names
|
---|
94 | (if any) preceding the file name, separated by slashes.
|
---|
95 |
|
---|
96 | @FIXME{how big a name before field overflows?}
|
---|
97 |
|
---|
98 | The @code{mode} field provides nine bits specifying file permissions
|
---|
99 | and three bits to specify the Set UID, Set GID, and Save Text
|
---|
100 | (@dfn{sticky}) modes. Values for these bits are defined above.
|
---|
101 | When special permissions are required to create a file with a given
|
---|
102 | mode, and the user restoring files from the archive does not hold such
|
---|
103 | permissions, the mode bit(s) specifying those special permissions
|
---|
104 | are ignored. Modes which are not supported by the operating system
|
---|
105 | restoring files from the archive will be ignored. Unsupported modes
|
---|
106 | should be faked up when creating or updating an archive; e.g., the
|
---|
107 | group permission could be copied from the @emph{other} permission.
|
---|
108 |
|
---|
109 | The @code{uid} and @code{gid} fields are the numeric user and group
|
---|
110 | ID of the file owners, respectively. If the operating system does
|
---|
111 | not support numeric user or group IDs, these fields should be ignored.
|
---|
112 |
|
---|
113 | The @code{size} field is the size of the file in bytes; linked files
|
---|
114 | are archived with this field specified as zero. @FIXME-xref{Modifiers, in
|
---|
115 | particular the @option{--incremental} (@option{-G}) option.}
|
---|
116 |
|
---|
117 | The @code{mtime} field is the data modification time of the file at
|
---|
118 | the time it was archived. It is the ASCII representation of the octal
|
---|
119 | value of the last time the file's contents were modified, represented
|
---|
120 | as an integer number of
|
---|
121 | seconds since January 1, 1970, 00:00 Coordinated Universal Time.
|
---|
122 |
|
---|
123 | The @code{chksum} field is the ASCII representation of the octal value
|
---|
124 | of the simple sum of all bytes in the header block. Each 8-bit
|
---|
125 | byte in the header is added to an unsigned integer, initialized to
|
---|
126 | zero, the precision of which shall be no less than seventeen bits.
|
---|
127 | When calculating the checksum, the @code{chksum} field is treated as
|
---|
128 | if it were all blanks.
|
---|
129 |
|
---|
130 | The @code{typeflag} field specifies the type of file archived. If a
|
---|
131 | particular implementation does not recognize or permit the specified
|
---|
132 | type, the file will be extracted as if it were a regular file. As this
|
---|
133 | action occurs, @command{tar} issues a warning to the standard error.
|
---|
134 |
|
---|
135 | The @code{atime} and @code{ctime} fields are used in making incremental
|
---|
136 | backups; they store, respectively, the particular file's access and
|
---|
137 | status change times.
|
---|
138 |
|
---|
139 | The @code{offset} is used by the @option{--multi-volume} (@option{-M}) option, when
|
---|
140 | making a multi-volume archive. The offset is number of bytes into
|
---|
141 | the file that we need to restart at to continue the file on the next
|
---|
142 | tape, i.e., where we store the location that a continued file is
|
---|
143 | continued at.
|
---|
144 |
|
---|
145 | The following fields were added to deal with sparse files. A file
|
---|
146 | is @dfn{sparse} if it takes in unallocated blocks which end up being
|
---|
147 | represented as zeros, i.e., no useful data. A test to see if a file
|
---|
148 | is sparse is to look at the number blocks allocated for it versus the
|
---|
149 | number of characters in the file; if there are fewer blocks allocated
|
---|
150 | for the file than would normally be allocated for a file of that
|
---|
151 | size, then the file is sparse. This is the method @command{tar} uses to
|
---|
152 | detect a sparse file, and once such a file is detected, it is treated
|
---|
153 | differently from non-sparse files.
|
---|
154 |
|
---|
155 | Sparse files are often @code{dbm} files, or other database-type files
|
---|
156 | which have data at some points and emptiness in the greater part of
|
---|
157 | the file. Such files can appear to be very large when an @samp{ls
|
---|
158 | -l} is done on them, when in truth, there may be a very small amount
|
---|
159 | of important data contained in the file. It is thus undesirable
|
---|
160 | to have @command{tar} think that it must back up this entire file, as
|
---|
161 | great quantities of room are wasted on empty blocks, which can lead
|
---|
162 | to running out of room on a tape far earlier than is necessary.
|
---|
163 | Thus, sparse files are dealt with so that these empty blocks are
|
---|
164 | not written to the tape. Instead, what is written to the tape is a
|
---|
165 | description, of sorts, of the sparse file: where the holes are, how
|
---|
166 | big the holes are, and how much data is found at the end of the hole.
|
---|
167 | This way, the file takes up potentially far less room on the tape,
|
---|
168 | and when the file is extracted later on, it will look exactly the way
|
---|
169 | it looked beforehand. The following is a description of the fields
|
---|
170 | used to handle a sparse file:
|
---|
171 |
|
---|
172 | The @code{sp} is an array of @code{struct sparse}. Each @code{struct
|
---|
173 | sparse} contains two 12-character strings which represent an offset
|
---|
174 | into the file and a number of bytes to be written at that offset.
|
---|
175 | The offset is absolute, and not relative to the offset in preceding
|
---|
176 | array element.
|
---|
177 |
|
---|
178 | The header can hold four of these @code{struct sparse} at the moment;
|
---|
179 | if more are needed, they are not stored in the header.
|
---|
180 |
|
---|
181 | The @code{isextended} flag is set when an @code{extended_header}
|
---|
182 | is needed to deal with a file. Note that this means that this flag
|
---|
183 | can only be set when dealing with a sparse file, and it is only set
|
---|
184 | in the event that the description of the file will not fit in the
|
---|
185 | allotted room for sparse structures in the header. In other words,
|
---|
186 | an extended_header is needed.
|
---|
187 |
|
---|
188 | The @code{extended_header} structure is used for sparse files which
|
---|
189 | need more sparse structures than can fit in the header. The header can
|
---|
190 | fit 4 such structures; if more are needed, the flag @code{isextended}
|
---|
191 | gets set and the next block is an @code{extended_header}.
|
---|
192 |
|
---|
193 | Each @code{extended_header} structure contains an array of 21
|
---|
194 | sparse structures, along with a similar @code{isextended} flag
|
---|
195 | that the header had. There can be an indeterminate number of such
|
---|
196 | @code{extended_header}s to describe a sparse file.
|
---|
197 |
|
---|
198 | @table @asis
|
---|
199 |
|
---|
200 | @item @code{REGTYPE}
|
---|
201 | @itemx @code{AREGTYPE}
|
---|
202 | These flags represent a regular file. In order to be compatible
|
---|
203 | with older versions of @command{tar}, a @code{typeflag} value of
|
---|
204 | @code{AREGTYPE} should be silently recognized as a regular file.
|
---|
205 | New archives should be created using @code{REGTYPE}. Also, for
|
---|
206 | backward compatibility, @command{tar} treats a regular file whose name
|
---|
207 | ends with a slash as a directory.
|
---|
208 |
|
---|
209 | @item @code{LNKTYPE}
|
---|
210 | This flag represents a file linked to another file, of any type,
|
---|
211 | previously archived. Such files are identified in Unix by each
|
---|
212 | file having the same device and inode number. The linked-to name is
|
---|
213 | specified in the @code{linkname} field with a trailing null.
|
---|
214 |
|
---|
215 | @item @code{SYMTYPE}
|
---|
216 | This represents a symbolic link to another file. The linked-to name
|
---|
217 | is specified in the @code{linkname} field with a trailing null.
|
---|
218 |
|
---|
219 | @item @code{CHRTYPE}
|
---|
220 | @itemx @code{BLKTYPE}
|
---|
221 | These represent character special files and block special files
|
---|
222 | respectively. In this case the @code{devmajor} and @code{devminor}
|
---|
223 | fields will contain the major and minor device numbers respectively.
|
---|
224 | Operating systems may map the device specifications to their own
|
---|
225 | local specification, or may ignore the entry.
|
---|
226 |
|
---|
227 | @item @code{DIRTYPE}
|
---|
228 | This flag specifies a directory or sub-directory. The directory
|
---|
229 | name in the @code{name} field should end with a slash. On systems where
|
---|
230 | disk allocation is performed on a directory basis, the @code{size} field
|
---|
231 | will contain the maximum number of bytes (which may be rounded to
|
---|
232 | the nearest disk block allocation unit) which the directory may
|
---|
233 | hold. A @code{size} field of zero indicates no such limiting. Systems
|
---|
234 | which do not support limiting in this manner should ignore the
|
---|
235 | @code{size} field.
|
---|
236 |
|
---|
237 | @item @code{FIFOTYPE}
|
---|
238 | This specifies a FIFO special file. Note that the archiving of a
|
---|
239 | FIFO file archives the existence of this file and not its contents.
|
---|
240 |
|
---|
241 | @item @code{CONTTYPE}
|
---|
242 | This specifies a contiguous file, which is the same as a normal
|
---|
243 | file except that, in operating systems which support it, all its
|
---|
244 | space is allocated contiguously on the disk. Operating systems
|
---|
245 | which do not allow contiguous allocation should silently treat this
|
---|
246 | type as a normal file.
|
---|
247 |
|
---|
248 | @item @code{A} @dots{} @code{Z}
|
---|
249 | These are reserved for custom implementations. Some of these are
|
---|
250 | used in the @acronym{GNU} modified format, as described below.
|
---|
251 |
|
---|
252 | @end table
|
---|
253 |
|
---|
254 | Other values are reserved for specification in future revisions of
|
---|
255 | the P1003 standard, and should not be used by any @command{tar} program.
|
---|
256 |
|
---|
257 | The @code{magic} field indicates that this archive was output in
|
---|
258 | the P1003 archive format. If this field contains @code{TMAGIC},
|
---|
259 | the @code{uname} and @code{gname} fields will contain the ASCII
|
---|
260 | representation of the owner and group of the file respectively.
|
---|
261 | If found, the user and group IDs are used rather than the values in
|
---|
262 | the @code{uid} and @code{gid} fields.
|
---|
263 |
|
---|
264 | For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990, pages
|
---|
265 | 169-173 (section 10.1) for @cite{Archive/Interchange File Format}; and
|
---|
266 | IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
|
---|
267 | (section E.4.48) for @cite{pax - Portable archive interchange}.
|
---|
268 |
|
---|
269 | @node Extensions
|
---|
270 | @unnumberedsec @acronym{GNU} Extensions to the Archive Format
|
---|
271 | @UNREVISED
|
---|
272 |
|
---|
273 | The @acronym{GNU} format uses additional file types to describe new types of
|
---|
274 | files in an archive. These are listed below.
|
---|
275 |
|
---|
276 | @table @code
|
---|
277 | @item GNUTYPE_DUMPDIR
|
---|
278 | @itemx 'D'
|
---|
279 | This represents a directory and a list of files created by the
|
---|
280 | @option{--incremental} (@option{-G}) option. The @code{size} field gives the total
|
---|
281 | size of the associated list of files. Each file name is preceded by
|
---|
282 | either a @samp{Y} (the file should be in this archive) or an @samp{N}.
|
---|
283 | (The file is a directory, or is not stored in the archive.) Each file
|
---|
284 | name is terminated by a null. There is an additional null after the
|
---|
285 | last file name.
|
---|
286 |
|
---|
287 | @item GNUTYPE_MULTIVOL
|
---|
288 | @itemx 'M'
|
---|
289 | This represents a file continued from another volume of a multi-volume
|
---|
290 | archive created with the @option{--multi-volume} (@option{-M}) option. The original
|
---|
291 | type of the file is not given here. The @code{size} field gives the
|
---|
292 | maximum size of this piece of the file (assuming the volume does
|
---|
293 | not end before the file is written out). The @code{offset} field
|
---|
294 | gives the offset from the beginning of the file where this part of
|
---|
295 | the file begins. Thus @code{size} plus @code{offset} should equal
|
---|
296 | the original size of the file.
|
---|
297 |
|
---|
298 | @item GNUTYPE_SPARSE
|
---|
299 | @itemx 'S'
|
---|
300 | This flag indicates that we are dealing with a sparse file. Note
|
---|
301 | that archiving a sparse file requires special operations to find
|
---|
302 | holes in the file, which mark the positions of these holes, along
|
---|
303 | with the number of bytes of data to be found after the hole.
|
---|
304 |
|
---|
305 | @item GNUTYPE_VOLHDR
|
---|
306 | @itemx 'V'
|
---|
307 | This file type is used to mark the volume header that was given with
|
---|
308 | the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option when the archive was created. The @code{name}
|
---|
309 | field contains the @code{name} given after the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option.
|
---|
310 | The @code{size} field is zero. Only the first file in each volume
|
---|
311 | of an archive should have this type.
|
---|
312 |
|
---|
313 | @end table
|
---|
314 |
|
---|
315 | You may have trouble reading a @acronym{GNU} format archive on a
|
---|
316 | non-@acronym{GNU} system if the options @option{--incremental} (@option{-G}),
|
---|
317 | @option{--multi-volume} (@option{-M}), @option{--sparse} (@option{-S}), or @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) were
|
---|
318 | used when writing the archive. In general, if @command{tar} does not
|
---|
319 | use the @acronym{GNU}-added fields of the header, other versions of
|
---|
320 | @command{tar} should be able to read the archive. Otherwise, the
|
---|
321 | @command{tar} program will give an error, the most likely one being a
|
---|
322 | checksum error.
|
---|
323 |
|
---|
324 | @node Sparse Formats
|
---|
325 | @unnumberedsec Storing Sparse Files
|
---|
326 | @include sparse.texi
|
---|
327 |
|
---|
328 | @node Snapshot Files
|
---|
329 | @unnumberedsec Format of the Incremental Snapshot Files
|
---|
330 | @include snapshot.texi
|
---|
331 |
|
---|
332 | @node Dumpdir
|
---|
333 | @unnumberedsec Dumpdir
|
---|
334 | @include dumpdir.texi
|
---|
335 |
|
---|