1 | \input texinfo @c -*-texinfo-*-
|
---|
2 |
|
---|
3 | @c %**start of header
|
---|
4 | @setfilename wget.info
|
---|
5 | @include version.texi
|
---|
6 | @set UPDATED Apr 2005
|
---|
7 | @settitle GNU Wget @value{VERSION} Manual
|
---|
8 | @c Disable the monstrous rectangles beside overfull hbox-es.
|
---|
9 | @finalout
|
---|
10 | @c Use `odd' to print double-sided.
|
---|
11 | @setchapternewpage on
|
---|
12 | @c %**end of header
|
---|
13 |
|
---|
14 | @iftex
|
---|
15 | @c Remove this if you don't use A4 paper.
|
---|
16 | @afourpaper
|
---|
17 | @end iftex
|
---|
18 |
|
---|
19 | @c Title for man page. The weird way texi2pod.pl is written requires
|
---|
20 | @c the preceding @set.
|
---|
21 | @set Wget Wget
|
---|
22 | @c man title Wget The non-interactive network downloader.
|
---|
23 |
|
---|
24 | @dircategory Network Applications
|
---|
25 | @direntry
|
---|
26 | * Wget: (wget). The non-interactive network downloader.
|
---|
27 | @end direntry
|
---|
28 |
|
---|
29 | @ifnottex
|
---|
30 | This file documents the the GNU Wget utility for downloading network
|
---|
31 | data.
|
---|
32 |
|
---|
33 | @c man begin COPYRIGHT
|
---|
34 | Copyright @copyright{} 1996--2005 Free Software Foundation, Inc.
|
---|
35 |
|
---|
36 | Permission is granted to make and distribute verbatim copies of
|
---|
37 | this manual provided the copyright notice and this permission notice
|
---|
38 | are preserved on all copies.
|
---|
39 |
|
---|
40 | @ignore
|
---|
41 | Permission is granted to process this file through TeX and print the
|
---|
42 | results, provided the printed document carries a copying permission
|
---|
43 | notice identical to this one except for the removal of this paragraph
|
---|
44 | (this paragraph not being relevant to the printed manual).
|
---|
45 | @end ignore
|
---|
46 | Permission is granted to copy, distribute and/or modify this document
|
---|
47 | under the terms of the GNU Free Documentation License, Version 1.2 or
|
---|
48 | any later version published by the Free Software Foundation; with the
|
---|
49 | Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
---|
50 | Documentation License'', with no Front-Cover Texts, and with no
|
---|
51 | Back-Cover Texts. A copy of the license is included in the section
|
---|
52 | entitled ``GNU Free Documentation License''.
|
---|
53 | @c man end
|
---|
54 | @end ifnottex
|
---|
55 |
|
---|
56 | @titlepage
|
---|
57 | @title GNU Wget @value{VERSION}
|
---|
58 | @subtitle The non-interactive download utility
|
---|
59 | @subtitle Updated for Wget @value{VERSION}, @value{UPDATED}
|
---|
60 | @author by Hrvoje Nik@v{s}i@'{c} and others
|
---|
61 |
|
---|
62 | @ignore
|
---|
63 | @c man begin AUTHOR
|
---|
64 | Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
|
---|
65 | @c man end
|
---|
66 | @c man begin SEEALSO
|
---|
67 | GNU Info entry for @file{wget}.
|
---|
68 | @c man end
|
---|
69 | @end ignore
|
---|
70 |
|
---|
71 | @page
|
---|
72 | @vskip 0pt plus 1filll
|
---|
73 | Copyright @copyright{} 1996--2005, Free Software Foundation, Inc.
|
---|
74 |
|
---|
75 | Permission is granted to copy, distribute and/or modify this document
|
---|
76 | under the terms of the GNU Free Documentation License, Version 1.2 or
|
---|
77 | any later version published by the Free Software Foundation; with the
|
---|
78 | Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
---|
79 | Documentation License'', with no Front-Cover Texts, and with no
|
---|
80 | Back-Cover Texts. A copy of the license is included in the section
|
---|
81 | entitled ``GNU Free Documentation License''.
|
---|
82 | @end titlepage
|
---|
83 |
|
---|
84 | @ifnottex
|
---|
85 | @node Top
|
---|
86 | @top Wget @value{VERSION}
|
---|
87 |
|
---|
88 | This manual documents version @value{VERSION} of GNU Wget, the freely
|
---|
89 | available utility for network downloads.
|
---|
90 |
|
---|
91 | Copyright @copyright{} 1996--2005 Free Software Foundation, Inc.
|
---|
92 |
|
---|
93 | @menu
|
---|
94 | * Overview:: Features of Wget.
|
---|
95 | * Invoking:: Wget command-line arguments.
|
---|
96 | * Recursive Download:: Downloading interlinked pages.
|
---|
97 | * Following Links:: The available methods of chasing links.
|
---|
98 | * Time-Stamping:: Mirroring according to time-stamps.
|
---|
99 | * Startup File:: Wget's initialization file.
|
---|
100 | * Examples:: Examples of usage.
|
---|
101 | * Various:: The stuff that doesn't fit anywhere else.
|
---|
102 | * Appendices:: Some useful references.
|
---|
103 | * Copying:: You may give out copies of Wget and of this manual.
|
---|
104 | * Concept Index:: Topics covered by this manual.
|
---|
105 | @end menu
|
---|
106 | @end ifnottex
|
---|
107 |
|
---|
108 | @node Overview
|
---|
109 | @chapter Overview
|
---|
110 | @cindex overview
|
---|
111 | @cindex features
|
---|
112 |
|
---|
113 | @c man begin DESCRIPTION
|
---|
114 | GNU Wget is a free utility for non-interactive download of files from
|
---|
115 | the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as
|
---|
116 | well as retrieval through @sc{http} proxies.
|
---|
117 |
|
---|
118 | @c man end
|
---|
119 | This chapter is a partial overview of Wget's features.
|
---|
120 |
|
---|
121 | @itemize @bullet
|
---|
122 | @item
|
---|
123 | @c man begin DESCRIPTION
|
---|
124 | Wget is non-interactive, meaning that it can work in the background,
|
---|
125 | while the user is not logged on. This allows you to start a retrieval
|
---|
126 | and disconnect from the system, letting Wget finish the work. By
|
---|
127 | contrast, most of the Web browsers require constant user's presence,
|
---|
128 | which can be a great hindrance when transferring a lot of data.
|
---|
129 | @c man end
|
---|
130 |
|
---|
131 | @item
|
---|
132 | @ignore
|
---|
133 | @c man begin DESCRIPTION
|
---|
134 |
|
---|
135 | @c man end
|
---|
136 | @end ignore
|
---|
137 | @c man begin DESCRIPTION
|
---|
138 | Wget can follow links in @sc{html} and @sc{xhtml} pages and create local
|
---|
139 | versions of remote web sites, fully recreating the directory structure of
|
---|
140 | the original site. This is sometimes referred to as ``recursive
|
---|
141 | downloading.'' While doing that, Wget respects the Robot Exclusion
|
---|
142 | Standard (@file{/robots.txt}). Wget can be instructed to convert the
|
---|
143 | links in downloaded @sc{html} files to the local files for offline
|
---|
144 | viewing.
|
---|
145 | @c man end
|
---|
146 |
|
---|
147 | @item
|
---|
148 | File name wildcard matching and recursive mirroring of directories are
|
---|
149 | available when retrieving via @sc{ftp}. Wget can read the time-stamp
|
---|
150 | information given by both @sc{http} and @sc{ftp} servers, and store it
|
---|
151 | locally. Thus Wget can see if the remote file has changed since last
|
---|
152 | retrieval, and automatically retrieve the new version if it has. This
|
---|
153 | makes Wget suitable for mirroring of @sc{ftp} sites, as well as home
|
---|
154 | pages.
|
---|
155 |
|
---|
156 | @item
|
---|
157 | @ignore
|
---|
158 | @c man begin DESCRIPTION
|
---|
159 |
|
---|
160 | @c man end
|
---|
161 | @end ignore
|
---|
162 | @c man begin DESCRIPTION
|
---|
163 | Wget has been designed for robustness over slow or unstable network
|
---|
164 | connections; if a download fails due to a network problem, it will
|
---|
165 | keep retrying until the whole file has been retrieved. If the server
|
---|
166 | supports regetting, it will instruct the server to continue the
|
---|
167 | download from where it left off.
|
---|
168 | @c man end
|
---|
169 |
|
---|
170 | @item
|
---|
171 | Wget supports proxy servers, which can lighten the network load, speed
|
---|
172 | up retrieval and provide access behind firewalls. However, if you are
|
---|
173 | behind a firewall that requires that you use a socks style gateway,
|
---|
174 | you can get the socks library and build Wget with support for socks.
|
---|
175 | Wget uses the passive @sc{ftp} downloading by default, active @sc{ftp}
|
---|
176 | being an option.
|
---|
177 |
|
---|
178 | @item
|
---|
179 | Wget supports IP version 6, the next generation of IP. IPv6 is
|
---|
180 | autodetected at compile-time, and can be disabled at either build or
|
---|
181 | run time. Binaries built with IPv6 support work well in both
|
---|
182 | IPv4-only and dual family environments.
|
---|
183 |
|
---|
184 | @item
|
---|
185 | Built-in features offer mechanisms to tune which links you wish to follow
|
---|
186 | (@pxref{Following Links}).
|
---|
187 |
|
---|
188 | @item
|
---|
189 | The progress of individual downloads is traced using a progress gauge.
|
---|
190 | Interactive downloads are tracked using a ``thermometer''-style gauge,
|
---|
191 | whereas non-interactive ones are traced with dots, each dot
|
---|
192 | representing a fixed amount of data received (1KB by default). Either
|
---|
193 | gauge can be customized to your preferences.
|
---|
194 |
|
---|
195 | @item
|
---|
196 | Most of the features are fully configurable, either through command line
|
---|
197 | options, or via the initialization file @file{.wgetrc} (@pxref{Startup
|
---|
198 | File}). Wget allows you to define @dfn{global} startup files
|
---|
199 | (@file{/usr/local/etc/wgetrc} by default) for site settings.
|
---|
200 |
|
---|
201 | @ignore
|
---|
202 | @c man begin FILES
|
---|
203 | @table @samp
|
---|
204 | @item /usr/local/etc/wgetrc
|
---|
205 | Default location of the @dfn{global} startup file.
|
---|
206 |
|
---|
207 | @item .wgetrc
|
---|
208 | User startup file.
|
---|
209 | @end table
|
---|
210 | @c man end
|
---|
211 | @end ignore
|
---|
212 |
|
---|
213 | @item
|
---|
214 | Finally, GNU Wget is free software. This means that everyone may use
|
---|
215 | it, redistribute it and/or modify it under the terms of the GNU General
|
---|
216 | Public License, as published by the Free Software Foundation
|
---|
217 | (@pxref{Copying}).
|
---|
218 | @end itemize
|
---|
219 |
|
---|
220 | @node Invoking
|
---|
221 | @chapter Invoking
|
---|
222 | @cindex invoking
|
---|
223 | @cindex command line
|
---|
224 | @cindex arguments
|
---|
225 | @cindex nohup
|
---|
226 |
|
---|
227 | By default, Wget is very simple to invoke. The basic syntax is:
|
---|
228 |
|
---|
229 | @example
|
---|
230 | @c man begin SYNOPSIS
|
---|
231 | wget [@var{option}]@dots{} [@var{URL}]@dots{}
|
---|
232 | @c man end
|
---|
233 | @end example
|
---|
234 |
|
---|
235 | Wget will simply download all the @sc{url}s specified on the command
|
---|
236 | line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below.
|
---|
237 |
|
---|
238 | However, you may wish to change some of the default parameters of
|
---|
239 | Wget. You can do it two ways: permanently, adding the appropriate
|
---|
240 | command to @file{.wgetrc} (@pxref{Startup File}), or specifying it on
|
---|
241 | the command line.
|
---|
242 |
|
---|
243 | @menu
|
---|
244 | * URL Format::
|
---|
245 | * Option Syntax::
|
---|
246 | * Basic Startup Options::
|
---|
247 | * Logging and Input File Options::
|
---|
248 | * Download Options::
|
---|
249 | * Directory Options::
|
---|
250 | * HTTP Options::
|
---|
251 | * HTTPS (SSL/TLS) Options::
|
---|
252 | * FTP Options::
|
---|
253 | * Recursive Retrieval Options::
|
---|
254 | * Recursive Accept/Reject Options::
|
---|
255 | @end menu
|
---|
256 |
|
---|
257 | @node URL Format
|
---|
258 | @section URL Format
|
---|
259 | @cindex URL
|
---|
260 | @cindex URL syntax
|
---|
261 |
|
---|
262 | @dfn{URL} is an acronym for Uniform Resource Locator. A uniform
|
---|
263 | resource locator is a compact string representation for a resource
|
---|
264 | available via the Internet. Wget recognizes the @sc{url} syntax as per
|
---|
265 | @sc{rfc1738}. This is the most widely used form (square brackets denote
|
---|
266 | optional parts):
|
---|
267 |
|
---|
268 | @example
|
---|
269 | http://host[:port]/directory/file
|
---|
270 | ftp://host[:port]/directory/file
|
---|
271 | @end example
|
---|
272 |
|
---|
273 | You can also encode your username and password within a @sc{url}:
|
---|
274 |
|
---|
275 | @example
|
---|
276 | ftp://user:password@@host/path
|
---|
277 | http://user:password@@host/path
|
---|
278 | @end example
|
---|
279 |
|
---|
280 | Either @var{user} or @var{password}, or both, may be left out. If you
|
---|
281 | leave out either the @sc{http} username or password, no authentication
|
---|
282 | will be sent. If you leave out the @sc{ftp} username, @samp{anonymous}
|
---|
283 | will be used. If you leave out the @sc{ftp} password, your email
|
---|
284 | address will be supplied as a default password.@footnote{If you have a
|
---|
285 | @file{.netrc} file in your home directory, password will also be
|
---|
286 | searched for there.}
|
---|
287 |
|
---|
288 | @strong{Important Note}: if you specify a password-containing @sc{url}
|
---|
289 | on the command line, the username and password will be plainly visible
|
---|
290 | to all users on the system, by way of @code{ps}. On multi-user systems,
|
---|
291 | this is a big security risk. To work around it, use @code{wget -i -}
|
---|
292 | and feed the @sc{url}s to Wget's standard input, each on a separate
|
---|
293 | line, terminated by @kbd{C-d}.
|
---|
294 |
|
---|
295 | You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy}
|
---|
296 | being the hexadecimal representation of the character's @sc{ascii}
|
---|
297 | value. Some common unsafe characters include @samp{%} (quoted as
|
---|
298 | @samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as
|
---|
299 | @samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe
|
---|
300 | characters.
|
---|
301 |
|
---|
302 | Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By
|
---|
303 | default, @sc{ftp} documents are retrieved in the binary mode (type
|
---|
304 | @samp{i}), which means that they are downloaded unchanged. Another
|
---|
305 | useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line
|
---|
306 | delimiters between the different operating systems, and is thus useful
|
---|
307 | for text files. Here is an example:
|
---|
308 |
|
---|
309 | @example
|
---|
310 | ftp://host/directory/file;type=a
|
---|
311 | @end example
|
---|
312 |
|
---|
313 | Two alternative variants of @sc{url} specification are also supported,
|
---|
314 | because of historical (hysterical?) reasons and their widespreaded use.
|
---|
315 |
|
---|
316 | @sc{ftp}-only syntax (supported by @code{NcFTP}):
|
---|
317 | @example
|
---|
318 | host:/dir/file
|
---|
319 | @end example
|
---|
320 |
|
---|
321 | @sc{http}-only syntax (introduced by @code{Netscape}):
|
---|
322 | @example
|
---|
323 | host[:port]/dir/file
|
---|
324 | @end example
|
---|
325 |
|
---|
326 | These two alternative forms are deprecated, and may cease being
|
---|
327 | supported in the future.
|
---|
328 |
|
---|
329 | If you do not understand the difference between these notations, or do
|
---|
330 | not know which one to use, just use the plain ordinary format you use
|
---|
331 | with your favorite browser, like @code{Lynx} or @code{Netscape}.
|
---|
332 |
|
---|
333 | @c man begin OPTIONS
|
---|
334 |
|
---|
335 | @node Option Syntax
|
---|
336 | @section Option Syntax
|
---|
337 | @cindex option syntax
|
---|
338 | @cindex syntax of options
|
---|
339 |
|
---|
340 | Since Wget uses GNU getopt to process command-line arguments, every
|
---|
341 | option has a long form along with the short one. Long options are
|
---|
342 | more convenient to remember, but take time to type. You may freely
|
---|
343 | mix different option styles, or specify options after the command-line
|
---|
344 | arguments. Thus you may write:
|
---|
345 |
|
---|
346 | @example
|
---|
347 | wget -r --tries=10 http://fly.srk.fer.hr/ -o log
|
---|
348 | @end example
|
---|
349 |
|
---|
350 | The space between the option accepting an argument and the argument may
|
---|
351 | be omitted. Instead @samp{-o log} you can write @samp{-olog}.
|
---|
352 |
|
---|
353 | You may put several options that do not require arguments together,
|
---|
354 | like:
|
---|
355 |
|
---|
356 | @example
|
---|
357 | wget -drc @var{URL}
|
---|
358 | @end example
|
---|
359 |
|
---|
360 | This is a complete equivalent of:
|
---|
361 |
|
---|
362 | @example
|
---|
363 | wget -d -r -c @var{URL}
|
---|
364 | @end example
|
---|
365 |
|
---|
366 | Since the options can be specified after the arguments, you may
|
---|
367 | terminate them with @samp{--}. So the following will try to download
|
---|
368 | @sc{url} @samp{-x}, reporting failure to @file{log}:
|
---|
369 |
|
---|
370 | @example
|
---|
371 | wget -o log -- -x
|
---|
372 | @end example
|
---|
373 |
|
---|
374 | The options that accept comma-separated lists all respect the convention
|
---|
375 | that specifying an empty list clears its value. This can be useful to
|
---|
376 | clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc}
|
---|
377 | sets @code{exclude_directories} to @file{/cgi-bin}, the following
|
---|
378 | example will first reset it, and then set it to exclude @file{/~nobody}
|
---|
379 | and @file{/~somebody}. You can also clear the lists in @file{.wgetrc}
|
---|
380 | (@pxref{Wgetrc Syntax}).
|
---|
381 |
|
---|
382 | @example
|
---|
383 | wget -X '' -X /~nobody,/~somebody
|
---|
384 | @end example
|
---|
385 |
|
---|
386 | Most options that do not accept arguments are @dfn{boolean} options,
|
---|
387 | so named because their state can be captured with a yes-or-no
|
---|
388 | (``boolean'') variable. For example, @samp{--follow-ftp} tells Wget
|
---|
389 | to follow FTP links from HTML files and, on the other hand,
|
---|
390 | @samp{--no-glob} tells it not to perform file globbing on FTP URLs. A
|
---|
391 | boolean option is either @dfn{affirmative} or @dfn{negative}
|
---|
392 | (beginning with @samp{--no}). All such options share several
|
---|
393 | properties.
|
---|
394 |
|
---|
395 | Unless stated otherwise, it is assumed that the default behavior is
|
---|
396 | the opposite of what the option accomplishes. For example, the
|
---|
397 | documented existence of @samp{--follow-ftp} assumes that the default
|
---|
398 | is to @emph{not} follow FTP links from HTML pages.
|
---|
399 |
|
---|
400 | Affirmative options can be negated by prepending the @samp{--no-} to
|
---|
401 | the option name; negative options can be negated by omitting the
|
---|
402 | @samp{--no-} prefix. This might seem superfluous---if the default for
|
---|
403 | an affirmative option is to not do something, then why provide a way
|
---|
404 | to explicitly turn it off? But the startup file may in fact change
|
---|
405 | the default. For instance, using @code{follow_ftp = off} in
|
---|
406 | @file{.wgetrc} makes Wget @emph{not} follow FTP links by default, and
|
---|
407 | using @samp{--no-follow-ftp} is the only way to restore the factory
|
---|
408 | default from the command line.
|
---|
409 |
|
---|
410 | @node Basic Startup Options
|
---|
411 | @section Basic Startup Options
|
---|
412 |
|
---|
413 | @table @samp
|
---|
414 | @item -V
|
---|
415 | @itemx --version
|
---|
416 | Display the version of Wget.
|
---|
417 |
|
---|
418 | @item -h
|
---|
419 | @itemx --help
|
---|
420 | Print a help message describing all of Wget's command-line options.
|
---|
421 |
|
---|
422 | @item -b
|
---|
423 | @itemx --background
|
---|
424 | Go to background immediately after startup. If no output file is
|
---|
425 | specified via the @samp{-o}, output is redirected to @file{wget-log}.
|
---|
426 |
|
---|
427 | @cindex execute wgetrc command
|
---|
428 | @item -e @var{command}
|
---|
429 | @itemx --execute @var{command}
|
---|
430 | Execute @var{command} as if it were a part of @file{.wgetrc}
|
---|
431 | (@pxref{Startup File}). A command thus invoked will be executed
|
---|
432 | @emph{after} the commands in @file{.wgetrc}, thus taking precedence over
|
---|
433 | them. If you need to specify more than one wgetrc command, use multiple
|
---|
434 | instances of @samp{-e}.
|
---|
435 |
|
---|
436 | @end table
|
---|
437 |
|
---|
438 | @node Logging and Input File Options
|
---|
439 | @section Logging and Input File Options
|
---|
440 |
|
---|
441 | @table @samp
|
---|
442 | @cindex output file
|
---|
443 | @cindex log file
|
---|
444 | @item -o @var{logfile}
|
---|
445 | @itemx --output-file=@var{logfile}
|
---|
446 | Log all messages to @var{logfile}. The messages are normally reported
|
---|
447 | to standard error.
|
---|
448 |
|
---|
449 | @cindex append to log
|
---|
450 | @item -a @var{logfile}
|
---|
451 | @itemx --append-output=@var{logfile}
|
---|
452 | Append to @var{logfile}. This is the same as @samp{-o}, only it appends
|
---|
453 | to @var{logfile} instead of overwriting the old log file. If
|
---|
454 | @var{logfile} does not exist, a new file is created.
|
---|
455 |
|
---|
456 | @cindex debug
|
---|
457 | @item -d
|
---|
458 | @itemx --debug
|
---|
459 | Turn on debug output, meaning various information important to the
|
---|
460 | developers of Wget if it does not work properly. Your system
|
---|
461 | administrator may have chosen to compile Wget without debug support, in
|
---|
462 | which case @samp{-d} will not work. Please note that compiling with
|
---|
463 | debug support is always safe---Wget compiled with the debug support will
|
---|
464 | @emph{not} print any debug info unless requested with @samp{-d}.
|
---|
465 | @xref{Reporting Bugs}, for more information on how to use @samp{-d} for
|
---|
466 | sending bug reports.
|
---|
467 |
|
---|
468 | @cindex quiet
|
---|
469 | @item -q
|
---|
470 | @itemx --quiet
|
---|
471 | Turn off Wget's output.
|
---|
472 |
|
---|
473 | @cindex verbose
|
---|
474 | @item -v
|
---|
475 | @itemx --verbose
|
---|
476 | Turn on verbose output, with all the available data. The default output
|
---|
477 | is verbose.
|
---|
478 |
|
---|
479 | @item -nv
|
---|
480 | @itemx --no-verbose
|
---|
481 | Turn off verbose without being completely quiet (use @samp{-q} for
|
---|
482 | that), which means that error messages and basic information still get
|
---|
483 | printed.
|
---|
484 |
|
---|
485 | @cindex input-file
|
---|
486 | @item -i @var{file}
|
---|
487 | @itemx --input-file=@var{file}
|
---|
488 | Read @sc{url}s from @var{file}. If @samp{-} is specified as
|
---|
489 | @var{file}, @sc{url}s are read from the standard input. (Use
|
---|
490 | @samp{./-} to read from a file literally named @samp{-}.)
|
---|
491 |
|
---|
492 | If this function is used, no @sc{url}s need be present on the command
|
---|
493 | line. If there are @sc{url}s both on the command line and in an input
|
---|
494 | file, those on the command lines will be the first ones to be
|
---|
495 | retrieved. The @var{file} need not be an @sc{html} document (but no
|
---|
496 | harm if it is)---it is enough if the @sc{url}s are just listed
|
---|
497 | sequentially.
|
---|
498 |
|
---|
499 | However, if you specify @samp{--force-html}, the document will be
|
---|
500 | regarded as @samp{html}. In that case you may have problems with
|
---|
501 | relative links, which you can solve either by adding @code{<base
|
---|
502 | href="@var{url}">} to the documents or by specifying
|
---|
503 | @samp{--base=@var{url}} on the command line.
|
---|
504 |
|
---|
505 | @cindex force html
|
---|
506 | @item -F
|
---|
507 | @itemx --force-html
|
---|
508 | When input is read from a file, force it to be treated as an @sc{html}
|
---|
509 | file. This enables you to retrieve relative links from existing
|
---|
510 | @sc{html} files on your local disk, by adding @code{<base
|
---|
511 | href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line
|
---|
512 | option.
|
---|
513 |
|
---|
514 | @cindex base for relative links in input file
|
---|
515 | @item -B @var{URL}
|
---|
516 | @itemx --base=@var{URL}
|
---|
517 | Prepends @var{URL} to relative links read from the file specified with
|
---|
518 | the @samp{-i} option.
|
---|
519 | @end table
|
---|
520 |
|
---|
521 | @node Download Options
|
---|
522 | @section Download Options
|
---|
523 |
|
---|
524 | @table @samp
|
---|
525 | @cindex bind address
|
---|
526 | @cindex client IP address
|
---|
527 | @cindex IP address, client
|
---|
528 | @item --bind-address=@var{ADDRESS}
|
---|
529 | When making client TCP/IP connections, bind to @var{ADDRESS} on
|
---|
530 | the local machine. @var{ADDRESS} may be specified as a hostname or IP
|
---|
531 | address. This option can be useful if your machine is bound to multiple
|
---|
532 | IPs.
|
---|
533 |
|
---|
534 | @cindex retries
|
---|
535 | @cindex tries
|
---|
536 | @cindex number of retries
|
---|
537 | @item -t @var{number}
|
---|
538 | @itemx --tries=@var{number}
|
---|
539 | Set number of retries to @var{number}. Specify 0 or @samp{inf} for
|
---|
540 | infinite retrying. The default is to retry 20 times, with the exception
|
---|
541 | of fatal errors like ``connection refused'' or ``not found'' (404),
|
---|
542 | which are not retried.
|
---|
543 |
|
---|
544 | @item -O @var{file}
|
---|
545 | @itemx --output-document=@var{file}
|
---|
546 | The documents will not be written to the appropriate files, but all
|
---|
547 | will be concatenated together and written to @var{file}. If @samp{-}
|
---|
548 | is used as @var{file}, documents will be printed to standard output,
|
---|
549 | disabling link conversion. (Use @samp{./-} to print to a file
|
---|
550 | literally named @samp{-}.)
|
---|
551 |
|
---|
552 | Note that a combination with @samp{-k} is only well-defined for
|
---|
553 | downloading a single document.
|
---|
554 |
|
---|
555 | @cindex clobbering, file
|
---|
556 | @cindex downloading multiple times
|
---|
557 | @cindex no-clobber
|
---|
558 | @item -nc
|
---|
559 | @itemx --no-clobber
|
---|
560 | If a file is downloaded more than once in the same directory, Wget's
|
---|
561 | behavior depends on a few options, including @samp{-nc}. In certain
|
---|
562 | cases, the local file will be @dfn{clobbered}, or overwritten, upon
|
---|
563 | repeated download. In other cases it will be preserved.
|
---|
564 |
|
---|
565 | When running Wget without @samp{-N}, @samp{-nc}, or @samp{-r},
|
---|
566 | downloading the same file in the same directory will result in the
|
---|
567 | original copy of @var{file} being preserved and the second copy being
|
---|
568 | named @samp{@var{file}.1}. If that file is downloaded yet again, the
|
---|
569 | third copy will be named @samp{@var{file}.2}, and so on. When
|
---|
570 | @samp{-nc} is specified, this behavior is suppressed, and Wget will
|
---|
571 | refuse to download newer copies of @samp{@var{file}}. Therefore,
|
---|
572 | ``@code{no-clobber}'' is actually a misnomer in this mode---it's not
|
---|
573 | clobbering that's prevented (as the numeric suffixes were already
|
---|
574 | preventing clobbering), but rather the multiple version saving that's
|
---|
575 | prevented.
|
---|
576 |
|
---|
577 | When running Wget with @samp{-r}, but without @samp{-N} or @samp{-nc},
|
---|
578 | re-downloading a file will result in the new copy simply overwriting the
|
---|
579 | old. Adding @samp{-nc} will prevent this behavior, instead causing the
|
---|
580 | original version to be preserved and any newer copies on the server to
|
---|
581 | be ignored.
|
---|
582 |
|
---|
583 | When running Wget with @samp{-N}, with or without @samp{-r}, the
|
---|
584 | decision as to whether or not to download a newer copy of a file depends
|
---|
585 | on the local and remote timestamp and size of the file
|
---|
586 | (@pxref{Time-Stamping}). @samp{-nc} may not be specified at the same
|
---|
587 | time as @samp{-N}.
|
---|
588 |
|
---|
589 | Note that when @samp{-nc} is specified, files with the suffixes
|
---|
590 | @samp{.html} or @samp{.htm} will be loaded from the local disk and
|
---|
591 | parsed as if they had been retrieved from the Web.
|
---|
592 |
|
---|
593 | @cindex continue retrieval
|
---|
594 | @cindex incomplete downloads
|
---|
595 | @cindex resume download
|
---|
596 | @item -c
|
---|
597 | @itemx --continue
|
---|
598 | Continue getting a partially-downloaded file. This is useful when you
|
---|
599 | want to finish up a download started by a previous instance of Wget, or
|
---|
600 | by another program. For instance:
|
---|
601 |
|
---|
602 | @example
|
---|
603 | wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
|
---|
604 | @end example
|
---|
605 |
|
---|
606 | If there is a file named @file{ls-lR.Z} in the current directory, Wget
|
---|
607 | will assume that it is the first portion of the remote file, and will
|
---|
608 | ask the server to continue the retrieval from an offset equal to the
|
---|
609 | length of the local file.
|
---|
610 |
|
---|
611 | Note that you don't need to specify this option if you just want the
|
---|
612 | current invocation of Wget to retry downloading a file should the
|
---|
613 | connection be lost midway through. This is the default behavior.
|
---|
614 | @samp{-c} only affects resumption of downloads started @emph{prior} to
|
---|
615 | this invocation of Wget, and whose local files are still sitting around.
|
---|
616 |
|
---|
617 | Without @samp{-c}, the previous example would just download the remote
|
---|
618 | file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file
|
---|
619 | alone.
|
---|
620 |
|
---|
621 | Beginning with Wget 1.7, if you use @samp{-c} on a non-empty file, and
|
---|
622 | it turns out that the server does not support continued downloading,
|
---|
623 | Wget will refuse to start the download from scratch, which would
|
---|
624 | effectively ruin existing contents. If you really want the download to
|
---|
625 | start from scratch, remove the file.
|
---|
626 |
|
---|
627 | Also beginning with Wget 1.7, if you use @samp{-c} on a file which is of
|
---|
628 | equal size as the one on the server, Wget will refuse to download the
|
---|
629 | file and print an explanatory message. The same happens when the file
|
---|
630 | is smaller on the server than locally (presumably because it was changed
|
---|
631 | on the server since your last download attempt)---because ``continuing''
|
---|
632 | is not meaningful, no download occurs.
|
---|
633 |
|
---|
634 | On the other side of the coin, while using @samp{-c}, any file that's
|
---|
635 | bigger on the server than locally will be considered an incomplete
|
---|
636 | download and only @code{(length(remote) - length(local))} bytes will be
|
---|
637 | downloaded and tacked onto the end of the local file. This behavior can
|
---|
638 | be desirable in certain cases---for instance, you can use @samp{wget -c}
|
---|
639 | to download just the new portion that's been appended to a data
|
---|
640 | collection or log file.
|
---|
641 |
|
---|
642 | However, if the file is bigger on the server because it's been
|
---|
643 | @emph{changed}, as opposed to just @emph{appended} to, you'll end up
|
---|
644 | with a garbled file. Wget has no way of verifying that the local file
|
---|
645 | is really a valid prefix of the remote file. You need to be especially
|
---|
646 | careful of this when using @samp{-c} in conjunction with @samp{-r},
|
---|
647 | since every file will be considered as an "incomplete download" candidate.
|
---|
648 |
|
---|
649 | Another instance where you'll get a garbled file if you try to use
|
---|
650 | @samp{-c} is if you have a lame @sc{http} proxy that inserts a
|
---|
651 | ``transfer interrupted'' string into the local file. In the future a
|
---|
652 | ``rollback'' option may be added to deal with this case.
|
---|
653 |
|
---|
654 | Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
|
---|
655 | servers that support the @code{Range} header.
|
---|
656 |
|
---|
657 | @cindex progress indicator
|
---|
658 | @cindex dot style
|
---|
659 | @item --progress=@var{type}
|
---|
660 | Select the type of the progress indicator you wish to use. Legal
|
---|
661 | indicators are ``dot'' and ``bar''.
|
---|
662 |
|
---|
663 | The ``bar'' indicator is used by default. It draws an @sc{ascii} progress
|
---|
664 | bar graphics (a.k.a ``thermometer'' display) indicating the status of
|
---|
665 | retrieval. If the output is not a TTY, the ``dot'' bar will be used by
|
---|
666 | default.
|
---|
667 |
|
---|
668 | Use @samp{--progress=dot} to switch to the ``dot'' display. It traces
|
---|
669 | the retrieval by printing dots on the screen, each dot representing a
|
---|
670 | fixed amount of downloaded data.
|
---|
671 |
|
---|
672 | When using the dotted retrieval, you may also set the @dfn{style} by
|
---|
673 | specifying the type as @samp{dot:@var{style}}. Different styles assign
|
---|
674 | different meaning to one dot. With the @code{default} style each dot
|
---|
675 | represents 1K, there are ten dots in a cluster and 50 dots in a line.
|
---|
676 | The @code{binary} style has a more ``computer''-like orientation---8K
|
---|
677 | dots, 16-dots clusters and 48 dots per line (which makes for 384K
|
---|
678 | lines). The @code{mega} style is suitable for downloading very large
|
---|
679 | files---each dot represents 64K retrieved, there are eight dots in a
|
---|
680 | cluster, and 48 dots on each line (so each line contains 3M).
|
---|
681 |
|
---|
682 | Note that you can set the default style using the @code{progress}
|
---|
683 | command in @file{.wgetrc}. That setting may be overridden from the
|
---|
684 | command line. The exception is that, when the output is not a TTY, the
|
---|
685 | ``dot'' progress will be favored over ``bar''. To force the bar output,
|
---|
686 | use @samp{--progress=bar:force}.
|
---|
687 |
|
---|
688 | @item -N
|
---|
689 | @itemx --timestamping
|
---|
690 | Turn on time-stamping. @xref{Time-Stamping}, for details.
|
---|
691 |
|
---|
692 | @cindex server response, print
|
---|
693 | @item -S
|
---|
694 | @itemx --server-response
|
---|
695 | Print the headers sent by @sc{http} servers and responses sent by
|
---|
696 | @sc{ftp} servers.
|
---|
697 |
|
---|
698 | @cindex Wget as spider
|
---|
699 | @cindex spider
|
---|
700 | @item --spider
|
---|
701 | When invoked with this option, Wget will behave as a Web @dfn{spider},
|
---|
702 | which means that it will not download the pages, just check that they
|
---|
703 | are there. For example, you can use Wget to check your bookmarks:
|
---|
704 |
|
---|
705 | @example
|
---|
706 | wget --spider --force-html -i bookmarks.html
|
---|
707 | @end example
|
---|
708 |
|
---|
709 | This feature needs much more work for Wget to get close to the
|
---|
710 | functionality of real web spiders.
|
---|
711 |
|
---|
712 | @cindex timeout
|
---|
713 | @item -T seconds
|
---|
714 | @itemx --timeout=@var{seconds}
|
---|
715 | Set the network timeout to @var{seconds} seconds. This is equivalent
|
---|
716 | to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and
|
---|
717 | @samp{--read-timeout}, all at the same time.
|
---|
718 |
|
---|
719 | When interacting with the network, Wget can check for timeout and
|
---|
720 | abort the operation if it takes too long. This prevents anomalies
|
---|
721 | like hanging reads and infinite connects. The only timeout enabled by
|
---|
722 | default is a 900-second read timeout. Setting a timeout to 0 disables
|
---|
723 | it altogether. Unless you know what you are doing, it is best not to
|
---|
724 | change the default timeout settings.
|
---|
725 |
|
---|
726 | All timeout-related options accept decimal values, as well as
|
---|
727 | subsecond values. For example, @samp{0.1} seconds is a legal (though
|
---|
728 | unwise) choice of timeout. Subsecond timeouts are useful for checking
|
---|
729 | server response times or for testing network latency.
|
---|
730 |
|
---|
731 | @cindex DNS timeout
|
---|
732 | @cindex timeout, DNS
|
---|
733 | @item --dns-timeout=@var{seconds}
|
---|
734 | Set the DNS lookup timeout to @var{seconds} seconds. DNS lookups that
|
---|
735 | don't complete within the specified time will fail. By default, there
|
---|
736 | is no timeout on DNS lookups, other than that implemented by system
|
---|
737 | libraries.
|
---|
738 |
|
---|
739 | @cindex connect timeout
|
---|
740 | @cindex timeout, connect
|
---|
741 | @item --connect-timeout=@var{seconds}
|
---|
742 | Set the connect timeout to @var{seconds} seconds. TCP connections that
|
---|
743 | take longer to establish will be aborted. By default, there is no
|
---|
744 | connect timeout, other than that implemented by system libraries.
|
---|
745 |
|
---|
746 | @cindex read timeout
|
---|
747 | @cindex timeout, read
|
---|
748 | @item --read-timeout=@var{seconds}
|
---|
749 | Set the read (and write) timeout to @var{seconds} seconds. The
|
---|
750 | ``time'' of this timeout refers @dfn{idle time}: if, at any point in
|
---|
751 | the download, no data is received for more than the specified number
|
---|
752 | of seconds, reading fails and the download is restarted. This option
|
---|
753 | does not directly affect the duration of the entire download.
|
---|
754 |
|
---|
755 | Of course, the remote server may choose to terminate the connection
|
---|
756 | sooner than this option requires. The default read timeout is 900
|
---|
757 | seconds.
|
---|
758 |
|
---|
759 | @cindex bandwidth, limit
|
---|
760 | @cindex rate, limit
|
---|
761 | @cindex limit bandwidth
|
---|
762 | @item --limit-rate=@var{amount}
|
---|
763 | Limit the download speed to @var{amount} bytes per second. Amount may
|
---|
764 | be expressed in bytes, kilobytes with the @samp{k} suffix, or megabytes
|
---|
765 | with the @samp{m} suffix. For example, @samp{--limit-rate=20k} will
|
---|
766 | limit the retrieval rate to 20KB/s. This is useful when, for whatever
|
---|
767 | reason, you don't want Wget to consume the entire available bandwidth.
|
---|
768 |
|
---|
769 | This option allows the use of decimal numbers, usually in conjunction
|
---|
770 | with power suffixes; for example, @samp{--limit-rate=2.5k} is a legal
|
---|
771 | value.
|
---|
772 |
|
---|
773 | Note that Wget implements the limiting by sleeping the appropriate
|
---|
774 | amount of time after a network read that took less time than specified
|
---|
775 | by the rate. Eventually this strategy causes the TCP transfer to slow
|
---|
776 | down to approximately the specified rate. However, it may take some
|
---|
777 | time for this balance to be achieved, so don't be surprised if limiting
|
---|
778 | the rate doesn't work well with very small files.
|
---|
779 |
|
---|
780 | @cindex pause
|
---|
781 | @cindex wait
|
---|
782 | @item -w @var{seconds}
|
---|
783 | @itemx --wait=@var{seconds}
|
---|
784 | Wait the specified number of seconds between the retrievals. Use of
|
---|
785 | this option is recommended, as it lightens the server load by making the
|
---|
786 | requests less frequent. Instead of in seconds, the time can be
|
---|
787 | specified in minutes using the @code{m} suffix, in hours using @code{h}
|
---|
788 | suffix, or in days using @code{d} suffix.
|
---|
789 |
|
---|
790 | Specifying a large value for this option is useful if the network or the
|
---|
791 | destination host is down, so that Wget can wait long enough to
|
---|
792 | reasonably expect the network error to be fixed before the retry.
|
---|
793 |
|
---|
794 | @cindex retries, waiting between
|
---|
795 | @cindex waiting between retries
|
---|
796 | @item --waitretry=@var{seconds}
|
---|
797 | If you don't want Wget to wait between @emph{every} retrieval, but only
|
---|
798 | between retries of failed downloads, you can use this option. Wget will
|
---|
799 | use @dfn{linear backoff}, waiting 1 second after the first failure on a
|
---|
800 | given file, then waiting 2 seconds after the second failure on that
|
---|
801 | file, up to the maximum number of @var{seconds} you specify. Therefore,
|
---|
802 | a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55
|
---|
803 | seconds per file.
|
---|
804 |
|
---|
805 | Note that this option is turned on by default in the global
|
---|
806 | @file{wgetrc} file.
|
---|
807 |
|
---|
808 | @cindex wait, random
|
---|
809 | @cindex random wait
|
---|
810 | @item --random-wait
|
---|
811 | Some web sites may perform log analysis to identify retrieval programs
|
---|
812 | such as Wget by looking for statistically significant similarities in
|
---|
813 | the time between requests. This option causes the time between requests
|
---|
814 | to vary between 0 and 2 * @var{wait} seconds, where @var{wait} was
|
---|
815 | specified using the @samp{--wait} option, in order to mask Wget's
|
---|
816 | presence from such analysis.
|
---|
817 |
|
---|
818 | A recent article in a publication devoted to development on a popular
|
---|
819 | consumer platform provided code to perform this analysis on the fly.
|
---|
820 | Its author suggested blocking at the class C address level to ensure
|
---|
821 | automated retrieval programs were blocked despite changing DHCP-supplied
|
---|
822 | addresses.
|
---|
823 |
|
---|
824 | The @samp{--random-wait} option was inspired by this ill-advised
|
---|
825 | recommendation to block many unrelated users from a web site due to the
|
---|
826 | actions of one.
|
---|
827 |
|
---|
828 | @cindex proxy
|
---|
829 | @itemx --no-proxy
|
---|
830 | Don't use proxies, even if the appropriate @code{*_proxy} environment
|
---|
831 | variable is defined.
|
---|
832 |
|
---|
833 | For more information about the use of proxies with Wget, @xref{Proxies}.
|
---|
834 |
|
---|
835 | @cindex quota
|
---|
836 | @item -Q @var{quota}
|
---|
837 | @itemx --quota=@var{quota}
|
---|
838 | Specify download quota for automatic retrievals. The value can be
|
---|
839 | specified in bytes (default), kilobytes (with @samp{k} suffix), or
|
---|
840 | megabytes (with @samp{m} suffix).
|
---|
841 |
|
---|
842 | Note that quota will never affect downloading a single file. So if you
|
---|
843 | specify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the
|
---|
844 | @file{ls-lR.gz} will be downloaded. The same goes even when several
|
---|
845 | @sc{url}s are specified on the command-line. However, quota is
|
---|
846 | respected when retrieving either recursively, or from an input file.
|
---|
847 | Thus you may safely type @samp{wget -Q2m -i sites}---download will be
|
---|
848 | aborted when the quota is exceeded.
|
---|
849 |
|
---|
850 | Setting quota to 0 or to @samp{inf} unlimits the download quota.
|
---|
851 |
|
---|
852 | @cindex DNS cache
|
---|
853 | @cindex caching of DNS lookups
|
---|
854 | @item --no-dns-cache
|
---|
855 | Turn off caching of DNS lookups. Normally, Wget remembers the IP
|
---|
856 | addresses it looked up from DNS so it doesn't have to repeatedly
|
---|
857 | contact the DNS server for the same (typically small) set of hosts it
|
---|
858 | retrieves from. This cache exists in memory only; a new Wget run will
|
---|
859 | contact DNS again.
|
---|
860 |
|
---|
861 | However, it has been reported that in some situations it is not
|
---|
862 | desirable to cache host names, even for the duration of a
|
---|
863 | short-running application like Wget. With this option Wget issues a
|
---|
864 | new DNS lookup (more precisely, a new call to @code{gethostbyname} or
|
---|
865 | @code{getaddrinfo}) each time it makes a new connection. Please note
|
---|
866 | that this option will @emph{not} affect caching that might be
|
---|
867 | performed by the resolving library or by an external caching layer,
|
---|
868 | such as NSCD.
|
---|
869 |
|
---|
870 | If you don't understand exactly what this option does, you probably
|
---|
871 | won't need it.
|
---|
872 |
|
---|
873 | @cindex file names, restrict
|
---|
874 | @cindex Windows file names
|
---|
875 | @item --restrict-file-names=@var{mode}
|
---|
876 | Change which characters found in remote URLs may show up in local file
|
---|
877 | names generated from those URLs. Characters that are @dfn{restricted}
|
---|
878 | by this option are escaped, i.e. replaced with @samp{%HH}, where
|
---|
879 | @samp{HH} is the hexadecimal number that corresponds to the restricted
|
---|
880 | character.
|
---|
881 |
|
---|
882 | By default, Wget escapes the characters that are not valid as part of
|
---|
883 | file names on your operating system, as well as control characters that
|
---|
884 | are typically unprintable. This option is useful for changing these
|
---|
885 | defaults, either because you are downloading to a non-native partition,
|
---|
886 | or because you want to disable escaping of the control characters.
|
---|
887 |
|
---|
888 | When mode is set to ``unix'', Wget escapes the character @samp{/} and
|
---|
889 | the control characters in the ranges 0--31 and 128--159. This is the
|
---|
890 | default on Unix-like OS'es.
|
---|
891 |
|
---|
892 | When mode is set to ``windows'', Wget escapes the characters @samp{\},
|
---|
893 | @samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<},
|
---|
894 | @samp{>}, and the control characters in the ranges 0--31 and 128--159.
|
---|
895 | In addition to this, Wget in Windows mode uses @samp{+} instead of
|
---|
896 | @samp{:} to separate host and port in local file names, and uses
|
---|
897 | @samp{@@} instead of @samp{?} to separate the query portion of the file
|
---|
898 | name from the rest. Therefore, a URL that would be saved as
|
---|
899 | @samp{www.xemacs.org:4300/search.pl?input=blah} in Unix mode would be
|
---|
900 | saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows
|
---|
901 | mode. This mode is the default on Windows.
|
---|
902 |
|
---|
903 | If you append @samp{,nocontrol} to the mode, as in
|
---|
904 | @samp{unix,nocontrol}, escaping of the control characters is also
|
---|
905 | switched off. You can use @samp{--restrict-file-names=nocontrol} to
|
---|
906 | turn off escaping of control characters without affecting the choice of
|
---|
907 | the OS to use as file name restriction mode.
|
---|
908 |
|
---|
909 | @cindex IPv6
|
---|
910 | @itemx -4
|
---|
911 | @itemx --inet4-only
|
---|
912 | @itemx -6
|
---|
913 | @itemx --inet6-only
|
---|
914 | Force connecting to IPv4 or IPv6 addresses. With @samp{--inet4-only}
|
---|
915 | or @samp{-4}, Wget will only connect to IPv4 hosts, ignoring AAAA
|
---|
916 | records in DNS, and refusing to connect to IPv6 addresses specified in
|
---|
917 | URLs. Conversely, with @samp{--inet6-only} or @samp{-6}, Wget will
|
---|
918 | only connect to IPv6 hosts and ignore A records and IPv4 addresses.
|
---|
919 |
|
---|
920 | Neither options should be needed normally. By default, an IPv6-aware
|
---|
921 | Wget will use the address family specified by the host's DNS record.
|
---|
922 | If the DNS responds with both IPv4 and IPv6 addresses, Wget will them
|
---|
923 | in sequence until it finds one it can connect to. (Also see
|
---|
924 | @code{--prefer-family} option described below.)
|
---|
925 |
|
---|
926 | These options can be used to deliberately force the use of IPv4 or
|
---|
927 | IPv6 address families on dual family systems, usually to aid debugging
|
---|
928 | or to deal with broken network configuration. Only one of
|
---|
929 | @samp{--inet6-only} and @samp{--inet4-only} may be specified at the
|
---|
930 | same time. Neither option is available in Wget compiled without IPv6
|
---|
931 | support.
|
---|
932 |
|
---|
933 | @item --prefer-family=IPv4/IPv6/none
|
---|
934 | When given a choice of several addresses, connect to the addresses
|
---|
935 | with specified address family first. IPv4 addresses are preferred by
|
---|
936 | default.
|
---|
937 |
|
---|
938 | This avoids spurious errors and connect attempts when accessing hosts
|
---|
939 | that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For
|
---|
940 | example, @samp{www.kame.net} resolves to
|
---|
941 | @samp{2001:200:0:8002:203:47ff:fea5:3085} and to
|
---|
942 | @samp{203.178.141.194}. When the preferred family is @code{IPv4}, the
|
---|
943 | IPv4 address is used first; when the preferred family is @code{IPv6},
|
---|
944 | the IPv6 address is used first; if the specified value is @code{none},
|
---|
945 | the address order returned by DNS is used without change.
|
---|
946 |
|
---|
947 | Unlike @samp{-4} and @samp{-6}, this option doesn't inhibit access to
|
---|
948 | any address family, it only changes the @emph{order} in which the
|
---|
949 | addresses are accessed. Also note that the reordering performed by
|
---|
950 | this option is @dfn{stable}---it doesn't affect order of addresses of
|
---|
951 | the same family. That is, the relative order of all IPv4 addresses
|
---|
952 | and of all IPv6 addresses remains intact in all cases.
|
---|
953 |
|
---|
954 | @item --retry-connrefused
|
---|
955 | Consider ``connection refused'' a transient error and try again.
|
---|
956 | Normally Wget gives up on a URL when it is unable to connect to the
|
---|
957 | site because failure to connect is taken as a sign that the server is
|
---|
958 | not running at all and that retries would not help. This option is
|
---|
959 | for mirroring unreliable sites whose servers tend to disappear for
|
---|
960 | short periods of time.
|
---|
961 |
|
---|
962 | @cindex user
|
---|
963 | @cindex password
|
---|
964 | @cindex authentication
|
---|
965 | @item --user=@var{user}
|
---|
966 | @itemx --password=@var{password}
|
---|
967 | Specify the username @var{user} and password @var{password} for both
|
---|
968 | @sc{ftp} and @sc{http} file retrieval. These parameters can be overridden
|
---|
969 | using the @samp{--ftp-user} and @samp{--ftp-password} options for
|
---|
970 | @sc{ftp} connections and the @samp{--http-user} and @samp{--http-password}
|
---|
971 | options for @sc{http} connections.
|
---|
972 | @end table
|
---|
973 |
|
---|
974 | @node Directory Options
|
---|
975 | @section Directory Options
|
---|
976 |
|
---|
977 | @table @samp
|
---|
978 | @item -nd
|
---|
979 | @itemx --no-directories
|
---|
980 | Do not create a hierarchy of directories when retrieving recursively.
|
---|
981 | With this option turned on, all files will get saved to the current
|
---|
982 | directory, without clobbering (if a name shows up more than once, the
|
---|
983 | filenames will get extensions @samp{.n}).
|
---|
984 |
|
---|
985 | @item -x
|
---|
986 | @itemx --force-directories
|
---|
987 | The opposite of @samp{-nd}---create a hierarchy of directories, even if
|
---|
988 | one would not have been created otherwise. E.g. @samp{wget -x
|
---|
989 | http://fly.srk.fer.hr/robots.txt} will save the downloaded file to
|
---|
990 | @file{fly.srk.fer.hr/robots.txt}.
|
---|
991 |
|
---|
992 | @item -nH
|
---|
993 | @itemx --no-host-directories
|
---|
994 | Disable generation of host-prefixed directories. By default, invoking
|
---|
995 | Wget with @samp{-r http://fly.srk.fer.hr/} will create a structure of
|
---|
996 | directories beginning with @file{fly.srk.fer.hr/}. This option disables
|
---|
997 | such behavior.
|
---|
998 |
|
---|
999 | @item --protocol-directories
|
---|
1000 | Use the protocol name as a directory component of local file names. For
|
---|
1001 | example, with this option, @samp{wget -r http://@var{host}} will save to
|
---|
1002 | @samp{http/@var{host}/...} rather than just to @samp{@var{host}/...}.
|
---|
1003 |
|
---|
1004 | @cindex cut directories
|
---|
1005 | @item --cut-dirs=@var{number}
|
---|
1006 | Ignore @var{number} directory components. This is useful for getting a
|
---|
1007 | fine-grained control over the directory where recursive retrieval will
|
---|
1008 | be saved.
|
---|
1009 |
|
---|
1010 | Take, for example, the directory at
|
---|
1011 | @samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with
|
---|
1012 | @samp{-r}, it will be saved locally under
|
---|
1013 | @file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option can
|
---|
1014 | remove the @file{ftp.xemacs.org/} part, you are still stuck with
|
---|
1015 | @file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; it
|
---|
1016 | makes Wget not ``see'' @var{number} remote directory components. Here
|
---|
1017 | are several examples of how @samp{--cut-dirs} option works.
|
---|
1018 |
|
---|
1019 | @example
|
---|
1020 | @group
|
---|
1021 | No options -> ftp.xemacs.org/pub/xemacs/
|
---|
1022 | -nH -> pub/xemacs/
|
---|
1023 | -nH --cut-dirs=1 -> xemacs/
|
---|
1024 | -nH --cut-dirs=2 -> .
|
---|
1025 |
|
---|
1026 | --cut-dirs=1 -> ftp.xemacs.org/xemacs/
|
---|
1027 | ...
|
---|
1028 | @end group
|
---|
1029 | @end example
|
---|
1030 |
|
---|
1031 | If you just want to get rid of the directory structure, this option is
|
---|
1032 | similar to a combination of @samp{-nd} and @samp{-P}. However, unlike
|
---|
1033 | @samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---for
|
---|
1034 | instance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory will
|
---|
1035 | be placed to @file{xemacs/beta}, as one would expect.
|
---|
1036 |
|
---|
1037 | @cindex directory prefix
|
---|
1038 | @item -P @var{prefix}
|
---|
1039 | @itemx --directory-prefix=@var{prefix}
|
---|
1040 | Set directory prefix to @var{prefix}. The @dfn{directory prefix} is the
|
---|
1041 | directory where all other files and subdirectories will be saved to,
|
---|
1042 | i.e. the top of the retrieval tree. The default is @samp{.} (the
|
---|
1043 | current directory).
|
---|
1044 | @end table
|
---|
1045 |
|
---|
1046 | @node HTTP Options
|
---|
1047 | @section HTTP Options
|
---|
1048 |
|
---|
1049 | @table @samp
|
---|
1050 | @cindex .html extension
|
---|
1051 | @item -E
|
---|
1052 | @itemx --html-extension
|
---|
1053 | If a file of type @samp{application/xhtml+xml} or @samp{text/html} is
|
---|
1054 | downloaded and the URL does not end with the regexp
|
---|
1055 | @samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html}
|
---|
1056 | to be appended to the local filename. This is useful, for instance, when
|
---|
1057 | you're mirroring a remote site that uses @samp{.asp} pages, but you want
|
---|
1058 | the mirrored pages to be viewable on your stock Apache server. Another
|
---|
1059 | good use for this is when you're downloading CGI-generated materials. A URL
|
---|
1060 | like @samp{http://site.com/article.cgi?25} will be saved as
|
---|
1061 | @file{article.cgi?25.html}.
|
---|
1062 |
|
---|
1063 | Note that filenames changed in this way will be re-downloaded every time
|
---|
1064 | you re-mirror a site, because Wget can't tell that the local
|
---|
1065 | @file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since
|
---|
1066 | it doesn't yet know that the URL produces output of type
|
---|
1067 | @samp{text/html} or @samp{application/xhtml+xml}. To prevent this
|
---|
1068 | re-downloading, you must use @samp{-k} and @samp{-K} so that the original
|
---|
1069 | version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive
|
---|
1070 | Retrieval Options}).
|
---|
1071 |
|
---|
1072 | @cindex http user
|
---|
1073 | @cindex http password
|
---|
1074 | @cindex authentication
|
---|
1075 | @item --http-user=@var{user}
|
---|
1076 | @itemx --http-password=@var{password}
|
---|
1077 | Specify the username @var{user} and password @var{password} on an
|
---|
1078 | @sc{http} server. According to the type of the challenge, Wget will
|
---|
1079 | encode them using either the @code{basic} (insecure) or the
|
---|
1080 | @code{digest} authentication scheme.
|
---|
1081 |
|
---|
1082 | Another way to specify username and password is in the @sc{url} itself
|
---|
1083 | (@pxref{URL Format}). Either method reveals your password to anyone who
|
---|
1084 | bothers to run @code{ps}. To prevent the passwords from being seen,
|
---|
1085 | store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect
|
---|
1086 | those files from other users with @code{chmod}. If the passwords are
|
---|
1087 | really important, do not leave them lying in those files either---edit
|
---|
1088 | the files and delete them after Wget has started the download.
|
---|
1089 |
|
---|
1090 | @iftex
|
---|
1091 | For more information about security issues with Wget, @xref{Security
|
---|
1092 | Considerations}.
|
---|
1093 | @end iftex
|
---|
1094 |
|
---|
1095 | @cindex proxy
|
---|
1096 | @cindex cache
|
---|
1097 | @item --no-cache
|
---|
1098 | Disable server-side cache. In this case, Wget will send the remote
|
---|
1099 | server an appropriate directive (@samp{Pragma: no-cache}) to get the
|
---|
1100 | file from the remote service, rather than returning the cached version.
|
---|
1101 | This is especially useful for retrieving and flushing out-of-date
|
---|
1102 | documents on proxy servers.
|
---|
1103 |
|
---|
1104 | Caching is allowed by default.
|
---|
1105 |
|
---|
1106 | @cindex cookies
|
---|
1107 | @item --no-cookies
|
---|
1108 | Disable the use of cookies. Cookies are a mechanism for maintaining
|
---|
1109 | server-side state. The server sends the client a cookie using the
|
---|
1110 | @code{Set-Cookie} header, and the client responds with the same cookie
|
---|
1111 | upon further requests. Since cookies allow the server owners to keep
|
---|
1112 | track of visitors and for sites to exchange this information, some
|
---|
1113 | consider them a breach of privacy. The default is to use cookies;
|
---|
1114 | however, @emph{storing} cookies is not on by default.
|
---|
1115 |
|
---|
1116 | @cindex loading cookies
|
---|
1117 | @cindex cookies, loading
|
---|
1118 | @item --load-cookies @var{file}
|
---|
1119 | Load cookies from @var{file} before the first HTTP retrieval.
|
---|
1120 | @var{file} is a textual file in the format originally used by Netscape's
|
---|
1121 | @file{cookies.txt} file.
|
---|
1122 |
|
---|
1123 | You will typically use this option when mirroring sites that require
|
---|
1124 | that you be logged in to access some or all of their content. The login
|
---|
1125 | process typically works by the web server issuing an @sc{http} cookie
|
---|
1126 | upon receiving and verifying your credentials. The cookie is then
|
---|
1127 | resent by the browser when accessing that part of the site, and so
|
---|
1128 | proves your identity.
|
---|
1129 |
|
---|
1130 | Mirroring such a site requires Wget to send the same cookies your
|
---|
1131 | browser sends when communicating with the site. This is achieved by
|
---|
1132 | @samp{--load-cookies}---simply point Wget to the location of the
|
---|
1133 | @file{cookies.txt} file, and it will send the same cookies your browser
|
---|
1134 | would send in the same situation. Different browsers keep textual
|
---|
1135 | cookie files in different locations:
|
---|
1136 |
|
---|
1137 | @table @asis
|
---|
1138 | @item Netscape 4.x.
|
---|
1139 | The cookies are in @file{~/.netscape/cookies.txt}.
|
---|
1140 |
|
---|
1141 | @item Mozilla and Netscape 6.x.
|
---|
1142 | Mozilla's cookie file is also named @file{cookies.txt}, located
|
---|
1143 | somewhere under @file{~/.mozilla}, in the directory of your profile.
|
---|
1144 | The full path usually ends up looking somewhat like
|
---|
1145 | @file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}.
|
---|
1146 |
|
---|
1147 | @item Internet Explorer.
|
---|
1148 | You can produce a cookie file Wget can use by using the File menu,
|
---|
1149 | Import and Export, Export Cookies. This has been tested with Internet
|
---|
1150 | Explorer 5; it is not guaranteed to work with earlier versions.
|
---|
1151 |
|
---|
1152 | @item Other browsers.
|
---|
1153 | If you are using a different browser to create your cookies,
|
---|
1154 | @samp{--load-cookies} will only work if you can locate or produce a
|
---|
1155 | cookie file in the Netscape format that Wget expects.
|
---|
1156 | @end table
|
---|
1157 |
|
---|
1158 | If you cannot use @samp{--load-cookies}, there might still be an
|
---|
1159 | alternative. If your browser supports a ``cookie manager'', you can use
|
---|
1160 | it to view the cookies used when accessing the site you're mirroring.
|
---|
1161 | Write down the name and value of the cookie, and manually instruct Wget
|
---|
1162 | to send those cookies, bypassing the ``official'' cookie support:
|
---|
1163 |
|
---|
1164 | @example
|
---|
1165 | wget --no-cookies --header "Cookie: @var{name}=@var{value}"
|
---|
1166 | @end example
|
---|
1167 |
|
---|
1168 | @cindex saving cookies
|
---|
1169 | @cindex cookies, saving
|
---|
1170 | @item --save-cookies @var{file}
|
---|
1171 | Save cookies to @var{file} before exiting. This will not save cookies
|
---|
1172 | that have expired or that have no expiry time (so-called ``session
|
---|
1173 | cookies''), but also see @samp{--keep-session-cookies}.
|
---|
1174 |
|
---|
1175 | @cindex cookies, session
|
---|
1176 | @cindex session cookies
|
---|
1177 | @item --keep-session-cookies
|
---|
1178 | When specified, causes @samp{--save-cookies} to also save session
|
---|
1179 | cookies. Session cookies are normally not saved because they are
|
---|
1180 | meant to be kept in memory and forgotten when you exit the browser.
|
---|
1181 | Saving them is useful on sites that require you to log in or to visit
|
---|
1182 | the home page before you can access some pages. With this option,
|
---|
1183 | multiple Wget runs are considered a single browser session as far as
|
---|
1184 | the site is concerned.
|
---|
1185 |
|
---|
1186 | Since the cookie file format does not normally carry session cookies,
|
---|
1187 | Wget marks them with an expiry timestamp of 0. Wget's
|
---|
1188 | @samp{--load-cookies} recognizes those as session cookies, but it might
|
---|
1189 | confuse other browsers. Also note that cookies so loaded will be
|
---|
1190 | treated as other session cookies, which means that if you want
|
---|
1191 | @samp{--save-cookies} to preserve them again, you must use
|
---|
1192 | @samp{--keep-session-cookies} again.
|
---|
1193 |
|
---|
1194 | @cindex Content-Length, ignore
|
---|
1195 | @cindex ignore length
|
---|
1196 | @item --ignore-length
|
---|
1197 | Unfortunately, some @sc{http} servers (@sc{cgi} programs, to be more
|
---|
1198 | precise) send out bogus @code{Content-Length} headers, which makes Wget
|
---|
1199 | go wild, as it thinks not all the document was retrieved. You can spot
|
---|
1200 | this syndrome if Wget retries getting the same document again and again,
|
---|
1201 | each time claiming that the (otherwise normal) connection has closed on
|
---|
1202 | the very same byte.
|
---|
1203 |
|
---|
1204 | With this option, Wget will ignore the @code{Content-Length} header---as
|
---|
1205 | if it never existed.
|
---|
1206 |
|
---|
1207 | @cindex header, add
|
---|
1208 | @item --header=@var{header-line}
|
---|
1209 | Send @var{header-line} along with the rest of the headers in each
|
---|
1210 | @sc{http} request. The supplied header is sent as-is, which means it
|
---|
1211 | must contain name and value separated by colon, and must not contain
|
---|
1212 | newlines.
|
---|
1213 |
|
---|
1214 | You may define more than one additional header by specifying
|
---|
1215 | @samp{--header} more than once.
|
---|
1216 |
|
---|
1217 | @example
|
---|
1218 | @group
|
---|
1219 | wget --header='Accept-Charset: iso-8859-2' \
|
---|
1220 | --header='Accept-Language: hr' \
|
---|
1221 | http://fly.srk.fer.hr/
|
---|
1222 | @end group
|
---|
1223 | @end example
|
---|
1224 |
|
---|
1225 | Specification of an empty string as the header value will clear all
|
---|
1226 | previous user-defined headers.
|
---|
1227 |
|
---|
1228 | As of Wget 1.10, this option can be used to override headers otherwise
|
---|
1229 | generated automatically. This example instructs Wget to connect to
|
---|
1230 | localhost, but to specify @samp{foo.bar} in the @code{Host} header:
|
---|
1231 |
|
---|
1232 | @example
|
---|
1233 | wget --header="Host: foo.bar" http://localhost/
|
---|
1234 | @end example
|
---|
1235 |
|
---|
1236 | In versions of Wget prior to 1.10 such use of @samp{--header} caused
|
---|
1237 | sending of duplicate headers.
|
---|
1238 |
|
---|
1239 | @cindex proxy user
|
---|
1240 | @cindex proxy password
|
---|
1241 | @cindex proxy authentication
|
---|
1242 | @item --proxy-user=@var{user}
|
---|
1243 | @itemx --proxy-password=@var{password}
|
---|
1244 | Specify the username @var{user} and password @var{password} for
|
---|
1245 | authentication on a proxy server. Wget will encode them using the
|
---|
1246 | @code{basic} authentication scheme.
|
---|
1247 |
|
---|
1248 | Security considerations similar to those with @samp{--http-password}
|
---|
1249 | pertain here as well.
|
---|
1250 |
|
---|
1251 | @cindex http referer
|
---|
1252 | @cindex referer, http
|
---|
1253 | @item --referer=@var{url}
|
---|
1254 | Include `Referer: @var{url}' header in HTTP request. Useful for
|
---|
1255 | retrieving documents with server-side processing that assume they are
|
---|
1256 | always being retrieved by interactive web browsers and only come out
|
---|
1257 | properly when Referer is set to one of the pages that point to them.
|
---|
1258 |
|
---|
1259 | @cindex server response, save
|
---|
1260 | @item --save-headers
|
---|
1261 | Save the headers sent by the @sc{http} server to the file, preceding the
|
---|
1262 | actual contents, with an empty line as the separator.
|
---|
1263 |
|
---|
1264 | @cindex user-agent
|
---|
1265 | @item -U @var{agent-string}
|
---|
1266 | @itemx --user-agent=@var{agent-string}
|
---|
1267 | Identify as @var{agent-string} to the @sc{http} server.
|
---|
1268 |
|
---|
1269 | The @sc{http} protocol allows the clients to identify themselves using a
|
---|
1270 | @code{User-Agent} header field. This enables distinguishing the
|
---|
1271 | @sc{www} software, usually for statistical purposes or for tracing of
|
---|
1272 | protocol violations. Wget normally identifies as
|
---|
1273 | @samp{Wget/@var{version}}, @var{version} being the current version
|
---|
1274 | number of Wget.
|
---|
1275 |
|
---|
1276 | However, some sites have been known to impose the policy of tailoring
|
---|
1277 | the output according to the @code{User-Agent}-supplied information.
|
---|
1278 | While this is not such a bad idea in theory, it has been abused by
|
---|
1279 | servers denying information to clients other than (historically)
|
---|
1280 | Netscape or, more frequently, Microsoft Internet Explorer. This
|
---|
1281 | option allows you to change the @code{User-Agent} line issued by Wget.
|
---|
1282 | Use of this option is discouraged, unless you really know what you are
|
---|
1283 | doing.
|
---|
1284 |
|
---|
1285 | Specifying empty user agent with @samp{--user-agent=""} instructs Wget
|
---|
1286 | not to send the @code{User-Agent} header in @sc{http} requests.
|
---|
1287 |
|
---|
1288 | @cindex POST
|
---|
1289 | @item --post-data=@var{string}
|
---|
1290 | @itemx --post-file=@var{file}
|
---|
1291 | Use POST as the method for all HTTP requests and send the specified data
|
---|
1292 | in the request body. @code{--post-data} sends @var{string} as data,
|
---|
1293 | whereas @code{--post-file} sends the contents of @var{file}. Other than
|
---|
1294 | that, they work in exactly the same way.
|
---|
1295 |
|
---|
1296 | Please be aware that Wget needs to know the size of the POST data in
|
---|
1297 | advance. Therefore the argument to @code{--post-file} must be a regular
|
---|
1298 | file; specifying a FIFO or something like @file{/dev/stdin} won't work.
|
---|
1299 | It's not quite clear how to work around this limitation inherent in
|
---|
1300 | HTTP/1.0. Although HTTP/1.1 introduces @dfn{chunked} transfer that
|
---|
1301 | doesn't require knowing the request length in advance, a client can't
|
---|
1302 | use chunked unless it knows it's talking to an HTTP/1.1 server. And it
|
---|
1303 | can't know that until it receives a response, which in turn requires the
|
---|
1304 | request to have been completed -- a chicken-and-egg problem.
|
---|
1305 |
|
---|
1306 | Note: if Wget is redirected after the POST request is completed, it
|
---|
1307 | will not send the POST data to the redirected URL. This is because
|
---|
1308 | URLs that process POST often respond with a redirection to a regular
|
---|
1309 | page, which does not desire or accept POST. It is not completely
|
---|
1310 | clear that this behavior is optimal; if it doesn't work out, it might
|
---|
1311 | be changed in the future.
|
---|
1312 |
|
---|
1313 | This example shows how to log to a server using POST and then proceed to
|
---|
1314 | download the desired pages, presumably only accessible to authorized
|
---|
1315 | users:
|
---|
1316 |
|
---|
1317 | @example
|
---|
1318 | @group
|
---|
1319 | # @r{Log in to the server. This can be done only once.}
|
---|
1320 | wget --save-cookies cookies.txt \
|
---|
1321 | --post-data 'user=foo&password=bar' \
|
---|
1322 | http://server.com/auth.php
|
---|
1323 |
|
---|
1324 | # @r{Now grab the page or pages we care about.}
|
---|
1325 | wget --load-cookies cookies.txt \
|
---|
1326 | -p http://server.com/interesting/article.php
|
---|
1327 | @end group
|
---|
1328 | @end example
|
---|
1329 |
|
---|
1330 | If the server is using session cookies to track user authentication,
|
---|
1331 | the above will not work because @samp{--save-cookies} will not save
|
---|
1332 | them (and neither will browsers) and the @file{cookies.txt} file will
|
---|
1333 | be empty. In that case use @samp{--keep-session-cookies} along with
|
---|
1334 | @samp{--save-cookies} to force saving of session cookies.
|
---|
1335 | @end table
|
---|
1336 |
|
---|
1337 | @node HTTPS (SSL/TLS) Options
|
---|
1338 | @section HTTPS (SSL/TLS) Options
|
---|
1339 |
|
---|
1340 | @cindex SSL
|
---|
1341 | To support encrypted HTTP (HTTPS) downloads, Wget must be compiled
|
---|
1342 | with an external SSL library, currently OpenSSL. If Wget is compiled
|
---|
1343 | without SSL support, none of these options are available.
|
---|
1344 |
|
---|
1345 | @table @samp
|
---|
1346 | @cindex SSL protocol, choose
|
---|
1347 | @item --secure-protocol=@var{protocol}
|
---|
1348 | Choose the secure protocol to be used. Legal values are @samp{auto},
|
---|
1349 | @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. If @samp{auto} is used,
|
---|
1350 | the SSL library is given the liberty of choosing the appropriate
|
---|
1351 | protocol automatically, which is achieved by sending an SSLv2 greeting
|
---|
1352 | and announcing support for SSLv3 and TLSv1. This is the default.
|
---|
1353 |
|
---|
1354 | Specifying @samp{SSLv2}, @samp{SSLv3}, or @samp{TLSv1} forces the use
|
---|
1355 | of the corresponding protocol. This is useful when talking to old and
|
---|
1356 | buggy SSL server implementations that make it hard for OpenSSL to
|
---|
1357 | choose the correct protocol version. Fortunately, such servers are
|
---|
1358 | quite rare.
|
---|
1359 |
|
---|
1360 | @cindex SSL certificate, check
|
---|
1361 | @item --no-check-certificate
|
---|
1362 | Don't check the server certificate against the available certificate
|
---|
1363 | authorities. Also don't require the URL host name to match the common
|
---|
1364 | name presented by the certificate.
|
---|
1365 |
|
---|
1366 | As of Wget 1.10, the default is to verify the server's certificate
|
---|
1367 | against the recognized certificate authorities, breaking the SSL
|
---|
1368 | handshake and aborting the download if the verification fails.
|
---|
1369 | Although this provides more secure downloads, it does break
|
---|
1370 | interoperability with some sites that worked with previous Wget
|
---|
1371 | versions, particularly those using self-signed, expired, or otherwise
|
---|
1372 | invalid certificates. This option forces an ``insecure'' mode of
|
---|
1373 | operation that turns the certificate verification errors into warnings
|
---|
1374 | and allows you to proceed.
|
---|
1375 |
|
---|
1376 | If you encounter ``certificate verification'' errors or ones saying
|
---|
1377 | that ``common name doesn't match requested host name'', you can use
|
---|
1378 | this option to bypass the verification and proceed with the download.
|
---|
1379 | @emph{Only use this option if you are otherwise convinced of the
|
---|
1380 | site's authenticity, or if you really don't care about the validity of
|
---|
1381 | its certificate.} It is almost always a bad idea not to check the
|
---|
1382 | certificates when transmitting confidential or important data.
|
---|
1383 |
|
---|
1384 | @cindex SSL certificate
|
---|
1385 | @item --certificate=@var{file}
|
---|
1386 | Use the client certificate stored in @var{file}. This is needed for
|
---|
1387 | servers that are configured to require certificates from the clients
|
---|
1388 | that connect to them. Normally a certificate is not required and this
|
---|
1389 | switch is optional.
|
---|
1390 |
|
---|
1391 | @cindex SSL certificate type, specify
|
---|
1392 | @item --certificate-type=@var{type}
|
---|
1393 | Specify the type of the client certificate. Legal values are
|
---|
1394 | @samp{PEM} (assumed by default) and @samp{DER}, also known as
|
---|
1395 | @samp{ASN1}.
|
---|
1396 |
|
---|
1397 | @item --private-key=@var{file}
|
---|
1398 | Read the private key from @var{file}. This allows you to provide the
|
---|
1399 | private key in a file separate from the certificate.
|
---|
1400 |
|
---|
1401 | @item --private-key-type=@var{type}
|
---|
1402 | Specify the type of the private key. Accepted values are @samp{PEM}
|
---|
1403 | (the default) and @samp{DER}.
|
---|
1404 |
|
---|
1405 | @item --ca-certificate=@var{file}
|
---|
1406 | Use @var{file} as the file with the bundle of certificate authorities
|
---|
1407 | (``CA'') to verify the peers. The certificates must be in PEM format.
|
---|
1408 |
|
---|
1409 | Without this option Wget looks for CA certificates at the
|
---|
1410 | system-specified locations, chosen at OpenSSL installation time.
|
---|
1411 |
|
---|
1412 | @cindex SSL certificate authority
|
---|
1413 | @item --ca-directory=@var{directory}
|
---|
1414 | Specifies directory containing CA certificates in PEM format. Each
|
---|
1415 | file contains one CA certificate, and the file name is based on a hash
|
---|
1416 | value derived from the certificate. This is achieved by processing a
|
---|
1417 | certificate directory with the @code{c_rehash} utility supplied with
|
---|
1418 | OpenSSL. Using @samp{--ca-directory} is more efficient than
|
---|
1419 | @samp{--ca-certificate} when many certificates are installed because
|
---|
1420 | it allows Wget to fetch certificates on demand.
|
---|
1421 |
|
---|
1422 | Without this option Wget looks for CA certificates at the
|
---|
1423 | system-specified locations, chosen at OpenSSL installation time.
|
---|
1424 |
|
---|
1425 | @cindex entropy, specifying source of
|
---|
1426 | @cindex randomness, specifying source of
|
---|
1427 | @item --random-file=@var{file}
|
---|
1428 | Use @var{file} as the source of random data for seeding the
|
---|
1429 | pseudo-random number generator on systems without @file{/dev/random}.
|
---|
1430 |
|
---|
1431 | On such systems the SSL library needs an external source of randomness
|
---|
1432 | to initialize. Randomness may be provided by EGD (see
|
---|
1433 | @samp{--egd-file} below) or read from an external source specified by
|
---|
1434 | the user. If this option is not specified, Wget looks for random data
|
---|
1435 | in @code{$RANDFILE} or, if that is unset, in @file{$HOME/.rnd}. If
|
---|
1436 | none of those are available, it is likely that SSL encryption will not
|
---|
1437 | be usable.
|
---|
1438 |
|
---|
1439 | If you're getting the ``Could not seed OpenSSL PRNG; disabling SSL.''
|
---|
1440 | error, you should provide random data using some of the methods
|
---|
1441 | described above.
|
---|
1442 |
|
---|
1443 | @cindex EGD
|
---|
1444 | @item --egd-file=@var{file}
|
---|
1445 | Use @var{file} as the EGD socket. EGD stands for @dfn{Entropy
|
---|
1446 | Gathering Daemon}, a user-space program that collects data from
|
---|
1447 | various unpredictable system sources and makes it available to other
|
---|
1448 | programs that might need it. Encryption software, such as the SSL
|
---|
1449 | library, needs sources of non-repeating randomness to seed the random
|
---|
1450 | number generator used to produce cryptographically strong keys.
|
---|
1451 |
|
---|
1452 | OpenSSL allows the user to specify his own source of entropy using the
|
---|
1453 | @code{RAND_FILE} environment variable. If this variable is unset, or
|
---|
1454 | if the specified file does not produce enough randomness, OpenSSL will
|
---|
1455 | read random data from EGD socket specified using this option.
|
---|
1456 |
|
---|
1457 | If this option is not specified (and the equivalent startup command is
|
---|
1458 | not used), EGD is never contacted. EGD is not needed on modern Unix
|
---|
1459 | systems that support @file{/dev/random}.
|
---|
1460 | @end table
|
---|
1461 |
|
---|
1462 | @node FTP Options
|
---|
1463 | @section FTP Options
|
---|
1464 |
|
---|
1465 | @table @samp
|
---|
1466 | @cindex ftp user
|
---|
1467 | @cindex ftp password
|
---|
1468 | @cindex ftp authentication
|
---|
1469 | @item --ftp-user=@var{user}
|
---|
1470 | @itemx --ftp-password=@var{password}
|
---|
1471 | Specify the username @var{user} and password @var{password} on an
|
---|
1472 | @sc{ftp} server. Without this, or the corresponding startup option,
|
---|
1473 | the password defaults to @samp{-wget@@}, normally used for anonymous
|
---|
1474 | FTP.
|
---|
1475 |
|
---|
1476 | Another way to specify username and password is in the @sc{url} itself
|
---|
1477 | (@pxref{URL Format}). Either method reveals your password to anyone who
|
---|
1478 | bothers to run @code{ps}. To prevent the passwords from being seen,
|
---|
1479 | store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect
|
---|
1480 | those files from other users with @code{chmod}. If the passwords are
|
---|
1481 | really important, do not leave them lying in those files either---edit
|
---|
1482 | the files and delete them after Wget has started the download.
|
---|
1483 |
|
---|
1484 | @iftex
|
---|
1485 | For more information about security issues with Wget, @xref{Security
|
---|
1486 | Considerations}.
|
---|
1487 | @end iftex
|
---|
1488 |
|
---|
1489 | @cindex .listing files, removing
|
---|
1490 | @item --no-remove-listing
|
---|
1491 | Don't remove the temporary @file{.listing} files generated by @sc{ftp}
|
---|
1492 | retrievals. Normally, these files contain the raw directory listings
|
---|
1493 | received from @sc{ftp} servers. Not removing them can be useful for
|
---|
1494 | debugging purposes, or when you want to be able to easily check on the
|
---|
1495 | contents of remote server directories (e.g. to verify that a mirror
|
---|
1496 | you're running is complete).
|
---|
1497 |
|
---|
1498 | Note that even though Wget writes to a known filename for this file,
|
---|
1499 | this is not a security hole in the scenario of a user making
|
---|
1500 | @file{.listing} a symbolic link to @file{/etc/passwd} or something and
|
---|
1501 | asking @code{root} to run Wget in his or her directory. Depending on
|
---|
1502 | the options used, either Wget will refuse to write to @file{.listing},
|
---|
1503 | making the globbing/recursion/time-stamping operation fail, or the
|
---|
1504 | symbolic link will be deleted and replaced with the actual
|
---|
1505 | @file{.listing} file, or the listing will be written to a
|
---|
1506 | @file{.listing.@var{number}} file.
|
---|
1507 |
|
---|
1508 | Even though this situation isn't a problem, though, @code{root} should
|
---|
1509 | never run Wget in a non-trusted user's directory. A user could do
|
---|
1510 | something as simple as linking @file{index.html} to @file{/etc/passwd}
|
---|
1511 | and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the file
|
---|
1512 | will be overwritten.
|
---|
1513 |
|
---|
1514 | @cindex globbing, toggle
|
---|
1515 | @item --no-glob
|
---|
1516 | Turn off @sc{ftp} globbing. Globbing refers to the use of shell-like
|
---|
1517 | special characters (@dfn{wildcards}), like @samp{*}, @samp{?}, @samp{[}
|
---|
1518 | and @samp{]} to retrieve more than one file from the same directory at
|
---|
1519 | once, like:
|
---|
1520 |
|
---|
1521 | @example
|
---|
1522 | wget ftp://gnjilux.srk.fer.hr/*.msg
|
---|
1523 | @end example
|
---|
1524 |
|
---|
1525 | By default, globbing will be turned on if the @sc{url} contains a
|
---|
1526 | globbing character. This option may be used to turn globbing on or off
|
---|
1527 | permanently.
|
---|
1528 |
|
---|
1529 | You may have to quote the @sc{url} to protect it from being expanded by
|
---|
1530 | your shell. Globbing makes Wget look for a directory listing, which is
|
---|
1531 | system-specific. This is why it currently works only with Unix @sc{ftp}
|
---|
1532 | servers (and the ones emulating Unix @code{ls} output).
|
---|
1533 |
|
---|
1534 | @cindex passive ftp
|
---|
1535 | @item --no-passive-ftp
|
---|
1536 | Disable the use of the @dfn{passive} FTP transfer mode. Passive FTP
|
---|
1537 | mandates that the client connect to the server to establish the data
|
---|
1538 | connection rather than the other way around.
|
---|
1539 |
|
---|
1540 | If the machine is connected to the Internet directly, both passive and
|
---|
1541 | active FTP should work equally well. Behind most firewall and NAT
|
---|
1542 | configurations passive FTP has a better chance of working. However,
|
---|
1543 | in some rare firewall configurations, active FTP actually works when
|
---|
1544 | passive FTP doesn't. If you suspect this to be the case, use this
|
---|
1545 | option, or set @code{passive_ftp=off} in your init file.
|
---|
1546 |
|
---|
1547 | @cindex symbolic links, retrieving
|
---|
1548 | @item --retr-symlinks
|
---|
1549 | Usually, when retrieving @sc{ftp} directories recursively and a symbolic
|
---|
1550 | link is encountered, the linked-to file is not downloaded. Instead, a
|
---|
1551 | matching symbolic link is created on the local filesystem. The
|
---|
1552 | pointed-to file will not be downloaded unless this recursive retrieval
|
---|
1553 | would have encountered it separately and downloaded it anyway.
|
---|
1554 |
|
---|
1555 | When @samp{--retr-symlinks} is specified, however, symbolic links are
|
---|
1556 | traversed and the pointed-to files are retrieved. At this time, this
|
---|
1557 | option does not cause Wget to traverse symlinks to directories and
|
---|
1558 | recurse through them, but in the future it should be enhanced to do
|
---|
1559 | this.
|
---|
1560 |
|
---|
1561 | Note that when retrieving a file (not a directory) because it was
|
---|
1562 | specified on the command-line, rather than because it was recursed to,
|
---|
1563 | this option has no effect. Symbolic links are always traversed in this
|
---|
1564 | case.
|
---|
1565 |
|
---|
1566 | @cindex Keep-Alive, turning off
|
---|
1567 | @cindex Persistent Connections, disabling
|
---|
1568 | @item --no-http-keep-alive
|
---|
1569 | Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget
|
---|
1570 | asks the server to keep the connection open so that, when you download
|
---|
1571 | more than one document from the same server, they get transferred over
|
---|
1572 | the same TCP connection. This saves time and at the same time reduces
|
---|
1573 | the load on the server.
|
---|
1574 |
|
---|
1575 | This option is useful when, for some reason, persistent (keep-alive)
|
---|
1576 | connections don't work for you, for example due to a server bug or due
|
---|
1577 | to the inability of server-side scripts to cope with the connections.
|
---|
1578 | @end table
|
---|
1579 |
|
---|
1580 | @node Recursive Retrieval Options
|
---|
1581 | @section Recursive Retrieval Options
|
---|
1582 |
|
---|
1583 | @table @samp
|
---|
1584 | @item -r
|
---|
1585 | @itemx --recursive
|
---|
1586 | Turn on recursive retrieving. @xref{Recursive Download}, for more
|
---|
1587 | details.
|
---|
1588 |
|
---|
1589 | @item -l @var{depth}
|
---|
1590 | @itemx --level=@var{depth}
|
---|
1591 | Specify recursion maximum depth level @var{depth} (@pxref{Recursive
|
---|
1592 | Download}). The default maximum depth is 5.
|
---|
1593 |
|
---|
1594 | @cindex proxy filling
|
---|
1595 | @cindex delete after retrieval
|
---|
1596 | @cindex filling proxy cache
|
---|
1597 | @item --delete-after
|
---|
1598 | This option tells Wget to delete every single file it downloads,
|
---|
1599 | @emph{after} having done so. It is useful for pre-fetching popular
|
---|
1600 | pages through a proxy, e.g.:
|
---|
1601 |
|
---|
1602 | @example
|
---|
1603 | wget -r -nd --delete-after http://whatever.com/~popular/page/
|
---|
1604 | @end example
|
---|
1605 |
|
---|
1606 | The @samp{-r} option is to retrieve recursively, and @samp{-nd} to not
|
---|
1607 | create directories.
|
---|
1608 |
|
---|
1609 | Note that @samp{--delete-after} deletes files on the local machine. It
|
---|
1610 | does not issue the @samp{DELE} command to remote FTP sites, for
|
---|
1611 | instance. Also note that when @samp{--delete-after} is specified,
|
---|
1612 | @samp{--convert-links} is ignored, so @samp{.orig} files are simply not
|
---|
1613 | created in the first place.
|
---|
1614 |
|
---|
1615 | @cindex conversion of links
|
---|
1616 | @cindex link conversion
|
---|
1617 | @item -k
|
---|
1618 | @itemx --convert-links
|
---|
1619 | After the download is complete, convert the links in the document to
|
---|
1620 | make them suitable for local viewing. This affects not only the visible
|
---|
1621 | hyperlinks, but any part of the document that links to external content,
|
---|
1622 | such as embedded images, links to style sheets, hyperlinks to non-@sc{html}
|
---|
1623 | content, etc.
|
---|
1624 |
|
---|
1625 | Each link will be changed in one of the two ways:
|
---|
1626 |
|
---|
1627 | @itemize @bullet
|
---|
1628 | @item
|
---|
1629 | The links to files that have been downloaded by Wget will be changed to
|
---|
1630 | refer to the file they point to as a relative link.
|
---|
1631 |
|
---|
1632 | Example: if the downloaded file @file{/foo/doc.html} links to
|
---|
1633 | @file{/bar/img.gif}, also downloaded, then the link in @file{doc.html}
|
---|
1634 | will be modified to point to @samp{../bar/img.gif}. This kind of
|
---|
1635 | transformation works reliably for arbitrary combinations of directories.
|
---|
1636 |
|
---|
1637 | @item
|
---|
1638 | The links to files that have not been downloaded by Wget will be changed
|
---|
1639 | to include host name and absolute path of the location they point to.
|
---|
1640 |
|
---|
1641 | Example: if the downloaded file @file{/foo/doc.html} links to
|
---|
1642 | @file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in
|
---|
1643 | @file{doc.html} will be modified to point to
|
---|
1644 | @file{http://@var{hostname}/bar/img.gif}.
|
---|
1645 | @end itemize
|
---|
1646 |
|
---|
1647 | Because of this, local browsing works reliably: if a linked file was
|
---|
1648 | downloaded, the link will refer to its local name; if it was not
|
---|
1649 | downloaded, the link will refer to its full Internet address rather than
|
---|
1650 | presenting a broken link. The fact that the former links are converted
|
---|
1651 | to relative links ensures that you can move the downloaded hierarchy to
|
---|
1652 | another directory.
|
---|
1653 |
|
---|
1654 | Note that only at the end of the download can Wget know which links have
|
---|
1655 | been downloaded. Because of that, the work done by @samp{-k} will be
|
---|
1656 | performed at the end of all the downloads.
|
---|
1657 |
|
---|
1658 | @cindex backing up converted files
|
---|
1659 | @item -K
|
---|
1660 | @itemx --backup-converted
|
---|
1661 | When converting a file, back up the original version with a @samp{.orig}
|
---|
1662 | suffix. Affects the behavior of @samp{-N} (@pxref{HTTP Time-Stamping
|
---|
1663 | Internals}).
|
---|
1664 |
|
---|
1665 | @item -m
|
---|
1666 | @itemx --mirror
|
---|
1667 | Turn on options suitable for mirroring. This option turns on recursion
|
---|
1668 | and time-stamping, sets infinite recursion depth and keeps @sc{ftp}
|
---|
1669 | directory listings. It is currently equivalent to
|
---|
1670 | @samp{-r -N -l inf --no-remove-listing}.
|
---|
1671 |
|
---|
1672 | @cindex page requisites
|
---|
1673 | @cindex required images, downloading
|
---|
1674 | @item -p
|
---|
1675 | @itemx --page-requisites
|
---|
1676 | This option causes Wget to download all the files that are necessary to
|
---|
1677 | properly display a given @sc{html} page. This includes such things as
|
---|
1678 | inlined images, sounds, and referenced stylesheets.
|
---|
1679 |
|
---|
1680 | Ordinarily, when downloading a single @sc{html} page, any requisite documents
|
---|
1681 | that may be needed to display it properly are not downloaded. Using
|
---|
1682 | @samp{-r} together with @samp{-l} can help, but since Wget does not
|
---|
1683 | ordinarily distinguish between external and inlined documents, one is
|
---|
1684 | generally left with ``leaf documents'' that are missing their
|
---|
1685 | requisites.
|
---|
1686 |
|
---|
1687 | For instance, say document @file{1.html} contains an @code{<IMG>} tag
|
---|
1688 | referencing @file{1.gif} and an @code{<A>} tag pointing to external
|
---|
1689 | document @file{2.html}. Say that @file{2.html} is similar but that its
|
---|
1690 | image is @file{2.gif} and it links to @file{3.html}. Say this
|
---|
1691 | continues up to some arbitrarily high number.
|
---|
1692 |
|
---|
1693 | If one executes the command:
|
---|
1694 |
|
---|
1695 | @example
|
---|
1696 | wget -r -l 2 http://@var{site}/1.html
|
---|
1697 | @end example
|
---|
1698 |
|
---|
1699 | then @file{1.html}, @file{1.gif}, @file{2.html}, @file{2.gif}, and
|
---|
1700 | @file{3.html} will be downloaded. As you can see, @file{3.html} is
|
---|
1701 | without its requisite @file{3.gif} because Wget is simply counting the
|
---|
1702 | number of hops (up to 2) away from @file{1.html} in order to determine
|
---|
1703 | where to stop the recursion. However, with this command:
|
---|
1704 |
|
---|
1705 | @example
|
---|
1706 | wget -r -l 2 -p http://@var{site}/1.html
|
---|
1707 | @end example
|
---|
1708 |
|
---|
1709 | all the above files @emph{and} @file{3.html}'s requisite @file{3.gif}
|
---|
1710 | will be downloaded. Similarly,
|
---|
1711 |
|
---|
1712 | @example
|
---|
1713 | wget -r -l 1 -p http://@var{site}/1.html
|
---|
1714 | @end example
|
---|
1715 |
|
---|
1716 | will cause @file{1.html}, @file{1.gif}, @file{2.html}, and @file{2.gif}
|
---|
1717 | to be downloaded. One might think that:
|
---|
1718 |
|
---|
1719 | @example
|
---|
1720 | wget -r -l 0 -p http://@var{site}/1.html
|
---|
1721 | @end example
|
---|
1722 |
|
---|
1723 | would download just @file{1.html} and @file{1.gif}, but unfortunately
|
---|
1724 | this is not the case, because @samp{-l 0} is equivalent to
|
---|
1725 | @samp{-l inf}---that is, infinite recursion. To download a single @sc{html}
|
---|
1726 | page (or a handful of them, all specified on the command-line or in a
|
---|
1727 | @samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off
|
---|
1728 | @samp{-r} and @samp{-l}:
|
---|
1729 |
|
---|
1730 | @example
|
---|
1731 | wget -p http://@var{site}/1.html
|
---|
1732 | @end example
|
---|
1733 |
|
---|
1734 | Note that Wget will behave as if @samp{-r} had been specified, but only
|
---|
1735 | that single page and its requisites will be downloaded. Links from that
|
---|
1736 | page to external documents will not be followed. Actually, to download
|
---|
1737 | a single page and all its requisites (even if they exist on separate
|
---|
1738 | websites), and make sure the lot displays properly locally, this author
|
---|
1739 | likes to use a few options in addition to @samp{-p}:
|
---|
1740 |
|
---|
1741 | @example
|
---|
1742 | wget -E -H -k -K -p http://@var{site}/@var{document}
|
---|
1743 | @end example
|
---|
1744 |
|
---|
1745 | To finish off this topic, it's worth knowing that Wget's idea of an
|
---|
1746 | external document link is any URL specified in an @code{<A>} tag, an
|
---|
1747 | @code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK
|
---|
1748 | REL="stylesheet">}.
|
---|
1749 |
|
---|
1750 | @cindex @sc{html} comments
|
---|
1751 | @cindex comments, @sc{html}
|
---|
1752 | @item --strict-comments
|
---|
1753 | Turn on strict parsing of @sc{html} comments. The default is to terminate
|
---|
1754 | comments at the first occurrence of @samp{-->}.
|
---|
1755 |
|
---|
1756 | According to specifications, @sc{html} comments are expressed as @sc{sgml}
|
---|
1757 | @dfn{declarations}. Declaration is special markup that begins with
|
---|
1758 | @samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that
|
---|
1759 | may contain comments between a pair of @samp{--} delimiters. @sc{html}
|
---|
1760 | comments are ``empty declarations'', @sc{sgml} declarations without any
|
---|
1761 | non-comment text. Therefore, @samp{<!--foo-->} is a valid comment, and
|
---|
1762 | so is @samp{<!--one-- --two-->}, but @samp{<!--1--2-->} is not.
|
---|
1763 |
|
---|
1764 | On the other hand, most @sc{html} writers don't perceive comments as anything
|
---|
1765 | other than text delimited with @samp{<!--} and @samp{-->}, which is not
|
---|
1766 | quite the same. For example, something like @samp{<!------------>}
|
---|
1767 | works as a valid comment as long as the number of dashes is a multiple
|
---|
1768 | of four (!). If not, the comment technically lasts until the next
|
---|
1769 | @samp{--}, which may be at the other end of the document. Because of
|
---|
1770 | this, many popular browsers completely ignore the specification and
|
---|
1771 | implement what users have come to expect: comments delimited with
|
---|
1772 | @samp{<!--} and @samp{-->}.
|
---|
1773 |
|
---|
1774 | Until version 1.9, Wget interpreted comments strictly, which resulted in
|
---|
1775 | missing links in many web pages that displayed fine in browsers, but had
|
---|
1776 | the misfortune of containing non-compliant comments. Beginning with
|
---|
1777 | version 1.9, Wget has joined the ranks of clients that implements
|
---|
1778 | ``naive'' comments, terminating each comment at the first occurrence of
|
---|
1779 | @samp{-->}.
|
---|
1780 |
|
---|
1781 | If, for whatever reason, you want strict comment parsing, use this
|
---|
1782 | option to turn it on.
|
---|
1783 | @end table
|
---|
1784 |
|
---|
1785 | @node Recursive Accept/Reject Options
|
---|
1786 | @section Recursive Accept/Reject Options
|
---|
1787 |
|
---|
1788 | @table @samp
|
---|
1789 | @item -A @var{acclist} --accept @var{acclist}
|
---|
1790 | @itemx -R @var{rejlist} --reject @var{rejlist}
|
---|
1791 | Specify comma-separated lists of file name suffixes or patterns to
|
---|
1792 | accept or reject (@pxref{Types of Files} for more details).
|
---|
1793 |
|
---|
1794 | @item -D @var{domain-list}
|
---|
1795 | @itemx --domains=@var{domain-list}
|
---|
1796 | Set domains to be followed. @var{domain-list} is a comma-separated list
|
---|
1797 | of domains. Note that it does @emph{not} turn on @samp{-H}.
|
---|
1798 |
|
---|
1799 | @item --exclude-domains @var{domain-list}
|
---|
1800 | Specify the domains that are @emph{not} to be followed.
|
---|
1801 | (@pxref{Spanning Hosts}).
|
---|
1802 |
|
---|
1803 | @cindex follow FTP links
|
---|
1804 | @item --follow-ftp
|
---|
1805 | Follow @sc{ftp} links from @sc{html} documents. Without this option,
|
---|
1806 | Wget will ignore all the @sc{ftp} links.
|
---|
1807 |
|
---|
1808 | @cindex tag-based recursive pruning
|
---|
1809 | @item --follow-tags=@var{list}
|
---|
1810 | Wget has an internal table of @sc{html} tag / attribute pairs that it
|
---|
1811 | considers when looking for linked documents during a recursive
|
---|
1812 | retrieval. If a user wants only a subset of those tags to be
|
---|
1813 | considered, however, he or she should be specify such tags in a
|
---|
1814 | comma-separated @var{list} with this option.
|
---|
1815 |
|
---|
1816 | @item --ignore-tags=@var{list}
|
---|
1817 | This is the opposite of the @samp{--follow-tags} option. To skip
|
---|
1818 | certain @sc{html} tags when recursively looking for documents to download,
|
---|
1819 | specify them in a comma-separated @var{list}.
|
---|
1820 |
|
---|
1821 | In the past, this option was the best bet for downloading a single page
|
---|
1822 | and its requisites, using a command-line like:
|
---|
1823 |
|
---|
1824 | @example
|
---|
1825 | wget --ignore-tags=a,area -H -k -K -r http://@var{site}/@var{document}
|
---|
1826 | @end example
|
---|
1827 |
|
---|
1828 | However, the author of this option came across a page with tags like
|
---|
1829 | @code{<LINK REL="home" HREF="/">} and came to the realization that
|
---|
1830 | specifying tags to ignore was not enough. One can't just tell Wget to
|
---|
1831 | ignore @code{<LINK>}, because then stylesheets will not be downloaded.
|
---|
1832 | Now the best bet for downloading a single page and its requisites is the
|
---|
1833 | dedicated @samp{--page-requisites} option.
|
---|
1834 |
|
---|
1835 | @item -H
|
---|
1836 | @itemx --span-hosts
|
---|
1837 | Enable spanning across hosts when doing recursive retrieving
|
---|
1838 | (@pxref{Spanning Hosts}).
|
---|
1839 |
|
---|
1840 | @item -L
|
---|
1841 | @itemx --relative
|
---|
1842 | Follow relative links only. Useful for retrieving a specific home page
|
---|
1843 | without any distractions, not even those from the same hosts
|
---|
1844 | (@pxref{Relative Links}).
|
---|
1845 |
|
---|
1846 | @item -I @var{list}
|
---|
1847 | @itemx --include-directories=@var{list}
|
---|
1848 | Specify a comma-separated list of directories you wish to follow when
|
---|
1849 | downloading (@pxref{Directory-Based Limits} for more details.) Elements
|
---|
1850 | of @var{list} may contain wildcards.
|
---|
1851 |
|
---|
1852 | @item -X @var{list}
|
---|
1853 | @itemx --exclude-directories=@var{list}
|
---|
1854 | Specify a comma-separated list of directories you wish to exclude from
|
---|
1855 | download (@pxref{Directory-Based Limits} for more details.) Elements of
|
---|
1856 | @var{list} may contain wildcards.
|
---|
1857 |
|
---|
1858 | @item -np
|
---|
1859 | @item --no-parent
|
---|
1860 | Do not ever ascend to the parent directory when retrieving recursively.
|
---|
1861 | This is a useful option, since it guarantees that only the files
|
---|
1862 | @emph{below} a certain hierarchy will be downloaded.
|
---|
1863 | @xref{Directory-Based Limits}, for more details.
|
---|
1864 | @end table
|
---|
1865 |
|
---|
1866 | @c man end
|
---|
1867 |
|
---|
1868 | @node Recursive Download
|
---|
1869 | @chapter Recursive Download
|
---|
1870 | @cindex recursion
|
---|
1871 | @cindex retrieving
|
---|
1872 | @cindex recursive download
|
---|
1873 |
|
---|
1874 | GNU Wget is capable of traversing parts of the Web (or a single
|
---|
1875 | @sc{http} or @sc{ftp} server), following links and directory structure.
|
---|
1876 | We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
|
---|
1877 |
|
---|
1878 | With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
|
---|
1879 | the given @sc{url}, documents, retrieving the files the @sc{html}
|
---|
1880 | document was referring to, through markup like @code{href}, or
|
---|
1881 | @code{src}. If the freshly downloaded file is also of type
|
---|
1882 | @code{text/html} or @code{application/xhtml+xml}, it will be parsed and
|
---|
1883 | followed further.
|
---|
1884 |
|
---|
1885 | Recursive retrieval of @sc{http} and @sc{html} content is
|
---|
1886 | @dfn{breadth-first}. This means that Wget first downloads the requested
|
---|
1887 | @sc{html} document, then the documents linked from that document, then the
|
---|
1888 | documents linked by them, and so on. In other words, Wget first
|
---|
1889 | downloads the documents at depth 1, then those at depth 2, and so on
|
---|
1890 | until the specified maximum depth.
|
---|
1891 |
|
---|
1892 | The maximum @dfn{depth} to which the retrieval may descend is specified
|
---|
1893 | with the @samp{-l} option. The default maximum depth is five layers.
|
---|
1894 |
|
---|
1895 | When retrieving an @sc{ftp} @sc{url} recursively, Wget will retrieve all
|
---|
1896 | the data from the given directory tree (including the subdirectories up
|
---|
1897 | to the specified depth) on the remote server, creating its mirror image
|
---|
1898 | locally. @sc{ftp} retrieval is also limited by the @code{depth}
|
---|
1899 | parameter. Unlike @sc{http} recursion, @sc{ftp} recursion is performed
|
---|
1900 | depth-first.
|
---|
1901 |
|
---|
1902 | By default, Wget will create a local directory tree, corresponding to
|
---|
1903 | the one found on the remote server.
|
---|
1904 |
|
---|
1905 | Recursive retrieving can find a number of applications, the most
|
---|
1906 | important of which is mirroring. It is also useful for @sc{www}
|
---|
1907 | presentations, and any other opportunities where slow network
|
---|
1908 | connections should be bypassed by storing the files locally.
|
---|
1909 |
|
---|
1910 | You should be warned that recursive downloads can overload the remote
|
---|
1911 | servers. Because of that, many administrators frown upon them and may
|
---|
1912 | ban access from your site if they detect very fast downloads of big
|
---|
1913 | amounts of content. When downloading from Internet servers, consider
|
---|
1914 | using the @samp{-w} option to introduce a delay between accesses to the
|
---|
1915 | server. The download will take a while longer, but the server
|
---|
1916 | administrator will not be alarmed by your rudeness.
|
---|
1917 |
|
---|
1918 | Of course, recursive download may cause problems on your machine. If
|
---|
1919 | left to run unchecked, it can easily fill up the disk. If downloading
|
---|
1920 | from local network, it can also take bandwidth on the system, as well as
|
---|
1921 | consume memory and CPU.
|
---|
1922 |
|
---|
1923 | Try to specify the criteria that match the kind of download you are
|
---|
1924 | trying to achieve. If you want to download only one page, use
|
---|
1925 | @samp{--page-requisites} without any additional recursion. If you want
|
---|
1926 | to download things under one directory, use @samp{-np} to avoid
|
---|
1927 | downloading things from other directories. If you want to download all
|
---|
1928 | the files from one directory, use @samp{-l 1} to make sure the recursion
|
---|
1929 | depth never exceeds one. @xref{Following Links}, for more information
|
---|
1930 | about this.
|
---|
1931 |
|
---|
1932 | Recursive retrieval should be used with care. Don't say you were not
|
---|
1933 | warned.
|
---|
1934 |
|
---|
1935 | @node Following Links
|
---|
1936 | @chapter Following Links
|
---|
1937 | @cindex links
|
---|
1938 | @cindex following links
|
---|
1939 |
|
---|
1940 | When retrieving recursively, one does not wish to retrieve loads of
|
---|
1941 | unnecessary data. Most of the time the users bear in mind exactly what
|
---|
1942 | they want to download, and want Wget to follow only specific links.
|
---|
1943 |
|
---|
1944 | For example, if you wish to download the music archive from
|
---|
1945 | @samp{fly.srk.fer.hr}, you will not want to download all the home pages
|
---|
1946 | that happen to be referenced by an obscure part of the archive.
|
---|
1947 |
|
---|
1948 | Wget possesses several mechanisms that allows you to fine-tune which
|
---|
1949 | links it will follow.
|
---|
1950 |
|
---|
1951 | @menu
|
---|
1952 | * Spanning Hosts:: (Un)limiting retrieval based on host name.
|
---|
1953 | * Types of Files:: Getting only certain files.
|
---|
1954 | * Directory-Based Limits:: Getting only certain directories.
|
---|
1955 | * Relative Links:: Follow relative links only.
|
---|
1956 | * FTP Links:: Following FTP links.
|
---|
1957 | @end menu
|
---|
1958 |
|
---|
1959 | @node Spanning Hosts
|
---|
1960 | @section Spanning Hosts
|
---|
1961 | @cindex spanning hosts
|
---|
1962 | @cindex hosts, spanning
|
---|
1963 |
|
---|
1964 | Wget's recursive retrieval normally refuses to visit hosts different
|
---|
1965 | than the one you specified on the command line. This is a reasonable
|
---|
1966 | default; without it, every retrieval would have the potential to turn
|
---|
1967 | your Wget into a small version of google.
|
---|
1968 |
|
---|
1969 | However, visiting different hosts, or @dfn{host spanning,} is sometimes
|
---|
1970 | a useful option. Maybe the images are served from a different server.
|
---|
1971 | Maybe you're mirroring a site that consists of pages interlinked between
|
---|
1972 | three servers. Maybe the server has two equivalent names, and the @sc{html}
|
---|
1973 | pages refer to both interchangeably.
|
---|
1974 |
|
---|
1975 | @table @asis
|
---|
1976 | @item Span to any host---@samp{-H}
|
---|
1977 |
|
---|
1978 | The @samp{-H} option turns on host spanning, thus allowing Wget's
|
---|
1979 | recursive run to visit any host referenced by a link. Unless sufficient
|
---|
1980 | recursion-limiting criteria are applied depth, these foreign hosts will
|
---|
1981 | typically link to yet more hosts, and so on until Wget ends up sucking
|
---|
1982 | up much more data than you have intended.
|
---|
1983 |
|
---|
1984 | @item Limit spanning to certain domains---@samp{-D}
|
---|
1985 |
|
---|
1986 | The @samp{-D} option allows you to specify the domains that will be
|
---|
1987 | followed, thus limiting the recursion only to the hosts that belong to
|
---|
1988 | these domains. Obviously, this makes sense only in conjunction with
|
---|
1989 | @samp{-H}. A typical example would be downloading the contents of
|
---|
1990 | @samp{www.server.com}, but allowing downloads from
|
---|
1991 | @samp{images.server.com}, etc.:
|
---|
1992 |
|
---|
1993 | @example
|
---|
1994 | wget -rH -Dserver.com http://www.server.com/
|
---|
1995 | @end example
|
---|
1996 |
|
---|
1997 | You can specify more than one address by separating them with a comma,
|
---|
1998 | e.g. @samp{-Ddomain1.com,domain2.com}.
|
---|
1999 |
|
---|
2000 | @item Keep download off certain domains---@samp{--exclude-domains}
|
---|
2001 |
|
---|
2002 | If there are domains you want to exclude specifically, you can do it
|
---|
2003 | with @samp{--exclude-domains}, which accepts the same type of arguments
|
---|
2004 | of @samp{-D}, but will @emph{exclude} all the listed domains. For
|
---|
2005 | example, if you want to download all the hosts from @samp{foo.edu}
|
---|
2006 | domain, with the exception of @samp{sunsite.foo.edu}, you can do it like
|
---|
2007 | this:
|
---|
2008 |
|
---|
2009 | @example
|
---|
2010 | wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
|
---|
2011 | http://www.foo.edu/
|
---|
2012 | @end example
|
---|
2013 |
|
---|
2014 | @end table
|
---|
2015 |
|
---|
2016 | @node Types of Files
|
---|
2017 | @section Types of Files
|
---|
2018 | @cindex types of files
|
---|
2019 |
|
---|
2020 | When downloading material from the web, you will often want to restrict
|
---|
2021 | the retrieval to only certain file types. For example, if you are
|
---|
2022 | interested in downloading @sc{gif}s, you will not be overjoyed to get
|
---|
2023 | loads of PostScript documents, and vice versa.
|
---|
2024 |
|
---|
2025 | Wget offers two options to deal with this problem. Each option
|
---|
2026 | description lists a short name, a long name, and the equivalent command
|
---|
2027 | in @file{.wgetrc}.
|
---|
2028 |
|
---|
2029 | @cindex accept wildcards
|
---|
2030 | @cindex accept suffixes
|
---|
2031 | @cindex wildcards, accept
|
---|
2032 | @cindex suffixes, accept
|
---|
2033 | @table @samp
|
---|
2034 | @item -A @var{acclist}
|
---|
2035 | @itemx --accept @var{acclist}
|
---|
2036 | @itemx accept = @var{acclist}
|
---|
2037 | The argument to @samp{--accept} option is a list of file suffixes or
|
---|
2038 | patterns that Wget will download during recursive retrieval. A suffix
|
---|
2039 | is the ending part of a file, and consists of ``normal'' letters,
|
---|
2040 | e.g. @samp{gif} or @samp{.jpg}. A matching pattern contains shell-like
|
---|
2041 | wildcards, e.g. @samp{books*} or @samp{zelazny*196[0-9]*}.
|
---|
2042 |
|
---|
2043 | So, specifying @samp{wget -A gif,jpg} will make Wget download only the
|
---|
2044 | files ending with @samp{gif} or @samp{jpg}, i.e. @sc{gif}s and
|
---|
2045 | @sc{jpeg}s. On the other hand, @samp{wget -A "zelazny*196[0-9]*"} will
|
---|
2046 | download only files beginning with @samp{zelazny} and containing numbers
|
---|
2047 | from 1960 to 1969 anywhere within. Look up the manual of your shell for
|
---|
2048 | a description of how pattern matching works.
|
---|
2049 |
|
---|
2050 | Of course, any number of suffixes and patterns can be combined into a
|
---|
2051 | comma-separated list, and given as an argument to @samp{-A}.
|
---|
2052 |
|
---|
2053 | @cindex reject wildcards
|
---|
2054 | @cindex reject suffixes
|
---|
2055 | @cindex wildcards, reject
|
---|
2056 | @cindex suffixes, reject
|
---|
2057 | @item -R @var{rejlist}
|
---|
2058 | @itemx --reject @var{rejlist}
|
---|
2059 | @itemx reject = @var{rejlist}
|
---|
2060 | The @samp{--reject} option works the same way as @samp{--accept}, only
|
---|
2061 | its logic is the reverse; Wget will download all files @emph{except} the
|
---|
2062 | ones matching the suffixes (or patterns) in the list.
|
---|
2063 |
|
---|
2064 | So, if you want to download a whole page except for the cumbersome
|
---|
2065 | @sc{mpeg}s and @sc{.au} files, you can use @samp{wget -R mpg,mpeg,au}.
|
---|
2066 | Analogously, to download all files except the ones beginning with
|
---|
2067 | @samp{bjork}, use @samp{wget -R "bjork*"}. The quotes are to prevent
|
---|
2068 | expansion by the shell.
|
---|
2069 | @end table
|
---|
2070 |
|
---|
2071 | The @samp{-A} and @samp{-R} options may be combined to achieve even
|
---|
2072 | better fine-tuning of which files to retrieve. E.g. @samp{wget -A
|
---|
2073 | "*zelazny*" -R .ps} will download all the files having @samp{zelazny} as
|
---|
2074 | a part of their name, but @emph{not} the PostScript files.
|
---|
2075 |
|
---|
2076 | Note that these two options do not affect the downloading of @sc{html}
|
---|
2077 | files; Wget must load all the @sc{html}s to know where to go at
|
---|
2078 | all---recursive retrieval would make no sense otherwise.
|
---|
2079 |
|
---|
2080 | @node Directory-Based Limits
|
---|
2081 | @section Directory-Based Limits
|
---|
2082 | @cindex directories
|
---|
2083 | @cindex directory limits
|
---|
2084 |
|
---|
2085 | Regardless of other link-following facilities, it is often useful to
|
---|
2086 | place the restriction of what files to retrieve based on the directories
|
---|
2087 | those files are placed in. There can be many reasons for this---the
|
---|
2088 | home pages may be organized in a reasonable directory structure; or some
|
---|
2089 | directories may contain useless information, e.g. @file{/cgi-bin} or
|
---|
2090 | @file{/dev} directories.
|
---|
2091 |
|
---|
2092 | Wget offers three different options to deal with this requirement. Each
|
---|
2093 | option description lists a short name, a long name, and the equivalent
|
---|
2094 | command in @file{.wgetrc}.
|
---|
2095 |
|
---|
2096 | @cindex directories, include
|
---|
2097 | @cindex include directories
|
---|
2098 | @cindex accept directories
|
---|
2099 | @table @samp
|
---|
2100 | @item -I @var{list}
|
---|
2101 | @itemx --include @var{list}
|
---|
2102 | @itemx include_directories = @var{list}
|
---|
2103 | @samp{-I} option accepts a comma-separated list of directories included
|
---|
2104 | in the retrieval. Any other directories will simply be ignored. The
|
---|
2105 | directories are absolute paths.
|
---|
2106 |
|
---|
2107 | So, if you wish to download from @samp{http://host/people/bozo/}
|
---|
2108 | following only links to bozo's colleagues in the @file{/people}
|
---|
2109 | directory and the bogus scripts in @file{/cgi-bin}, you can specify:
|
---|
2110 |
|
---|
2111 | @example
|
---|
2112 | wget -I /people,/cgi-bin http://host/people/bozo/
|
---|
2113 | @end example
|
---|
2114 |
|
---|
2115 | @cindex directories, exclude
|
---|
2116 | @cindex exclude directories
|
---|
2117 | @cindex reject directories
|
---|
2118 | @item -X @var{list}
|
---|
2119 | @itemx --exclude @var{list}
|
---|
2120 | @itemx exclude_directories = @var{list}
|
---|
2121 | @samp{-X} option is exactly the reverse of @samp{-I}---this is a list of
|
---|
2122 | directories @emph{excluded} from the download. E.g. if you do not want
|
---|
2123 | Wget to download things from @file{/cgi-bin} directory, specify @samp{-X
|
---|
2124 | /cgi-bin} on the command line.
|
---|
2125 |
|
---|
2126 | The same as with @samp{-A}/@samp{-R}, these two options can be combined
|
---|
2127 | to get a better fine-tuning of downloading subdirectories. E.g. if you
|
---|
2128 | want to load all the files from @file{/pub} hierarchy except for
|
---|
2129 | @file{/pub/worthless}, specify @samp{-I/pub -X/pub/worthless}.
|
---|
2130 |
|
---|
2131 | @cindex no parent
|
---|
2132 | @item -np
|
---|
2133 | @itemx --no-parent
|
---|
2134 | @itemx no_parent = on
|
---|
2135 | The simplest, and often very useful way of limiting directories is
|
---|
2136 | disallowing retrieval of the links that refer to the hierarchy
|
---|
2137 | @dfn{above} than the beginning directory, i.e. disallowing ascent to the
|
---|
2138 | parent directory/directories.
|
---|
2139 |
|
---|
2140 | The @samp{--no-parent} option (short @samp{-np}) is useful in this case.
|
---|
2141 | Using it guarantees that you will never leave the existing hierarchy.
|
---|
2142 | Supposing you issue Wget with:
|
---|
2143 |
|
---|
2144 | @example
|
---|
2145 | wget -r --no-parent http://somehost/~luzer/my-archive/
|
---|
2146 | @end example
|
---|
2147 |
|
---|
2148 | You may rest assured that none of the references to
|
---|
2149 | @file{/~his-girls-homepage/} or @file{/~luzer/all-my-mpegs/} will be
|
---|
2150 | followed. Only the archive you are interested in will be downloaded.
|
---|
2151 | Essentially, @samp{--no-parent} is similar to
|
---|
2152 | @samp{-I/~luzer/my-archive}, only it handles redirections in a more
|
---|
2153 | intelligent fashion.
|
---|
2154 | @end table
|
---|
2155 |
|
---|
2156 | @node Relative Links
|
---|
2157 | @section Relative Links
|
---|
2158 | @cindex relative links
|
---|
2159 |
|
---|
2160 | When @samp{-L} is turned on, only the relative links are ever followed.
|
---|
2161 | Relative links are here defined those that do not refer to the web
|
---|
2162 | server root. For example, these links are relative:
|
---|
2163 |
|
---|
2164 | @example
|
---|
2165 | <a href="foo.gif">
|
---|
2166 | <a href="foo/bar.gif">
|
---|
2167 | <a href="../foo/bar.gif">
|
---|
2168 | @end example
|
---|
2169 |
|
---|
2170 | These links are not relative:
|
---|
2171 |
|
---|
2172 | @example
|
---|
2173 | <a href="/foo.gif">
|
---|
2174 | <a href="/foo/bar.gif">
|
---|
2175 | <a href="http://www.server.com/foo/bar.gif">
|
---|
2176 | @end example
|
---|
2177 |
|
---|
2178 | Using this option guarantees that recursive retrieval will not span
|
---|
2179 | hosts, even without @samp{-H}. In simple cases it also allows downloads
|
---|
2180 | to ``just work'' without having to convert links.
|
---|
2181 |
|
---|
2182 | This option is probably not very useful and might be removed in a future
|
---|
2183 | release.
|
---|
2184 |
|
---|
2185 | @node FTP Links
|
---|
2186 | @section Following FTP Links
|
---|
2187 | @cindex following ftp links
|
---|
2188 |
|
---|
2189 | The rules for @sc{ftp} are somewhat specific, as it is necessary for
|
---|
2190 | them to be. @sc{ftp} links in @sc{html} documents are often included
|
---|
2191 | for purposes of reference, and it is often inconvenient to download them
|
---|
2192 | by default.
|
---|
2193 |
|
---|
2194 | To have @sc{ftp} links followed from @sc{html} documents, you need to
|
---|
2195 | specify the @samp{--follow-ftp} option. Having done that, @sc{ftp}
|
---|
2196 | links will span hosts regardless of @samp{-H} setting. This is logical,
|
---|
2197 | as @sc{ftp} links rarely point to the same host where the @sc{http}
|
---|
2198 | server resides. For similar reasons, the @samp{-L} options has no
|
---|
2199 | effect on such downloads. On the other hand, domain acceptance
|
---|
2200 | (@samp{-D}) and suffix rules (@samp{-A} and @samp{-R}) apply normally.
|
---|
2201 |
|
---|
2202 | Also note that followed links to @sc{ftp} directories will not be
|
---|
2203 | retrieved recursively further.
|
---|
2204 |
|
---|
2205 | @node Time-Stamping
|
---|
2206 | @chapter Time-Stamping
|
---|
2207 | @cindex time-stamping
|
---|
2208 | @cindex timestamping
|
---|
2209 | @cindex updating the archives
|
---|
2210 | @cindex incremental updating
|
---|
2211 |
|
---|
2212 | One of the most important aspects of mirroring information from the
|
---|
2213 | Internet is updating your archives.
|
---|
2214 |
|
---|
2215 | Downloading the whole archive again and again, just to replace a few
|
---|
2216 | changed files is expensive, both in terms of wasted bandwidth and money,
|
---|
2217 | and the time to do the update. This is why all the mirroring tools
|
---|
2218 | offer the option of incremental updating.
|
---|
2219 |
|
---|
2220 | Such an updating mechanism means that the remote server is scanned in
|
---|
2221 | search of @dfn{new} files. Only those new files will be downloaded in
|
---|
2222 | the place of the old ones.
|
---|
2223 |
|
---|
2224 | A file is considered new if one of these two conditions are met:
|
---|
2225 |
|
---|
2226 | @enumerate
|
---|
2227 | @item
|
---|
2228 | A file of that name does not already exist locally.
|
---|
2229 |
|
---|
2230 | @item
|
---|
2231 | A file of that name does exist, but the remote file was modified more
|
---|
2232 | recently than the local file.
|
---|
2233 | @end enumerate
|
---|
2234 |
|
---|
2235 | To implement this, the program needs to be aware of the time of last
|
---|
2236 | modification of both local and remote files. We call this information the
|
---|
2237 | @dfn{time-stamp} of a file.
|
---|
2238 |
|
---|
2239 | The time-stamping in GNU Wget is turned on using @samp{--timestamping}
|
---|
2240 | (@samp{-N}) option, or through @code{timestamping = on} directive in
|
---|
2241 | @file{.wgetrc}. With this option, for each file it intends to download,
|
---|
2242 | Wget will check whether a local file of the same name exists. If it
|
---|
2243 | does, and the remote file is older, Wget will not download it.
|
---|
2244 |
|
---|
2245 | If the local file does not exist, or the sizes of the files do not
|
---|
2246 | match, Wget will download the remote file no matter what the time-stamps
|
---|
2247 | say.
|
---|
2248 |
|
---|
2249 | @menu
|
---|
2250 | * Time-Stamping Usage::
|
---|
2251 | * HTTP Time-Stamping Internals::
|
---|
2252 | * FTP Time-Stamping Internals::
|
---|
2253 | @end menu
|
---|
2254 |
|
---|
2255 | @node Time-Stamping Usage
|
---|
2256 | @section Time-Stamping Usage
|
---|
2257 | @cindex time-stamping usage
|
---|
2258 | @cindex usage, time-stamping
|
---|
2259 |
|
---|
2260 | The usage of time-stamping is simple. Say you would like to download a
|
---|
2261 | file so that it keeps its date of modification.
|
---|
2262 |
|
---|
2263 | @example
|
---|
2264 | wget -S http://www.gnu.ai.mit.edu/
|
---|
2265 | @end example
|
---|
2266 |
|
---|
2267 | A simple @code{ls -l} shows that the time stamp on the local file equals
|
---|
2268 | the state of the @code{Last-Modified} header, as returned by the server.
|
---|
2269 | As you can see, the time-stamping info is preserved locally, even
|
---|
2270 | without @samp{-N} (at least for @sc{http}).
|
---|
2271 |
|
---|
2272 | Several days later, you would like Wget to check if the remote file has
|
---|
2273 | changed, and download it if it has.
|
---|
2274 |
|
---|
2275 | @example
|
---|
2276 | wget -N http://www.gnu.ai.mit.edu/
|
---|
2277 | @end example
|
---|
2278 |
|
---|
2279 | Wget will ask the server for the last-modified date. If the local file
|
---|
2280 | has the same timestamp as the server, or a newer one, the remote file
|
---|
2281 | will not be re-fetched. However, if the remote file is more recent,
|
---|
2282 | Wget will proceed to fetch it.
|
---|
2283 |
|
---|
2284 | The same goes for @sc{ftp}. For example:
|
---|
2285 |
|
---|
2286 | @example
|
---|
2287 | wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"
|
---|
2288 | @end example
|
---|
2289 |
|
---|
2290 | (The quotes around that URL are to prevent the shell from trying to
|
---|
2291 | interpret the @samp{*}.)
|
---|
2292 |
|
---|
2293 | After download, a local directory listing will show that the timestamps
|
---|
2294 | match those on the remote server. Reissuing the command with @samp{-N}
|
---|
2295 | will make Wget re-fetch @emph{only} the files that have been modified
|
---|
2296 | since the last download.
|
---|
2297 |
|
---|
2298 | If you wished to mirror the GNU archive every week, you would use a
|
---|
2299 | command like the following, weekly:
|
---|
2300 |
|
---|
2301 | @example
|
---|
2302 | wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/
|
---|
2303 | @end example
|
---|
2304 |
|
---|
2305 | Note that time-stamping will only work for files for which the server
|
---|
2306 | gives a timestamp. For @sc{http}, this depends on getting a
|
---|
2307 | @code{Last-Modified} header. For @sc{ftp}, this depends on getting a
|
---|
2308 | directory listing with dates in a format that Wget can parse
|
---|
2309 | (@pxref{FTP Time-Stamping Internals}).
|
---|
2310 |
|
---|
2311 | @node HTTP Time-Stamping Internals
|
---|
2312 | @section HTTP Time-Stamping Internals
|
---|
2313 | @cindex http time-stamping
|
---|
2314 |
|
---|
2315 | Time-stamping in @sc{http} is implemented by checking of the
|
---|
2316 | @code{Last-Modified} header. If you wish to retrieve the file
|
---|
2317 | @file{foo.html} through @sc{http}, Wget will check whether
|
---|
2318 | @file{foo.html} exists locally. If it doesn't, @file{foo.html} will be
|
---|
2319 | retrieved unconditionally.
|
---|
2320 |
|
---|
2321 | If the file does exist locally, Wget will first check its local
|
---|
2322 | time-stamp (similar to the way @code{ls -l} checks it), and then send a
|
---|
2323 | @code{HEAD} request to the remote server, demanding the information on
|
---|
2324 | the remote file.
|
---|
2325 |
|
---|
2326 | The @code{Last-Modified} header is examined to find which file was
|
---|
2327 | modified more recently (which makes it ``newer''). If the remote file
|
---|
2328 | is newer, it will be downloaded; if it is older, Wget will give
|
---|
2329 | up.@footnote{As an additional check, Wget will look at the
|
---|
2330 | @code{Content-Length} header, and compare the sizes; if they are not the
|
---|
2331 | same, the remote file will be downloaded no matter what the time-stamp
|
---|
2332 | says.}
|
---|
2333 |
|
---|
2334 | When @samp{--backup-converted} (@samp{-K}) is specified in conjunction
|
---|
2335 | with @samp{-N}, server file @samp{@var{X}} is compared to local file
|
---|
2336 | @samp{@var{X}.orig}, if extant, rather than being compared to local file
|
---|
2337 | @samp{@var{X}}, which will always differ if it's been converted by
|
---|
2338 | @samp{--convert-links} (@samp{-k}).
|
---|
2339 |
|
---|
2340 | Arguably, @sc{http} time-stamping should be implemented using the
|
---|
2341 | @code{If-Modified-Since} request.
|
---|
2342 |
|
---|
2343 | @node FTP Time-Stamping Internals
|
---|
2344 | @section FTP Time-Stamping Internals
|
---|
2345 | @cindex ftp time-stamping
|
---|
2346 |
|
---|
2347 | In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only
|
---|
2348 | @sc{ftp} has no headers---time-stamps must be ferreted out of directory
|
---|
2349 | listings.
|
---|
2350 |
|
---|
2351 | If an @sc{ftp} download is recursive or uses globbing, Wget will use the
|
---|
2352 | @sc{ftp} @code{LIST} command to get a file listing for the directory
|
---|
2353 | containing the desired file(s). It will try to analyze the listing,
|
---|
2354 | treating it like Unix @code{ls -l} output, extracting the time-stamps.
|
---|
2355 | The rest is exactly the same as for @sc{http}. Note that when
|
---|
2356 | retrieving individual files from an @sc{ftp} server without using
|
---|
2357 | globbing or recursion, listing files will not be downloaded (and thus
|
---|
2358 | files will not be time-stamped) unless @samp{-N} is specified.
|
---|
2359 |
|
---|
2360 | Assumption that every directory listing is a Unix-style listing may
|
---|
2361 | sound extremely constraining, but in practice it is not, as many
|
---|
2362 | non-Unix @sc{ftp} servers use the Unixoid listing format because most
|
---|
2363 | (all?) of the clients understand it. Bear in mind that @sc{rfc959}
|
---|
2364 | defines no standard way to get a file list, let alone the time-stamps.
|
---|
2365 | We can only hope that a future standard will define this.
|
---|
2366 |
|
---|
2367 | Another non-standard solution includes the use of @code{MDTM} command
|
---|
2368 | that is supported by some @sc{ftp} servers (including the popular
|
---|
2369 | @code{wu-ftpd}), which returns the exact time of the specified file.
|
---|
2370 | Wget may support this command in the future.
|
---|
2371 |
|
---|
2372 | @node Startup File
|
---|
2373 | @chapter Startup File
|
---|
2374 | @cindex startup file
|
---|
2375 | @cindex wgetrc
|
---|
2376 | @cindex .wgetrc
|
---|
2377 | @cindex startup
|
---|
2378 | @cindex .netrc
|
---|
2379 |
|
---|
2380 | Once you know how to change default settings of Wget through command
|
---|
2381 | line arguments, you may wish to make some of those settings permanent.
|
---|
2382 | You can do that in a convenient way by creating the Wget startup
|
---|
2383 | file---@file{.wgetrc}.
|
---|
2384 |
|
---|
2385 | Besides @file{.wgetrc} is the ``main'' initialization file, it is
|
---|
2386 | convenient to have a special facility for storing passwords. Thus Wget
|
---|
2387 | reads and interprets the contents of @file{$HOME/.netrc}, if it finds
|
---|
2388 | it. You can find @file{.netrc} format in your system manuals.
|
---|
2389 |
|
---|
2390 | Wget reads @file{.wgetrc} upon startup, recognizing a limited set of
|
---|
2391 | commands.
|
---|
2392 |
|
---|
2393 | @menu
|
---|
2394 | * Wgetrc Location:: Location of various wgetrc files.
|
---|
2395 | * Wgetrc Syntax:: Syntax of wgetrc.
|
---|
2396 | * Wgetrc Commands:: List of available commands.
|
---|
2397 | * Sample Wgetrc:: A wgetrc example.
|
---|
2398 | @end menu
|
---|
2399 |
|
---|
2400 | @node Wgetrc Location
|
---|
2401 | @section Wgetrc Location
|
---|
2402 | @cindex wgetrc location
|
---|
2403 | @cindex location of wgetrc
|
---|
2404 |
|
---|
2405 | When initializing, Wget will look for a @dfn{global} startup file,
|
---|
2406 | @file{/usr/local/etc/wgetrc} by default (or some prefix other than
|
---|
2407 | @file{/usr/local}, if Wget was not installed there) and read commands
|
---|
2408 | from there, if it exists.
|
---|
2409 |
|
---|
2410 | Then it will look for the user's file. If the environmental variable
|
---|
2411 | @code{WGETRC} is set, Wget will try to load that file. Failing that, no
|
---|
2412 | further attempts will be made.
|
---|
2413 |
|
---|
2414 | If @code{WGETRC} is not set, Wget will try to load @file{$HOME/.wgetrc}.
|
---|
2415 |
|
---|
2416 | The fact that user's settings are loaded after the system-wide ones
|
---|
2417 | means that in case of collision user's wgetrc @emph{overrides} the
|
---|
2418 | system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default).
|
---|
2419 | Fascist admins, away!
|
---|
2420 |
|
---|
2421 | @node Wgetrc Syntax
|
---|
2422 | @section Wgetrc Syntax
|
---|
2423 | @cindex wgetrc syntax
|
---|
2424 | @cindex syntax of wgetrc
|
---|
2425 |
|
---|
2426 | The syntax of a wgetrc command is simple:
|
---|
2427 |
|
---|
2428 | @example
|
---|
2429 | variable = value
|
---|
2430 | @end example
|
---|
2431 |
|
---|
2432 | The @dfn{variable} will also be called @dfn{command}. Valid
|
---|
2433 | @dfn{values} are different for different commands.
|
---|
2434 |
|
---|
2435 | The commands are case-insensitive and underscore-insensitive. Thus
|
---|
2436 | @samp{DIr__PrefiX} is the same as @samp{dirprefix}. Empty lines, lines
|
---|
2437 | beginning with @samp{#} and lines containing white-space only are
|
---|
2438 | discarded.
|
---|
2439 |
|
---|
2440 | Commands that expect a comma-separated list will clear the list on an
|
---|
2441 | empty command. So, if you wish to reset the rejection list specified in
|
---|
2442 | global @file{wgetrc}, you can do it with:
|
---|
2443 |
|
---|
2444 | @example
|
---|
2445 | reject =
|
---|
2446 | @end example
|
---|
2447 |
|
---|
2448 | @node Wgetrc Commands
|
---|
2449 | @section Wgetrc Commands
|
---|
2450 | @cindex wgetrc commands
|
---|
2451 |
|
---|
2452 | The complete set of commands is listed below. Legal values are listed
|
---|
2453 | after the @samp{=}. Simple Boolean values can be set or unset using
|
---|
2454 | @samp{on} and @samp{off} or @samp{1} and @samp{0}. A fancier kind of
|
---|
2455 | Boolean allowed in some cases is the @dfn{lockable Boolean}, which may
|
---|
2456 | be set to @samp{on}, @samp{off}, @samp{always}, or @samp{never}. If an
|
---|
2457 | option is set to @samp{always} or @samp{never}, that value will be
|
---|
2458 | locked in for the duration of the Wget invocation---command-line options
|
---|
2459 | will not override.
|
---|
2460 |
|
---|
2461 | Some commands take pseudo-arbitrary values. @var{address} values can be
|
---|
2462 | hostnames or dotted-quad IP addresses. @var{n} can be any positive
|
---|
2463 | integer, or @samp{inf} for infinity, where appropriate. @var{string}
|
---|
2464 | values can be any non-empty string.
|
---|
2465 |
|
---|
2466 | Most of these commands have direct command-line equivalents. Also, any
|
---|
2467 | wgetrc command can be specified on the command line using the
|
---|
2468 | @samp{--execute} switch (@pxref{Basic Startup Options}.)
|
---|
2469 |
|
---|
2470 | @table @asis
|
---|
2471 | @item accept/reject = @var{string}
|
---|
2472 | Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}).
|
---|
2473 |
|
---|
2474 | @item add_hostdir = on/off
|
---|
2475 | Enable/disable host-prefixed file names. @samp{-nH} disables it.
|
---|
2476 |
|
---|
2477 | @item continue = on/off
|
---|
2478 | If set to on, force continuation of preexistent partially retrieved
|
---|
2479 | files. See @samp{-c} before setting it.
|
---|
2480 |
|
---|
2481 | @item background = on/off
|
---|
2482 | Enable/disable going to background---the same as @samp{-b} (which
|
---|
2483 | enables it).
|
---|
2484 |
|
---|
2485 | @item backup_converted = on/off
|
---|
2486 | Enable/disable saving pre-converted files with the suffix
|
---|
2487 | @samp{.orig}---the same as @samp{-K} (which enables it).
|
---|
2488 |
|
---|
2489 | @c @item backups = @var{number}
|
---|
2490 | @c #### Document me!
|
---|
2491 | @c
|
---|
2492 | @item base = @var{string}
|
---|
2493 | Consider relative @sc{url}s in @sc{url} input files forced to be
|
---|
2494 | interpreted as @sc{html} as being relative to @var{string}---the same as
|
---|
2495 | @samp{--base=@var{string}}.
|
---|
2496 |
|
---|
2497 | @item bind_address = @var{address}
|
---|
2498 | Bind to @var{address}, like the @samp{--bind-address=@var{address}}.
|
---|
2499 |
|
---|
2500 | @item ca_certificate = @var{file}
|
---|
2501 | Set the certificate authority bundle file to @var{file}. The same
|
---|
2502 | as @samp{--ca-certificate=@var{file}}.
|
---|
2503 |
|
---|
2504 | @item ca_directory = @var{directory}
|
---|
2505 | Set the directory used for certificate authorities. The same as
|
---|
2506 | @samp{--ca-directory=@var{directory}}.
|
---|
2507 |
|
---|
2508 | @item cache = on/off
|
---|
2509 | When set to off, disallow server-caching. See the @samp{--no-cache}
|
---|
2510 | option.
|
---|
2511 |
|
---|
2512 | @item certificate = @var{file}
|
---|
2513 | Set the client certificate file name to @var{file}. The same as
|
---|
2514 | @samp{--certificate=@var{file}}.
|
---|
2515 |
|
---|
2516 | @item certificate_type = @var{string}
|
---|
2517 | Specify the type of the client certificate, legal values being
|
---|
2518 | @samp{PEM} (the default) and @samp{DER} (aka ASN1). The same as
|
---|
2519 | @samp{--certificate-type=@var{string}}.
|
---|
2520 |
|
---|
2521 | @item check_certificate = on/off
|
---|
2522 | If this is set to off, the server certificate is not checked against
|
---|
2523 | the specified client authorities. The default is ``on''. The same as
|
---|
2524 | @samp{--check-certificate}.
|
---|
2525 |
|
---|
2526 | @item convert_links = on/off
|
---|
2527 | Convert non-relative links locally. The same as @samp{-k}.
|
---|
2528 |
|
---|
2529 | @item cookies = on/off
|
---|
2530 | When set to off, disallow cookies. See the @samp{--cookies} option.
|
---|
2531 |
|
---|
2532 | @item connect_timeout = @var{n}
|
---|
2533 | Set the connect timeout---the same as @samp{--connect-timeout}.
|
---|
2534 |
|
---|
2535 | @item cut_dirs = @var{n}
|
---|
2536 | Ignore @var{n} remote directory components. Equivalent to
|
---|
2537 | @samp{--cut-dirs=@var{n}}.
|
---|
2538 |
|
---|
2539 | @item debug = on/off
|
---|
2540 | Debug mode, same as @samp{-d}.
|
---|
2541 |
|
---|
2542 | @item delete_after = on/off
|
---|
2543 | Delete after download---the same as @samp{--delete-after}.
|
---|
2544 |
|
---|
2545 | @item dir_prefix = @var{string}
|
---|
2546 | Top of directory tree---the same as @samp{-P @var{string}}.
|
---|
2547 |
|
---|
2548 | @item dirstruct = on/off
|
---|
2549 | Turning dirstruct on or off---the same as @samp{-x} or @samp{-nd},
|
---|
2550 | respectively.
|
---|
2551 |
|
---|
2552 | @item dns_cache = on/off
|
---|
2553 | Turn DNS caching on/off. Since DNS caching is on by default, this
|
---|
2554 | option is normally used to turn it off and is equivalent to
|
---|
2555 | @samp{--no-dns-cache}.
|
---|
2556 |
|
---|
2557 | @item dns_timeout = @var{n}
|
---|
2558 | Set the DNS timeout---the same as @samp{--dns-timeout}.
|
---|
2559 |
|
---|
2560 | @item domains = @var{string}
|
---|
2561 | Same as @samp{-D} (@pxref{Spanning Hosts}).
|
---|
2562 |
|
---|
2563 | @item dot_bytes = @var{n}
|
---|
2564 | Specify the number of bytes ``contained'' in a dot, as seen throughout
|
---|
2565 | the retrieval (1024 by default). You can postfix the value with
|
---|
2566 | @samp{k} or @samp{m}, representing kilobytes and megabytes,
|
---|
2567 | respectively. With dot settings you can tailor the dot retrieval to
|
---|
2568 | suit your needs, or you can use the predefined @dfn{styles}
|
---|
2569 | (@pxref{Download Options}).
|
---|
2570 |
|
---|
2571 | @item dots_in_line = @var{n}
|
---|
2572 | Specify the number of dots that will be printed in each line throughout
|
---|
2573 | the retrieval (50 by default).
|
---|
2574 |
|
---|
2575 | @item dot_spacing = @var{n}
|
---|
2576 | Specify the number of dots in a single cluster (10 by default).
|
---|
2577 |
|
---|
2578 | @item egd_file = @var{file}
|
---|
2579 | Use @var{string} as the EGD socket file name. The same as
|
---|
2580 | @samp{--egd-file=@var{file}}.
|
---|
2581 |
|
---|
2582 | @item exclude_directories = @var{string}
|
---|
2583 | Specify a comma-separated list of directories you wish to exclude from
|
---|
2584 | download---the same as @samp{-X @var{string}} (@pxref{Directory-Based
|
---|
2585 | Limits}).
|
---|
2586 |
|
---|
2587 | @item exclude_domains = @var{string}
|
---|
2588 | Same as @samp{--exclude-domains=@var{string}} (@pxref{Spanning
|
---|
2589 | Hosts}).
|
---|
2590 |
|
---|
2591 | @item follow_ftp = on/off
|
---|
2592 | Follow @sc{ftp} links from @sc{html} documents---the same as
|
---|
2593 | @samp{--follow-ftp}.
|
---|
2594 |
|
---|
2595 | @item follow_tags = @var{string}
|
---|
2596 | Only follow certain @sc{html} tags when doing a recursive retrieval,
|
---|
2597 | just like @samp{--follow-tags=@var{string}}.
|
---|
2598 |
|
---|
2599 | @item force_html = on/off
|
---|
2600 | If set to on, force the input filename to be regarded as an @sc{html}
|
---|
2601 | document---the same as @samp{-F}.
|
---|
2602 |
|
---|
2603 | @item ftp_password = @var{string}
|
---|
2604 | Set your @sc{ftp} password to @var{string}. Without this setting, the
|
---|
2605 | password defaults to @samp{-wget@@}, which is a useful default for
|
---|
2606 | anonymous @sc{ftp} access.
|
---|
2607 |
|
---|
2608 | This command used to be named @code{passwd} prior to Wget 1.10.
|
---|
2609 |
|
---|
2610 | @item ftp_proxy = @var{string}
|
---|
2611 | Use @var{string} as @sc{ftp} proxy, instead of the one specified in
|
---|
2612 | environment.
|
---|
2613 |
|
---|
2614 | @item ftp_user = @var{string}
|
---|
2615 | Set @sc{ftp} user to @var{string}.
|
---|
2616 |
|
---|
2617 | This command used to be named @code{login} prior to Wget 1.10.
|
---|
2618 |
|
---|
2619 | @item glob = on/off
|
---|
2620 | Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}.
|
---|
2621 |
|
---|
2622 | @item header = @var{string}
|
---|
2623 | Define a header for HTTP doewnloads, like using
|
---|
2624 | @samp{--header=@var{string}}.
|
---|
2625 |
|
---|
2626 | @item html_extension = on/off
|
---|
2627 | Add a @samp{.html} extension to @samp{text/html} or
|
---|
2628 | @samp{application/xhtml+xml} files without it, like @samp{-E}.
|
---|
2629 |
|
---|
2630 | @item http_keep_alive = on/off
|
---|
2631 | Turn the keep-alive feature on or off (defaults to on). Turning it
|
---|
2632 | off is equivalent to @samp{--no-http-keep-alive}.
|
---|
2633 |
|
---|
2634 | @item http_password = @var{string}
|
---|
2635 | Set @sc{http} password, equivalent to
|
---|
2636 | @samp{--http-password=@var{string}}.
|
---|
2637 |
|
---|
2638 | @item http_proxy = @var{string}
|
---|
2639 | Use @var{string} as @sc{http} proxy, instead of the one specified in
|
---|
2640 | environment.
|
---|
2641 |
|
---|
2642 | @item http_user = @var{string}
|
---|
2643 | Set @sc{http} user to @var{string}, equivalent to
|
---|
2644 | @samp{--http-user=@var{string}}.
|
---|
2645 |
|
---|
2646 | @item ignore_length = on/off
|
---|
2647 | When set to on, ignore @code{Content-Length} header; the same as
|
---|
2648 | @samp{--ignore-length}.
|
---|
2649 |
|
---|
2650 | @item ignore_tags = @var{string}
|
---|
2651 | Ignore certain @sc{html} tags when doing a recursive retrieval, like
|
---|
2652 | @samp{--ignore-tags=@var{string}}.
|
---|
2653 |
|
---|
2654 | @item include_directories = @var{string}
|
---|
2655 | Specify a comma-separated list of directories you wish to follow when
|
---|
2656 | downloading---the same as @samp{-I @var{string}}.
|
---|
2657 |
|
---|
2658 | @item inet4_only = on/off
|
---|
2659 | Force connecting to IPv4 addresses, off by default. You can put this
|
---|
2660 | in the global init file to disable Wget's attempts to resolve and
|
---|
2661 | connect to IPv6 hosts. Available only if Wget was compiled with IPv6
|
---|
2662 | support. The same as @samp{--inet4-only} or @samp{-4}.
|
---|
2663 |
|
---|
2664 | @item inet6_only = on/off
|
---|
2665 | Force connecting to IPv6 addresses, off by default. Available only if
|
---|
2666 | Wget was compiled with IPv6 support. The same as @samp{--inet6-only}
|
---|
2667 | or @samp{-6}.
|
---|
2668 |
|
---|
2669 | @item input = @var{file}
|
---|
2670 | Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}.
|
---|
2671 |
|
---|
2672 | @item limit_rate = @var{rate}
|
---|
2673 | Limit the download speed to no more than @var{rate} bytes per second.
|
---|
2674 | The same as @samp{--limit-rate=@var{rate}}.
|
---|
2675 |
|
---|
2676 | @item load_cookies = @var{file}
|
---|
2677 | Load cookies from @var{file}. See @samp{--load-cookies @var{file}}.
|
---|
2678 |
|
---|
2679 | @item logfile = @var{file}
|
---|
2680 | Set logfile to @var{file}, the same as @samp{-o @var{file}}.
|
---|
2681 |
|
---|
2682 | @item mirror = on/off
|
---|
2683 | Turn mirroring on/off. The same as @samp{-m}.
|
---|
2684 |
|
---|
2685 | @item netrc = on/off
|
---|
2686 | Turn reading netrc on or off.
|
---|
2687 |
|
---|
2688 | @item noclobber = on/off
|
---|
2689 | Same as @samp{-nc}.
|
---|
2690 |
|
---|
2691 | @item no_parent = on/off
|
---|
2692 | Disallow retrieving outside the directory hierarchy, like
|
---|
2693 | @samp{--no-parent} (@pxref{Directory-Based Limits}).
|
---|
2694 |
|
---|
2695 | @item no_proxy = @var{string}
|
---|
2696 | Use @var{string} as the comma-separated list of domains to avoid in
|
---|
2697 | proxy loading, instead of the one specified in environment.
|
---|
2698 |
|
---|
2699 | @item output_document = @var{file}
|
---|
2700 | Set the output filename---the same as @samp{-O @var{file}}.
|
---|
2701 |
|
---|
2702 | @item page_requisites = on/off
|
---|
2703 | Download all ancillary documents necessary for a single @sc{html} page to
|
---|
2704 | display properly---the same as @samp{-p}.
|
---|
2705 |
|
---|
2706 | @item passive_ftp = on/off/always/never
|
---|
2707 | Change setting of passive @sc{ftp}, equivalent to the
|
---|
2708 | @samp{--passive-ftp} option. Some scripts and @samp{.pm} (Perl
|
---|
2709 | module) files download files using @samp{wget --passive-ftp}. If your
|
---|
2710 | firewall does not allow this, you can set @samp{passive_ftp = never}
|
---|
2711 | to override the command-line.
|
---|
2712 |
|
---|
2713 | @itemx password = @var{string}
|
---|
2714 | Specify password @var{string} for both @sc{ftp} and @sc{http} file retrieval.
|
---|
2715 | This command can be overridden using the @samp{ftp_password} and
|
---|
2716 | @samp{http_password} command for @sc{ftp} and @sc{http} respectively.
|
---|
2717 |
|
---|
2718 | @item post_data = @var{string}
|
---|
2719 | Use POST as the method for all HTTP requests and send @var{string} in
|
---|
2720 | the request body. The same as @samp{--post-data=@var{string}}.
|
---|
2721 |
|
---|
2722 | @item post_file = @var{file}
|
---|
2723 | Use POST as the method for all HTTP requests and send the contents of
|
---|
2724 | @var{file} in the request body. The same as
|
---|
2725 | @samp{--post-file=@var{file}}.
|
---|
2726 |
|
---|
2727 | @item prefer_family = IPv4/IPv6/none
|
---|
2728 | When given a choice of several addresses, connect to the addresses
|
---|
2729 | with specified address family first. IPv4 addresses are preferred by
|
---|
2730 | default. The same as @samp{--prefer-family}, which see for a detailed
|
---|
2731 | discussion of why this is useful.
|
---|
2732 |
|
---|
2733 | @item private_key = @var{file}
|
---|
2734 | Set the private key file to @var{file}. The same as
|
---|
2735 | @samp{--private-key=@var{file}}.
|
---|
2736 |
|
---|
2737 | @item private_key_type = @var{string}
|
---|
2738 | Specify the type of the private key, legal values being @samp{PEM}
|
---|
2739 | (the default) and @samp{DER} (aka ASN1). The same as
|
---|
2740 | @samp{--private-type=@var{string}}.
|
---|
2741 |
|
---|
2742 | @item progress = @var{string}
|
---|
2743 | Set the type of the progress indicator. Legal types are @samp{dot}
|
---|
2744 | and @samp{bar}. Equivalent to @samp{--progress=@var{string}}.
|
---|
2745 |
|
---|
2746 | @item protocol_directories = on/off
|
---|
2747 | When set, use the protocol name as a directory component of local file
|
---|
2748 | names. The same as @samp{--protocol-directories}.
|
---|
2749 |
|
---|
2750 | @item proxy_user = @var{string}
|
---|
2751 | Set proxy authentication user name to @var{string}, like
|
---|
2752 | @samp{--proxy-user=@var{string}}.
|
---|
2753 |
|
---|
2754 | @item proxy_password = @var{string}
|
---|
2755 | Set proxy authentication password to @var{string}, like
|
---|
2756 | @samp{--proxy-password=@var{string}}.
|
---|
2757 |
|
---|
2758 | @item quiet = on/off
|
---|
2759 | Quiet mode---the same as @samp{-q}.
|
---|
2760 |
|
---|
2761 | @item quota = @var{quota}
|
---|
2762 | Specify the download quota, which is useful to put in the global
|
---|
2763 | @file{wgetrc}. When download quota is specified, Wget will stop
|
---|
2764 | retrieving after the download sum has become greater than quota. The
|
---|
2765 | quota can be specified in bytes (default), kbytes @samp{k} appended) or
|
---|
2766 | mbytes (@samp{m} appended). Thus @samp{quota = 5m} will set the quota
|
---|
2767 | to 5 megabytes. Note that the user's startup file overrides system
|
---|
2768 | settings.
|
---|
2769 |
|
---|
2770 | @item random_file = @var{file}
|
---|
2771 | Use @var{file} as a source of randomness on systems lacking
|
---|
2772 | @file{/dev/random}.
|
---|
2773 |
|
---|
2774 | @item read_timeout = @var{n}
|
---|
2775 | Set the read (and write) timeout---the same as
|
---|
2776 | @samp{--read-timeout=@var{n}}.
|
---|
2777 |
|
---|
2778 | @item reclevel = @var{n}
|
---|
2779 | Recursion level (depth)---the same as @samp{-l @var{n}}.
|
---|
2780 |
|
---|
2781 | @item recursive = on/off
|
---|
2782 | Recursive on/off---the same as @samp{-r}.
|
---|
2783 |
|
---|
2784 | @item referer = @var{string}
|
---|
2785 | Set HTTP @samp{Referer:} header just like
|
---|
2786 | @samp{--referer=@var{string}}. (Note it was the folks who wrote the
|
---|
2787 | @sc{http} spec who got the spelling of ``referrer'' wrong.)
|
---|
2788 |
|
---|
2789 | @item relative_only = on/off
|
---|
2790 | Follow only relative links---the same as @samp{-L} (@pxref{Relative
|
---|
2791 | Links}).
|
---|
2792 |
|
---|
2793 | @item remove_listing = on/off
|
---|
2794 | If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it
|
---|
2795 | to off is the same as @samp{--no-remove-listing}.
|
---|
2796 |
|
---|
2797 | @item restrict_file_names = unix/windows
|
---|
2798 | Restrict the file names generated by Wget from URLs. See
|
---|
2799 | @samp{--restrict-file-names} for a more detailed description.
|
---|
2800 |
|
---|
2801 | @item retr_symlinks = on/off
|
---|
2802 | When set to on, retrieve symbolic links as if they were plain files; the
|
---|
2803 | same as @samp{--retr-symlinks}.
|
---|
2804 |
|
---|
2805 | @item retry_connrefused = on/off
|
---|
2806 | When set to on, consider ``connection refused'' a transient
|
---|
2807 | error---the same as @samp{--retry-connrefused}.
|
---|
2808 |
|
---|
2809 | @item robots = on/off
|
---|
2810 | Specify whether the norobots convention is respected by Wget, ``on'' by
|
---|
2811 | default. This switch controls both the @file{/robots.txt} and the
|
---|
2812 | @samp{nofollow} aspect of the spec. @xref{Robot Exclusion}, for more
|
---|
2813 | details about this. Be sure you know what you are doing before turning
|
---|
2814 | this off.
|
---|
2815 |
|
---|
2816 | @item save_cookies = @var{file}
|
---|
2817 | Save cookies to @var{file}. The same as @samp{--save-cookies
|
---|
2818 | @var{file}}.
|
---|
2819 |
|
---|
2820 | @item secure_protocol = @var{string}
|
---|
2821 | Choose the secure protocol to be used. Legal values are @samp{auto}
|
---|
2822 | (the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same
|
---|
2823 | as @samp{--secure-protocol=@var{string}}.
|
---|
2824 |
|
---|
2825 | @item server_response = on/off
|
---|
2826 | Choose whether or not to print the @sc{http} and @sc{ftp} server
|
---|
2827 | responses---the same as @samp{-S}.
|
---|
2828 |
|
---|
2829 | @item span_hosts = on/off
|
---|
2830 | Same as @samp{-H}.
|
---|
2831 |
|
---|
2832 | @item strict_comments = on/off
|
---|
2833 | Same as @samp{--strict-comments}.
|
---|
2834 |
|
---|
2835 | @item timeout = @var{n}
|
---|
2836 | Set all applicable timeout values to @var{n}, the same as @samp{-T
|
---|
2837 | @var{n}}.
|
---|
2838 |
|
---|
2839 | @item timestamping = on/off
|
---|
2840 | Turn timestamping on/off. The same as @samp{-N} (@pxref{Time-Stamping}).
|
---|
2841 |
|
---|
2842 | @item tries = @var{n}
|
---|
2843 | Set number of retries per @sc{url}---the same as @samp{-t @var{n}}.
|
---|
2844 |
|
---|
2845 | @item use_proxy = on/off
|
---|
2846 | When set to off, don't use proxy even when proxy-related environment
|
---|
2847 | variables are set. In that case it is the same as using
|
---|
2848 | @samp{--no-proxy}.
|
---|
2849 |
|
---|
2850 | @item user = @var{string}
|
---|
2851 | Specify username @var{string} for both @sc{ftp} and @sc{http} file retrieval.
|
---|
2852 | This command can be overridden using the @samp{ftp_user} and
|
---|
2853 | @samp{http_user} command for @sc{ftp} and @sc{http} respectively.
|
---|
2854 |
|
---|
2855 | @item verbose = on/off
|
---|
2856 | Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.
|
---|
2857 |
|
---|
2858 | @item wait = @var{n}
|
---|
2859 | Wait @var{n} seconds between retrievals---the same as @samp{-w
|
---|
2860 | @var{n}}.
|
---|
2861 |
|
---|
2862 | @item waitretry = @var{n}
|
---|
2863 | Wait up to @var{n} seconds between retries of failed retrievals
|
---|
2864 | only---the same as @samp{--waitretry=@var{n}}. Note that this is
|
---|
2865 | turned on by default in the global @file{wgetrc}.
|
---|
2866 |
|
---|
2867 | @item randomwait = on/off
|
---|
2868 | Turn random between-request wait times on or off. The same as
|
---|
2869 | @samp{--random-wait}.
|
---|
2870 | @end table
|
---|
2871 |
|
---|
2872 | @node Sample Wgetrc
|
---|
2873 | @section Sample Wgetrc
|
---|
2874 | @cindex sample wgetrc
|
---|
2875 |
|
---|
2876 | This is the sample initialization file, as given in the distribution.
|
---|
2877 | It is divided in two section---one for global usage (suitable for global
|
---|
2878 | startup file), and one for local usage (suitable for
|
---|
2879 | @file{$HOME/.wgetrc}). Be careful about the things you change.
|
---|
2880 |
|
---|
2881 | Note that almost all the lines are commented out. For a command to have
|
---|
2882 | any effect, you must remove the @samp{#} character at the beginning of
|
---|
2883 | its line.
|
---|
2884 |
|
---|
2885 | @example
|
---|
2886 | @include sample.wgetrc.munged_for_texi_inclusion
|
---|
2887 | @end example
|
---|
2888 |
|
---|
2889 | @node Examples
|
---|
2890 | @chapter Examples
|
---|
2891 | @cindex examples
|
---|
2892 |
|
---|
2893 | @c man begin EXAMPLES
|
---|
2894 | The examples are divided into three sections loosely based on their
|
---|
2895 | complexity.
|
---|
2896 |
|
---|
2897 | @menu
|
---|
2898 | * Simple Usage:: Simple, basic usage of the program.
|
---|
2899 | * Advanced Usage:: Advanced tips.
|
---|
2900 | * Very Advanced Usage:: The hairy stuff.
|
---|
2901 | @end menu
|
---|
2902 |
|
---|
2903 | @node Simple Usage
|
---|
2904 | @section Simple Usage
|
---|
2905 |
|
---|
2906 | @itemize @bullet
|
---|
2907 | @item
|
---|
2908 | Say you want to download a @sc{url}. Just type:
|
---|
2909 |
|
---|
2910 | @example
|
---|
2911 | wget http://fly.srk.fer.hr/
|
---|
2912 | @end example
|
---|
2913 |
|
---|
2914 | @item
|
---|
2915 | But what will happen if the connection is slow, and the file is lengthy?
|
---|
2916 | The connection will probably fail before the whole file is retrieved,
|
---|
2917 | more than once. In this case, Wget will try getting the file until it
|
---|
2918 | either gets the whole of it, or exceeds the default number of retries
|
---|
2919 | (this being 20). It is easy to change the number of tries to 45, to
|
---|
2920 | insure that the whole file will arrive safely:
|
---|
2921 |
|
---|
2922 | @example
|
---|
2923 | wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
|
---|
2924 | @end example
|
---|
2925 |
|
---|
2926 | @item
|
---|
2927 | Now let's leave Wget to work in the background, and write its progress
|
---|
2928 | to log file @file{log}. It is tiring to type @samp{--tries}, so we
|
---|
2929 | shall use @samp{-t}.
|
---|
2930 |
|
---|
2931 | @example
|
---|
2932 | wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
|
---|
2933 | @end example
|
---|
2934 |
|
---|
2935 | The ampersand at the end of the line makes sure that Wget works in the
|
---|
2936 | background. To unlimit the number of retries, use @samp{-t inf}.
|
---|
2937 |
|
---|
2938 | @item
|
---|
2939 | The usage of @sc{ftp} is as simple. Wget will take care of login and
|
---|
2940 | password.
|
---|
2941 |
|
---|
2942 | @example
|
---|
2943 | wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
---|
2944 | @end example
|
---|
2945 |
|
---|
2946 | @item
|
---|
2947 | If you specify a directory, Wget will retrieve the directory listing,
|
---|
2948 | parse it and convert it to @sc{html}. Try:
|
---|
2949 |
|
---|
2950 | @example
|
---|
2951 | wget ftp://ftp.gnu.org/pub/gnu/
|
---|
2952 | links index.html
|
---|
2953 | @end example
|
---|
2954 | @end itemize
|
---|
2955 |
|
---|
2956 | @node Advanced Usage
|
---|
2957 | @section Advanced Usage
|
---|
2958 |
|
---|
2959 | @itemize @bullet
|
---|
2960 | @item
|
---|
2961 | You have a file that contains the URLs you want to download? Use the
|
---|
2962 | @samp{-i} switch:
|
---|
2963 |
|
---|
2964 | @example
|
---|
2965 | wget -i @var{file}
|
---|
2966 | @end example
|
---|
2967 |
|
---|
2968 | If you specify @samp{-} as file name, the @sc{url}s will be read from
|
---|
2969 | standard input.
|
---|
2970 |
|
---|
2971 | @item
|
---|
2972 | Create a five levels deep mirror image of the GNU web site, with the
|
---|
2973 | same directory structure the original has, with only one try per
|
---|
2974 | document, saving the log of the activities to @file{gnulog}:
|
---|
2975 |
|
---|
2976 | @example
|
---|
2977 | wget -r http://www.gnu.org/ -o gnulog
|
---|
2978 | @end example
|
---|
2979 |
|
---|
2980 | @item
|
---|
2981 | The same as the above, but convert the links in the @sc{html} files to
|
---|
2982 | point to local files, so you can view the documents off-line:
|
---|
2983 |
|
---|
2984 | @example
|
---|
2985 | wget --convert-links -r http://www.gnu.org/ -o gnulog
|
---|
2986 | @end example
|
---|
2987 |
|
---|
2988 | @item
|
---|
2989 | Retrieve only one @sc{html} page, but make sure that all the elements needed
|
---|
2990 | for the page to be displayed, such as inline images and external style
|
---|
2991 | sheets, are also downloaded. Also make sure the downloaded page
|
---|
2992 | references the downloaded links.
|
---|
2993 |
|
---|
2994 | @example
|
---|
2995 | wget -p --convert-links http://www.server.com/dir/page.html
|
---|
2996 | @end example
|
---|
2997 |
|
---|
2998 | The @sc{html} page will be saved to @file{www.server.com/dir/page.html}, and
|
---|
2999 | the images, stylesheets, etc., somewhere under @file{www.server.com/},
|
---|
3000 | depending on where they were on the remote server.
|
---|
3001 |
|
---|
3002 | @item
|
---|
3003 | The same as the above, but without the @file{www.server.com/} directory.
|
---|
3004 | In fact, I don't want to have all those random server directories
|
---|
3005 | anyway---just save @emph{all} those files under a @file{download/}
|
---|
3006 | subdirectory of the current directory.
|
---|
3007 |
|
---|
3008 | @example
|
---|
3009 | wget -p --convert-links -nH -nd -Pdownload \
|
---|
3010 | http://www.server.com/dir/page.html
|
---|
3011 | @end example
|
---|
3012 |
|
---|
3013 | @item
|
---|
3014 | Retrieve the index.html of @samp{www.lycos.com}, showing the original
|
---|
3015 | server headers:
|
---|
3016 |
|
---|
3017 | @example
|
---|
3018 | wget -S http://www.lycos.com/
|
---|
3019 | @end example
|
---|
3020 |
|
---|
3021 | @item
|
---|
3022 | Save the server headers with the file, perhaps for post-processing.
|
---|
3023 |
|
---|
3024 | @example
|
---|
3025 | wget --save-headers http://www.lycos.com/
|
---|
3026 | more index.html
|
---|
3027 | @end example
|
---|
3028 |
|
---|
3029 | @item
|
---|
3030 | Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them
|
---|
3031 | to @file{/tmp}.
|
---|
3032 |
|
---|
3033 | @example
|
---|
3034 | wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
|
---|
3035 | @end example
|
---|
3036 |
|
---|
3037 | @item
|
---|
3038 | You want to download all the @sc{gif}s from a directory on an @sc{http}
|
---|
3039 | server. You tried @samp{wget http://www.server.com/dir/*.gif}, but that
|
---|
3040 | didn't work because @sc{http} retrieval does not support globbing. In
|
---|
3041 | that case, use:
|
---|
3042 |
|
---|
3043 | @example
|
---|
3044 | wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
|
---|
3045 | @end example
|
---|
3046 |
|
---|
3047 | More verbose, but the effect is the same. @samp{-r -l1} means to
|
---|
3048 | retrieve recursively (@pxref{Recursive Download}), with maximum depth
|
---|
3049 | of 1. @samp{--no-parent} means that references to the parent directory
|
---|
3050 | are ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to
|
---|
3051 | download only the @sc{gif} files. @samp{-A "*.gif"} would have worked
|
---|
3052 | too.
|
---|
3053 |
|
---|
3054 | @item
|
---|
3055 | Suppose you were in the middle of downloading, when Wget was
|
---|
3056 | interrupted. Now you do not want to clobber the files already present.
|
---|
3057 | It would be:
|
---|
3058 |
|
---|
3059 | @example
|
---|
3060 | wget -nc -r http://www.gnu.org/
|
---|
3061 | @end example
|
---|
3062 |
|
---|
3063 | @item
|
---|
3064 | If you want to encode your own username and password to @sc{http} or
|
---|
3065 | @sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}).
|
---|
3066 |
|
---|
3067 | @example
|
---|
3068 | wget ftp://hniksic:mypassword@@unix.server.com/.emacs
|
---|
3069 | @end example
|
---|
3070 |
|
---|
3071 | Note, however, that this usage is not advisable on multi-user systems
|
---|
3072 | because it reveals your password to anyone who looks at the output of
|
---|
3073 | @code{ps}.
|
---|
3074 |
|
---|
3075 | @cindex redirecting output
|
---|
3076 | @item
|
---|
3077 | You would like the output documents to go to standard output instead of
|
---|
3078 | to files?
|
---|
3079 |
|
---|
3080 | @example
|
---|
3081 | wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
---|
3082 | @end example
|
---|
3083 |
|
---|
3084 | You can also combine the two options and make pipelines to retrieve the
|
---|
3085 | documents from remote hotlists:
|
---|
3086 |
|
---|
3087 | @example
|
---|
3088 | wget -O - http://cool.list.com/ | wget --force-html -i -
|
---|
3089 | @end example
|
---|
3090 | @end itemize
|
---|
3091 |
|
---|
3092 | @node Very Advanced Usage
|
---|
3093 | @section Very Advanced Usage
|
---|
3094 |
|
---|
3095 | @cindex mirroring
|
---|
3096 | @itemize @bullet
|
---|
3097 | @item
|
---|
3098 | If you wish Wget to keep a mirror of a page (or @sc{ftp}
|
---|
3099 | subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand
|
---|
3100 | for @samp{-r -l inf -N}. You can put Wget in the crontab file asking it
|
---|
3101 | to recheck a site each Sunday:
|
---|
3102 |
|
---|
3103 | @example
|
---|
3104 | crontab
|
---|
3105 | 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
|
---|
3106 | @end example
|
---|
3107 |
|
---|
3108 | @item
|
---|
3109 | In addition to the above, you want the links to be converted for local
|
---|
3110 | viewing. But, after having read this manual, you know that link
|
---|
3111 | conversion doesn't play well with timestamping, so you also want Wget to
|
---|
3112 | back up the original @sc{html} files before the conversion. Wget invocation
|
---|
3113 | would look like this:
|
---|
3114 |
|
---|
3115 | @example
|
---|
3116 | wget --mirror --convert-links --backup-converted \
|
---|
3117 | http://www.gnu.org/ -o /home/me/weeklog
|
---|
3118 | @end example
|
---|
3119 |
|
---|
3120 | @item
|
---|
3121 | But you've also noticed that local viewing doesn't work all that well
|
---|
3122 | when @sc{html} files are saved under extensions other than @samp{.html},
|
---|
3123 | perhaps because they were served as @file{index.cgi}. So you'd like
|
---|
3124 | Wget to rename all the files served with content-type @samp{text/html}
|
---|
3125 | or @samp{application/xhtml+xml} to @file{@var{name}.html}.
|
---|
3126 |
|
---|
3127 | @example
|
---|
3128 | wget --mirror --convert-links --backup-converted \
|
---|
3129 | --html-extension -o /home/me/weeklog \
|
---|
3130 | http://www.gnu.org/
|
---|
3131 | @end example
|
---|
3132 |
|
---|
3133 | Or, with less typing:
|
---|
3134 |
|
---|
3135 | @example
|
---|
3136 | wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
|
---|
3137 | @end example
|
---|
3138 | @end itemize
|
---|
3139 | @c man end
|
---|
3140 |
|
---|
3141 | @node Various
|
---|
3142 | @chapter Various
|
---|
3143 | @cindex various
|
---|
3144 |
|
---|
3145 | This chapter contains all the stuff that could not fit anywhere else.
|
---|
3146 |
|
---|
3147 | @menu
|
---|
3148 | * Proxies:: Support for proxy servers
|
---|
3149 | * Distribution:: Getting the latest version.
|
---|
3150 | * Mailing List:: Wget mailing list for announcements and discussion.
|
---|
3151 | * Reporting Bugs:: How and where to report bugs.
|
---|
3152 | * Portability:: The systems Wget works on.
|
---|
3153 | * Signals:: Signal-handling performed by Wget.
|
---|
3154 | @end menu
|
---|
3155 |
|
---|
3156 | @node Proxies
|
---|
3157 | @section Proxies
|
---|
3158 | @cindex proxies
|
---|
3159 |
|
---|
3160 | @dfn{Proxies} are special-purpose @sc{http} servers designed to transfer
|
---|
3161 | data from remote servers to local clients. One typical use of proxies
|
---|
3162 | is lightening network load for users behind a slow connection. This is
|
---|
3163 | achieved by channeling all @sc{http} and @sc{ftp} requests through the
|
---|
3164 | proxy which caches the transferred data. When a cached resource is
|
---|
3165 | requested again, proxy will return the data from cache. Another use for
|
---|
3166 | proxies is for companies that separate (for security reasons) their
|
---|
3167 | internal networks from the rest of Internet. In order to obtain
|
---|
3168 | information from the Web, their users connect and retrieve remote data
|
---|
3169 | using an authorized proxy.
|
---|
3170 |
|
---|
3171 | Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The
|
---|
3172 | standard way to specify proxy location, which Wget recognizes, is using
|
---|
3173 | the following environment variables:
|
---|
3174 |
|
---|
3175 | @table @code
|
---|
3176 | @item http_proxy
|
---|
3177 | This variable should contain the @sc{url} of the proxy for @sc{http}
|
---|
3178 | connections.
|
---|
3179 |
|
---|
3180 | @item ftp_proxy
|
---|
3181 | This variable should contain the @sc{url} of the proxy for @sc{ftp}
|
---|
3182 | connections. It is quite common that @sc{http_proxy} and @sc{ftp_proxy}
|
---|
3183 | are set to the same @sc{url}.
|
---|
3184 |
|
---|
3185 | @item no_proxy
|
---|
3186 | This variable should contain a comma-separated list of domain extensions
|
---|
3187 | proxy should @emph{not} be used for. For instance, if the value of
|
---|
3188 | @code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve
|
---|
3189 | documents from MIT.
|
---|
3190 | @end table
|
---|
3191 |
|
---|
3192 | In addition to the environment variables, proxy location and settings
|
---|
3193 | may be specified from within Wget itself.
|
---|
3194 |
|
---|
3195 | @table @samp
|
---|
3196 | @itemx --no-proxy
|
---|
3197 | @itemx proxy = on/off
|
---|
3198 | This option and the corresponding command may be used to suppress the
|
---|
3199 | use of proxy, even if the appropriate environment variables are set.
|
---|
3200 |
|
---|
3201 | @item http_proxy = @var{URL}
|
---|
3202 | @itemx ftp_proxy = @var{URL}
|
---|
3203 | @itemx no_proxy = @var{string}
|
---|
3204 | These startup file variables allow you to override the proxy settings
|
---|
3205 | specified by the environment.
|
---|
3206 | @end table
|
---|
3207 |
|
---|
3208 | Some proxy servers require authorization to enable you to use them. The
|
---|
3209 | authorization consists of @dfn{username} and @dfn{password}, which must
|
---|
3210 | be sent by Wget. As with @sc{http} authorization, several
|
---|
3211 | authentication schemes exist. For proxy authorization only the
|
---|
3212 | @code{Basic} authentication scheme is currently implemented.
|
---|
3213 |
|
---|
3214 | You may specify your username and password either through the proxy
|
---|
3215 | @sc{url} or through the command-line options. Assuming that the
|
---|
3216 | company's proxy is located at @samp{proxy.company.com} at port 8001, a
|
---|
3217 | proxy @sc{url} location containing authorization data might look like
|
---|
3218 | this:
|
---|
3219 |
|
---|
3220 | @example
|
---|
3221 | http://hniksic:mypassword@@proxy.company.com:8001/
|
---|
3222 | @end example
|
---|
3223 |
|
---|
3224 | Alternatively, you may use the @samp{proxy-user} and
|
---|
3225 | @samp{proxy-password} options, and the equivalent @file{.wgetrc}
|
---|
3226 | settings @code{proxy_user} and @code{proxy_password} to set the proxy
|
---|
3227 | username and password.
|
---|
3228 |
|
---|
3229 | @node Distribution
|
---|
3230 | @section Distribution
|
---|
3231 | @cindex latest version
|
---|
3232 |
|
---|
3233 | Like all GNU utilities, the latest version of Wget can be found at the
|
---|
3234 | master GNU archive site ftp.gnu.org, and its mirrors. For example,
|
---|
3235 | Wget @value{VERSION} can be found at
|
---|
3236 | @url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz}
|
---|
3237 |
|
---|
3238 | @node Mailing List
|
---|
3239 | @section Mailing List
|
---|
3240 | @cindex mailing list
|
---|
3241 | @cindex list
|
---|
3242 |
|
---|
3243 | There are several Wget-related mailing lists, all hosted by
|
---|
3244 | SunSITE.dk. The general discussion list is at
|
---|
3245 | @email{wget@@sunsite.dk}. It is the preferred place for bug reports
|
---|
3246 | and suggestions, as well as for discussion of development. You are
|
---|
3247 | invited to subscribe.
|
---|
3248 |
|
---|
3249 | To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk}
|
---|
3250 | and follow the instructions. Unsubscribe by mailing to
|
---|
3251 | @email{wget-unsubscribe@@sunsite.dk}. The mailing list is archived at
|
---|
3252 | @url{http://www.mail-archive.com/wget%40sunsite.dk/} and at
|
---|
3253 | @url{http://news.gmane.org/gmane.comp.web.wget.general}.
|
---|
3254 |
|
---|
3255 | The second mailing list is at @email{wget-patches@@sunsite.dk}, and is
|
---|
3256 | used to submit patches for review by Wget developers. A ``patch'' is
|
---|
3257 | a textual representation of change to source code, readable by both
|
---|
3258 | humans and programs. The file @file{PATCHES} that comes with Wget
|
---|
3259 | covers the creation and submitting of patches in detail. Please don't
|
---|
3260 | send general suggestions or bug reports to @samp{wget-patches}; use it
|
---|
3261 | only for patch submissions.
|
---|
3262 |
|
---|
3263 | To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk}
|
---|
3264 | and follow the instructions. Unsubscribe by mailing to
|
---|
3265 | @email{wget-unsubscribe@@sunsite.dk}. The mailing list is archived at
|
---|
3266 | @url{http://news.gmane.org/gmane.comp.web.wget.patches}.
|
---|
3267 |
|
---|
3268 | @node Reporting Bugs
|
---|
3269 | @section Reporting Bugs
|
---|
3270 | @cindex bugs
|
---|
3271 | @cindex reporting bugs
|
---|
3272 | @cindex bug reports
|
---|
3273 |
|
---|
3274 | @c man begin BUGS
|
---|
3275 | You are welcome to send bug reports about GNU Wget to
|
---|
3276 | @email{bug-wget@@gnu.org}.
|
---|
3277 |
|
---|
3278 | Before actually submitting a bug report, please try to follow a few
|
---|
3279 | simple guidelines.
|
---|
3280 |
|
---|
3281 | @enumerate
|
---|
3282 | @item
|
---|
3283 | Please try to ascertain that the behavior you see really is a bug. If
|
---|
3284 | Wget crashes, it's a bug. If Wget does not behave as documented,
|
---|
3285 | it's a bug. If things work strange, but you are not sure about the way
|
---|
3286 | they are supposed to work, it might well be a bug.
|
---|
3287 |
|
---|
3288 | @item
|
---|
3289 | Try to repeat the bug in as simple circumstances as possible. E.g. if
|
---|
3290 | Wget crashes while downloading @samp{wget -rl0 -kKE -t5 -Y0
|
---|
3291 | http://yoyodyne.com -o /tmp/log}, you should try to see if the crash is
|
---|
3292 | repeatable, and if will occur with a simpler set of options. You might
|
---|
3293 | even try to start the download at the page where the crash occurred to
|
---|
3294 | see if that page somehow triggered the crash.
|
---|
3295 |
|
---|
3296 | Also, while I will probably be interested to know the contents of your
|
---|
3297 | @file{.wgetrc} file, just dumping it into the debug message is probably
|
---|
3298 | a bad idea. Instead, you should first try to see if the bug repeats
|
---|
3299 | with @file{.wgetrc} moved out of the way. Only if it turns out that
|
---|
3300 | @file{.wgetrc} settings affect the bug, mail me the relevant parts of
|
---|
3301 | the file.
|
---|
3302 |
|
---|
3303 | @item
|
---|
3304 | Please start Wget with @samp{-d} option and send us the resulting
|
---|
3305 | output (or relevant parts thereof). If Wget was compiled without
|
---|
3306 | debug support, recompile it---it is @emph{much} easier to trace bugs
|
---|
3307 | with debug support on.
|
---|
3308 |
|
---|
3309 | Note: please make sure to remove any potentially sensitive information
|
---|
3310 | from the debug log before sending it to the bug address. The
|
---|
3311 | @code{-d} won't go out of its way to collect sensitive information,
|
---|
3312 | but the log @emph{will} contain a fairly complete transcript of Wget's
|
---|
3313 | communication with the server, which may include passwords and pieces
|
---|
3314 | of downloaded data. Since the bug address is publically archived, you
|
---|
3315 | may assume that all bug reports are visible to the public.
|
---|
3316 |
|
---|
3317 | @item
|
---|
3318 | If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which
|
---|
3319 | wget` core} and type @code{where} to get the backtrace. This may not
|
---|
3320 | work if the system administrator has disabled core files, but it is
|
---|
3321 | safe to try.
|
---|
3322 | @end enumerate
|
---|
3323 | @c man end
|
---|
3324 |
|
---|
3325 | @node Portability
|
---|
3326 | @section Portability
|
---|
3327 | @cindex portability
|
---|
3328 | @cindex operating systems
|
---|
3329 |
|
---|
3330 | Like all GNU software, Wget works on the GNU system. However, since it
|
---|
3331 | uses GNU Autoconf for building and configuring, and mostly avoids using
|
---|
3332 | ``special'' features of any particular Unix, it should compile (and
|
---|
3333 | work) on all common Unix flavors.
|
---|
3334 |
|
---|
3335 | Various Wget versions have been compiled and tested under many kinds
|
---|
3336 | of Unix systems, including GNU/Linux, Solaris, SunOS 4.x, OSF (aka
|
---|
3337 | Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some of
|
---|
3338 | those systems are no longer in widespread use and may not be able to
|
---|
3339 | support recent versions of Wget. If Wget fails to compile on your
|
---|
3340 | system, we would like to know about it.
|
---|
3341 |
|
---|
3342 | Thanks to kind contributors, this version of Wget compiles and works
|
---|
3343 | on 32-bit Microsoft Windows platforms. It has been compiled
|
---|
3344 | successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC
|
---|
3345 | compilers. Naturally, it is crippled of some features available on
|
---|
3346 | Unix, but it should work as a substitute for people stuck with
|
---|
3347 | Windows. Note that Windows-specific portions of Wget are not
|
---|
3348 | guaranteed to be supported in the future, although this has been the
|
---|
3349 | case in practice for many years now. All questions and problems in
|
---|
3350 | Windows usage should be reported to Wget mailing list at
|
---|
3351 | @email{wget@@sunsite.dk} where the volunteers who maintain the
|
---|
3352 | Windows-related features might look at them.
|
---|
3353 |
|
---|
3354 | @node Signals
|
---|
3355 | @section Signals
|
---|
3356 | @cindex signal handling
|
---|
3357 | @cindex hangup
|
---|
3358 |
|
---|
3359 | Since the purpose of Wget is background work, it catches the hangup
|
---|
3360 | signal (@code{SIGHUP}) and ignores it. If the output was on standard
|
---|
3361 | output, it will be redirected to a file named @file{wget-log}.
|
---|
3362 | Otherwise, @code{SIGHUP} is ignored. This is convenient when you wish
|
---|
3363 | to redirect the output of Wget after having started it.
|
---|
3364 |
|
---|
3365 | @example
|
---|
3366 | $ wget http://www.gnus.org/dist/gnus.tar.gz &
|
---|
3367 | ...
|
---|
3368 | $ kill -HUP %%
|
---|
3369 | SIGHUP received, redirecting output to `wget-log'.
|
---|
3370 | @end example
|
---|
3371 |
|
---|
3372 | Other than that, Wget will not try to interfere with signals in any way.
|
---|
3373 | @kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike.
|
---|
3374 |
|
---|
3375 | @node Appendices
|
---|
3376 | @chapter Appendices
|
---|
3377 |
|
---|
3378 | This chapter contains some references I consider useful.
|
---|
3379 |
|
---|
3380 | @menu
|
---|
3381 | * Robot Exclusion:: Wget's support for RES.
|
---|
3382 | * Security Considerations:: Security with Wget.
|
---|
3383 | * Contributors:: People who helped.
|
---|
3384 | @end menu
|
---|
3385 |
|
---|
3386 | @node Robot Exclusion
|
---|
3387 | @section Robot Exclusion
|
---|
3388 | @cindex robot exclusion
|
---|
3389 | @cindex robots.txt
|
---|
3390 | @cindex server maintenance
|
---|
3391 |
|
---|
3392 | It is extremely easy to make Wget wander aimlessly around a web site,
|
---|
3393 | sucking all the available data in progress. @samp{wget -r @var{site}},
|
---|
3394 | and you're set. Great? Not for the server admin.
|
---|
3395 |
|
---|
3396 | As long as Wget is only retrieving static pages, and doing it at a
|
---|
3397 | reasonable rate (see the @samp{--wait} option), there's not much of a
|
---|
3398 | problem. The trouble is that Wget can't tell the difference between the
|
---|
3399 | smallest static page and the most demanding CGI. A site I know has a
|
---|
3400 | section handled by a CGI Perl script that converts Info files to @sc{html} on
|
---|
3401 | the fly. The script is slow, but works well enough for human users
|
---|
3402 | viewing an occasional Info file. However, when someone's recursive Wget
|
---|
3403 | download stumbles upon the index page that links to all the Info files
|
---|
3404 | through the script, the system is brought to its knees without providing
|
---|
3405 | anything useful to the user (This task of converting Info files could be
|
---|
3406 | done locally and access to Info documentation for all installed GNU
|
---|
3407 | software on a system is available from the @code{info} command).
|
---|
3408 |
|
---|
3409 | To avoid this kind of accident, as well as to preserve privacy for
|
---|
3410 | documents that need to be protected from well-behaved robots, the
|
---|
3411 | concept of @dfn{robot exclusion} was invented. The idea is that
|
---|
3412 | the server administrators and document authors can specify which
|
---|
3413 | portions of the site they wish to protect from robots and those
|
---|
3414 | they will permit access.
|
---|
3415 |
|
---|
3416 | The most popular mechanism, and the @i{de facto} standard supported by
|
---|
3417 | all the major robots, is the ``Robots Exclusion Standard'' (RES) written
|
---|
3418 | by Martijn Koster et al. in 1994. It specifies the format of a text
|
---|
3419 | file containing directives that instruct the robots which URL paths to
|
---|
3420 | avoid. To be found by the robots, the specifications must be placed in
|
---|
3421 | @file{/robots.txt} in the server root, which the robots are expected to
|
---|
3422 | download and parse.
|
---|
3423 |
|
---|
3424 | Although Wget is not a web robot in the strictest sense of the word, it
|
---|
3425 | can downloads large parts of the site without the user's intervention to
|
---|
3426 | download an individual page. Because of that, Wget honors RES when
|
---|
3427 | downloading recursively. For instance, when you issue:
|
---|
3428 |
|
---|
3429 | @example
|
---|
3430 | wget -r http://www.server.com/
|
---|
3431 | @end example
|
---|
3432 |
|
---|
3433 | First the index of @samp{www.server.com} will be downloaded. If Wget
|
---|
3434 | finds that it wants to download more documents from that server, it will
|
---|
3435 | request @samp{http://www.server.com/robots.txt} and, if found, use it
|
---|
3436 | for further downloads. @file{robots.txt} is loaded only once per each
|
---|
3437 | server.
|
---|
3438 |
|
---|
3439 | Until version 1.8, Wget supported the first version of the standard,
|
---|
3440 | written by Martijn Koster in 1994 and available at
|
---|
3441 | @url{http://www.robotstxt.org/wc/norobots.html}. As of version 1.8,
|
---|
3442 | Wget has supported the additional directives specified in the internet
|
---|
3443 | draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web
|
---|
3444 | Robots Control''. The draft, which has as far as I know never made to
|
---|
3445 | an @sc{rfc}, is available at
|
---|
3446 | @url{http://www.robotstxt.org/wc/norobots-rfc.txt}.
|
---|
3447 |
|
---|
3448 | This manual no longer includes the text of the Robot Exclusion Standard.
|
---|
3449 |
|
---|
3450 | The second, less known mechanism, enables the author of an individual
|
---|
3451 | document to specify whether they want the links from the file to be
|
---|
3452 | followed by a robot. This is achieved using the @code{META} tag, like
|
---|
3453 | this:
|
---|
3454 |
|
---|
3455 | @example
|
---|
3456 | <meta name="robots" content="nofollow">
|
---|
3457 | @end example
|
---|
3458 |
|
---|
3459 | This is explained in some detail at
|
---|
3460 | @url{http://www.robotstxt.org/wc/meta-user.html}. Wget supports this
|
---|
3461 | method of robot exclusion in addition to the usual @file{/robots.txt}
|
---|
3462 | exclusion.
|
---|
3463 |
|
---|
3464 | If you know what you are doing and really really wish to turn off the
|
---|
3465 | robot exclusion, set the @code{robots} variable to @samp{off} in your
|
---|
3466 | @file{.wgetrc}. You can achieve the same effect from the command line
|
---|
3467 | using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}.
|
---|
3468 |
|
---|
3469 | @node Security Considerations
|
---|
3470 | @section Security Considerations
|
---|
3471 | @cindex security
|
---|
3472 |
|
---|
3473 | When using Wget, you must be aware that it sends unencrypted passwords
|
---|
3474 | through the network, which may present a security problem. Here are the
|
---|
3475 | main issues, and some solutions.
|
---|
3476 |
|
---|
3477 | @enumerate
|
---|
3478 | @item
|
---|
3479 | The passwords on the command line are visible using @code{ps}. The best
|
---|
3480 | way around it is to use @code{wget -i -} and feed the @sc{url}s to
|
---|
3481 | Wget's standard input, each on a separate line, terminated by @kbd{C-d}.
|
---|
3482 | Another workaround is to use @file{.netrc} to store passwords; however,
|
---|
3483 | storing unencrypted passwords is also considered a security risk.
|
---|
3484 |
|
---|
3485 | @item
|
---|
3486 | Using the insecure @dfn{basic} authentication scheme, unencrypted
|
---|
3487 | passwords are transmitted through the network routers and gateways.
|
---|
3488 |
|
---|
3489 | @item
|
---|
3490 | The @sc{ftp} passwords are also in no way encrypted. There is no good
|
---|
3491 | solution for this at the moment.
|
---|
3492 |
|
---|
3493 | @item
|
---|
3494 | Although the ``normal'' output of Wget tries to hide the passwords,
|
---|
3495 | debugging logs show them, in all forms. This problem is avoided by
|
---|
3496 | being careful when you send debug logs (yes, even when you send them to
|
---|
3497 | me).
|
---|
3498 | @end enumerate
|
---|
3499 |
|
---|
3500 | @node Contributors
|
---|
3501 | @section Contributors
|
---|
3502 | @cindex contributors
|
---|
3503 |
|
---|
3504 | @iftex
|
---|
3505 | GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@xemacs.org}.
|
---|
3506 | @end iftex
|
---|
3507 | @ifnottex
|
---|
3508 | GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org}.
|
---|
3509 | @end ifnottex
|
---|
3510 | However, its development could never have gone as far as it has, were it
|
---|
3511 | not for the help of many people, either with bug reports, feature
|
---|
3512 | proposals, patches, or letters saying ``Thanks!''.
|
---|
3513 |
|
---|
3514 | Special thanks goes to the following people (no particular order):
|
---|
3515 |
|
---|
3516 | @itemize @bullet
|
---|
3517 | @item
|
---|
3518 | Karsten Thygesen---donated system resources such as the mailing list,
|
---|
3519 | web space, and @sc{ftp} space, along with a lot of time to make these
|
---|
3520 | actually work.
|
---|
3521 |
|
---|
3522 | @item
|
---|
3523 | Shawn McHorse---bug reports and patches.
|
---|
3524 |
|
---|
3525 | @item
|
---|
3526 | Kaveh R. Ghazi---on-the-fly @code{ansi2knr}-ization. Lots of
|
---|
3527 | portability fixes.
|
---|
3528 |
|
---|
3529 | @item
|
---|
3530 | Gordon Matzigkeit---@file{.netrc} support.
|
---|
3531 |
|
---|
3532 | @item
|
---|
3533 | @iftex
|
---|
3534 | Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en
|
---|
3535 | Ka@v{c}ar---feature suggestions and ``philosophical'' discussions.
|
---|
3536 | @end iftex
|
---|
3537 | @ifnottex
|
---|
3538 | Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions
|
---|
3539 | and ``philosophical'' discussions.
|
---|
3540 | @end ifnottex
|
---|
3541 |
|
---|
3542 | @item
|
---|
3543 | Darko Budor---initial port to Windows.
|
---|
3544 |
|
---|
3545 | @item
|
---|
3546 | Antonio Rosella---help and suggestions, plus the Italian translation.
|
---|
3547 |
|
---|
3548 | @item
|
---|
3549 | @iftex
|
---|
3550 | Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and
|
---|
3551 | suggestions.
|
---|
3552 | @end iftex
|
---|
3553 | @ifnottex
|
---|
3554 | Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions.
|
---|
3555 | @end ifnottex
|
---|
3556 |
|
---|
3557 | @item
|
---|
3558 | @iftex
|
---|
3559 | Fran@,{c}ois Pinard---many thorough bug reports and discussions.
|
---|
3560 | @end iftex
|
---|
3561 | @ifnottex
|
---|
3562 | Francois Pinard---many thorough bug reports and discussions.
|
---|
3563 | @end ifnottex
|
---|
3564 |
|
---|
3565 | @item
|
---|
3566 | Karl Eichwalder---lots of help with internationalization and other
|
---|
3567 | things.
|
---|
3568 |
|
---|
3569 | @item
|
---|
3570 | Junio Hamano---donated support for Opie and @sc{http} @code{Digest}
|
---|
3571 | authentication.
|
---|
3572 |
|
---|
3573 | @item
|
---|
3574 | The people who provided donations for development, including Brian
|
---|
3575 | Gough.
|
---|
3576 | @end itemize
|
---|
3577 |
|
---|
3578 | The following people have provided patches, bug/build reports, useful
|
---|
3579 | suggestions, beta testing services, fan mail and all the other things
|
---|
3580 | that make maintenance so much fun:
|
---|
3581 |
|
---|
3582 | Ian Abbott
|
---|
3583 | Tim Adam,
|
---|
3584 | Adrian Aichner,
|
---|
3585 | Martin Baehr,
|
---|
3586 | Dieter Baron,
|
---|
3587 | Roger Beeman,
|
---|
3588 | Dan Berger,
|
---|
3589 | T. Bharath,
|
---|
3590 | Christian Biere,
|
---|
3591 | Paul Bludov,
|
---|
3592 | Daniel Bodea,
|
---|
3593 | Mark Boyns,
|
---|
3594 | John Burden,
|
---|
3595 | Wanderlei Cavassin,
|
---|
3596 | Gilles Cedoc,
|
---|
3597 | Tim Charron,
|
---|
3598 | Noel Cragg,
|
---|
3599 | @iftex
|
---|
3600 | Kristijan @v{C}onka@v{s},
|
---|
3601 | @end iftex
|
---|
3602 | @ifnottex
|
---|
3603 | Kristijan Conkas,
|
---|
3604 | @end ifnottex
|
---|
3605 | John Daily,
|
---|
3606 | Andreas Damm,
|
---|
3607 | Ahmon Dancy,
|
---|
3608 | Andrew Davison,
|
---|
3609 | Bertrand Demiddelaer,
|
---|
3610 | Andrew Deryabin,
|
---|
3611 | Ulrich Drepper,
|
---|
3612 | Marc Duponcheel,
|
---|
3613 | @iftex
|
---|
3614 | Damir D@v{z}eko,
|
---|
3615 | @end iftex
|
---|
3616 | @ifnottex
|
---|
3617 | Damir Dzeko,
|
---|
3618 | @end ifnottex
|
---|
3619 | Alan Eldridge,
|
---|
3620 | Hans-Andreas Engel,
|
---|
3621 | @iftex
|
---|
3622 | Aleksandar Erkalovi@'{c},
|
---|
3623 | @end iftex
|
---|
3624 | @ifnottex
|
---|
3625 | Aleksandar Erkalovic,
|
---|
3626 | @end ifnottex
|
---|
3627 | Andy Eskilsson,
|
---|
3628 | Christian Fraenkel,
|
---|
3629 | David Fritz,
|
---|
3630 | Charles C. Fu,
|
---|
3631 | FUJISHIMA Satsuki,
|
---|
3632 | Masashi Fujita,
|
---|
3633 | Howard Gayle,
|
---|
3634 | Marcel Gerrits,
|
---|
3635 | Lemble Gregory,
|
---|
3636 | Hans Grobler,
|
---|
3637 | Mathieu Guillaume,
|
---|
3638 | Dan Harkless,
|
---|
3639 | Aaron Hawley,
|
---|
3640 | Herold Heiko,
|
---|
3641 | Jochen Hein,
|
---|
3642 | Karl Heuer,
|
---|
3643 | HIROSE Masaaki,
|
---|
3644 | Ulf Harnhammar,
|
---|
3645 | Gregor Hoffleit,
|
---|
3646 | Erik Magnus Hulthen,
|
---|
3647 | Richard Huveneers,
|
---|
3648 | Jonas Jensen,
|
---|
3649 | Larry Jones,
|
---|
3650 | Simon Josefsson,
|
---|
3651 | @iftex
|
---|
3652 | Mario Juri@'{c},
|
---|
3653 | @end iftex
|
---|
3654 | @ifnottex
|
---|
3655 | Mario Juric,
|
---|
3656 | @end ifnottex
|
---|
3657 | @iftex
|
---|
3658 | Hack Kampbj@o rn,
|
---|
3659 | @end iftex
|
---|
3660 | @ifnottex
|
---|
3661 | Hack Kampbjorn,
|
---|
3662 | @end ifnottex
|
---|
3663 | Const Kaplinsky,
|
---|
3664 | @iftex
|
---|
3665 | Goran Kezunovi@'{c},
|
---|
3666 | @end iftex
|
---|
3667 | @ifnottex
|
---|
3668 | Goran Kezunovic,
|
---|
3669 | @end ifnottex
|
---|
3670 | Igor Khristophorov,
|
---|
3671 | Robert Kleine,
|
---|
3672 | KOJIMA Haime,
|
---|
3673 | Fila Kolodny,
|
---|
3674 | Alexander Kourakos,
|
---|
3675 | Martin Kraemer,
|
---|
3676 | Sami Krank,
|
---|
3677 | @tex
|
---|
3678 | $\Sigma\acute{\iota}\mu o\varsigma\;
|
---|
3679 | \Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$
|
---|
3680 | (Simos KSenitellis),
|
---|
3681 | @end tex
|
---|
3682 | @ifnottex
|
---|
3683 | Simos KSenitellis,
|
---|
3684 | @end ifnottex
|
---|
3685 | Christian Lackas,
|
---|
3686 | Hrvoje Lacko,
|
---|
3687 | Daniel S. Lewart,
|
---|
3688 | @iftex
|
---|
3689 | Nicol@'{a}s Lichtmeier,
|
---|
3690 | @end iftex
|
---|
3691 | @ifnottex
|
---|
3692 | Nicolas Lichtmeier,
|
---|
3693 | @end ifnottex
|
---|
3694 | Dave Love,
|
---|
3695 | Alexander V. Lukyanov,
|
---|
3696 | @iftex
|
---|
3697 | Thomas Lu@ss{}nig,
|
---|
3698 | @end iftex
|
---|
3699 | @ifnottex
|
---|
3700 | Thomas Lussnig,
|
---|
3701 | @end ifnottex
|
---|
3702 | Andre Majorel,
|
---|
3703 | Aurelien Marchand,
|
---|
3704 | Matthew J. Mellon,
|
---|
3705 | Jordan Mendelson,
|
---|
3706 | Lin Zhe Min,
|
---|
3707 | Jan Minar,
|
---|
3708 | Tim Mooney,
|
---|
3709 | Keith Moore,
|
---|
3710 | Adam D. Moss,
|
---|
3711 | Simon Munton,
|
---|
3712 | Charlie Negyesi,
|
---|
3713 | R. K. Owen,
|
---|
3714 | Leonid Petrov,
|
---|
3715 | Simone Piunno,
|
---|
3716 | Andrew Pollock,
|
---|
3717 | Steve Pothier,
|
---|
3718 | @iftex
|
---|
3719 | Jan P@v{r}ikryl,
|
---|
3720 | @end iftex
|
---|
3721 | @ifnottex
|
---|
3722 | Jan Prikryl,
|
---|
3723 | @end ifnottex
|
---|
3724 | Marin Purgar,
|
---|
3725 | @iftex
|
---|
3726 | Csaba R@'{a}duly,
|
---|
3727 | @end iftex
|
---|
3728 | @ifnottex
|
---|
3729 | Csaba Raduly,
|
---|
3730 | @end ifnottex
|
---|
3731 | Keith Refson,
|
---|
3732 | Bill Richardson,
|
---|
3733 | Tyler Riddle,
|
---|
3734 | Tobias Ringstrom,
|
---|
3735 | @c Texinfo doesn't grok @'{@i}, so we have to use TeX itself.
|
---|
3736 | @tex
|
---|
3737 | Juan Jos\'{e} Rodr\'{\i}guez,
|
---|
3738 | @end tex
|
---|
3739 | @ifnottex
|
---|
3740 | Juan Jose Rodriguez,
|
---|
3741 | @end ifnottex
|
---|
3742 | Maciej W. Rozycki,
|
---|
3743 | Edward J. Sabol,
|
---|
3744 | Heinz Salzmann,
|
---|
3745 | Robert Schmidt,
|
---|
3746 | Nicolas Schodet,
|
---|
3747 | Andreas Schwab,
|
---|
3748 | Chris Seawood,
|
---|
3749 | Dennis Smit,
|
---|
3750 | Toomas Soome,
|
---|
3751 | Tage Stabell-Kulo,
|
---|
3752 | Philip Stadermann,
|
---|
3753 | Daniel Stenberg,
|
---|
3754 | Sven Sternberger,
|
---|
3755 | Markus Strasser,
|
---|
3756 | John Summerfield,
|
---|
3757 | Szakacsits Szabolcs,
|
---|
3758 | Mike Thomas,
|
---|
3759 | Philipp Thomas,
|
---|
3760 | Mauro Tortonesi,
|
---|
3761 | Dave Turner,
|
---|
3762 | Gisle Vanem,
|
---|
3763 | Russell Vincent,
|
---|
3764 | @iftex
|
---|
3765 | @v{Z}eljko Vrba,
|
---|
3766 | @end iftex
|
---|
3767 | @ifnottex
|
---|
3768 | Zeljko Vrba,
|
---|
3769 | @end ifnottex
|
---|
3770 | Charles G Waldman,
|
---|
3771 | Douglas E. Wegscheid,
|
---|
3772 | YAMAZAKI Makoto,
|
---|
3773 | Jasmin Zainul,
|
---|
3774 | @iftex
|
---|
3775 | Bojan @v{Z}drnja,
|
---|
3776 | @end iftex
|
---|
3777 | @ifnottex
|
---|
3778 | Bojan Zdrnja,
|
---|
3779 | @end ifnottex
|
---|
3780 | Kristijan Zimmer.
|
---|
3781 |
|
---|
3782 | Apologies to all who I accidentally left out, and many thanks to all the
|
---|
3783 | subscribers of the Wget mailing list.
|
---|
3784 |
|
---|
3785 | @node Copying
|
---|
3786 | @chapter Copying
|
---|
3787 | @cindex copying
|
---|
3788 | @cindex GPL
|
---|
3789 | @cindex GFDL
|
---|
3790 | @cindex free software
|
---|
3791 |
|
---|
3792 | GNU Wget is licensed under the GNU General Public License (GNU GPL),
|
---|
3793 | which makes it @dfn{free software}. Please note that ``free'' in ``free
|
---|
3794 | software'' refers to liberty, not price. As some people like to point
|
---|
3795 | out, it's the ``free'' of ``free speech'', not the ``free'' of ``free
|
---|
3796 | beer''.
|
---|
3797 |
|
---|
3798 | The exact and legally binding distribution terms are spelled out below.
|
---|
3799 | The GPL guarantees that you have the right (freedom) to run and change
|
---|
3800 | GNU Wget and distribute it to others, and even---if you want---charge
|
---|
3801 | money for doing any of those things. With these rights comes the
|
---|
3802 | obligation to distribute the source code along with the software and to
|
---|
3803 | grant your recipients the same rights and impose the same restrictions.
|
---|
3804 |
|
---|
3805 | This licensing model is also known as @dfn{open source} because it,
|
---|
3806 | among other things, makes sure that all recipients will receive the
|
---|
3807 | source code along with the program, and be able to improve it. The GNU
|
---|
3808 | project prefers the term ``free software'' for reasons outlined at
|
---|
3809 | @url{http://www.gnu.org/philosophy/free-software-for-freedom.html}.
|
---|
3810 |
|
---|
3811 | The exact license terms are defined by this paragraph and the GNU
|
---|
3812 | General Public License it refers to:
|
---|
3813 |
|
---|
3814 | @quotation
|
---|
3815 | GNU Wget is free software; you can redistribute it and/or modify it
|
---|
3816 | under the terms of the GNU General Public License as published by the
|
---|
3817 | Free Software Foundation; either version 2 of the License, or (at your
|
---|
3818 | option) any later version.
|
---|
3819 |
|
---|
3820 | GNU Wget is distributed in the hope that it will be useful, but WITHOUT
|
---|
3821 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
---|
3822 | FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
---|
3823 | for more details.
|
---|
3824 |
|
---|
3825 | A copy of the GNU General Public License is included as part of this
|
---|
3826 | manual; if you did not receive it, write to the Free Software
|
---|
3827 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
---|
3828 | @end quotation
|
---|
3829 |
|
---|
3830 | In addition to this, this manual is free in the same sense:
|
---|
3831 |
|
---|
3832 | @quotation
|
---|
3833 | Permission is granted to copy, distribute and/or modify this document
|
---|
3834 | under the terms of the GNU Free Documentation License, Version 1.2 or
|
---|
3835 | any later version published by the Free Software Foundation; with the
|
---|
3836 | Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
---|
3837 | Documentation License'', with no Front-Cover Texts, and with no
|
---|
3838 | Back-Cover Texts. A copy of the license is included in the section
|
---|
3839 | entitled ``GNU Free Documentation License''.
|
---|
3840 | @end quotation
|
---|
3841 |
|
---|
3842 | @c #### Maybe we should wrap these licenses in ifinfo? Stallman says
|
---|
3843 | @c that the GFDL needs to be present in the manual, and to me it would
|
---|
3844 | @c suck to include the license for the manual and not the license for
|
---|
3845 | @c the program.
|
---|
3846 |
|
---|
3847 | The full texts of the GNU General Public License and of the GNU Free
|
---|
3848 | Documentation License are available below.
|
---|
3849 |
|
---|
3850 | @menu
|
---|
3851 | * GNU General Public License::
|
---|
3852 | * GNU Free Documentation License::
|
---|
3853 | @end menu
|
---|
3854 |
|
---|
3855 | @include gpl.texi
|
---|
3856 |
|
---|
3857 | @include fdl.texi
|
---|
3858 |
|
---|
3859 | @node Concept Index
|
---|
3860 | @unnumbered Concept Index
|
---|
3861 | @printindex cp
|
---|
3862 |
|
---|
3863 | @contents
|
---|
3864 |
|
---|
3865 | @bye
|
---|