1 | This is wget.info, produced by makeinfo version 4.8 from ./wget.texi.
|
---|
2 |
|
---|
3 | INFO-DIR-SECTION Network Applications
|
---|
4 | START-INFO-DIR-ENTRY
|
---|
5 | * Wget: (wget). The non-interactive network downloader.
|
---|
6 | END-INFO-DIR-ENTRY
|
---|
7 |
|
---|
8 | This file documents the the GNU Wget utility for downloading network
|
---|
9 | data.
|
---|
10 |
|
---|
11 | Copyright (C) 1996-2005 Free Software Foundation, Inc.
|
---|
12 |
|
---|
13 | Permission is granted to make and distribute verbatim copies of this
|
---|
14 | manual provided the copyright notice and this permission notice are
|
---|
15 | preserved on all copies.
|
---|
16 |
|
---|
17 | Permission is granted to copy, distribute and/or modify this document
|
---|
18 | under the terms of the GNU Free Documentation License, Version 1.2 or
|
---|
19 | any later version published by the Free Software Foundation; with the
|
---|
20 | Invariant Sections being "GNU General Public License" and "GNU Free
|
---|
21 | Documentation License", with no Front-Cover Texts, and with no
|
---|
22 | Back-Cover Texts. A copy of the license is included in the section
|
---|
23 | entitled "GNU Free Documentation License".
|
---|
24 |
|
---|
25 |
|
---|
26 | File: wget.info, Node: Top, Next: Overview, Up: (dir)
|
---|
27 |
|
---|
28 | Wget 1.10.2
|
---|
29 | ***********
|
---|
30 |
|
---|
31 | This manual documents version 1.10.2 of GNU Wget, the freely available
|
---|
32 | utility for network downloads.
|
---|
33 |
|
---|
34 | Copyright (C) 1996-2005 Free Software Foundation, Inc.
|
---|
35 |
|
---|
36 | * Menu:
|
---|
37 |
|
---|
38 | * Overview:: Features of Wget.
|
---|
39 | * Invoking:: Wget command-line arguments.
|
---|
40 | * Recursive Download:: Downloading interlinked pages.
|
---|
41 | * Following Links:: The available methods of chasing links.
|
---|
42 | * Time-Stamping:: Mirroring according to time-stamps.
|
---|
43 | * Startup File:: Wget's initialization file.
|
---|
44 | * Examples:: Examples of usage.
|
---|
45 | * Various:: The stuff that doesn't fit anywhere else.
|
---|
46 | * Appendices:: Some useful references.
|
---|
47 | * Copying:: You may give out copies of Wget and of this manual.
|
---|
48 | * Concept Index:: Topics covered by this manual.
|
---|
49 |
|
---|
50 |
|
---|
51 | File: wget.info, Node: Overview, Next: Invoking, Prev: Top, Up: Top
|
---|
52 |
|
---|
53 | 1 Overview
|
---|
54 | **********
|
---|
55 |
|
---|
56 | GNU Wget is a free utility for non-interactive download of files from
|
---|
57 | the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
|
---|
58 | retrieval through HTTP proxies.
|
---|
59 |
|
---|
60 | This chapter is a partial overview of Wget's features.
|
---|
61 |
|
---|
62 | * Wget is non-interactive, meaning that it can work in the
|
---|
63 | background, while the user is not logged on. This allows you to
|
---|
64 | start a retrieval and disconnect from the system, letting Wget
|
---|
65 | finish the work. By contrast, most of the Web browsers require
|
---|
66 | constant user's presence, which can be a great hindrance when
|
---|
67 | transferring a lot of data.
|
---|
68 |
|
---|
69 | * Wget can follow links in HTML and XHTML pages and create local
|
---|
70 | versions of remote web sites, fully recreating the directory
|
---|
71 | structure of the original site. This is sometimes referred to as
|
---|
72 | "recursive downloading." While doing that, Wget respects the
|
---|
73 | Robot Exclusion Standard (`/robots.txt'). Wget can be instructed
|
---|
74 | to convert the links in downloaded HTML files to the local files
|
---|
75 | for offline viewing.
|
---|
76 |
|
---|
77 | * File name wildcard matching and recursive mirroring of directories
|
---|
78 | are available when retrieving via FTP. Wget can read the
|
---|
79 | time-stamp information given by both HTTP and FTP servers, and
|
---|
80 | store it locally. Thus Wget can see if the remote file has
|
---|
81 | changed since last retrieval, and automatically retrieve the new
|
---|
82 | version if it has. This makes Wget suitable for mirroring of FTP
|
---|
83 | sites, as well as home pages.
|
---|
84 |
|
---|
85 | * Wget has been designed for robustness over slow or unstable network
|
---|
86 | connections; if a download fails due to a network problem, it will
|
---|
87 | keep retrying until the whole file has been retrieved. If the
|
---|
88 | server supports regetting, it will instruct the server to continue
|
---|
89 | the download from where it left off.
|
---|
90 |
|
---|
91 | * Wget supports proxy servers, which can lighten the network load,
|
---|
92 | speed up retrieval and provide access behind firewalls. However,
|
---|
93 | if you are behind a firewall that requires that you use a socks
|
---|
94 | style gateway, you can get the socks library and build Wget with
|
---|
95 | support for socks. Wget uses the passive FTP downloading by
|
---|
96 | default, active FTP being an option.
|
---|
97 |
|
---|
98 | * Wget supports IP version 6, the next generation of IP. IPv6 is
|
---|
99 | autodetected at compile-time, and can be disabled at either build
|
---|
100 | or run time. Binaries built with IPv6 support work well in both
|
---|
101 | IPv4-only and dual family environments.
|
---|
102 |
|
---|
103 | * Built-in features offer mechanisms to tune which links you wish to
|
---|
104 | follow (*note Following Links::).
|
---|
105 |
|
---|
106 | * The progress of individual downloads is traced using a progress
|
---|
107 | gauge. Interactive downloads are tracked using a
|
---|
108 | "thermometer"-style gauge, whereas non-interactive ones are traced
|
---|
109 | with dots, each dot representing a fixed amount of data received
|
---|
110 | (1KB by default). Either gauge can be customized to your
|
---|
111 | preferences.
|
---|
112 |
|
---|
113 | * Most of the features are fully configurable, either through
|
---|
114 | command line options, or via the initialization file `.wgetrc'
|
---|
115 | (*note Startup File::). Wget allows you to define "global"
|
---|
116 | startup files (`/usr/local/etc/wgetrc' by default) for site
|
---|
117 | settings.
|
---|
118 |
|
---|
119 | * Finally, GNU Wget is free software. This means that everyone may
|
---|
120 | use it, redistribute it and/or modify it under the terms of the
|
---|
121 | GNU General Public License, as published by the Free Software
|
---|
122 | Foundation (*note Copying::).
|
---|
123 |
|
---|
124 |
|
---|
125 | File: wget.info, Node: Invoking, Next: Recursive Download, Prev: Overview, Up: Top
|
---|
126 |
|
---|
127 | 2 Invoking
|
---|
128 | **********
|
---|
129 |
|
---|
130 | By default, Wget is very simple to invoke. The basic syntax is:
|
---|
131 |
|
---|
132 | wget [OPTION]... [URL]...
|
---|
133 |
|
---|
134 | Wget will simply download all the URLs specified on the command
|
---|
135 | line. URL is a "Uniform Resource Locator", as defined below.
|
---|
136 |
|
---|
137 | However, you may wish to change some of the default parameters of
|
---|
138 | Wget. You can do it two ways: permanently, adding the appropriate
|
---|
139 | command to `.wgetrc' (*note Startup File::), or specifying it on the
|
---|
140 | command line.
|
---|
141 |
|
---|
142 | * Menu:
|
---|
143 |
|
---|
144 | * URL Format::
|
---|
145 | * Option Syntax::
|
---|
146 | * Basic Startup Options::
|
---|
147 | * Logging and Input File Options::
|
---|
148 | * Download Options::
|
---|
149 | * Directory Options::
|
---|
150 | * HTTP Options::
|
---|
151 | * HTTPS (SSL/TLS) Options::
|
---|
152 | * FTP Options::
|
---|
153 | * Recursive Retrieval Options::
|
---|
154 | * Recursive Accept/Reject Options::
|
---|
155 |
|
---|
156 |
|
---|
157 | File: wget.info, Node: URL Format, Next: Option Syntax, Up: Invoking
|
---|
158 |
|
---|
159 | 2.1 URL Format
|
---|
160 | ==============
|
---|
161 |
|
---|
162 | "URL" is an acronym for Uniform Resource Locator. A uniform resource
|
---|
163 | locator is a compact string representation for a resource available via
|
---|
164 | the Internet. Wget recognizes the URL syntax as per RFC1738. This is
|
---|
165 | the most widely used form (square brackets denote optional parts):
|
---|
166 |
|
---|
167 | http://host[:port]/directory/file
|
---|
168 | ftp://host[:port]/directory/file
|
---|
169 |
|
---|
170 | You can also encode your username and password within a URL:
|
---|
171 |
|
---|
172 | ftp://user:password@host/path
|
---|
173 | http://user:password@host/path
|
---|
174 |
|
---|
175 | Either USER or PASSWORD, or both, may be left out. If you leave out
|
---|
176 | either the HTTP username or password, no authentication will be sent.
|
---|
177 | If you leave out the FTP username, `anonymous' will be used. If you
|
---|
178 | leave out the FTP password, your email address will be supplied as a
|
---|
179 | default password.(1)
|
---|
180 |
|
---|
181 | *Important Note*: if you specify a password-containing URL on the
|
---|
182 | command line, the username and password will be plainly visible to all
|
---|
183 | users on the system, by way of `ps'. On multi-user systems, this is a
|
---|
184 | big security risk. To work around it, use `wget -i -' and feed the
|
---|
185 | URLs to Wget's standard input, each on a separate line, terminated by
|
---|
186 | `C-d'.
|
---|
187 |
|
---|
188 | You can encode unsafe characters in a URL as `%xy', `xy' being the
|
---|
189 | hexadecimal representation of the character's ASCII value. Some common
|
---|
190 | unsafe characters include `%' (quoted as `%25'), `:' (quoted as `%3A'),
|
---|
191 | and `@' (quoted as `%40'). Refer to RFC1738 for a comprehensive list
|
---|
192 | of unsafe characters.
|
---|
193 |
|
---|
194 | Wget also supports the `type' feature for FTP URLs. By default, FTP
|
---|
195 | documents are retrieved in the binary mode (type `i'), which means that
|
---|
196 | they are downloaded unchanged. Another useful mode is the `a'
|
---|
197 | ("ASCII") mode, which converts the line delimiters between the
|
---|
198 | different operating systems, and is thus useful for text files. Here
|
---|
199 | is an example:
|
---|
200 |
|
---|
201 | ftp://host/directory/file;type=a
|
---|
202 |
|
---|
203 | Two alternative variants of URL specification are also supported,
|
---|
204 | because of historical (hysterical?) reasons and their widespreaded use.
|
---|
205 |
|
---|
206 | FTP-only syntax (supported by `NcFTP'):
|
---|
207 | host:/dir/file
|
---|
208 |
|
---|
209 | HTTP-only syntax (introduced by `Netscape'):
|
---|
210 | host[:port]/dir/file
|
---|
211 |
|
---|
212 | These two alternative forms are deprecated, and may cease being
|
---|
213 | supported in the future.
|
---|
214 |
|
---|
215 | If you do not understand the difference between these notations, or
|
---|
216 | do not know which one to use, just use the plain ordinary format you use
|
---|
217 | with your favorite browser, like `Lynx' or `Netscape'.
|
---|
218 |
|
---|
219 | ---------- Footnotes ----------
|
---|
220 |
|
---|
221 | (1) If you have a `.netrc' file in your home directory, password
|
---|
222 | will also be searched for there.
|
---|
223 |
|
---|
224 |
|
---|
225 | File: wget.info, Node: Option Syntax, Next: Basic Startup Options, Prev: URL Format, Up: Invoking
|
---|
226 |
|
---|
227 | 2.2 Option Syntax
|
---|
228 | =================
|
---|
229 |
|
---|
230 | Since Wget uses GNU getopt to process command-line arguments, every
|
---|
231 | option has a long form along with the short one. Long options are more
|
---|
232 | convenient to remember, but take time to type. You may freely mix
|
---|
233 | different option styles, or specify options after the command-line
|
---|
234 | arguments. Thus you may write:
|
---|
235 |
|
---|
236 | wget -r --tries=10 http://fly.srk.fer.hr/ -o log
|
---|
237 |
|
---|
238 | The space between the option accepting an argument and the argument
|
---|
239 | may be omitted. Instead `-o log' you can write `-olog'.
|
---|
240 |
|
---|
241 | You may put several options that do not require arguments together,
|
---|
242 | like:
|
---|
243 |
|
---|
244 | wget -drc URL
|
---|
245 |
|
---|
246 | This is a complete equivalent of:
|
---|
247 |
|
---|
248 | wget -d -r -c URL
|
---|
249 |
|
---|
250 | Since the options can be specified after the arguments, you may
|
---|
251 | terminate them with `--'. So the following will try to download URL
|
---|
252 | `-x', reporting failure to `log':
|
---|
253 |
|
---|
254 | wget -o log -- -x
|
---|
255 |
|
---|
256 | The options that accept comma-separated lists all respect the
|
---|
257 | convention that specifying an empty list clears its value. This can be
|
---|
258 | useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'
|
---|
259 | sets `exclude_directories' to `/cgi-bin', the following example will
|
---|
260 | first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
|
---|
261 | You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).
|
---|
262 |
|
---|
263 | wget -X '' -X /~nobody,/~somebody
|
---|
264 |
|
---|
265 | Most options that do not accept arguments are "boolean" options, so
|
---|
266 | named because their state can be captured with a yes-or-no ("boolean")
|
---|
267 | variable. For example, `--follow-ftp' tells Wget to follow FTP links
|
---|
268 | from HTML files and, on the other hand, `--no-glob' tells it not to
|
---|
269 | perform file globbing on FTP URLs. A boolean option is either
|
---|
270 | "affirmative" or "negative" (beginning with `--no'). All such options
|
---|
271 | share several properties.
|
---|
272 |
|
---|
273 | Unless stated otherwise, it is assumed that the default behavior is
|
---|
274 | the opposite of what the option accomplishes. For example, the
|
---|
275 | documented existence of `--follow-ftp' assumes that the default is to
|
---|
276 | _not_ follow FTP links from HTML pages.
|
---|
277 |
|
---|
278 | Affirmative options can be negated by prepending the `--no-' to the
|
---|
279 | option name; negative options can be negated by omitting the `--no-'
|
---|
280 | prefix. This might seem superfluous--if the default for an affirmative
|
---|
281 | option is to not do something, then why provide a way to explicitly
|
---|
282 | turn it off? But the startup file may in fact change the default. For
|
---|
283 | instance, using `follow_ftp = off' in `.wgetrc' makes Wget _not_ follow
|
---|
284 | FTP links by default, and using `--no-follow-ftp' is the only way to
|
---|
285 | restore the factory default from the command line.
|
---|
286 |
|
---|
287 |
|
---|
288 | File: wget.info, Node: Basic Startup Options, Next: Logging and Input File Options, Prev: Option Syntax, Up: Invoking
|
---|
289 |
|
---|
290 | 2.3 Basic Startup Options
|
---|
291 | =========================
|
---|
292 |
|
---|
293 | `-V'
|
---|
294 | `--version'
|
---|
295 | Display the version of Wget.
|
---|
296 |
|
---|
297 | `-h'
|
---|
298 | `--help'
|
---|
299 | Print a help message describing all of Wget's command-line options.
|
---|
300 |
|
---|
301 | `-b'
|
---|
302 | `--background'
|
---|
303 | Go to background immediately after startup. If no output file is
|
---|
304 | specified via the `-o', output is redirected to `wget-log'.
|
---|
305 |
|
---|
306 | `-e COMMAND'
|
---|
307 | `--execute COMMAND'
|
---|
308 | Execute COMMAND as if it were a part of `.wgetrc' (*note Startup
|
---|
309 | File::). A command thus invoked will be executed _after_ the
|
---|
310 | commands in `.wgetrc', thus taking precedence over them. If you
|
---|
311 | need to specify more than one wgetrc command, use multiple
|
---|
312 | instances of `-e'.
|
---|
313 |
|
---|
314 |
|
---|
315 |
|
---|
316 | File: wget.info, Node: Logging and Input File Options, Next: Download Options, Prev: Basic Startup Options, Up: Invoking
|
---|
317 |
|
---|
318 | 2.4 Logging and Input File Options
|
---|
319 | ==================================
|
---|
320 |
|
---|
321 | `-o LOGFILE'
|
---|
322 | `--output-file=LOGFILE'
|
---|
323 | Log all messages to LOGFILE. The messages are normally reported
|
---|
324 | to standard error.
|
---|
325 |
|
---|
326 | `-a LOGFILE'
|
---|
327 | `--append-output=LOGFILE'
|
---|
328 | Append to LOGFILE. This is the same as `-o', only it appends to
|
---|
329 | LOGFILE instead of overwriting the old log file. If LOGFILE does
|
---|
330 | not exist, a new file is created.
|
---|
331 |
|
---|
332 | `-d'
|
---|
333 | `--debug'
|
---|
334 | Turn on debug output, meaning various information important to the
|
---|
335 | developers of Wget if it does not work properly. Your system
|
---|
336 | administrator may have chosen to compile Wget without debug
|
---|
337 | support, in which case `-d' will not work. Please note that
|
---|
338 | compiling with debug support is always safe--Wget compiled with
|
---|
339 | the debug support will _not_ print any debug info unless requested
|
---|
340 | with `-d'. *Note Reporting Bugs::, for more information on how to
|
---|
341 | use `-d' for sending bug reports.
|
---|
342 |
|
---|
343 | `-q'
|
---|
344 | `--quiet'
|
---|
345 | Turn off Wget's output.
|
---|
346 |
|
---|
347 | `-v'
|
---|
348 | `--verbose'
|
---|
349 | Turn on verbose output, with all the available data. The default
|
---|
350 | output is verbose.
|
---|
351 |
|
---|
352 | `-nv'
|
---|
353 | `--no-verbose'
|
---|
354 | Turn off verbose without being completely quiet (use `-q' for
|
---|
355 | that), which means that error messages and basic information still
|
---|
356 | get printed.
|
---|
357 |
|
---|
358 | `-i FILE'
|
---|
359 | `--input-file=FILE'
|
---|
360 | Read URLs from FILE. If `-' is specified as FILE, URLs are read
|
---|
361 | from the standard input. (Use `./-' to read from a file literally
|
---|
362 | named `-'.)
|
---|
363 |
|
---|
364 | If this function is used, no URLs need be present on the command
|
---|
365 | line. If there are URLs both on the command line and in an input
|
---|
366 | file, those on the command lines will be the first ones to be
|
---|
367 | retrieved. The FILE need not be an HTML document (but no harm if
|
---|
368 | it is)--it is enough if the URLs are just listed sequentially.
|
---|
369 |
|
---|
370 | However, if you specify `--force-html', the document will be
|
---|
371 | regarded as `html'. In that case you may have problems with
|
---|
372 | relative links, which you can solve either by adding `<base
|
---|
373 | href="URL">' to the documents or by specifying `--base=URL' on the
|
---|
374 | command line.
|
---|
375 |
|
---|
376 | `-F'
|
---|
377 | `--force-html'
|
---|
378 | When input is read from a file, force it to be treated as an HTML
|
---|
379 | file. This enables you to retrieve relative links from existing
|
---|
380 | HTML files on your local disk, by adding `<base href="URL">' to
|
---|
381 | HTML, or using the `--base' command-line option.
|
---|
382 |
|
---|
383 | `-B URL'
|
---|
384 | `--base=URL'
|
---|
385 | Prepends URL to relative links read from the file specified with
|
---|
386 | the `-i' option.
|
---|
387 |
|
---|
388 |
|
---|
389 | File: wget.info, Node: Download Options, Next: Directory Options, Prev: Logging and Input File Options, Up: Invoking
|
---|
390 |
|
---|
391 | 2.5 Download Options
|
---|
392 | ====================
|
---|
393 |
|
---|
394 | `--bind-address=ADDRESS'
|
---|
395 | When making client TCP/IP connections, bind to ADDRESS on the
|
---|
396 | local machine. ADDRESS may be specified as a hostname or IP
|
---|
397 | address. This option can be useful if your machine is bound to
|
---|
398 | multiple IPs.
|
---|
399 |
|
---|
400 | `-t NUMBER'
|
---|
401 | `--tries=NUMBER'
|
---|
402 | Set number of retries to NUMBER. Specify 0 or `inf' for infinite
|
---|
403 | retrying. The default is to retry 20 times, with the exception of
|
---|
404 | fatal errors like "connection refused" or "not found" (404), which
|
---|
405 | are not retried.
|
---|
406 |
|
---|
407 | `-O FILE'
|
---|
408 | `--output-document=FILE'
|
---|
409 | The documents will not be written to the appropriate files, but all
|
---|
410 | will be concatenated together and written to FILE. If `-' is used
|
---|
411 | as FILE, documents will be printed to standard output, disabling
|
---|
412 | link conversion. (Use `./-' to print to a file literally named
|
---|
413 | `-'.)
|
---|
414 |
|
---|
415 | Note that a combination with `-k' is only well-defined for
|
---|
416 | downloading a single document.
|
---|
417 |
|
---|
418 | `-nc'
|
---|
419 | `--no-clobber'
|
---|
420 | If a file is downloaded more than once in the same directory,
|
---|
421 | Wget's behavior depends on a few options, including `-nc'. In
|
---|
422 | certain cases, the local file will be "clobbered", or overwritten,
|
---|
423 | upon repeated download. In other cases it will be preserved.
|
---|
424 |
|
---|
425 | When running Wget without `-N', `-nc', or `-r', downloading the
|
---|
426 | same file in the same directory will result in the original copy
|
---|
427 | of FILE being preserved and the second copy being named `FILE.1'.
|
---|
428 | If that file is downloaded yet again, the third copy will be named
|
---|
429 | `FILE.2', and so on. When `-nc' is specified, this behavior is
|
---|
430 | suppressed, and Wget will refuse to download newer copies of
|
---|
431 | `FILE'. Therefore, "`no-clobber'" is actually a misnomer in this
|
---|
432 | mode--it's not clobbering that's prevented (as the numeric
|
---|
433 | suffixes were already preventing clobbering), but rather the
|
---|
434 | multiple version saving that's prevented.
|
---|
435 |
|
---|
436 | When running Wget with `-r', but without `-N' or `-nc',
|
---|
437 | re-downloading a file will result in the new copy simply
|
---|
438 | overwriting the old. Adding `-nc' will prevent this behavior,
|
---|
439 | instead causing the original version to be preserved and any newer
|
---|
440 | copies on the server to be ignored.
|
---|
441 |
|
---|
442 | When running Wget with `-N', with or without `-r', the decision as
|
---|
443 | to whether or not to download a newer copy of a file depends on
|
---|
444 | the local and remote timestamp and size of the file (*note
|
---|
445 | Time-Stamping::). `-nc' may not be specified at the same time as
|
---|
446 | `-N'.
|
---|
447 |
|
---|
448 | Note that when `-nc' is specified, files with the suffixes `.html'
|
---|
449 | or `.htm' will be loaded from the local disk and parsed as if they
|
---|
450 | had been retrieved from the Web.
|
---|
451 |
|
---|
452 | `-c'
|
---|
453 | `--continue'
|
---|
454 | Continue getting a partially-downloaded file. This is useful when
|
---|
455 | you want to finish up a download started by a previous instance of
|
---|
456 | Wget, or by another program. For instance:
|
---|
457 |
|
---|
458 | wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
|
---|
459 |
|
---|
460 | If there is a file named `ls-lR.Z' in the current directory, Wget
|
---|
461 | will assume that it is the first portion of the remote file, and
|
---|
462 | will ask the server to continue the retrieval from an offset equal
|
---|
463 | to the length of the local file.
|
---|
464 |
|
---|
465 | Note that you don't need to specify this option if you just want
|
---|
466 | the current invocation of Wget to retry downloading a file should
|
---|
467 | the connection be lost midway through. This is the default
|
---|
468 | behavior. `-c' only affects resumption of downloads started
|
---|
469 | _prior_ to this invocation of Wget, and whose local files are
|
---|
470 | still sitting around.
|
---|
471 |
|
---|
472 | Without `-c', the previous example would just download the remote
|
---|
473 | file to `ls-lR.Z.1', leaving the truncated `ls-lR.Z' file alone.
|
---|
474 |
|
---|
475 | Beginning with Wget 1.7, if you use `-c' on a non-empty file, and
|
---|
476 | it turns out that the server does not support continued
|
---|
477 | downloading, Wget will refuse to start the download from scratch,
|
---|
478 | which would effectively ruin existing contents. If you really
|
---|
479 | want the download to start from scratch, remove the file.
|
---|
480 |
|
---|
481 | Also beginning with Wget 1.7, if you use `-c' on a file which is of
|
---|
482 | equal size as the one on the server, Wget will refuse to download
|
---|
483 | the file and print an explanatory message. The same happens when
|
---|
484 | the file is smaller on the server than locally (presumably because
|
---|
485 | it was changed on the server since your last download
|
---|
486 | attempt)--because "continuing" is not meaningful, no download
|
---|
487 | occurs.
|
---|
488 |
|
---|
489 | On the other side of the coin, while using `-c', any file that's
|
---|
490 | bigger on the server than locally will be considered an incomplete
|
---|
491 | download and only `(length(remote) - length(local))' bytes will be
|
---|
492 | downloaded and tacked onto the end of the local file. This
|
---|
493 | behavior can be desirable in certain cases--for instance, you can
|
---|
494 | use `wget -c' to download just the new portion that's been
|
---|
495 | appended to a data collection or log file.
|
---|
496 |
|
---|
497 | However, if the file is bigger on the server because it's been
|
---|
498 | _changed_, as opposed to just _appended_ to, you'll end up with a
|
---|
499 | garbled file. Wget has no way of verifying that the local file is
|
---|
500 | really a valid prefix of the remote file. You need to be
|
---|
501 | especially careful of this when using `-c' in conjunction with
|
---|
502 | `-r', since every file will be considered as an "incomplete
|
---|
503 | download" candidate.
|
---|
504 |
|
---|
505 | Another instance where you'll get a garbled file if you try to use
|
---|
506 | `-c' is if you have a lame HTTP proxy that inserts a "transfer
|
---|
507 | interrupted" string into the local file. In the future a
|
---|
508 | "rollback" option may be added to deal with this case.
|
---|
509 |
|
---|
510 | Note that `-c' only works with FTP servers and with HTTP servers
|
---|
511 | that support the `Range' header.
|
---|
512 |
|
---|
513 | `--progress=TYPE'
|
---|
514 | Select the type of the progress indicator you wish to use. Legal
|
---|
515 | indicators are "dot" and "bar".
|
---|
516 |
|
---|
517 | The "bar" indicator is used by default. It draws an ASCII progress
|
---|
518 | bar graphics (a.k.a "thermometer" display) indicating the status of
|
---|
519 | retrieval. If the output is not a TTY, the "dot" bar will be used
|
---|
520 | by default.
|
---|
521 |
|
---|
522 | Use `--progress=dot' to switch to the "dot" display. It traces
|
---|
523 | the retrieval by printing dots on the screen, each dot
|
---|
524 | representing a fixed amount of downloaded data.
|
---|
525 |
|
---|
526 | When using the dotted retrieval, you may also set the "style" by
|
---|
527 | specifying the type as `dot:STYLE'. Different styles assign
|
---|
528 | different meaning to one dot. With the `default' style each dot
|
---|
529 | represents 1K, there are ten dots in a cluster and 50 dots in a
|
---|
530 | line. The `binary' style has a more "computer"-like
|
---|
531 | orientation--8K dots, 16-dots clusters and 48 dots per line (which
|
---|
532 | makes for 384K lines). The `mega' style is suitable for
|
---|
533 | downloading very large files--each dot represents 64K retrieved,
|
---|
534 | there are eight dots in a cluster, and 48 dots on each line (so
|
---|
535 | each line contains 3M).
|
---|
536 |
|
---|
537 | Note that you can set the default style using the `progress'
|
---|
538 | command in `.wgetrc'. That setting may be overridden from the
|
---|
539 | command line. The exception is that, when the output is not a
|
---|
540 | TTY, the "dot" progress will be favored over "bar". To force the
|
---|
541 | bar output, use `--progress=bar:force'.
|
---|
542 |
|
---|
543 | `-N'
|
---|
544 | `--timestamping'
|
---|
545 | Turn on time-stamping. *Note Time-Stamping::, for details.
|
---|
546 |
|
---|
547 | `-S'
|
---|
548 | `--server-response'
|
---|
549 | Print the headers sent by HTTP servers and responses sent by FTP
|
---|
550 | servers.
|
---|
551 |
|
---|
552 | `--spider'
|
---|
553 | When invoked with this option, Wget will behave as a Web "spider",
|
---|
554 | which means that it will not download the pages, just check that
|
---|
555 | they are there. For example, you can use Wget to check your
|
---|
556 | bookmarks:
|
---|
557 |
|
---|
558 | wget --spider --force-html -i bookmarks.html
|
---|
559 |
|
---|
560 | This feature needs much more work for Wget to get close to the
|
---|
561 | functionality of real web spiders.
|
---|
562 |
|
---|
563 | `-T seconds'
|
---|
564 | `--timeout=SECONDS'
|
---|
565 | Set the network timeout to SECONDS seconds. This is equivalent to
|
---|
566 | specifying `--dns-timeout', `--connect-timeout', and
|
---|
567 | `--read-timeout', all at the same time.
|
---|
568 |
|
---|
569 | When interacting with the network, Wget can check for timeout and
|
---|
570 | abort the operation if it takes too long. This prevents anomalies
|
---|
571 | like hanging reads and infinite connects. The only timeout
|
---|
572 | enabled by default is a 900-second read timeout. Setting a
|
---|
573 | timeout to 0 disables it altogether. Unless you know what you are
|
---|
574 | doing, it is best not to change the default timeout settings.
|
---|
575 |
|
---|
576 | All timeout-related options accept decimal values, as well as
|
---|
577 | subsecond values. For example, `0.1' seconds is a legal (though
|
---|
578 | unwise) choice of timeout. Subsecond timeouts are useful for
|
---|
579 | checking server response times or for testing network latency.
|
---|
580 |
|
---|
581 | `--dns-timeout=SECONDS'
|
---|
582 | Set the DNS lookup timeout to SECONDS seconds. DNS lookups that
|
---|
583 | don't complete within the specified time will fail. By default,
|
---|
584 | there is no timeout on DNS lookups, other than that implemented by
|
---|
585 | system libraries.
|
---|
586 |
|
---|
587 | `--connect-timeout=SECONDS'
|
---|
588 | Set the connect timeout to SECONDS seconds. TCP connections that
|
---|
589 | take longer to establish will be aborted. By default, there is no
|
---|
590 | connect timeout, other than that implemented by system libraries.
|
---|
591 |
|
---|
592 | `--read-timeout=SECONDS'
|
---|
593 | Set the read (and write) timeout to SECONDS seconds. The "time"
|
---|
594 | of this timeout refers "idle time": if, at any point in the
|
---|
595 | download, no data is received for more than the specified number
|
---|
596 | of seconds, reading fails and the download is restarted. This
|
---|
597 | option does not directly affect the duration of the entire
|
---|
598 | download.
|
---|
599 |
|
---|
600 | Of course, the remote server may choose to terminate the connection
|
---|
601 | sooner than this option requires. The default read timeout is 900
|
---|
602 | seconds.
|
---|
603 |
|
---|
604 | `--limit-rate=AMOUNT'
|
---|
605 | Limit the download speed to AMOUNT bytes per second. Amount may
|
---|
606 | be expressed in bytes, kilobytes with the `k' suffix, or megabytes
|
---|
607 | with the `m' suffix. For example, `--limit-rate=20k' will limit
|
---|
608 | the retrieval rate to 20KB/s. This is useful when, for whatever
|
---|
609 | reason, you don't want Wget to consume the entire available
|
---|
610 | bandwidth.
|
---|
611 |
|
---|
612 | This option allows the use of decimal numbers, usually in
|
---|
613 | conjunction with power suffixes; for example, `--limit-rate=2.5k'
|
---|
614 | is a legal value.
|
---|
615 |
|
---|
616 | Note that Wget implements the limiting by sleeping the appropriate
|
---|
617 | amount of time after a network read that took less time than
|
---|
618 | specified by the rate. Eventually this strategy causes the TCP
|
---|
619 | transfer to slow down to approximately the specified rate.
|
---|
620 | However, it may take some time for this balance to be achieved, so
|
---|
621 | don't be surprised if limiting the rate doesn't work well with
|
---|
622 | very small files.
|
---|
623 |
|
---|
624 | `-w SECONDS'
|
---|
625 | `--wait=SECONDS'
|
---|
626 | Wait the specified number of seconds between the retrievals. Use
|
---|
627 | of this option is recommended, as it lightens the server load by
|
---|
628 | making the requests less frequent. Instead of in seconds, the
|
---|
629 | time can be specified in minutes using the `m' suffix, in hours
|
---|
630 | using `h' suffix, or in days using `d' suffix.
|
---|
631 |
|
---|
632 | Specifying a large value for this option is useful if the network
|
---|
633 | or the destination host is down, so that Wget can wait long enough
|
---|
634 | to reasonably expect the network error to be fixed before the
|
---|
635 | retry.
|
---|
636 |
|
---|
637 | `--waitretry=SECONDS'
|
---|
638 | If you don't want Wget to wait between _every_ retrieval, but only
|
---|
639 | between retries of failed downloads, you can use this option.
|
---|
640 | Wget will use "linear backoff", waiting 1 second after the first
|
---|
641 | failure on a given file, then waiting 2 seconds after the second
|
---|
642 | failure on that file, up to the maximum number of SECONDS you
|
---|
643 | specify. Therefore, a value of 10 will actually make Wget wait up
|
---|
644 | to (1 + 2 + ... + 10) = 55 seconds per file.
|
---|
645 |
|
---|
646 | Note that this option is turned on by default in the global
|
---|
647 | `wgetrc' file.
|
---|
648 |
|
---|
649 | `--random-wait'
|
---|
650 | Some web sites may perform log analysis to identify retrieval
|
---|
651 | programs such as Wget by looking for statistically significant
|
---|
652 | similarities in the time between requests. This option causes the
|
---|
653 | time between requests to vary between 0 and 2 * WAIT seconds,
|
---|
654 | where WAIT was specified using the `--wait' option, in order to
|
---|
655 | mask Wget's presence from such analysis.
|
---|
656 |
|
---|
657 | A recent article in a publication devoted to development on a
|
---|
658 | popular consumer platform provided code to perform this analysis
|
---|
659 | on the fly. Its author suggested blocking at the class C address
|
---|
660 | level to ensure automated retrieval programs were blocked despite
|
---|
661 | changing DHCP-supplied addresses.
|
---|
662 |
|
---|
663 | The `--random-wait' option was inspired by this ill-advised
|
---|
664 | recommendation to block many unrelated users from a web site due
|
---|
665 | to the actions of one.
|
---|
666 |
|
---|
667 | `--no-proxy'
|
---|
668 | Don't use proxies, even if the appropriate `*_proxy' environment
|
---|
669 | variable is defined.
|
---|
670 |
|
---|
671 | For more information about the use of proxies with Wget, *Note
|
---|
672 | Proxies::.
|
---|
673 |
|
---|
674 | `-Q QUOTA'
|
---|
675 | `--quota=QUOTA'
|
---|
676 | Specify download quota for automatic retrievals. The value can be
|
---|
677 | specified in bytes (default), kilobytes (with `k' suffix), or
|
---|
678 | megabytes (with `m' suffix).
|
---|
679 |
|
---|
680 | Note that quota will never affect downloading a single file. So
|
---|
681 | if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz',
|
---|
682 | all of the `ls-lR.gz' will be downloaded. The same goes even when
|
---|
683 | several URLs are specified on the command-line. However, quota is
|
---|
684 | respected when retrieving either recursively, or from an input
|
---|
685 | file. Thus you may safely type `wget -Q2m -i sites'--download
|
---|
686 | will be aborted when the quota is exceeded.
|
---|
687 |
|
---|
688 | Setting quota to 0 or to `inf' unlimits the download quota.
|
---|
689 |
|
---|
690 | `--no-dns-cache'
|
---|
691 | Turn off caching of DNS lookups. Normally, Wget remembers the IP
|
---|
692 | addresses it looked up from DNS so it doesn't have to repeatedly
|
---|
693 | contact the DNS server for the same (typically small) set of hosts
|
---|
694 | it retrieves from. This cache exists in memory only; a new Wget
|
---|
695 | run will contact DNS again.
|
---|
696 |
|
---|
697 | However, it has been reported that in some situations it is not
|
---|
698 | desirable to cache host names, even for the duration of a
|
---|
699 | short-running application like Wget. With this option Wget issues
|
---|
700 | a new DNS lookup (more precisely, a new call to `gethostbyname' or
|
---|
701 | `getaddrinfo') each time it makes a new connection. Please note
|
---|
702 | that this option will _not_ affect caching that might be performed
|
---|
703 | by the resolving library or by an external caching layer, such as
|
---|
704 | NSCD.
|
---|
705 |
|
---|
706 | If you don't understand exactly what this option does, you probably
|
---|
707 | won't need it.
|
---|
708 |
|
---|
709 | `--restrict-file-names=MODE'
|
---|
710 | Change which characters found in remote URLs may show up in local
|
---|
711 | file names generated from those URLs. Characters that are
|
---|
712 | "restricted" by this option are escaped, i.e. replaced with `%HH',
|
---|
713 | where `HH' is the hexadecimal number that corresponds to the
|
---|
714 | restricted character.
|
---|
715 |
|
---|
716 | By default, Wget escapes the characters that are not valid as part
|
---|
717 | of file names on your operating system, as well as control
|
---|
718 | characters that are typically unprintable. This option is useful
|
---|
719 | for changing these defaults, either because you are downloading to
|
---|
720 | a non-native partition, or because you want to disable escaping of
|
---|
721 | the control characters.
|
---|
722 |
|
---|
723 | When mode is set to "unix", Wget escapes the character `/' and the
|
---|
724 | control characters in the ranges 0-31 and 128-159. This is the
|
---|
725 | default on Unix-like OS'es.
|
---|
726 |
|
---|
727 | When mode is set to "windows", Wget escapes the characters `\',
|
---|
728 | `|', `/', `:', `?', `"', `*', `<', `>', and the control characters
|
---|
729 | in the ranges 0-31 and 128-159. In addition to this, Wget in
|
---|
730 | Windows mode uses `+' instead of `:' to separate host and port in
|
---|
731 | local file names, and uses `@' instead of `?' to separate the
|
---|
732 | query portion of the file name from the rest. Therefore, a URL
|
---|
733 | that would be saved as `www.xemacs.org:4300/search.pl?input=blah'
|
---|
734 | in Unix mode would be saved as
|
---|
735 | `www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This
|
---|
736 | mode is the default on Windows.
|
---|
737 |
|
---|
738 | If you append `,nocontrol' to the mode, as in `unix,nocontrol',
|
---|
739 | escaping of the control characters is also switched off. You can
|
---|
740 | use `--restrict-file-names=nocontrol' to turn off escaping of
|
---|
741 | control characters without affecting the choice of the OS to use
|
---|
742 | as file name restriction mode.
|
---|
743 |
|
---|
744 | `-4'
|
---|
745 | `--inet4-only'
|
---|
746 | `-6'
|
---|
747 | `--inet6-only'
|
---|
748 | Force connecting to IPv4 or IPv6 addresses. With `--inet4-only'
|
---|
749 | or `-4', Wget will only connect to IPv4 hosts, ignoring AAAA
|
---|
750 | records in DNS, and refusing to connect to IPv6 addresses
|
---|
751 | specified in URLs. Conversely, with `--inet6-only' or `-6', Wget
|
---|
752 | will only connect to IPv6 hosts and ignore A records and IPv4
|
---|
753 | addresses.
|
---|
754 |
|
---|
755 | Neither options should be needed normally. By default, an
|
---|
756 | IPv6-aware Wget will use the address family specified by the
|
---|
757 | host's DNS record. If the DNS responds with both IPv4 and IPv6
|
---|
758 | addresses, Wget will them in sequence until it finds one it can
|
---|
759 | connect to. (Also see `--prefer-family' option described below.)
|
---|
760 |
|
---|
761 | These options can be used to deliberately force the use of IPv4 or
|
---|
762 | IPv6 address families on dual family systems, usually to aid
|
---|
763 | debugging or to deal with broken network configuration. Only one
|
---|
764 | of `--inet6-only' and `--inet4-only' may be specified at the same
|
---|
765 | time. Neither option is available in Wget compiled without IPv6
|
---|
766 | support.
|
---|
767 |
|
---|
768 | `--prefer-family=IPv4/IPv6/none'
|
---|
769 | When given a choice of several addresses, connect to the addresses
|
---|
770 | with specified address family first. IPv4 addresses are preferred
|
---|
771 | by default.
|
---|
772 |
|
---|
773 | This avoids spurious errors and connect attempts when accessing
|
---|
774 | hosts that resolve to both IPv6 and IPv4 addresses from IPv4
|
---|
775 | networks. For example, `www.kame.net' resolves to
|
---|
776 | `2001:200:0:8002:203:47ff:fea5:3085' and to `203.178.141.194'.
|
---|
777 | When the preferred family is `IPv4', the IPv4 address is used
|
---|
778 | first; when the preferred family is `IPv6', the IPv6 address is
|
---|
779 | used first; if the specified value is `none', the address order
|
---|
780 | returned by DNS is used without change.
|
---|
781 |
|
---|
782 | Unlike `-4' and `-6', this option doesn't inhibit access to any
|
---|
783 | address family, it only changes the _order_ in which the addresses
|
---|
784 | are accessed. Also note that the reordering performed by this
|
---|
785 | option is "stable"--it doesn't affect order of addresses of the
|
---|
786 | same family. That is, the relative order of all IPv4 addresses
|
---|
787 | and of all IPv6 addresses remains intact in all cases.
|
---|
788 |
|
---|
789 | `--retry-connrefused'
|
---|
790 | Consider "connection refused" a transient error and try again.
|
---|
791 | Normally Wget gives up on a URL when it is unable to connect to the
|
---|
792 | site because failure to connect is taken as a sign that the server
|
---|
793 | is not running at all and that retries would not help. This
|
---|
794 | option is for mirroring unreliable sites whose servers tend to
|
---|
795 | disappear for short periods of time.
|
---|
796 |
|
---|
797 | `--user=USER'
|
---|
798 | `--password=PASSWORD'
|
---|
799 | Specify the username USER and password PASSWORD for both FTP and
|
---|
800 | HTTP file retrieval. These parameters can be overridden using the
|
---|
801 | `--ftp-user' and `--ftp-password' options for FTP connections and
|
---|
802 | the `--http-user' and `--http-password' options for HTTP
|
---|
803 | connections.
|
---|
804 |
|
---|
805 |
|
---|
806 | File: wget.info, Node: Directory Options, Next: HTTP Options, Prev: Download Options, Up: Invoking
|
---|
807 |
|
---|
808 | 2.6 Directory Options
|
---|
809 | =====================
|
---|
810 |
|
---|
811 | `-nd'
|
---|
812 | `--no-directories'
|
---|
813 | Do not create a hierarchy of directories when retrieving
|
---|
814 | recursively. With this option turned on, all files will get saved
|
---|
815 | to the current directory, without clobbering (if a name shows up
|
---|
816 | more than once, the filenames will get extensions `.n').
|
---|
817 |
|
---|
818 | `-x'
|
---|
819 | `--force-directories'
|
---|
820 | The opposite of `-nd'--create a hierarchy of directories, even if
|
---|
821 | one would not have been created otherwise. E.g. `wget -x
|
---|
822 | http://fly.srk.fer.hr/robots.txt' will save the downloaded file to
|
---|
823 | `fly.srk.fer.hr/robots.txt'.
|
---|
824 |
|
---|
825 | `-nH'
|
---|
826 | `--no-host-directories'
|
---|
827 | Disable generation of host-prefixed directories. By default,
|
---|
828 | invoking Wget with `-r http://fly.srk.fer.hr/' will create a
|
---|
829 | structure of directories beginning with `fly.srk.fer.hr/'. This
|
---|
830 | option disables such behavior.
|
---|
831 |
|
---|
832 | `--protocol-directories'
|
---|
833 | Use the protocol name as a directory component of local file
|
---|
834 | names. For example, with this option, `wget -r http://HOST' will
|
---|
835 | save to `http/HOST/...' rather than just to `HOST/...'.
|
---|
836 |
|
---|
837 | `--cut-dirs=NUMBER'
|
---|
838 | Ignore NUMBER directory components. This is useful for getting a
|
---|
839 | fine-grained control over the directory where recursive retrieval
|
---|
840 | will be saved.
|
---|
841 |
|
---|
842 | Take, for example, the directory at
|
---|
843 | `ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with `-r',
|
---|
844 | it will be saved locally under `ftp.xemacs.org/pub/xemacs/'.
|
---|
845 | While the `-nH' option can remove the `ftp.xemacs.org/' part, you
|
---|
846 | are still stuck with `pub/xemacs'. This is where `--cut-dirs'
|
---|
847 | comes in handy; it makes Wget not "see" NUMBER remote directory
|
---|
848 | components. Here are several examples of how `--cut-dirs' option
|
---|
849 | works.
|
---|
850 |
|
---|
851 | No options -> ftp.xemacs.org/pub/xemacs/
|
---|
852 | -nH -> pub/xemacs/
|
---|
853 | -nH --cut-dirs=1 -> xemacs/
|
---|
854 | -nH --cut-dirs=2 -> .
|
---|
855 |
|
---|
856 | --cut-dirs=1 -> ftp.xemacs.org/xemacs/
|
---|
857 | ...
|
---|
858 |
|
---|
859 | If you just want to get rid of the directory structure, this
|
---|
860 | option is similar to a combination of `-nd' and `-P'. However,
|
---|
861 | unlike `-nd', `--cut-dirs' does not lose with subdirectories--for
|
---|
862 | instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be
|
---|
863 | placed to `xemacs/beta', as one would expect.
|
---|
864 |
|
---|
865 | `-P PREFIX'
|
---|
866 | `--directory-prefix=PREFIX'
|
---|
867 | Set directory prefix to PREFIX. The "directory prefix" is the
|
---|
868 | directory where all other files and subdirectories will be saved
|
---|
869 | to, i.e. the top of the retrieval tree. The default is `.' (the
|
---|
870 | current directory).
|
---|
871 |
|
---|
872 |
|
---|
873 | File: wget.info, Node: HTTP Options, Next: HTTPS (SSL/TLS) Options, Prev: Directory Options, Up: Invoking
|
---|
874 |
|
---|
875 | 2.7 HTTP Options
|
---|
876 | ================
|
---|
877 |
|
---|
878 | `-E'
|
---|
879 | `--html-extension'
|
---|
880 | If a file of type `application/xhtml+xml' or `text/html' is
|
---|
881 | downloaded and the URL does not end with the regexp
|
---|
882 | `\.[Hh][Tt][Mm][Ll]?', this option will cause the suffix `.html'
|
---|
883 | to be appended to the local filename. This is useful, for
|
---|
884 | instance, when you're mirroring a remote site that uses `.asp'
|
---|
885 | pages, but you want the mirrored pages to be viewable on your
|
---|
886 | stock Apache server. Another good use for this is when you're
|
---|
887 | downloading CGI-generated materials. A URL like
|
---|
888 | `http://site.com/article.cgi?25' will be saved as
|
---|
889 | `article.cgi?25.html'.
|
---|
890 |
|
---|
891 | Note that filenames changed in this way will be re-downloaded
|
---|
892 | every time you re-mirror a site, because Wget can't tell that the
|
---|
893 | local `X.html' file corresponds to remote URL `X' (since it
|
---|
894 | doesn't yet know that the URL produces output of type `text/html'
|
---|
895 | or `application/xhtml+xml'. To prevent this re-downloading, you
|
---|
896 | must use `-k' and `-K' so that the original version of the file
|
---|
897 | will be saved as `X.orig' (*note Recursive Retrieval Options::).
|
---|
898 |
|
---|
899 | `--http-user=USER'
|
---|
900 | `--http-password=PASSWORD'
|
---|
901 | Specify the username USER and password PASSWORD on an HTTP server.
|
---|
902 | According to the type of the challenge, Wget will encode them
|
---|
903 | using either the `basic' (insecure) or the `digest' authentication
|
---|
904 | scheme.
|
---|
905 |
|
---|
906 | Another way to specify username and password is in the URL itself
|
---|
907 | (*note URL Format::). Either method reveals your password to
|
---|
908 | anyone who bothers to run `ps'. To prevent the passwords from
|
---|
909 | being seen, store them in `.wgetrc' or `.netrc', and make sure to
|
---|
910 | protect those files from other users with `chmod'. If the
|
---|
911 | passwords are really important, do not leave them lying in those
|
---|
912 | files either--edit the files and delete them after Wget has
|
---|
913 | started the download.
|
---|
914 |
|
---|
915 | `--no-cache'
|
---|
916 | Disable server-side cache. In this case, Wget will send the remote
|
---|
917 | server an appropriate directive (`Pragma: no-cache') to get the
|
---|
918 | file from the remote service, rather than returning the cached
|
---|
919 | version. This is especially useful for retrieving and flushing
|
---|
920 | out-of-date documents on proxy servers.
|
---|
921 |
|
---|
922 | Caching is allowed by default.
|
---|
923 |
|
---|
924 | `--no-cookies'
|
---|
925 | Disable the use of cookies. Cookies are a mechanism for
|
---|
926 | maintaining server-side state. The server sends the client a
|
---|
927 | cookie using the `Set-Cookie' header, and the client responds with
|
---|
928 | the same cookie upon further requests. Since cookies allow the
|
---|
929 | server owners to keep track of visitors and for sites to exchange
|
---|
930 | this information, some consider them a breach of privacy. The
|
---|
931 | default is to use cookies; however, _storing_ cookies is not on by
|
---|
932 | default.
|
---|
933 |
|
---|
934 | `--load-cookies FILE'
|
---|
935 | Load cookies from FILE before the first HTTP retrieval. FILE is a
|
---|
936 | textual file in the format originally used by Netscape's
|
---|
937 | `cookies.txt' file.
|
---|
938 |
|
---|
939 | You will typically use this option when mirroring sites that
|
---|
940 | require that you be logged in to access some or all of their
|
---|
941 | content. The login process typically works by the web server
|
---|
942 | issuing an HTTP cookie upon receiving and verifying your
|
---|
943 | credentials. The cookie is then resent by the browser when
|
---|
944 | accessing that part of the site, and so proves your identity.
|
---|
945 |
|
---|
946 | Mirroring such a site requires Wget to send the same cookies your
|
---|
947 | browser sends when communicating with the site. This is achieved
|
---|
948 | by `--load-cookies'--simply point Wget to the location of the
|
---|
949 | `cookies.txt' file, and it will send the same cookies your browser
|
---|
950 | would send in the same situation. Different browsers keep textual
|
---|
951 | cookie files in different locations:
|
---|
952 |
|
---|
953 | Netscape 4.x.
|
---|
954 | The cookies are in `~/.netscape/cookies.txt'.
|
---|
955 |
|
---|
956 | Mozilla and Netscape 6.x.
|
---|
957 | Mozilla's cookie file is also named `cookies.txt', located
|
---|
958 | somewhere under `~/.mozilla', in the directory of your
|
---|
959 | profile. The full path usually ends up looking somewhat like
|
---|
960 | `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.
|
---|
961 |
|
---|
962 | Internet Explorer.
|
---|
963 | You can produce a cookie file Wget can use by using the File
|
---|
964 | menu, Import and Export, Export Cookies. This has been
|
---|
965 | tested with Internet Explorer 5; it is not guaranteed to work
|
---|
966 | with earlier versions.
|
---|
967 |
|
---|
968 | Other browsers.
|
---|
969 | If you are using a different browser to create your cookies,
|
---|
970 | `--load-cookies' will only work if you can locate or produce a
|
---|
971 | cookie file in the Netscape format that Wget expects.
|
---|
972 |
|
---|
973 | If you cannot use `--load-cookies', there might still be an
|
---|
974 | alternative. If your browser supports a "cookie manager", you can
|
---|
975 | use it to view the cookies used when accessing the site you're
|
---|
976 | mirroring. Write down the name and value of the cookie, and
|
---|
977 | manually instruct Wget to send those cookies, bypassing the
|
---|
978 | "official" cookie support:
|
---|
979 |
|
---|
980 | wget --no-cookies --header "Cookie: NAME=VALUE"
|
---|
981 |
|
---|
982 | `--save-cookies FILE'
|
---|
983 | Save cookies to FILE before exiting. This will not save cookies
|
---|
984 | that have expired or that have no expiry time (so-called "session
|
---|
985 | cookies"), but also see `--keep-session-cookies'.
|
---|
986 |
|
---|
987 | `--keep-session-cookies'
|
---|
988 | When specified, causes `--save-cookies' to also save session
|
---|
989 | cookies. Session cookies are normally not saved because they are
|
---|
990 | meant to be kept in memory and forgotten when you exit the browser.
|
---|
991 | Saving them is useful on sites that require you to log in or to
|
---|
992 | visit the home page before you can access some pages. With this
|
---|
993 | option, multiple Wget runs are considered a single browser session
|
---|
994 | as far as the site is concerned.
|
---|
995 |
|
---|
996 | Since the cookie file format does not normally carry session
|
---|
997 | cookies, Wget marks them with an expiry timestamp of 0. Wget's
|
---|
998 | `--load-cookies' recognizes those as session cookies, but it might
|
---|
999 | confuse other browsers. Also note that cookies so loaded will be
|
---|
1000 | treated as other session cookies, which means that if you want
|
---|
1001 | `--save-cookies' to preserve them again, you must use
|
---|
1002 | `--keep-session-cookies' again.
|
---|
1003 |
|
---|
1004 | `--ignore-length'
|
---|
1005 | Unfortunately, some HTTP servers (CGI programs, to be more
|
---|
1006 | precise) send out bogus `Content-Length' headers, which makes Wget
|
---|
1007 | go wild, as it thinks not all the document was retrieved. You can
|
---|
1008 | spot this syndrome if Wget retries getting the same document again
|
---|
1009 | and again, each time claiming that the (otherwise normal)
|
---|
1010 | connection has closed on the very same byte.
|
---|
1011 |
|
---|
1012 | With this option, Wget will ignore the `Content-Length' header--as
|
---|
1013 | if it never existed.
|
---|
1014 |
|
---|
1015 | `--header=HEADER-LINE'
|
---|
1016 | Send HEADER-LINE along with the rest of the headers in each HTTP
|
---|
1017 | request. The supplied header is sent as-is, which means it must
|
---|
1018 | contain name and value separated by colon, and must not contain
|
---|
1019 | newlines.
|
---|
1020 |
|
---|
1021 | You may define more than one additional header by specifying
|
---|
1022 | `--header' more than once.
|
---|
1023 |
|
---|
1024 | wget --header='Accept-Charset: iso-8859-2' \
|
---|
1025 | --header='Accept-Language: hr' \
|
---|
1026 | http://fly.srk.fer.hr/
|
---|
1027 |
|
---|
1028 | Specification of an empty string as the header value will clear all
|
---|
1029 | previous user-defined headers.
|
---|
1030 |
|
---|
1031 | As of Wget 1.10, this option can be used to override headers
|
---|
1032 | otherwise generated automatically. This example instructs Wget to
|
---|
1033 | connect to localhost, but to specify `foo.bar' in the `Host'
|
---|
1034 | header:
|
---|
1035 |
|
---|
1036 | wget --header="Host: foo.bar" http://localhost/
|
---|
1037 |
|
---|
1038 | In versions of Wget prior to 1.10 such use of `--header' caused
|
---|
1039 | sending of duplicate headers.
|
---|
1040 |
|
---|
1041 | `--proxy-user=USER'
|
---|
1042 | `--proxy-password=PASSWORD'
|
---|
1043 | Specify the username USER and password PASSWORD for authentication
|
---|
1044 | on a proxy server. Wget will encode them using the `basic'
|
---|
1045 | authentication scheme.
|
---|
1046 |
|
---|
1047 | Security considerations similar to those with `--http-password'
|
---|
1048 | pertain here as well.
|
---|
1049 |
|
---|
1050 | `--referer=URL'
|
---|
1051 | Include `Referer: URL' header in HTTP request. Useful for
|
---|
1052 | retrieving documents with server-side processing that assume they
|
---|
1053 | are always being retrieved by interactive web browsers and only
|
---|
1054 | come out properly when Referer is set to one of the pages that
|
---|
1055 | point to them.
|
---|
1056 |
|
---|
1057 | `--save-headers'
|
---|
1058 | Save the headers sent by the HTTP server to the file, preceding the
|
---|
1059 | actual contents, with an empty line as the separator.
|
---|
1060 |
|
---|
1061 | `-U AGENT-STRING'
|
---|
1062 | `--user-agent=AGENT-STRING'
|
---|
1063 | Identify as AGENT-STRING to the HTTP server.
|
---|
1064 |
|
---|
1065 | The HTTP protocol allows the clients to identify themselves using a
|
---|
1066 | `User-Agent' header field. This enables distinguishing the WWW
|
---|
1067 | software, usually for statistical purposes or for tracing of
|
---|
1068 | protocol violations. Wget normally identifies as `Wget/VERSION',
|
---|
1069 | VERSION being the current version number of Wget.
|
---|
1070 |
|
---|
1071 | However, some sites have been known to impose the policy of
|
---|
1072 | tailoring the output according to the `User-Agent'-supplied
|
---|
1073 | information. While this is not such a bad idea in theory, it has
|
---|
1074 | been abused by servers denying information to clients other than
|
---|
1075 | (historically) Netscape or, more frequently, Microsoft Internet
|
---|
1076 | Explorer. This option allows you to change the `User-Agent' line
|
---|
1077 | issued by Wget. Use of this option is discouraged, unless you
|
---|
1078 | really know what you are doing.
|
---|
1079 |
|
---|
1080 | Specifying empty user agent with `--user-agent=""' instructs Wget
|
---|
1081 | not to send the `User-Agent' header in HTTP requests.
|
---|
1082 |
|
---|
1083 | `--post-data=STRING'
|
---|
1084 | `--post-file=FILE'
|
---|
1085 | Use POST as the method for all HTTP requests and send the
|
---|
1086 | specified data in the request body. `--post-data' sends STRING as
|
---|
1087 | data, whereas `--post-file' sends the contents of FILE. Other than
|
---|
1088 | that, they work in exactly the same way.
|
---|
1089 |
|
---|
1090 | Please be aware that Wget needs to know the size of the POST data
|
---|
1091 | in advance. Therefore the argument to `--post-file' must be a
|
---|
1092 | regular file; specifying a FIFO or something like `/dev/stdin'
|
---|
1093 | won't work. It's not quite clear how to work around this
|
---|
1094 | limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces
|
---|
1095 | "chunked" transfer that doesn't require knowing the request length
|
---|
1096 | in advance, a client can't use chunked unless it knows it's
|
---|
1097 | talking to an HTTP/1.1 server. And it can't know that until it
|
---|
1098 | receives a response, which in turn requires the request to have
|
---|
1099 | been completed - a chicken-and-egg problem.
|
---|
1100 |
|
---|
1101 | Note: if Wget is redirected after the POST request is completed, it
|
---|
1102 | will not send the POST data to the redirected URL. This is because
|
---|
1103 | URLs that process POST often respond with a redirection to a
|
---|
1104 | regular page, which does not desire or accept POST. It is not
|
---|
1105 | completely clear that this behavior is optimal; if it doesn't work
|
---|
1106 | out, it might be changed in the future.
|
---|
1107 |
|
---|
1108 | This example shows how to log to a server using POST and then
|
---|
1109 | proceed to download the desired pages, presumably only accessible
|
---|
1110 | to authorized users:
|
---|
1111 |
|
---|
1112 | # Log in to the server. This can be done only once.
|
---|
1113 | wget --save-cookies cookies.txt \
|
---|
1114 | --post-data 'user=foo&password=bar' \
|
---|
1115 | http://server.com/auth.php
|
---|
1116 |
|
---|
1117 | # Now grab the page or pages we care about.
|
---|
1118 | wget --load-cookies cookies.txt \
|
---|
1119 | -p http://server.com/interesting/article.php
|
---|
1120 |
|
---|
1121 | If the server is using session cookies to track user
|
---|
1122 | authentication, the above will not work because `--save-cookies'
|
---|
1123 | will not save them (and neither will browsers) and the
|
---|
1124 | `cookies.txt' file will be empty. In that case use
|
---|
1125 | `--keep-session-cookies' along with `--save-cookies' to force
|
---|
1126 | saving of session cookies.
|
---|
1127 |
|
---|
1128 |
|
---|
1129 | File: wget.info, Node: HTTPS (SSL/TLS) Options, Next: FTP Options, Prev: HTTP Options, Up: Invoking
|
---|
1130 |
|
---|
1131 | 2.8 HTTPS (SSL/TLS) Options
|
---|
1132 | ===========================
|
---|
1133 |
|
---|
1134 | To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with
|
---|
1135 | an external SSL library, currently OpenSSL. If Wget is compiled
|
---|
1136 | without SSL support, none of these options are available.
|
---|
1137 |
|
---|
1138 | `--secure-protocol=PROTOCOL'
|
---|
1139 | Choose the secure protocol to be used. Legal values are `auto',
|
---|
1140 | `SSLv2', `SSLv3', and `TLSv1'. If `auto' is used, the SSL library
|
---|
1141 | is given the liberty of choosing the appropriate protocol
|
---|
1142 | automatically, which is achieved by sending an SSLv2 greeting and
|
---|
1143 | announcing support for SSLv3 and TLSv1. This is the default.
|
---|
1144 |
|
---|
1145 | Specifying `SSLv2', `SSLv3', or `TLSv1' forces the use of the
|
---|
1146 | corresponding protocol. This is useful when talking to old and
|
---|
1147 | buggy SSL server implementations that make it hard for OpenSSL to
|
---|
1148 | choose the correct protocol version. Fortunately, such servers are
|
---|
1149 | quite rare.
|
---|
1150 |
|
---|
1151 | `--no-check-certificate'
|
---|
1152 | Don't check the server certificate against the available
|
---|
1153 | certificate authorities. Also don't require the URL host name to
|
---|
1154 | match the common name presented by the certificate.
|
---|
1155 |
|
---|
1156 | As of Wget 1.10, the default is to verify the server's certificate
|
---|
1157 | against the recognized certificate authorities, breaking the SSL
|
---|
1158 | handshake and aborting the download if the verification fails.
|
---|
1159 | Although this provides more secure downloads, it does break
|
---|
1160 | interoperability with some sites that worked with previous Wget
|
---|
1161 | versions, particularly those using self-signed, expired, or
|
---|
1162 | otherwise invalid certificates. This option forces an "insecure"
|
---|
1163 | mode of operation that turns the certificate verification errors
|
---|
1164 | into warnings and allows you to proceed.
|
---|
1165 |
|
---|
1166 | If you encounter "certificate verification" errors or ones saying
|
---|
1167 | that "common name doesn't match requested host name", you can use
|
---|
1168 | this option to bypass the verification and proceed with the
|
---|
1169 | download. _Only use this option if you are otherwise convinced of
|
---|
1170 | the site's authenticity, or if you really don't care about the
|
---|
1171 | validity of its certificate._ It is almost always a bad idea not
|
---|
1172 | to check the certificates when transmitting confidential or
|
---|
1173 | important data.
|
---|
1174 |
|
---|
1175 | `--certificate=FILE'
|
---|
1176 | Use the client certificate stored in FILE. This is needed for
|
---|
1177 | servers that are configured to require certificates from the
|
---|
1178 | clients that connect to them. Normally a certificate is not
|
---|
1179 | required and this switch is optional.
|
---|
1180 |
|
---|
1181 | `--certificate-type=TYPE'
|
---|
1182 | Specify the type of the client certificate. Legal values are
|
---|
1183 | `PEM' (assumed by default) and `DER', also known as `ASN1'.
|
---|
1184 |
|
---|
1185 | `--private-key=FILE'
|
---|
1186 | Read the private key from FILE. This allows you to provide the
|
---|
1187 | private key in a file separate from the certificate.
|
---|
1188 |
|
---|
1189 | `--private-key-type=TYPE'
|
---|
1190 | Specify the type of the private key. Accepted values are `PEM'
|
---|
1191 | (the default) and `DER'.
|
---|
1192 |
|
---|
1193 | `--ca-certificate=FILE'
|
---|
1194 | Use FILE as the file with the bundle of certificate authorities
|
---|
1195 | ("CA") to verify the peers. The certificates must be in PEM
|
---|
1196 | format.
|
---|
1197 |
|
---|
1198 | Without this option Wget looks for CA certificates at the
|
---|
1199 | system-specified locations, chosen at OpenSSL installation time.
|
---|
1200 |
|
---|
1201 | `--ca-directory=DIRECTORY'
|
---|
1202 | Specifies directory containing CA certificates in PEM format. Each
|
---|
1203 | file contains one CA certificate, and the file name is based on a
|
---|
1204 | hash value derived from the certificate. This is achieved by
|
---|
1205 | processing a certificate directory with the `c_rehash' utility
|
---|
1206 | supplied with OpenSSL. Using `--ca-directory' is more efficient
|
---|
1207 | than `--ca-certificate' when many certificates are installed
|
---|
1208 | because it allows Wget to fetch certificates on demand.
|
---|
1209 |
|
---|
1210 | Without this option Wget looks for CA certificates at the
|
---|
1211 | system-specified locations, chosen at OpenSSL installation time.
|
---|
1212 |
|
---|
1213 | `--random-file=FILE'
|
---|
1214 | Use FILE as the source of random data for seeding the
|
---|
1215 | pseudo-random number generator on systems without `/dev/random'.
|
---|
1216 |
|
---|
1217 | On such systems the SSL library needs an external source of
|
---|
1218 | randomness to initialize. Randomness may be provided by EGD (see
|
---|
1219 | `--egd-file' below) or read from an external source specified by
|
---|
1220 | the user. If this option is not specified, Wget looks for random
|
---|
1221 | data in `$RANDFILE' or, if that is unset, in `$HOME/.rnd'. If
|
---|
1222 | none of those are available, it is likely that SSL encryption will
|
---|
1223 | not be usable.
|
---|
1224 |
|
---|
1225 | If you're getting the "Could not seed OpenSSL PRNG; disabling SSL."
|
---|
1226 | error, you should provide random data using some of the methods
|
---|
1227 | described above.
|
---|
1228 |
|
---|
1229 | `--egd-file=FILE'
|
---|
1230 | Use FILE as the EGD socket. EGD stands for "Entropy Gathering
|
---|
1231 | Daemon", a user-space program that collects data from various
|
---|
1232 | unpredictable system sources and makes it available to other
|
---|
1233 | programs that might need it. Encryption software, such as the SSL
|
---|
1234 | library, needs sources of non-repeating randomness to seed the
|
---|
1235 | random number generator used to produce cryptographically strong
|
---|
1236 | keys.
|
---|
1237 |
|
---|
1238 | OpenSSL allows the user to specify his own source of entropy using
|
---|
1239 | the `RAND_FILE' environment variable. If this variable is unset,
|
---|
1240 | or if the specified file does not produce enough randomness,
|
---|
1241 | OpenSSL will read random data from EGD socket specified using this
|
---|
1242 | option.
|
---|
1243 |
|
---|
1244 | If this option is not specified (and the equivalent startup
|
---|
1245 | command is not used), EGD is never contacted. EGD is not needed
|
---|
1246 | on modern Unix systems that support `/dev/random'.
|
---|
1247 |
|
---|
1248 |
|
---|
1249 | File: wget.info, Node: FTP Options, Next: Recursive Retrieval Options, Prev: HTTPS (SSL/TLS) Options, Up: Invoking
|
---|
1250 |
|
---|
1251 | 2.9 FTP Options
|
---|
1252 | ===============
|
---|
1253 |
|
---|
1254 | `--ftp-user=USER'
|
---|
1255 | `--ftp-password=PASSWORD'
|
---|
1256 | Specify the username USER and password PASSWORD on an FTP server.
|
---|
1257 | Without this, or the corresponding startup option, the password
|
---|
1258 | defaults to `-wget@', normally used for anonymous FTP.
|
---|
1259 |
|
---|
1260 | Another way to specify username and password is in the URL itself
|
---|
1261 | (*note URL Format::). Either method reveals your password to
|
---|
1262 | anyone who bothers to run `ps'. To prevent the passwords from
|
---|
1263 | being seen, store them in `.wgetrc' or `.netrc', and make sure to
|
---|
1264 | protect those files from other users with `chmod'. If the
|
---|
1265 | passwords are really important, do not leave them lying in those
|
---|
1266 | files either--edit the files and delete them after Wget has
|
---|
1267 | started the download.
|
---|
1268 |
|
---|
1269 | `--no-remove-listing'
|
---|
1270 | Don't remove the temporary `.listing' files generated by FTP
|
---|
1271 | retrievals. Normally, these files contain the raw directory
|
---|
1272 | listings received from FTP servers. Not removing them can be
|
---|
1273 | useful for debugging purposes, or when you want to be able to
|
---|
1274 | easily check on the contents of remote server directories (e.g. to
|
---|
1275 | verify that a mirror you're running is complete).
|
---|
1276 |
|
---|
1277 | Note that even though Wget writes to a known filename for this
|
---|
1278 | file, this is not a security hole in the scenario of a user making
|
---|
1279 | `.listing' a symbolic link to `/etc/passwd' or something and
|
---|
1280 | asking `root' to run Wget in his or her directory. Depending on
|
---|
1281 | the options used, either Wget will refuse to write to `.listing',
|
---|
1282 | making the globbing/recursion/time-stamping operation fail, or the
|
---|
1283 | symbolic link will be deleted and replaced with the actual
|
---|
1284 | `.listing' file, or the listing will be written to a
|
---|
1285 | `.listing.NUMBER' file.
|
---|
1286 |
|
---|
1287 | Even though this situation isn't a problem, though, `root' should
|
---|
1288 | never run Wget in a non-trusted user's directory. A user could do
|
---|
1289 | something as simple as linking `index.html' to `/etc/passwd' and
|
---|
1290 | asking `root' to run Wget with `-N' or `-r' so the file will be
|
---|
1291 | overwritten.
|
---|
1292 |
|
---|
1293 | `--no-glob'
|
---|
1294 | Turn off FTP globbing. Globbing refers to the use of shell-like
|
---|
1295 | special characters ("wildcards"), like `*', `?', `[' and `]' to
|
---|
1296 | retrieve more than one file from the same directory at once, like:
|
---|
1297 |
|
---|
1298 | wget ftp://gnjilux.srk.fer.hr/*.msg
|
---|
1299 |
|
---|
1300 | By default, globbing will be turned on if the URL contains a
|
---|
1301 | globbing character. This option may be used to turn globbing on
|
---|
1302 | or off permanently.
|
---|
1303 |
|
---|
1304 | You may have to quote the URL to protect it from being expanded by
|
---|
1305 | your shell. Globbing makes Wget look for a directory listing,
|
---|
1306 | which is system-specific. This is why it currently works only
|
---|
1307 | with Unix FTP servers (and the ones emulating Unix `ls' output).
|
---|
1308 |
|
---|
1309 | `--no-passive-ftp'
|
---|
1310 | Disable the use of the "passive" FTP transfer mode. Passive FTP
|
---|
1311 | mandates that the client connect to the server to establish the
|
---|
1312 | data connection rather than the other way around.
|
---|
1313 |
|
---|
1314 | If the machine is connected to the Internet directly, both passive
|
---|
1315 | and active FTP should work equally well. Behind most firewall and
|
---|
1316 | NAT configurations passive FTP has a better chance of working.
|
---|
1317 | However, in some rare firewall configurations, active FTP actually
|
---|
1318 | works when passive FTP doesn't. If you suspect this to be the
|
---|
1319 | case, use this option, or set `passive_ftp=off' in your init file.
|
---|
1320 |
|
---|
1321 | `--retr-symlinks'
|
---|
1322 | Usually, when retrieving FTP directories recursively and a symbolic
|
---|
1323 | link is encountered, the linked-to file is not downloaded.
|
---|
1324 | Instead, a matching symbolic link is created on the local
|
---|
1325 | filesystem. The pointed-to file will not be downloaded unless
|
---|
1326 | this recursive retrieval would have encountered it separately and
|
---|
1327 | downloaded it anyway.
|
---|
1328 |
|
---|
1329 | When `--retr-symlinks' is specified, however, symbolic links are
|
---|
1330 | traversed and the pointed-to files are retrieved. At this time,
|
---|
1331 | this option does not cause Wget to traverse symlinks to
|
---|
1332 | directories and recurse through them, but in the future it should
|
---|
1333 | be enhanced to do this.
|
---|
1334 |
|
---|
1335 | Note that when retrieving a file (not a directory) because it was
|
---|
1336 | specified on the command-line, rather than because it was recursed
|
---|
1337 | to, this option has no effect. Symbolic links are always
|
---|
1338 | traversed in this case.
|
---|
1339 |
|
---|
1340 | `--no-http-keep-alive'
|
---|
1341 | Turn off the "keep-alive" feature for HTTP downloads. Normally,
|
---|
1342 | Wget asks the server to keep the connection open so that, when you
|
---|
1343 | download more than one document from the same server, they get
|
---|
1344 | transferred over the same TCP connection. This saves time and at
|
---|
1345 | the same time reduces the load on the server.
|
---|
1346 |
|
---|
1347 | This option is useful when, for some reason, persistent
|
---|
1348 | (keep-alive) connections don't work for you, for example due to a
|
---|
1349 | server bug or due to the inability of server-side scripts to cope
|
---|
1350 | with the connections.
|
---|
1351 |
|
---|
1352 |
|
---|
1353 | File: wget.info, Node: Recursive Retrieval Options, Next: Recursive Accept/Reject Options, Prev: FTP Options, Up: Invoking
|
---|
1354 |
|
---|
1355 | 2.10 Recursive Retrieval Options
|
---|
1356 | ================================
|
---|
1357 |
|
---|
1358 | `-r'
|
---|
1359 | `--recursive'
|
---|
1360 | Turn on recursive retrieving. *Note Recursive Download::, for more
|
---|
1361 | details.
|
---|
1362 |
|
---|
1363 | `-l DEPTH'
|
---|
1364 | `--level=DEPTH'
|
---|
1365 | Specify recursion maximum depth level DEPTH (*note Recursive
|
---|
1366 | Download::). The default maximum depth is 5.
|
---|
1367 |
|
---|
1368 | `--delete-after'
|
---|
1369 | This option tells Wget to delete every single file it downloads,
|
---|
1370 | _after_ having done so. It is useful for pre-fetching popular
|
---|
1371 | pages through a proxy, e.g.:
|
---|
1372 |
|
---|
1373 | wget -r -nd --delete-after http://whatever.com/~popular/page/
|
---|
1374 |
|
---|
1375 | The `-r' option is to retrieve recursively, and `-nd' to not
|
---|
1376 | create directories.
|
---|
1377 |
|
---|
1378 | Note that `--delete-after' deletes files on the local machine. It
|
---|
1379 | does not issue the `DELE' command to remote FTP sites, for
|
---|
1380 | instance. Also note that when `--delete-after' is specified,
|
---|
1381 | `--convert-links' is ignored, so `.orig' files are simply not
|
---|
1382 | created in the first place.
|
---|
1383 |
|
---|
1384 | `-k'
|
---|
1385 | `--convert-links'
|
---|
1386 | After the download is complete, convert the links in the document
|
---|
1387 | to make them suitable for local viewing. This affects not only
|
---|
1388 | the visible hyperlinks, but any part of the document that links to
|
---|
1389 | external content, such as embedded images, links to style sheets,
|
---|
1390 | hyperlinks to non-HTML content, etc.
|
---|
1391 |
|
---|
1392 | Each link will be changed in one of the two ways:
|
---|
1393 |
|
---|
1394 | * The links to files that have been downloaded by Wget will be
|
---|
1395 | changed to refer to the file they point to as a relative link.
|
---|
1396 |
|
---|
1397 | Example: if the downloaded file `/foo/doc.html' links to
|
---|
1398 | `/bar/img.gif', also downloaded, then the link in `doc.html'
|
---|
1399 | will be modified to point to `../bar/img.gif'. This kind of
|
---|
1400 | transformation works reliably for arbitrary combinations of
|
---|
1401 | directories.
|
---|
1402 |
|
---|
1403 | * The links to files that have not been downloaded by Wget will
|
---|
1404 | be changed to include host name and absolute path of the
|
---|
1405 | location they point to.
|
---|
1406 |
|
---|
1407 | Example: if the downloaded file `/foo/doc.html' links to
|
---|
1408 | `/bar/img.gif' (or to `../bar/img.gif'), then the link in
|
---|
1409 | `doc.html' will be modified to point to
|
---|
1410 | `http://HOSTNAME/bar/img.gif'.
|
---|
1411 |
|
---|
1412 | Because of this, local browsing works reliably: if a linked file
|
---|
1413 | was downloaded, the link will refer to its local name; if it was
|
---|
1414 | not downloaded, the link will refer to its full Internet address
|
---|
1415 | rather than presenting a broken link. The fact that the former
|
---|
1416 | links are converted to relative links ensures that you can move
|
---|
1417 | the downloaded hierarchy to another directory.
|
---|
1418 |
|
---|
1419 | Note that only at the end of the download can Wget know which
|
---|
1420 | links have been downloaded. Because of that, the work done by
|
---|
1421 | `-k' will be performed at the end of all the downloads.
|
---|
1422 |
|
---|
1423 | `-K'
|
---|
1424 | `--backup-converted'
|
---|
1425 | When converting a file, back up the original version with a `.orig'
|
---|
1426 | suffix. Affects the behavior of `-N' (*note HTTP Time-Stamping
|
---|
1427 | Internals::).
|
---|
1428 |
|
---|
1429 | `-m'
|
---|
1430 | `--mirror'
|
---|
1431 | Turn on options suitable for mirroring. This option turns on
|
---|
1432 | recursion and time-stamping, sets infinite recursion depth and
|
---|
1433 | keeps FTP directory listings. It is currently equivalent to `-r
|
---|
1434 | -N -l inf --no-remove-listing'.
|
---|
1435 |
|
---|
1436 | `-p'
|
---|
1437 | `--page-requisites'
|
---|
1438 | This option causes Wget to download all the files that are
|
---|
1439 | necessary to properly display a given HTML page. This includes
|
---|
1440 | such things as inlined images, sounds, and referenced stylesheets.
|
---|
1441 |
|
---|
1442 | Ordinarily, when downloading a single HTML page, any requisite
|
---|
1443 | documents that may be needed to display it properly are not
|
---|
1444 | downloaded. Using `-r' together with `-l' can help, but since
|
---|
1445 | Wget does not ordinarily distinguish between external and inlined
|
---|
1446 | documents, one is generally left with "leaf documents" that are
|
---|
1447 | missing their requisites.
|
---|
1448 |
|
---|
1449 | For instance, say document `1.html' contains an `<IMG>' tag
|
---|
1450 | referencing `1.gif' and an `<A>' tag pointing to external document
|
---|
1451 | `2.html'. Say that `2.html' is similar but that its image is
|
---|
1452 | `2.gif' and it links to `3.html'. Say this continues up to some
|
---|
1453 | arbitrarily high number.
|
---|
1454 |
|
---|
1455 | If one executes the command:
|
---|
1456 |
|
---|
1457 | wget -r -l 2 http://SITE/1.html
|
---|
1458 |
|
---|
1459 | then `1.html', `1.gif', `2.html', `2.gif', and `3.html' will be
|
---|
1460 | downloaded. As you can see, `3.html' is without its requisite
|
---|
1461 | `3.gif' because Wget is simply counting the number of hops (up to
|
---|
1462 | 2) away from `1.html' in order to determine where to stop the
|
---|
1463 | recursion. However, with this command:
|
---|
1464 |
|
---|
1465 | wget -r -l 2 -p http://SITE/1.html
|
---|
1466 |
|
---|
1467 | all the above files _and_ `3.html''s requisite `3.gif' will be
|
---|
1468 | downloaded. Similarly,
|
---|
1469 |
|
---|
1470 | wget -r -l 1 -p http://SITE/1.html
|
---|
1471 |
|
---|
1472 | will cause `1.html', `1.gif', `2.html', and `2.gif' to be
|
---|
1473 | downloaded. One might think that:
|
---|
1474 |
|
---|
1475 | wget -r -l 0 -p http://SITE/1.html
|
---|
1476 |
|
---|
1477 | would download just `1.html' and `1.gif', but unfortunately this
|
---|
1478 | is not the case, because `-l 0' is equivalent to `-l inf'--that
|
---|
1479 | is, infinite recursion. To download a single HTML page (or a
|
---|
1480 | handful of them, all specified on the command-line or in a `-i'
|
---|
1481 | URL input file) and its (or their) requisites, simply leave off
|
---|
1482 | `-r' and `-l':
|
---|
1483 |
|
---|
1484 | wget -p http://SITE/1.html
|
---|
1485 |
|
---|
1486 | Note that Wget will behave as if `-r' had been specified, but only
|
---|
1487 | that single page and its requisites will be downloaded. Links
|
---|
1488 | from that page to external documents will not be followed.
|
---|
1489 | Actually, to download a single page and all its requisites (even
|
---|
1490 | if they exist on separate websites), and make sure the lot
|
---|
1491 | displays properly locally, this author likes to use a few options
|
---|
1492 | in addition to `-p':
|
---|
1493 |
|
---|
1494 | wget -E -H -k -K -p http://SITE/DOCUMENT
|
---|
1495 |
|
---|
1496 | To finish off this topic, it's worth knowing that Wget's idea of an
|
---|
1497 | external document link is any URL specified in an `<A>' tag, an
|
---|
1498 | `<AREA>' tag, or a `<LINK>' tag other than `<LINK
|
---|
1499 | REL="stylesheet">'.
|
---|
1500 |
|
---|
1501 | `--strict-comments'
|
---|
1502 | Turn on strict parsing of HTML comments. The default is to
|
---|
1503 | terminate comments at the first occurrence of `-->'.
|
---|
1504 |
|
---|
1505 | According to specifications, HTML comments are expressed as SGML
|
---|
1506 | "declarations". Declaration is special markup that begins with
|
---|
1507 | `<!' and ends with `>', such as `<!DOCTYPE ...>', that may contain
|
---|
1508 | comments between a pair of `--' delimiters. HTML comments are
|
---|
1509 | "empty declarations", SGML declarations without any non-comment
|
---|
1510 | text. Therefore, `<!--foo-->' is a valid comment, and so is
|
---|
1511 | `<!--one-- --two-->', but `<!--1--2-->' is not.
|
---|
1512 |
|
---|
1513 | On the other hand, most HTML writers don't perceive comments as
|
---|
1514 | anything other than text delimited with `<!--' and `-->', which is
|
---|
1515 | not quite the same. For example, something like `<!------------>'
|
---|
1516 | works as a valid comment as long as the number of dashes is a
|
---|
1517 | multiple of four (!). If not, the comment technically lasts until
|
---|
1518 | the next `--', which may be at the other end of the document.
|
---|
1519 | Because of this, many popular browsers completely ignore the
|
---|
1520 | specification and implement what users have come to expect:
|
---|
1521 | comments delimited with `<!--' and `-->'.
|
---|
1522 |
|
---|
1523 | Until version 1.9, Wget interpreted comments strictly, which
|
---|
1524 | resulted in missing links in many web pages that displayed fine in
|
---|
1525 | browsers, but had the misfortune of containing non-compliant
|
---|
1526 | comments. Beginning with version 1.9, Wget has joined the ranks
|
---|
1527 | of clients that implements "naive" comments, terminating each
|
---|
1528 | comment at the first occurrence of `-->'.
|
---|
1529 |
|
---|
1530 | If, for whatever reason, you want strict comment parsing, use this
|
---|
1531 | option to turn it on.
|
---|
1532 |
|
---|
1533 |
|
---|
1534 | File: wget.info, Node: Recursive Accept/Reject Options, Prev: Recursive Retrieval Options, Up: Invoking
|
---|
1535 |
|
---|
1536 | 2.11 Recursive Accept/Reject Options
|
---|
1537 | ====================================
|
---|
1538 |
|
---|
1539 | `-A ACCLIST --accept ACCLIST'
|
---|
1540 | `-R REJLIST --reject REJLIST'
|
---|
1541 | Specify comma-separated lists of file name suffixes or patterns to
|
---|
1542 | accept or reject (*note Types of Files:: for more details).
|
---|
1543 |
|
---|
1544 | `-D DOMAIN-LIST'
|
---|
1545 | `--domains=DOMAIN-LIST'
|
---|
1546 | Set domains to be followed. DOMAIN-LIST is a comma-separated list
|
---|
1547 | of domains. Note that it does _not_ turn on `-H'.
|
---|
1548 |
|
---|
1549 | `--exclude-domains DOMAIN-LIST'
|
---|
1550 | Specify the domains that are _not_ to be followed. (*note
|
---|
1551 | Spanning Hosts::).
|
---|
1552 |
|
---|
1553 | `--follow-ftp'
|
---|
1554 | Follow FTP links from HTML documents. Without this option, Wget
|
---|
1555 | will ignore all the FTP links.
|
---|
1556 |
|
---|
1557 | `--follow-tags=LIST'
|
---|
1558 | Wget has an internal table of HTML tag / attribute pairs that it
|
---|
1559 | considers when looking for linked documents during a recursive
|
---|
1560 | retrieval. If a user wants only a subset of those tags to be
|
---|
1561 | considered, however, he or she should be specify such tags in a
|
---|
1562 | comma-separated LIST with this option.
|
---|
1563 |
|
---|
1564 | `--ignore-tags=LIST'
|
---|
1565 | This is the opposite of the `--follow-tags' option. To skip
|
---|
1566 | certain HTML tags when recursively looking for documents to
|
---|
1567 | download, specify them in a comma-separated LIST.
|
---|
1568 |
|
---|
1569 | In the past, this option was the best bet for downloading a single
|
---|
1570 | page and its requisites, using a command-line like:
|
---|
1571 |
|
---|
1572 | wget --ignore-tags=a,area -H -k -K -r http://SITE/DOCUMENT
|
---|
1573 |
|
---|
1574 | However, the author of this option came across a page with tags
|
---|
1575 | like `<LINK REL="home" HREF="/">' and came to the realization that
|
---|
1576 | specifying tags to ignore was not enough. One can't just tell
|
---|
1577 | Wget to ignore `<LINK>', because then stylesheets will not be
|
---|
1578 | downloaded. Now the best bet for downloading a single page and
|
---|
1579 | its requisites is the dedicated `--page-requisites' option.
|
---|
1580 |
|
---|
1581 | `-H'
|
---|
1582 | `--span-hosts'
|
---|
1583 | Enable spanning across hosts when doing recursive retrieving
|
---|
1584 | (*note Spanning Hosts::).
|
---|
1585 |
|
---|
1586 | `-L'
|
---|
1587 | `--relative'
|
---|
1588 | Follow relative links only. Useful for retrieving a specific home
|
---|
1589 | page without any distractions, not even those from the same hosts
|
---|
1590 | (*note Relative Links::).
|
---|
1591 |
|
---|
1592 | `-I LIST'
|
---|
1593 | `--include-directories=LIST'
|
---|
1594 | Specify a comma-separated list of directories you wish to follow
|
---|
1595 | when downloading (*note Directory-Based Limits:: for more
|
---|
1596 | details.) Elements of LIST may contain wildcards.
|
---|
1597 |
|
---|
1598 | `-X LIST'
|
---|
1599 | `--exclude-directories=LIST'
|
---|
1600 | Specify a comma-separated list of directories you wish to exclude
|
---|
1601 | from download (*note Directory-Based Limits:: for more details.)
|
---|
1602 | Elements of LIST may contain wildcards.
|
---|
1603 |
|
---|
1604 | `-np'
|
---|
1605 |
|
---|
1606 | `--no-parent'
|
---|
1607 | Do not ever ascend to the parent directory when retrieving
|
---|
1608 | recursively. This is a useful option, since it guarantees that
|
---|
1609 | only the files _below_ a certain hierarchy will be downloaded.
|
---|
1610 | *Note Directory-Based Limits::, for more details.
|
---|
1611 |
|
---|
1612 |
|
---|
1613 | File: wget.info, Node: Recursive Download, Next: Following Links, Prev: Invoking, Up: Top
|
---|
1614 |
|
---|
1615 | 3 Recursive Download
|
---|
1616 | ********************
|
---|
1617 |
|
---|
1618 | GNU Wget is capable of traversing parts of the Web (or a single HTTP or
|
---|
1619 | FTP server), following links and directory structure. We refer to this
|
---|
1620 | as to "recursive retrieval", or "recursion".
|
---|
1621 |
|
---|
1622 | With HTTP URLs, Wget retrieves and parses the HTML from the given
|
---|
1623 | URL, documents, retrieving the files the HTML document was referring
|
---|
1624 | to, through markup like `href', or `src'. If the freshly downloaded
|
---|
1625 | file is also of type `text/html' or `application/xhtml+xml', it will be
|
---|
1626 | parsed and followed further.
|
---|
1627 |
|
---|
1628 | Recursive retrieval of HTTP and HTML content is "breadth-first".
|
---|
1629 | This means that Wget first downloads the requested HTML document, then
|
---|
1630 | the documents linked from that document, then the documents linked by
|
---|
1631 | them, and so on. In other words, Wget first downloads the documents at
|
---|
1632 | depth 1, then those at depth 2, and so on until the specified maximum
|
---|
1633 | depth.
|
---|
1634 |
|
---|
1635 | The maximum "depth" to which the retrieval may descend is specified
|
---|
1636 | with the `-l' option. The default maximum depth is five layers.
|
---|
1637 |
|
---|
1638 | When retrieving an FTP URL recursively, Wget will retrieve all the
|
---|
1639 | data from the given directory tree (including the subdirectories up to
|
---|
1640 | the specified depth) on the remote server, creating its mirror image
|
---|
1641 | locally. FTP retrieval is also limited by the `depth' parameter.
|
---|
1642 | Unlike HTTP recursion, FTP recursion is performed depth-first.
|
---|
1643 |
|
---|
1644 | By default, Wget will create a local directory tree, corresponding to
|
---|
1645 | the one found on the remote server.
|
---|
1646 |
|
---|
1647 | Recursive retrieving can find a number of applications, the most
|
---|
1648 | important of which is mirroring. It is also useful for WWW
|
---|
1649 | presentations, and any other opportunities where slow network
|
---|
1650 | connections should be bypassed by storing the files locally.
|
---|
1651 |
|
---|
1652 | You should be warned that recursive downloads can overload the remote
|
---|
1653 | servers. Because of that, many administrators frown upon them and may
|
---|
1654 | ban access from your site if they detect very fast downloads of big
|
---|
1655 | amounts of content. When downloading from Internet servers, consider
|
---|
1656 | using the `-w' option to introduce a delay between accesses to the
|
---|
1657 | server. The download will take a while longer, but the server
|
---|
1658 | administrator will not be alarmed by your rudeness.
|
---|
1659 |
|
---|
1660 | Of course, recursive download may cause problems on your machine. If
|
---|
1661 | left to run unchecked, it can easily fill up the disk. If downloading
|
---|
1662 | from local network, it can also take bandwidth on the system, as well as
|
---|
1663 | consume memory and CPU.
|
---|
1664 |
|
---|
1665 | Try to specify the criteria that match the kind of download you are
|
---|
1666 | trying to achieve. If you want to download only one page, use
|
---|
1667 | `--page-requisites' without any additional recursion. If you want to
|
---|
1668 | download things under one directory, use `-np' to avoid downloading
|
---|
1669 | things from other directories. If you want to download all the files
|
---|
1670 | from one directory, use `-l 1' to make sure the recursion depth never
|
---|
1671 | exceeds one. *Note Following Links::, for more information about this.
|
---|
1672 |
|
---|
1673 | Recursive retrieval should be used with care. Don't say you were not
|
---|
1674 | warned.
|
---|
1675 |
|
---|
1676 |
|
---|
1677 | File: wget.info, Node: Following Links, Next: Time-Stamping, Prev: Recursive Download, Up: Top
|
---|
1678 |
|
---|
1679 | 4 Following Links
|
---|
1680 | *****************
|
---|
1681 |
|
---|
1682 | When retrieving recursively, one does not wish to retrieve loads of
|
---|
1683 | unnecessary data. Most of the time the users bear in mind exactly what
|
---|
1684 | they want to download, and want Wget to follow only specific links.
|
---|
1685 |
|
---|
1686 | For example, if you wish to download the music archive from
|
---|
1687 | `fly.srk.fer.hr', you will not want to download all the home pages that
|
---|
1688 | happen to be referenced by an obscure part of the archive.
|
---|
1689 |
|
---|
1690 | Wget possesses several mechanisms that allows you to fine-tune which
|
---|
1691 | links it will follow.
|
---|
1692 |
|
---|
1693 | * Menu:
|
---|
1694 |
|
---|
1695 | * Spanning Hosts:: (Un)limiting retrieval based on host name.
|
---|
1696 | * Types of Files:: Getting only certain files.
|
---|
1697 | * Directory-Based Limits:: Getting only certain directories.
|
---|
1698 | * Relative Links:: Follow relative links only.
|
---|
1699 | * FTP Links:: Following FTP links.
|
---|
1700 |
|
---|
1701 |
|
---|
1702 | File: wget.info, Node: Spanning Hosts, Next: Types of Files, Up: Following Links
|
---|
1703 |
|
---|
1704 | 4.1 Spanning Hosts
|
---|
1705 | ==================
|
---|
1706 |
|
---|
1707 | Wget's recursive retrieval normally refuses to visit hosts different
|
---|
1708 | than the one you specified on the command line. This is a reasonable
|
---|
1709 | default; without it, every retrieval would have the potential to turn
|
---|
1710 | your Wget into a small version of google.
|
---|
1711 |
|
---|
1712 | However, visiting different hosts, or "host spanning," is sometimes
|
---|
1713 | a useful option. Maybe the images are served from a different server.
|
---|
1714 | Maybe you're mirroring a site that consists of pages interlinked between
|
---|
1715 | three servers. Maybe the server has two equivalent names, and the HTML
|
---|
1716 | pages refer to both interchangeably.
|
---|
1717 |
|
---|
1718 | Span to any host--`-H'
|
---|
1719 | The `-H' option turns on host spanning, thus allowing Wget's
|
---|
1720 | recursive run to visit any host referenced by a link. Unless
|
---|
1721 | sufficient recursion-limiting criteria are applied depth, these
|
---|
1722 | foreign hosts will typically link to yet more hosts, and so on
|
---|
1723 | until Wget ends up sucking up much more data than you have
|
---|
1724 | intended.
|
---|
1725 |
|
---|
1726 | Limit spanning to certain domains--`-D'
|
---|
1727 | The `-D' option allows you to specify the domains that will be
|
---|
1728 | followed, thus limiting the recursion only to the hosts that
|
---|
1729 | belong to these domains. Obviously, this makes sense only in
|
---|
1730 | conjunction with `-H'. A typical example would be downloading the
|
---|
1731 | contents of `www.server.com', but allowing downloads from
|
---|
1732 | `images.server.com', etc.:
|
---|
1733 |
|
---|
1734 | wget -rH -Dserver.com http://www.server.com/
|
---|
1735 |
|
---|
1736 | You can specify more than one address by separating them with a
|
---|
1737 | comma, e.g. `-Ddomain1.com,domain2.com'.
|
---|
1738 |
|
---|
1739 | Keep download off certain domains--`--exclude-domains'
|
---|
1740 | If there are domains you want to exclude specifically, you can do
|
---|
1741 | it with `--exclude-domains', which accepts the same type of
|
---|
1742 | arguments of `-D', but will _exclude_ all the listed domains. For
|
---|
1743 | example, if you want to download all the hosts from `foo.edu'
|
---|
1744 | domain, with the exception of `sunsite.foo.edu', you can do it like
|
---|
1745 | this:
|
---|
1746 |
|
---|
1747 | wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
|
---|
1748 | http://www.foo.edu/
|
---|
1749 |
|
---|
1750 |
|
---|
1751 |
|
---|
1752 | File: wget.info, Node: Types of Files, Next: Directory-Based Limits, Prev: Spanning Hosts, Up: Following Links
|
---|
1753 |
|
---|
1754 | 4.2 Types of Files
|
---|
1755 | ==================
|
---|
1756 |
|
---|
1757 | When downloading material from the web, you will often want to restrict
|
---|
1758 | the retrieval to only certain file types. For example, if you are
|
---|
1759 | interested in downloading GIFs, you will not be overjoyed to get loads
|
---|
1760 | of PostScript documents, and vice versa.
|
---|
1761 |
|
---|
1762 | Wget offers two options to deal with this problem. Each option
|
---|
1763 | description lists a short name, a long name, and the equivalent command
|
---|
1764 | in `.wgetrc'.
|
---|
1765 |
|
---|
1766 | `-A ACCLIST'
|
---|
1767 | `--accept ACCLIST'
|
---|
1768 | `accept = ACCLIST'
|
---|
1769 | The argument to `--accept' option is a list of file suffixes or
|
---|
1770 | patterns that Wget will download during recursive retrieval. A
|
---|
1771 | suffix is the ending part of a file, and consists of "normal"
|
---|
1772 | letters, e.g. `gif' or `.jpg'. A matching pattern contains
|
---|
1773 | shell-like wildcards, e.g. `books*' or `zelazny*196[0-9]*'.
|
---|
1774 |
|
---|
1775 | So, specifying `wget -A gif,jpg' will make Wget download only the
|
---|
1776 | files ending with `gif' or `jpg', i.e. GIFs and JPEGs. On the
|
---|
1777 | other hand, `wget -A "zelazny*196[0-9]*"' will download only files
|
---|
1778 | beginning with `zelazny' and containing numbers from 1960 to 1969
|
---|
1779 | anywhere within. Look up the manual of your shell for a
|
---|
1780 | description of how pattern matching works.
|
---|
1781 |
|
---|
1782 | Of course, any number of suffixes and patterns can be combined
|
---|
1783 | into a comma-separated list, and given as an argument to `-A'.
|
---|
1784 |
|
---|
1785 | `-R REJLIST'
|
---|
1786 | `--reject REJLIST'
|
---|
1787 | `reject = REJLIST'
|
---|
1788 | The `--reject' option works the same way as `--accept', only its
|
---|
1789 | logic is the reverse; Wget will download all files _except_ the
|
---|
1790 | ones matching the suffixes (or patterns) in the list.
|
---|
1791 |
|
---|
1792 | So, if you want to download a whole page except for the cumbersome
|
---|
1793 | MPEGs and .AU files, you can use `wget -R mpg,mpeg,au'.
|
---|
1794 | Analogously, to download all files except the ones beginning with
|
---|
1795 | `bjork', use `wget -R "bjork*"'. The quotes are to prevent
|
---|
1796 | expansion by the shell.
|
---|
1797 |
|
---|
1798 | The `-A' and `-R' options may be combined to achieve even better
|
---|
1799 | fine-tuning of which files to retrieve. E.g. `wget -A "*zelazny*" -R
|
---|
1800 | .ps' will download all the files having `zelazny' as a part of their
|
---|
1801 | name, but _not_ the PostScript files.
|
---|
1802 |
|
---|
1803 | Note that these two options do not affect the downloading of HTML
|
---|
1804 | files; Wget must load all the HTMLs to know where to go at
|
---|
1805 | all--recursive retrieval would make no sense otherwise.
|
---|
1806 |
|
---|
1807 |
|
---|
1808 | File: wget.info, Node: Directory-Based Limits, Next: Relative Links, Prev: Types of Files, Up: Following Links
|
---|
1809 |
|
---|
1810 | 4.3 Directory-Based Limits
|
---|
1811 | ==========================
|
---|
1812 |
|
---|
1813 | Regardless of other link-following facilities, it is often useful to
|
---|
1814 | place the restriction of what files to retrieve based on the directories
|
---|
1815 | those files are placed in. There can be many reasons for this--the
|
---|
1816 | home pages may be organized in a reasonable directory structure; or some
|
---|
1817 | directories may contain useless information, e.g. `/cgi-bin' or `/dev'
|
---|
1818 | directories.
|
---|
1819 |
|
---|
1820 | Wget offers three different options to deal with this requirement.
|
---|
1821 | Each option description lists a short name, a long name, and the
|
---|
1822 | equivalent command in `.wgetrc'.
|
---|
1823 |
|
---|
1824 | `-I LIST'
|
---|
1825 | `--include LIST'
|
---|
1826 | `include_directories = LIST'
|
---|
1827 | `-I' option accepts a comma-separated list of directories included
|
---|
1828 | in the retrieval. Any other directories will simply be ignored.
|
---|
1829 | The directories are absolute paths.
|
---|
1830 |
|
---|
1831 | So, if you wish to download from `http://host/people/bozo/'
|
---|
1832 | following only links to bozo's colleagues in the `/people'
|
---|
1833 | directory and the bogus scripts in `/cgi-bin', you can specify:
|
---|
1834 |
|
---|
1835 | wget -I /people,/cgi-bin http://host/people/bozo/
|
---|
1836 |
|
---|
1837 | `-X LIST'
|
---|
1838 | `--exclude LIST'
|
---|
1839 | `exclude_directories = LIST'
|
---|
1840 | `-X' option is exactly the reverse of `-I'--this is a list of
|
---|
1841 | directories _excluded_ from the download. E.g. if you do not want
|
---|
1842 | Wget to download things from `/cgi-bin' directory, specify `-X
|
---|
1843 | /cgi-bin' on the command line.
|
---|
1844 |
|
---|
1845 | The same as with `-A'/`-R', these two options can be combined to
|
---|
1846 | get a better fine-tuning of downloading subdirectories. E.g. if
|
---|
1847 | you want to load all the files from `/pub' hierarchy except for
|
---|
1848 | `/pub/worthless', specify `-I/pub -X/pub/worthless'.
|
---|
1849 |
|
---|
1850 | `-np'
|
---|
1851 | `--no-parent'
|
---|
1852 | `no_parent = on'
|
---|
1853 | The simplest, and often very useful way of limiting directories is
|
---|
1854 | disallowing retrieval of the links that refer to the hierarchy
|
---|
1855 | "above" than the beginning directory, i.e. disallowing ascent to
|
---|
1856 | the parent directory/directories.
|
---|
1857 |
|
---|
1858 | The `--no-parent' option (short `-np') is useful in this case.
|
---|
1859 | Using it guarantees that you will never leave the existing
|
---|
1860 | hierarchy. Supposing you issue Wget with:
|
---|
1861 |
|
---|
1862 | wget -r --no-parent http://somehost/~luzer/my-archive/
|
---|
1863 |
|
---|
1864 | You may rest assured that none of the references to
|
---|
1865 | `/~his-girls-homepage/' or `/~luzer/all-my-mpegs/' will be
|
---|
1866 | followed. Only the archive you are interested in will be
|
---|
1867 | downloaded. Essentially, `--no-parent' is similar to
|
---|
1868 | `-I/~luzer/my-archive', only it handles redirections in a more
|
---|
1869 | intelligent fashion.
|
---|
1870 |
|
---|
1871 |
|
---|
1872 | File: wget.info, Node: Relative Links, Next: FTP Links, Prev: Directory-Based Limits, Up: Following Links
|
---|
1873 |
|
---|
1874 | 4.4 Relative Links
|
---|
1875 | ==================
|
---|
1876 |
|
---|
1877 | When `-L' is turned on, only the relative links are ever followed.
|
---|
1878 | Relative links are here defined those that do not refer to the web
|
---|
1879 | server root. For example, these links are relative:
|
---|
1880 |
|
---|
1881 | <a href="foo.gif">
|
---|
1882 | <a href="foo/bar.gif">
|
---|
1883 | <a href="../foo/bar.gif">
|
---|
1884 |
|
---|
1885 | These links are not relative:
|
---|
1886 |
|
---|
1887 | <a href="/foo.gif">
|
---|
1888 | <a href="/foo/bar.gif">
|
---|
1889 | <a href="http://www.server.com/foo/bar.gif">
|
---|
1890 |
|
---|
1891 | Using this option guarantees that recursive retrieval will not span
|
---|
1892 | hosts, even without `-H'. In simple cases it also allows downloads to
|
---|
1893 | "just work" without having to convert links.
|
---|
1894 |
|
---|
1895 | This option is probably not very useful and might be removed in a
|
---|
1896 | future release.
|
---|
1897 |
|
---|
1898 |
|
---|
1899 | File: wget.info, Node: FTP Links, Prev: Relative Links, Up: Following Links
|
---|
1900 |
|
---|
1901 | 4.5 Following FTP Links
|
---|
1902 | =======================
|
---|
1903 |
|
---|
1904 | The rules for FTP are somewhat specific, as it is necessary for them to
|
---|
1905 | be. FTP links in HTML documents are often included for purposes of
|
---|
1906 | reference, and it is often inconvenient to download them by default.
|
---|
1907 |
|
---|
1908 | To have FTP links followed from HTML documents, you need to specify
|
---|
1909 | the `--follow-ftp' option. Having done that, FTP links will span hosts
|
---|
1910 | regardless of `-H' setting. This is logical, as FTP links rarely point
|
---|
1911 | to the same host where the HTTP server resides. For similar reasons,
|
---|
1912 | the `-L' options has no effect on such downloads. On the other hand,
|
---|
1913 | domain acceptance (`-D') and suffix rules (`-A' and `-R') apply
|
---|
1914 | normally.
|
---|
1915 |
|
---|
1916 | Also note that followed links to FTP directories will not be
|
---|
1917 | retrieved recursively further.
|
---|
1918 |
|
---|
1919 |
|
---|
1920 | File: wget.info, Node: Time-Stamping, Next: Startup File, Prev: Following Links, Up: Top
|
---|
1921 |
|
---|
1922 | 5 Time-Stamping
|
---|
1923 | ***************
|
---|
1924 |
|
---|
1925 | One of the most important aspects of mirroring information from the
|
---|
1926 | Internet is updating your archives.
|
---|
1927 |
|
---|
1928 | Downloading the whole archive again and again, just to replace a few
|
---|
1929 | changed files is expensive, both in terms of wasted bandwidth and money,
|
---|
1930 | and the time to do the update. This is why all the mirroring tools
|
---|
1931 | offer the option of incremental updating.
|
---|
1932 |
|
---|
1933 | Such an updating mechanism means that the remote server is scanned in
|
---|
1934 | search of "new" files. Only those new files will be downloaded in the
|
---|
1935 | place of the old ones.
|
---|
1936 |
|
---|
1937 | A file is considered new if one of these two conditions are met:
|
---|
1938 |
|
---|
1939 | 1. A file of that name does not already exist locally.
|
---|
1940 |
|
---|
1941 | 2. A file of that name does exist, but the remote file was modified
|
---|
1942 | more recently than the local file.
|
---|
1943 |
|
---|
1944 | To implement this, the program needs to be aware of the time of last
|
---|
1945 | modification of both local and remote files. We call this information
|
---|
1946 | the "time-stamp" of a file.
|
---|
1947 |
|
---|
1948 | The time-stamping in GNU Wget is turned on using `--timestamping'
|
---|
1949 | (`-N') option, or through `timestamping = on' directive in `.wgetrc'.
|
---|
1950 | With this option, for each file it intends to download, Wget will check
|
---|
1951 | whether a local file of the same name exists. If it does, and the
|
---|
1952 | remote file is older, Wget will not download it.
|
---|
1953 |
|
---|
1954 | If the local file does not exist, or the sizes of the files do not
|
---|
1955 | match, Wget will download the remote file no matter what the time-stamps
|
---|
1956 | say.
|
---|
1957 |
|
---|
1958 | * Menu:
|
---|
1959 |
|
---|
1960 | * Time-Stamping Usage::
|
---|
1961 | * HTTP Time-Stamping Internals::
|
---|
1962 | * FTP Time-Stamping Internals::
|
---|
1963 |
|
---|
1964 |
|
---|
1965 | File: wget.info, Node: Time-Stamping Usage, Next: HTTP Time-Stamping Internals, Up: Time-Stamping
|
---|
1966 |
|
---|
1967 | 5.1 Time-Stamping Usage
|
---|
1968 | =======================
|
---|
1969 |
|
---|
1970 | The usage of time-stamping is simple. Say you would like to download a
|
---|
1971 | file so that it keeps its date of modification.
|
---|
1972 |
|
---|
1973 | wget -S http://www.gnu.ai.mit.edu/
|
---|
1974 |
|
---|
1975 | A simple `ls -l' shows that the time stamp on the local file equals
|
---|
1976 | the state of the `Last-Modified' header, as returned by the server. As
|
---|
1977 | you can see, the time-stamping info is preserved locally, even without
|
---|
1978 | `-N' (at least for HTTP).
|
---|
1979 |
|
---|
1980 | Several days later, you would like Wget to check if the remote file
|
---|
1981 | has changed, and download it if it has.
|
---|
1982 |
|
---|
1983 | wget -N http://www.gnu.ai.mit.edu/
|
---|
1984 |
|
---|
1985 | Wget will ask the server for the last-modified date. If the local
|
---|
1986 | file has the same timestamp as the server, or a newer one, the remote
|
---|
1987 | file will not be re-fetched. However, if the remote file is more
|
---|
1988 | recent, Wget will proceed to fetch it.
|
---|
1989 |
|
---|
1990 | The same goes for FTP. For example:
|
---|
1991 |
|
---|
1992 | wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"
|
---|
1993 |
|
---|
1994 | (The quotes around that URL are to prevent the shell from trying to
|
---|
1995 | interpret the `*'.)
|
---|
1996 |
|
---|
1997 | After download, a local directory listing will show that the
|
---|
1998 | timestamps match those on the remote server. Reissuing the command
|
---|
1999 | with `-N' will make Wget re-fetch _only_ the files that have been
|
---|
2000 | modified since the last download.
|
---|
2001 |
|
---|
2002 | If you wished to mirror the GNU archive every week, you would use a
|
---|
2003 | command like the following, weekly:
|
---|
2004 |
|
---|
2005 | wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/
|
---|
2006 |
|
---|
2007 | Note that time-stamping will only work for files for which the server
|
---|
2008 | gives a timestamp. For HTTP, this depends on getting a `Last-Modified'
|
---|
2009 | header. For FTP, this depends on getting a directory listing with
|
---|
2010 | dates in a format that Wget can parse (*note FTP Time-Stamping
|
---|
2011 | Internals::).
|
---|
2012 |
|
---|
2013 |
|
---|
2014 | File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
|
---|
2015 |
|
---|
2016 | 5.2 HTTP Time-Stamping Internals
|
---|
2017 | ================================
|
---|
2018 |
|
---|
2019 | Time-stamping in HTTP is implemented by checking of the `Last-Modified'
|
---|
2020 | header. If you wish to retrieve the file `foo.html' through HTTP, Wget
|
---|
2021 | will check whether `foo.html' exists locally. If it doesn't,
|
---|
2022 | `foo.html' will be retrieved unconditionally.
|
---|
2023 |
|
---|
2024 | If the file does exist locally, Wget will first check its local
|
---|
2025 | time-stamp (similar to the way `ls -l' checks it), and then send a
|
---|
2026 | `HEAD' request to the remote server, demanding the information on the
|
---|
2027 | remote file.
|
---|
2028 |
|
---|
2029 | The `Last-Modified' header is examined to find which file was
|
---|
2030 | modified more recently (which makes it "newer"). If the remote file is
|
---|
2031 | newer, it will be downloaded; if it is older, Wget will give up.(1)
|
---|
2032 |
|
---|
2033 | When `--backup-converted' (`-K') is specified in conjunction with
|
---|
2034 | `-N', server file `X' is compared to local file `X.orig', if extant,
|
---|
2035 | rather than being compared to local file `X', which will always differ
|
---|
2036 | if it's been converted by `--convert-links' (`-k').
|
---|
2037 |
|
---|
2038 | Arguably, HTTP time-stamping should be implemented using the
|
---|
2039 | `If-Modified-Since' request.
|
---|
2040 |
|
---|
2041 | ---------- Footnotes ----------
|
---|
2042 |
|
---|
2043 | (1) As an additional check, Wget will look at the `Content-Length'
|
---|
2044 | header, and compare the sizes; if they are not the same, the remote
|
---|
2045 | file will be downloaded no matter what the time-stamp says.
|
---|
2046 |
|
---|
2047 |
|
---|
2048 | File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping
|
---|
2049 |
|
---|
2050 | 5.3 FTP Time-Stamping Internals
|
---|
2051 | ===============================
|
---|
2052 |
|
---|
2053 | In theory, FTP time-stamping works much the same as HTTP, only FTP has
|
---|
2054 | no headers--time-stamps must be ferreted out of directory listings.
|
---|
2055 |
|
---|
2056 | If an FTP download is recursive or uses globbing, Wget will use the
|
---|
2057 | FTP `LIST' command to get a file listing for the directory containing
|
---|
2058 | the desired file(s). It will try to analyze the listing, treating it
|
---|
2059 | like Unix `ls -l' output, extracting the time-stamps. The rest is
|
---|
2060 | exactly the same as for HTTP. Note that when retrieving individual
|
---|
2061 | files from an FTP server without using globbing or recursion, listing
|
---|
2062 | files will not be downloaded (and thus files will not be time-stamped)
|
---|
2063 | unless `-N' is specified.
|
---|
2064 |
|
---|
2065 | Assumption that every directory listing is a Unix-style listing may
|
---|
2066 | sound extremely constraining, but in practice it is not, as many
|
---|
2067 | non-Unix FTP servers use the Unixoid listing format because most (all?)
|
---|
2068 | of the clients understand it. Bear in mind that RFC959 defines no
|
---|
2069 | standard way to get a file list, let alone the time-stamps. We can
|
---|
2070 | only hope that a future standard will define this.
|
---|
2071 |
|
---|
2072 | Another non-standard solution includes the use of `MDTM' command
|
---|
2073 | that is supported by some FTP servers (including the popular
|
---|
2074 | `wu-ftpd'), which returns the exact time of the specified file. Wget
|
---|
2075 | may support this command in the future.
|
---|
2076 |
|
---|
2077 |
|
---|
2078 | File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
|
---|
2079 |
|
---|
2080 | 6 Startup File
|
---|
2081 | **************
|
---|
2082 |
|
---|
2083 | Once you know how to change default settings of Wget through command
|
---|
2084 | line arguments, you may wish to make some of those settings permanent.
|
---|
2085 | You can do that in a convenient way by creating the Wget startup
|
---|
2086 | file--`.wgetrc'.
|
---|
2087 |
|
---|
2088 | Besides `.wgetrc' is the "main" initialization file, it is
|
---|
2089 | convenient to have a special facility for storing passwords. Thus Wget
|
---|
2090 | reads and interprets the contents of `$HOME/.netrc', if it finds it.
|
---|
2091 | You can find `.netrc' format in your system manuals.
|
---|
2092 |
|
---|
2093 | Wget reads `.wgetrc' upon startup, recognizing a limited set of
|
---|
2094 | commands.
|
---|
2095 |
|
---|
2096 | * Menu:
|
---|
2097 |
|
---|
2098 | * Wgetrc Location:: Location of various wgetrc files.
|
---|
2099 | * Wgetrc Syntax:: Syntax of wgetrc.
|
---|
2100 | * Wgetrc Commands:: List of available commands.
|
---|
2101 | * Sample Wgetrc:: A wgetrc example.
|
---|
2102 |
|
---|
2103 |
|
---|
2104 | File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Up: Startup File
|
---|
2105 |
|
---|
2106 | 6.1 Wgetrc Location
|
---|
2107 | ===================
|
---|
2108 |
|
---|
2109 | When initializing, Wget will look for a "global" startup file,
|
---|
2110 | `/usr/local/etc/wgetrc' by default (or some prefix other than
|
---|
2111 | `/usr/local', if Wget was not installed there) and read commands from
|
---|
2112 | there, if it exists.
|
---|
2113 |
|
---|
2114 | Then it will look for the user's file. If the environmental variable
|
---|
2115 | `WGETRC' is set, Wget will try to load that file. Failing that, no
|
---|
2116 | further attempts will be made.
|
---|
2117 |
|
---|
2118 | If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
|
---|
2119 |
|
---|
2120 | The fact that user's settings are loaded after the system-wide ones
|
---|
2121 | means that in case of collision user's wgetrc _overrides_ the
|
---|
2122 | system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
|
---|
2123 | admins, away!
|
---|
2124 |
|
---|
2125 |
|
---|
2126 | File: wget.info, Node: Wgetrc Syntax, Next: Wgetrc Commands, Prev: Wgetrc Location, Up: Startup File
|
---|
2127 |
|
---|
2128 | 6.2 Wgetrc Syntax
|
---|
2129 | =================
|
---|
2130 |
|
---|
2131 | The syntax of a wgetrc command is simple:
|
---|
2132 |
|
---|
2133 | variable = value
|
---|
2134 |
|
---|
2135 | The "variable" will also be called "command". Valid "values" are
|
---|
2136 | different for different commands.
|
---|
2137 |
|
---|
2138 | The commands are case-insensitive and underscore-insensitive. Thus
|
---|
2139 | `DIr__PrefiX' is the same as `dirprefix'. Empty lines, lines beginning
|
---|
2140 | with `#' and lines containing white-space only are discarded.
|
---|
2141 |
|
---|
2142 | Commands that expect a comma-separated list will clear the list on an
|
---|
2143 | empty command. So, if you wish to reset the rejection list specified in
|
---|
2144 | global `wgetrc', you can do it with:
|
---|
2145 |
|
---|
2146 | reject =
|
---|
2147 |
|
---|
2148 |
|
---|
2149 | File: wget.info, Node: Wgetrc Commands, Next: Sample Wgetrc, Prev: Wgetrc Syntax, Up: Startup File
|
---|
2150 |
|
---|
2151 | 6.3 Wgetrc Commands
|
---|
2152 | ===================
|
---|
2153 |
|
---|
2154 | The complete set of commands is listed below. Legal values are listed
|
---|
2155 | after the `='. Simple Boolean values can be set or unset using `on'
|
---|
2156 | and `off' or `1' and `0'. A fancier kind of Boolean allowed in some
|
---|
2157 | cases is the "lockable Boolean", which may be set to `on', `off',
|
---|
2158 | `always', or `never'. If an option is set to `always' or `never', that
|
---|
2159 | value will be locked in for the duration of the Wget
|
---|
2160 | invocation--command-line options will not override.
|
---|
2161 |
|
---|
2162 | Some commands take pseudo-arbitrary values. ADDRESS values can be
|
---|
2163 | hostnames or dotted-quad IP addresses. N can be any positive integer,
|
---|
2164 | or `inf' for infinity, where appropriate. STRING values can be any
|
---|
2165 | non-empty string.
|
---|
2166 |
|
---|
2167 | Most of these commands have direct command-line equivalents. Also,
|
---|
2168 | any wgetrc command can be specified on the command line using the
|
---|
2169 | `--execute' switch (*note Basic Startup Options::.)
|
---|
2170 |
|
---|
2171 | accept/reject = STRING
|
---|
2172 | Same as `-A'/`-R' (*note Types of Files::).
|
---|
2173 |
|
---|
2174 | add_hostdir = on/off
|
---|
2175 | Enable/disable host-prefixed file names. `-nH' disables it.
|
---|
2176 |
|
---|
2177 | continue = on/off
|
---|
2178 | If set to on, force continuation of preexistent partially retrieved
|
---|
2179 | files. See `-c' before setting it.
|
---|
2180 |
|
---|
2181 | background = on/off
|
---|
2182 | Enable/disable going to background--the same as `-b' (which
|
---|
2183 | enables it).
|
---|
2184 |
|
---|
2185 | backup_converted = on/off
|
---|
2186 | Enable/disable saving pre-converted files with the suffix
|
---|
2187 | `.orig'--the same as `-K' (which enables it).
|
---|
2188 |
|
---|
2189 | base = STRING
|
---|
2190 | Consider relative URLs in URL input files forced to be interpreted
|
---|
2191 | as HTML as being relative to STRING--the same as `--base=STRING'.
|
---|
2192 |
|
---|
2193 | bind_address = ADDRESS
|
---|
2194 | Bind to ADDRESS, like the `--bind-address=ADDRESS'.
|
---|
2195 |
|
---|
2196 | ca_certificate = FILE
|
---|
2197 | Set the certificate authority bundle file to FILE. The same as
|
---|
2198 | `--ca-certificate=FILE'.
|
---|
2199 |
|
---|
2200 | ca_directory = DIRECTORY
|
---|
2201 | Set the directory used for certificate authorities. The same as
|
---|
2202 | `--ca-directory=DIRECTORY'.
|
---|
2203 |
|
---|
2204 | cache = on/off
|
---|
2205 | When set to off, disallow server-caching. See the `--no-cache'
|
---|
2206 | option.
|
---|
2207 |
|
---|
2208 | certificate = FILE
|
---|
2209 | Set the client certificate file name to FILE. The same as
|
---|
2210 | `--certificate=FILE'.
|
---|
2211 |
|
---|
2212 | certificate_type = STRING
|
---|
2213 | Specify the type of the client certificate, legal values being
|
---|
2214 | `PEM' (the default) and `DER' (aka ASN1). The same as
|
---|
2215 | `--certificate-type=STRING'.
|
---|
2216 |
|
---|
2217 | check_certificate = on/off
|
---|
2218 | If this is set to off, the server certificate is not checked
|
---|
2219 | against the specified client authorities. The default is "on".
|
---|
2220 | The same as `--check-certificate'.
|
---|
2221 |
|
---|
2222 | convert_links = on/off
|
---|
2223 | Convert non-relative links locally. The same as `-k'.
|
---|
2224 |
|
---|
2225 | cookies = on/off
|
---|
2226 | When set to off, disallow cookies. See the `--cookies' option.
|
---|
2227 |
|
---|
2228 | connect_timeout = N
|
---|
2229 | Set the connect timeout--the same as `--connect-timeout'.
|
---|
2230 |
|
---|
2231 | cut_dirs = N
|
---|
2232 | Ignore N remote directory components. Equivalent to
|
---|
2233 | `--cut-dirs=N'.
|
---|
2234 |
|
---|
2235 | debug = on/off
|
---|
2236 | Debug mode, same as `-d'.
|
---|
2237 |
|
---|
2238 | delete_after = on/off
|
---|
2239 | Delete after download--the same as `--delete-after'.
|
---|
2240 |
|
---|
2241 | dir_prefix = STRING
|
---|
2242 | Top of directory tree--the same as `-P STRING'.
|
---|
2243 |
|
---|
2244 | dirstruct = on/off
|
---|
2245 | Turning dirstruct on or off--the same as `-x' or `-nd',
|
---|
2246 | respectively.
|
---|
2247 |
|
---|
2248 | dns_cache = on/off
|
---|
2249 | Turn DNS caching on/off. Since DNS caching is on by default, this
|
---|
2250 | option is normally used to turn it off and is equivalent to
|
---|
2251 | `--no-dns-cache'.
|
---|
2252 |
|
---|
2253 | dns_timeout = N
|
---|
2254 | Set the DNS timeout--the same as `--dns-timeout'.
|
---|
2255 |
|
---|
2256 | domains = STRING
|
---|
2257 | Same as `-D' (*note Spanning Hosts::).
|
---|
2258 |
|
---|
2259 | dot_bytes = N
|
---|
2260 | Specify the number of bytes "contained" in a dot, as seen
|
---|
2261 | throughout the retrieval (1024 by default). You can postfix the
|
---|
2262 | value with `k' or `m', representing kilobytes and megabytes,
|
---|
2263 | respectively. With dot settings you can tailor the dot retrieval
|
---|
2264 | to suit your needs, or you can use the predefined "styles" (*note
|
---|
2265 | Download Options::).
|
---|
2266 |
|
---|
2267 | dots_in_line = N
|
---|
2268 | Specify the number of dots that will be printed in each line
|
---|
2269 | throughout the retrieval (50 by default).
|
---|
2270 |
|
---|
2271 | dot_spacing = N
|
---|
2272 | Specify the number of dots in a single cluster (10 by default).
|
---|
2273 |
|
---|
2274 | egd_file = FILE
|
---|
2275 | Use STRING as the EGD socket file name. The same as
|
---|
2276 | `--egd-file=FILE'.
|
---|
2277 |
|
---|
2278 | exclude_directories = STRING
|
---|
2279 | Specify a comma-separated list of directories you wish to exclude
|
---|
2280 | from download--the same as `-X STRING' (*note Directory-Based
|
---|
2281 | Limits::).
|
---|
2282 |
|
---|
2283 | exclude_domains = STRING
|
---|
2284 | Same as `--exclude-domains=STRING' (*note Spanning Hosts::).
|
---|
2285 |
|
---|
2286 | follow_ftp = on/off
|
---|
2287 | Follow FTP links from HTML documents--the same as `--follow-ftp'.
|
---|
2288 |
|
---|
2289 | follow_tags = STRING
|
---|
2290 | Only follow certain HTML tags when doing a recursive retrieval,
|
---|
2291 | just like `--follow-tags=STRING'.
|
---|
2292 |
|
---|
2293 | force_html = on/off
|
---|
2294 | If set to on, force the input filename to be regarded as an HTML
|
---|
2295 | document--the same as `-F'.
|
---|
2296 |
|
---|
2297 | ftp_password = STRING
|
---|
2298 | Set your FTP password to STRING. Without this setting, the
|
---|
2299 | password defaults to `-wget@', which is a useful default for
|
---|
2300 | anonymous FTP access.
|
---|
2301 |
|
---|
2302 | This command used to be named `passwd' prior to Wget 1.10.
|
---|
2303 |
|
---|
2304 | ftp_proxy = STRING
|
---|
2305 | Use STRING as FTP proxy, instead of the one specified in
|
---|
2306 | environment.
|
---|
2307 |
|
---|
2308 | ftp_user = STRING
|
---|
2309 | Set FTP user to STRING.
|
---|
2310 |
|
---|
2311 | This command used to be named `login' prior to Wget 1.10.
|
---|
2312 |
|
---|
2313 | glob = on/off
|
---|
2314 | Turn globbing on/off--the same as `--glob' and `--no-glob'.
|
---|
2315 |
|
---|
2316 | header = STRING
|
---|
2317 | Define a header for HTTP doewnloads, like using `--header=STRING'.
|
---|
2318 |
|
---|
2319 | html_extension = on/off
|
---|
2320 | Add a `.html' extension to `text/html' or `application/xhtml+xml'
|
---|
2321 | files without it, like `-E'.
|
---|
2322 |
|
---|
2323 | http_keep_alive = on/off
|
---|
2324 | Turn the keep-alive feature on or off (defaults to on). Turning it
|
---|
2325 | off is equivalent to `--no-http-keep-alive'.
|
---|
2326 |
|
---|
2327 | http_password = STRING
|
---|
2328 | Set HTTP password, equivalent to `--http-password=STRING'.
|
---|
2329 |
|
---|
2330 | http_proxy = STRING
|
---|
2331 | Use STRING as HTTP proxy, instead of the one specified in
|
---|
2332 | environment.
|
---|
2333 |
|
---|
2334 | http_user = STRING
|
---|
2335 | Set HTTP user to STRING, equivalent to `--http-user=STRING'.
|
---|
2336 |
|
---|
2337 | ignore_length = on/off
|
---|
2338 | When set to on, ignore `Content-Length' header; the same as
|
---|
2339 | `--ignore-length'.
|
---|
2340 |
|
---|
2341 | ignore_tags = STRING
|
---|
2342 | Ignore certain HTML tags when doing a recursive retrieval, like
|
---|
2343 | `--ignore-tags=STRING'.
|
---|
2344 |
|
---|
2345 | include_directories = STRING
|
---|
2346 | Specify a comma-separated list of directories you wish to follow
|
---|
2347 | when downloading--the same as `-I STRING'.
|
---|
2348 |
|
---|
2349 | inet4_only = on/off
|
---|
2350 | Force connecting to IPv4 addresses, off by default. You can put
|
---|
2351 | this in the global init file to disable Wget's attempts to resolve
|
---|
2352 | and connect to IPv6 hosts. Available only if Wget was compiled
|
---|
2353 | with IPv6 support. The same as `--inet4-only' or `-4'.
|
---|
2354 |
|
---|
2355 | inet6_only = on/off
|
---|
2356 | Force connecting to IPv6 addresses, off by default. Available
|
---|
2357 | only if Wget was compiled with IPv6 support. The same as
|
---|
2358 | `--inet6-only' or `-6'.
|
---|
2359 |
|
---|
2360 | input = FILE
|
---|
2361 | Read the URLs from STRING, like `-i FILE'.
|
---|
2362 |
|
---|
2363 | limit_rate = RATE
|
---|
2364 | Limit the download speed to no more than RATE bytes per second.
|
---|
2365 | The same as `--limit-rate=RATE'.
|
---|
2366 |
|
---|
2367 | load_cookies = FILE
|
---|
2368 | Load cookies from FILE. See `--load-cookies FILE'.
|
---|
2369 |
|
---|
2370 | logfile = FILE
|
---|
2371 | Set logfile to FILE, the same as `-o FILE'.
|
---|
2372 |
|
---|
2373 | mirror = on/off
|
---|
2374 | Turn mirroring on/off. The same as `-m'.
|
---|
2375 |
|
---|
2376 | netrc = on/off
|
---|
2377 | Turn reading netrc on or off.
|
---|
2378 |
|
---|
2379 | noclobber = on/off
|
---|
2380 | Same as `-nc'.
|
---|
2381 |
|
---|
2382 | no_parent = on/off
|
---|
2383 | Disallow retrieving outside the directory hierarchy, like
|
---|
2384 | `--no-parent' (*note Directory-Based Limits::).
|
---|
2385 |
|
---|
2386 | no_proxy = STRING
|
---|
2387 | Use STRING as the comma-separated list of domains to avoid in
|
---|
2388 | proxy loading, instead of the one specified in environment.
|
---|
2389 |
|
---|
2390 | output_document = FILE
|
---|
2391 | Set the output filename--the same as `-O FILE'.
|
---|
2392 |
|
---|
2393 | page_requisites = on/off
|
---|
2394 | Download all ancillary documents necessary for a single HTML page
|
---|
2395 | to display properly--the same as `-p'.
|
---|
2396 |
|
---|
2397 | passive_ftp = on/off/always/never
|
---|
2398 | Change setting of passive FTP, equivalent to the `--passive-ftp'
|
---|
2399 | option. Some scripts and `.pm' (Perl module) files download files
|
---|
2400 | using `wget --passive-ftp'. If your firewall does not allow this,
|
---|
2401 | you can set `passive_ftp = never' to override the command-line.
|
---|
2402 |
|
---|
2403 | password = STRING
|
---|
2404 | Specify password STRING for both FTP and HTTP file retrieval.
|
---|
2405 | This command can be overridden using the `ftp_password' and
|
---|
2406 | `http_password' command for FTP and HTTP respectively.
|
---|
2407 |
|
---|
2408 | post_data = STRING
|
---|
2409 | Use POST as the method for all HTTP requests and send STRING in
|
---|
2410 | the request body. The same as `--post-data=STRING'.
|
---|
2411 |
|
---|
2412 | post_file = FILE
|
---|
2413 | Use POST as the method for all HTTP requests and send the contents
|
---|
2414 | of FILE in the request body. The same as `--post-file=FILE'.
|
---|
2415 |
|
---|
2416 | prefer_family = IPv4/IPv6/none
|
---|
2417 | When given a choice of several addresses, connect to the addresses
|
---|
2418 | with specified address family first. IPv4 addresses are preferred
|
---|
2419 | by default. The same as `--prefer-family', which see for a
|
---|
2420 | detailed discussion of why this is useful.
|
---|
2421 |
|
---|
2422 | private_key = FILE
|
---|
2423 | Set the private key file to FILE. The same as
|
---|
2424 | `--private-key=FILE'.
|
---|
2425 |
|
---|
2426 | private_key_type = STRING
|
---|
2427 | Specify the type of the private key, legal values being `PEM' (the
|
---|
2428 | default) and `DER' (aka ASN1). The same as
|
---|
2429 | `--private-type=STRING'.
|
---|
2430 |
|
---|
2431 | progress = STRING
|
---|
2432 | Set the type of the progress indicator. Legal types are `dot' and
|
---|
2433 | `bar'. Equivalent to `--progress=STRING'.
|
---|
2434 |
|
---|
2435 | protocol_directories = on/off
|
---|
2436 | When set, use the protocol name as a directory component of local
|
---|
2437 | file names. The same as `--protocol-directories'.
|
---|
2438 |
|
---|
2439 | proxy_user = STRING
|
---|
2440 | Set proxy authentication user name to STRING, like
|
---|
2441 | `--proxy-user=STRING'.
|
---|
2442 |
|
---|
2443 | proxy_password = STRING
|
---|
2444 | Set proxy authentication password to STRING, like
|
---|
2445 | `--proxy-password=STRING'.
|
---|
2446 |
|
---|
2447 | quiet = on/off
|
---|
2448 | Quiet mode--the same as `-q'.
|
---|
2449 |
|
---|
2450 | quota = QUOTA
|
---|
2451 | Specify the download quota, which is useful to put in the global
|
---|
2452 | `wgetrc'. When download quota is specified, Wget will stop
|
---|
2453 | retrieving after the download sum has become greater than quota.
|
---|
2454 | The quota can be specified in bytes (default), kbytes `k'
|
---|
2455 | appended) or mbytes (`m' appended). Thus `quota = 5m' will set
|
---|
2456 | the quota to 5 megabytes. Note that the user's startup file
|
---|
2457 | overrides system settings.
|
---|
2458 |
|
---|
2459 | random_file = FILE
|
---|
2460 | Use FILE as a source of randomness on systems lacking
|
---|
2461 | `/dev/random'.
|
---|
2462 |
|
---|
2463 | read_timeout = N
|
---|
2464 | Set the read (and write) timeout--the same as `--read-timeout=N'.
|
---|
2465 |
|
---|
2466 | reclevel = N
|
---|
2467 | Recursion level (depth)--the same as `-l N'.
|
---|
2468 |
|
---|
2469 | recursive = on/off
|
---|
2470 | Recursive on/off--the same as `-r'.
|
---|
2471 |
|
---|
2472 | referer = STRING
|
---|
2473 | Set HTTP `Referer:' header just like `--referer=STRING'. (Note it
|
---|
2474 | was the folks who wrote the HTTP spec who got the spelling of
|
---|
2475 | "referrer" wrong.)
|
---|
2476 |
|
---|
2477 | relative_only = on/off
|
---|
2478 | Follow only relative links--the same as `-L' (*note Relative
|
---|
2479 | Links::).
|
---|
2480 |
|
---|
2481 | remove_listing = on/off
|
---|
2482 | If set to on, remove FTP listings downloaded by Wget. Setting it
|
---|
2483 | to off is the same as `--no-remove-listing'.
|
---|
2484 |
|
---|
2485 | restrict_file_names = unix/windows
|
---|
2486 | Restrict the file names generated by Wget from URLs. See
|
---|
2487 | `--restrict-file-names' for a more detailed description.
|
---|
2488 |
|
---|
2489 | retr_symlinks = on/off
|
---|
2490 | When set to on, retrieve symbolic links as if they were plain
|
---|
2491 | files; the same as `--retr-symlinks'.
|
---|
2492 |
|
---|
2493 | retry_connrefused = on/off
|
---|
2494 | When set to on, consider "connection refused" a transient
|
---|
2495 | error--the same as `--retry-connrefused'.
|
---|
2496 |
|
---|
2497 | robots = on/off
|
---|
2498 | Specify whether the norobots convention is respected by Wget, "on"
|
---|
2499 | by default. This switch controls both the `/robots.txt' and the
|
---|
2500 | `nofollow' aspect of the spec. *Note Robot Exclusion::, for more
|
---|
2501 | details about this. Be sure you know what you are doing before
|
---|
2502 | turning this off.
|
---|
2503 |
|
---|
2504 | save_cookies = FILE
|
---|
2505 | Save cookies to FILE. The same as `--save-cookies FILE'.
|
---|
2506 |
|
---|
2507 | secure_protocol = STRING
|
---|
2508 | Choose the secure protocol to be used. Legal values are `auto'
|
---|
2509 | (the default), `SSLv2', `SSLv3', and `TLSv1'. The same as
|
---|
2510 | `--secure-protocol=STRING'.
|
---|
2511 |
|
---|
2512 | server_response = on/off
|
---|
2513 | Choose whether or not to print the HTTP and FTP server
|
---|
2514 | responses--the same as `-S'.
|
---|
2515 |
|
---|
2516 | span_hosts = on/off
|
---|
2517 | Same as `-H'.
|
---|
2518 |
|
---|
2519 | strict_comments = on/off
|
---|
2520 | Same as `--strict-comments'.
|
---|
2521 |
|
---|
2522 | timeout = N
|
---|
2523 | Set all applicable timeout values to N, the same as `-T N'.
|
---|
2524 |
|
---|
2525 | timestamping = on/off
|
---|
2526 | Turn timestamping on/off. The same as `-N' (*note
|
---|
2527 | Time-Stamping::).
|
---|
2528 |
|
---|
2529 | tries = N
|
---|
2530 | Set number of retries per URL--the same as `-t N'.
|
---|
2531 |
|
---|
2532 | use_proxy = on/off
|
---|
2533 | When set to off, don't use proxy even when proxy-related
|
---|
2534 | environment variables are set. In that case it is the same as
|
---|
2535 | using `--no-proxy'.
|
---|
2536 |
|
---|
2537 | user = STRING
|
---|
2538 | Specify username STRING for both FTP and HTTP file retrieval.
|
---|
2539 | This command can be overridden using the `ftp_user' and
|
---|
2540 | `http_user' command for FTP and HTTP respectively.
|
---|
2541 |
|
---|
2542 | verbose = on/off
|
---|
2543 | Turn verbose on/off--the same as `-v'/`-nv'.
|
---|
2544 |
|
---|
2545 | wait = N
|
---|
2546 | Wait N seconds between retrievals--the same as `-w N'.
|
---|
2547 |
|
---|
2548 | waitretry = N
|
---|
2549 | Wait up to N seconds between retries of failed retrievals
|
---|
2550 | only--the same as `--waitretry=N'. Note that this is turned on by
|
---|
2551 | default in the global `wgetrc'.
|
---|
2552 |
|
---|
2553 | randomwait = on/off
|
---|
2554 | Turn random between-request wait times on or off. The same as
|
---|
2555 | `--random-wait'.
|
---|
2556 |
|
---|
2557 |
|
---|
2558 | File: wget.info, Node: Sample Wgetrc, Prev: Wgetrc Commands, Up: Startup File
|
---|
2559 |
|
---|
2560 | 6.4 Sample Wgetrc
|
---|
2561 | =================
|
---|
2562 |
|
---|
2563 | This is the sample initialization file, as given in the distribution.
|
---|
2564 | It is divided in two section--one for global usage (suitable for global
|
---|
2565 | startup file), and one for local usage (suitable for `$HOME/.wgetrc').
|
---|
2566 | Be careful about the things you change.
|
---|
2567 |
|
---|
2568 | Note that almost all the lines are commented out. For a command to
|
---|
2569 | have any effect, you must remove the `#' character at the beginning of
|
---|
2570 | its line.
|
---|
2571 |
|
---|
2572 | ###
|
---|
2573 | ### Sample Wget initialization file .wgetrc
|
---|
2574 | ###
|
---|
2575 |
|
---|
2576 | ## You can use this file to change the default behaviour of wget or to
|
---|
2577 | ## avoid having to type many many command-line options. This file does
|
---|
2578 | ## not contain a comprehensive list of commands -- look at the manual
|
---|
2579 | ## to find out what you can put into this file.
|
---|
2580 | ##
|
---|
2581 | ## Wget initialization file can reside in /usr/local/etc/wgetrc
|
---|
2582 | ## (global, for all users) or $HOME/.wgetrc (for a single user).
|
---|
2583 | ##
|
---|
2584 | ## To use the settings in this file, you will have to uncomment them,
|
---|
2585 | ## as well as change them, in most cases, as the values on the
|
---|
2586 | ## commented-out lines are the default values (e.g. "off").
|
---|
2587 |
|
---|
2588 |
|
---|
2589 | ##
|
---|
2590 | ## Global settings (useful for setting up in /usr/local/etc/wgetrc).
|
---|
2591 | ## Think well before you change them, since they may reduce wget's
|
---|
2592 | ## functionality, and make it behave contrary to the documentation:
|
---|
2593 | ##
|
---|
2594 |
|
---|
2595 | # You can set retrieve quota for beginners by specifying a value
|
---|
2596 | # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
|
---|
2597 | # default quota is unlimited.
|
---|
2598 | #quota = inf
|
---|
2599 |
|
---|
2600 | # You can lower (or raise) the default number of retries when
|
---|
2601 | # downloading a file (default is 20).
|
---|
2602 | #tries = 20
|
---|
2603 |
|
---|
2604 | # Lowering the maximum depth of the recursive retrieval is handy to
|
---|
2605 | # prevent newbies from going too "deep" when they unwittingly start
|
---|
2606 | # the recursive retrieval. The default is 5.
|
---|
2607 | #reclevel = 5
|
---|
2608 |
|
---|
2609 | # By default Wget uses "passive FTP" transfer where the client
|
---|
2610 | # initiates the data connection to the server rather than the other
|
---|
2611 | # way around. That is required on systems behind NAT where the client
|
---|
2612 | # computer cannot be easily reached from the Internet. However, some
|
---|
2613 | # firewalls software explicitly supports active FTP and in fact has
|
---|
2614 | # problems supporting passive transfer. If you are in such
|
---|
2615 | # environment, use "passive_ftp = off" to revert to active FTP.
|
---|
2616 | #passive_ftp = off
|
---|
2617 |
|
---|
2618 | # The "wait" command below makes Wget wait between every connection.
|
---|
2619 | # If, instead, you want Wget to wait only between retries of failed
|
---|
2620 | # downloads, set waitretry to maximum number of seconds to wait (Wget
|
---|
2621 | # will use "linear backoff", waiting 1 second after the first failure
|
---|
2622 | # on a file, 2 seconds after the second failure, etc. up to this max).
|
---|
2623 | waitretry = 10
|
---|
2624 |
|
---|
2625 |
|
---|
2626 | ##
|
---|
2627 | ## Local settings (for a user to set in his $HOME/.wgetrc). It is
|
---|
2628 | ## *highly* undesirable to put these settings in the global file, since
|
---|
2629 | ## they are potentially dangerous to "normal" users.
|
---|
2630 | ##
|
---|
2631 | ## Even when setting up your own ~/.wgetrc, you should know what you
|
---|
2632 | ## are doing before doing so.
|
---|
2633 | ##
|
---|
2634 |
|
---|
2635 | # Set this to on to use timestamping by default:
|
---|
2636 | #timestamping = off
|
---|
2637 |
|
---|
2638 | # It is a good idea to make Wget send your email address in a `From:'
|
---|
2639 | # header with your request (so that server administrators can contact
|
---|
2640 | # you in case of errors). Wget does *not* send `From:' by default.
|
---|
2641 | #header = From: Your Name <username@site.domain>
|
---|
2642 |
|
---|
2643 | # You can set up other headers, like Accept-Language. Accept-Language
|
---|
2644 | # is *not* sent by default.
|
---|
2645 | #header = Accept-Language: en
|
---|
2646 |
|
---|
2647 | # You can set the default proxies for Wget to use for http and ftp.
|
---|
2648 | # They will override the value in the environment.
|
---|
2649 | #http_proxy = http://proxy.yoyodyne.com:18023/
|
---|
2650 | #ftp_proxy = http://proxy.yoyodyne.com:18023/
|
---|
2651 |
|
---|
2652 | # If you do not want to use proxy at all, set this to off.
|
---|
2653 | #use_proxy = on
|
---|
2654 |
|
---|
2655 | # You can customize the retrieval outlook. Valid options are default,
|
---|
2656 | # binary, mega and micro.
|
---|
2657 | #dot_style = default
|
---|
2658 |
|
---|
2659 | # Setting this to off makes Wget not download /robots.txt. Be sure to
|
---|
2660 | # know *exactly* what /robots.txt is and how it is used before changing
|
---|
2661 | # the default!
|
---|
2662 | #robots = on
|
---|
2663 |
|
---|
2664 | # It can be useful to make Wget wait between connections. Set this to
|
---|
2665 | # the number of seconds you want Wget to wait.
|
---|
2666 | #wait = 0
|
---|
2667 |
|
---|
2668 | # You can force creating directory structure, even if a single is being
|
---|
2669 | # retrieved, by setting this to on.
|
---|
2670 | #dirstruct = off
|
---|
2671 |
|
---|
2672 | # You can turn on recursive retrieving by default (don't do this if
|
---|
2673 | # you are not sure you know what it means) by setting this to on.
|
---|
2674 | #recursive = off
|
---|
2675 |
|
---|
2676 | # To always back up file X as X.orig before converting its links (due
|
---|
2677 | # to -k / --convert-links / convert_links = on having been specified),
|
---|
2678 | # set this variable to on:
|
---|
2679 | #backup_converted = off
|
---|
2680 |
|
---|
2681 | # To have Wget follow FTP links from HTML files by default, set this
|
---|
2682 | # to on:
|
---|
2683 | #follow_ftp = off
|
---|
2684 |
|
---|
2685 |
|
---|
2686 | File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
|
---|
2687 |
|
---|
2688 | 7 Examples
|
---|
2689 | **********
|
---|
2690 |
|
---|
2691 | The examples are divided into three sections loosely based on their
|
---|
2692 | complexity.
|
---|
2693 |
|
---|
2694 | * Menu:
|
---|
2695 |
|
---|
2696 | * Simple Usage:: Simple, basic usage of the program.
|
---|
2697 | * Advanced Usage:: Advanced tips.
|
---|
2698 | * Very Advanced Usage:: The hairy stuff.
|
---|
2699 |
|
---|
2700 |
|
---|
2701 | File: wget.info, Node: Simple Usage, Next: Advanced Usage, Up: Examples
|
---|
2702 |
|
---|
2703 | 7.1 Simple Usage
|
---|
2704 | ================
|
---|
2705 |
|
---|
2706 | * Say you want to download a URL. Just type:
|
---|
2707 |
|
---|
2708 | wget http://fly.srk.fer.hr/
|
---|
2709 |
|
---|
2710 | * But what will happen if the connection is slow, and the file is
|
---|
2711 | lengthy? The connection will probably fail before the whole file
|
---|
2712 | is retrieved, more than once. In this case, Wget will try getting
|
---|
2713 | the file until it either gets the whole of it, or exceeds the
|
---|
2714 | default number of retries (this being 20). It is easy to change
|
---|
2715 | the number of tries to 45, to insure that the whole file will
|
---|
2716 | arrive safely:
|
---|
2717 |
|
---|
2718 | wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
|
---|
2719 |
|
---|
2720 | * Now let's leave Wget to work in the background, and write its
|
---|
2721 | progress to log file `log'. It is tiring to type `--tries', so we
|
---|
2722 | shall use `-t'.
|
---|
2723 |
|
---|
2724 | wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
|
---|
2725 |
|
---|
2726 | The ampersand at the end of the line makes sure that Wget works in
|
---|
2727 | the background. To unlimit the number of retries, use `-t inf'.
|
---|
2728 |
|
---|
2729 | * The usage of FTP is as simple. Wget will take care of login and
|
---|
2730 | password.
|
---|
2731 |
|
---|
2732 | wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
---|
2733 |
|
---|
2734 | * If you specify a directory, Wget will retrieve the directory
|
---|
2735 | listing, parse it and convert it to HTML. Try:
|
---|
2736 |
|
---|
2737 | wget ftp://ftp.gnu.org/pub/gnu/
|
---|
2738 | links index.html
|
---|
2739 |
|
---|
2740 |
|
---|
2741 | File: wget.info, Node: Advanced Usage, Next: Very Advanced Usage, Prev: Simple Usage, Up: Examples
|
---|
2742 |
|
---|
2743 | 7.2 Advanced Usage
|
---|
2744 | ==================
|
---|
2745 |
|
---|
2746 | * You have a file that contains the URLs you want to download? Use
|
---|
2747 | the `-i' switch:
|
---|
2748 |
|
---|
2749 | wget -i FILE
|
---|
2750 |
|
---|
2751 | If you specify `-' as file name, the URLs will be read from
|
---|
2752 | standard input.
|
---|
2753 |
|
---|
2754 | * Create a five levels deep mirror image of the GNU web site, with
|
---|
2755 | the same directory structure the original has, with only one try
|
---|
2756 | per document, saving the log of the activities to `gnulog':
|
---|
2757 |
|
---|
2758 | wget -r http://www.gnu.org/ -o gnulog
|
---|
2759 |
|
---|
2760 | * The same as the above, but convert the links in the HTML files to
|
---|
2761 | point to local files, so you can view the documents off-line:
|
---|
2762 |
|
---|
2763 | wget --convert-links -r http://www.gnu.org/ -o gnulog
|
---|
2764 |
|
---|
2765 | * Retrieve only one HTML page, but make sure that all the elements
|
---|
2766 | needed for the page to be displayed, such as inline images and
|
---|
2767 | external style sheets, are also downloaded. Also make sure the
|
---|
2768 | downloaded page references the downloaded links.
|
---|
2769 |
|
---|
2770 | wget -p --convert-links http://www.server.com/dir/page.html
|
---|
2771 |
|
---|
2772 | The HTML page will be saved to `www.server.com/dir/page.html', and
|
---|
2773 | the images, stylesheets, etc., somewhere under `www.server.com/',
|
---|
2774 | depending on where they were on the remote server.
|
---|
2775 |
|
---|
2776 | * The same as the above, but without the `www.server.com/' directory.
|
---|
2777 | In fact, I don't want to have all those random server directories
|
---|
2778 | anyway--just save _all_ those files under a `download/'
|
---|
2779 | subdirectory of the current directory.
|
---|
2780 |
|
---|
2781 | wget -p --convert-links -nH -nd -Pdownload \
|
---|
2782 | http://www.server.com/dir/page.html
|
---|
2783 |
|
---|
2784 | * Retrieve the index.html of `www.lycos.com', showing the original
|
---|
2785 | server headers:
|
---|
2786 |
|
---|
2787 | wget -S http://www.lycos.com/
|
---|
2788 |
|
---|
2789 | * Save the server headers with the file, perhaps for post-processing.
|
---|
2790 |
|
---|
2791 | wget --save-headers http://www.lycos.com/
|
---|
2792 | more index.html
|
---|
2793 |
|
---|
2794 | * Retrieve the first two levels of `wuarchive.wustl.edu', saving them
|
---|
2795 | to `/tmp'.
|
---|
2796 |
|
---|
2797 | wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
|
---|
2798 |
|
---|
2799 | * You want to download all the GIFs from a directory on an HTTP
|
---|
2800 | server. You tried `wget http://www.server.com/dir/*.gif', but that
|
---|
2801 | didn't work because HTTP retrieval does not support globbing. In
|
---|
2802 | that case, use:
|
---|
2803 |
|
---|
2804 | wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
|
---|
2805 |
|
---|
2806 | More verbose, but the effect is the same. `-r -l1' means to
|
---|
2807 | retrieve recursively (*note Recursive Download::), with maximum
|
---|
2808 | depth of 1. `--no-parent' means that references to the parent
|
---|
2809 | directory are ignored (*note Directory-Based Limits::), and
|
---|
2810 | `-A.gif' means to download only the GIF files. `-A "*.gif"' would
|
---|
2811 | have worked too.
|
---|
2812 |
|
---|
2813 | * Suppose you were in the middle of downloading, when Wget was
|
---|
2814 | interrupted. Now you do not want to clobber the files already
|
---|
2815 | present. It would be:
|
---|
2816 |
|
---|
2817 | wget -nc -r http://www.gnu.org/
|
---|
2818 |
|
---|
2819 | * If you want to encode your own username and password to HTTP or
|
---|
2820 | FTP, use the appropriate URL syntax (*note URL Format::).
|
---|
2821 |
|
---|
2822 | wget ftp://hniksic:mypassword@unix.server.com/.emacs
|
---|
2823 |
|
---|
2824 | Note, however, that this usage is not advisable on multi-user
|
---|
2825 | systems because it reveals your password to anyone who looks at
|
---|
2826 | the output of `ps'.
|
---|
2827 |
|
---|
2828 | * You would like the output documents to go to standard output
|
---|
2829 | instead of to files?
|
---|
2830 |
|
---|
2831 | wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
---|
2832 |
|
---|
2833 | You can also combine the two options and make pipelines to
|
---|
2834 | retrieve the documents from remote hotlists:
|
---|
2835 |
|
---|
2836 | wget -O - http://cool.list.com/ | wget --force-html -i -
|
---|
2837 |
|
---|
2838 |
|
---|
2839 | File: wget.info, Node: Very Advanced Usage, Prev: Advanced Usage, Up: Examples
|
---|
2840 |
|
---|
2841 | 7.3 Very Advanced Usage
|
---|
2842 | =======================
|
---|
2843 |
|
---|
2844 | * If you wish Wget to keep a mirror of a page (or FTP
|
---|
2845 | subdirectories), use `--mirror' (`-m'), which is the shorthand for
|
---|
2846 | `-r -l inf -N'. You can put Wget in the crontab file asking it to
|
---|
2847 | recheck a site each Sunday:
|
---|
2848 |
|
---|
2849 | crontab
|
---|
2850 | 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
|
---|
2851 |
|
---|
2852 | * In addition to the above, you want the links to be converted for
|
---|
2853 | local viewing. But, after having read this manual, you know that
|
---|
2854 | link conversion doesn't play well with timestamping, so you also
|
---|
2855 | want Wget to back up the original HTML files before the
|
---|
2856 | conversion. Wget invocation would look like this:
|
---|
2857 |
|
---|
2858 | wget --mirror --convert-links --backup-converted \
|
---|
2859 | http://www.gnu.org/ -o /home/me/weeklog
|
---|
2860 |
|
---|
2861 | * But you've also noticed that local viewing doesn't work all that
|
---|
2862 | well when HTML files are saved under extensions other than `.html',
|
---|
2863 | perhaps because they were served as `index.cgi'. So you'd like
|
---|
2864 | Wget to rename all the files served with content-type `text/html'
|
---|
2865 | or `application/xhtml+xml' to `NAME.html'.
|
---|
2866 |
|
---|
2867 | wget --mirror --convert-links --backup-converted \
|
---|
2868 | --html-extension -o /home/me/weeklog \
|
---|
2869 | http://www.gnu.org/
|
---|
2870 |
|
---|
2871 | Or, with less typing:
|
---|
2872 |
|
---|
2873 | wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
|
---|
2874 |
|
---|
2875 |
|
---|
2876 | File: wget.info, Node: Various, Next: Appendices, Prev: Examples, Up: Top
|
---|
2877 |
|
---|
2878 | 8 Various
|
---|
2879 | *********
|
---|
2880 |
|
---|
2881 | This chapter contains all the stuff that could not fit anywhere else.
|
---|
2882 |
|
---|
2883 | * Menu:
|
---|
2884 |
|
---|
2885 | * Proxies:: Support for proxy servers
|
---|
2886 | * Distribution:: Getting the latest version.
|
---|
2887 | * Mailing List:: Wget mailing list for announcements and discussion.
|
---|
2888 | * Reporting Bugs:: How and where to report bugs.
|
---|
2889 | * Portability:: The systems Wget works on.
|
---|
2890 | * Signals:: Signal-handling performed by Wget.
|
---|
2891 |
|
---|
2892 |
|
---|
2893 | File: wget.info, Node: Proxies, Next: Distribution, Up: Various
|
---|
2894 |
|
---|
2895 | 8.1 Proxies
|
---|
2896 | ===========
|
---|
2897 |
|
---|
2898 | "Proxies" are special-purpose HTTP servers designed to transfer data
|
---|
2899 | from remote servers to local clients. One typical use of proxies is
|
---|
2900 | lightening network load for users behind a slow connection. This is
|
---|
2901 | achieved by channeling all HTTP and FTP requests through the proxy
|
---|
2902 | which caches the transferred data. When a cached resource is requested
|
---|
2903 | again, proxy will return the data from cache. Another use for proxies
|
---|
2904 | is for companies that separate (for security reasons) their internal
|
---|
2905 | networks from the rest of Internet. In order to obtain information
|
---|
2906 | from the Web, their users connect and retrieve remote data using an
|
---|
2907 | authorized proxy.
|
---|
2908 |
|
---|
2909 | Wget supports proxies for both HTTP and FTP retrievals. The
|
---|
2910 | standard way to specify proxy location, which Wget recognizes, is using
|
---|
2911 | the following environment variables:
|
---|
2912 |
|
---|
2913 | `http_proxy'
|
---|
2914 | This variable should contain the URL of the proxy for HTTP
|
---|
2915 | connections.
|
---|
2916 |
|
---|
2917 | `ftp_proxy'
|
---|
2918 | This variable should contain the URL of the proxy for FTP
|
---|
2919 | connections. It is quite common that HTTP_PROXY and FTP_PROXY are
|
---|
2920 | set to the same URL.
|
---|
2921 |
|
---|
2922 | `no_proxy'
|
---|
2923 | This variable should contain a comma-separated list of domain
|
---|
2924 | extensions proxy should _not_ be used for. For instance, if the
|
---|
2925 | value of `no_proxy' is `.mit.edu', proxy will not be used to
|
---|
2926 | retrieve documents from MIT.
|
---|
2927 |
|
---|
2928 | In addition to the environment variables, proxy location and settings
|
---|
2929 | may be specified from within Wget itself.
|
---|
2930 |
|
---|
2931 | `--no-proxy'
|
---|
2932 | `proxy = on/off'
|
---|
2933 | This option and the corresponding command may be used to suppress
|
---|
2934 | the use of proxy, even if the appropriate environment variables
|
---|
2935 | are set.
|
---|
2936 |
|
---|
2937 | `http_proxy = URL'
|
---|
2938 | `ftp_proxy = URL'
|
---|
2939 | `no_proxy = STRING'
|
---|
2940 | These startup file variables allow you to override the proxy
|
---|
2941 | settings specified by the environment.
|
---|
2942 |
|
---|
2943 | Some proxy servers require authorization to enable you to use them.
|
---|
2944 | The authorization consists of "username" and "password", which must be
|
---|
2945 | sent by Wget. As with HTTP authorization, several authentication
|
---|
2946 | schemes exist. For proxy authorization only the `Basic' authentication
|
---|
2947 | scheme is currently implemented.
|
---|
2948 |
|
---|
2949 | You may specify your username and password either through the proxy
|
---|
2950 | URL or through the command-line options. Assuming that the company's
|
---|
2951 | proxy is located at `proxy.company.com' at port 8001, a proxy URL
|
---|
2952 | location containing authorization data might look like this:
|
---|
2953 |
|
---|
2954 | http://hniksic:mypassword@proxy.company.com:8001/
|
---|
2955 |
|
---|
2956 | Alternatively, you may use the `proxy-user' and `proxy-password'
|
---|
2957 | options, and the equivalent `.wgetrc' settings `proxy_user' and
|
---|
2958 | `proxy_password' to set the proxy username and password.
|
---|
2959 |
|
---|
2960 |
|
---|
2961 | File: wget.info, Node: Distribution, Next: Mailing List, Prev: Proxies, Up: Various
|
---|
2962 |
|
---|
2963 | 8.2 Distribution
|
---|
2964 | ================
|
---|
2965 |
|
---|
2966 | Like all GNU utilities, the latest version of Wget can be found at the
|
---|
2967 | master GNU archive site ftp.gnu.org, and its mirrors. For example,
|
---|
2968 | Wget 1.10.2 can be found at
|
---|
2969 | `ftp://ftp.gnu.org/pub/gnu/wget/wget-1.10.2.tar.gz'
|
---|
2970 |
|
---|
2971 |
|
---|
2972 | File: wget.info, Node: Mailing List, Next: Reporting Bugs, Prev: Distribution, Up: Various
|
---|
2973 |
|
---|
2974 | 8.3 Mailing List
|
---|
2975 | ================
|
---|
2976 |
|
---|
2977 | There are several Wget-related mailing lists, all hosted by SunSITE.dk.
|
---|
2978 | The general discussion list is at <wget@sunsite.dk>. It is the
|
---|
2979 | preferred place for bug reports and suggestions, as well as for
|
---|
2980 | discussion of development. You are invited to subscribe.
|
---|
2981 |
|
---|
2982 | To subscribe, simply send mail to <wget-subscribe@sunsite.dk> and
|
---|
2983 | follow the instructions. Unsubscribe by mailing to
|
---|
2984 | <wget-unsubscribe@sunsite.dk>. The mailing list is archived at
|
---|
2985 | `http://www.mail-archive.com/wget%40sunsite.dk/' and at
|
---|
2986 | `http://news.gmane.org/gmane.comp.web.wget.general'.
|
---|
2987 |
|
---|
2988 | The second mailing list is at <wget-patches@sunsite.dk>, and is used
|
---|
2989 | to submit patches for review by Wget developers. A "patch" is a
|
---|
2990 | textual representation of change to source code, readable by both
|
---|
2991 | humans and programs. The file `PATCHES' that comes with Wget covers
|
---|
2992 | the creation and submitting of patches in detail. Please don't send
|
---|
2993 | general suggestions or bug reports to `wget-patches'; use it only for
|
---|
2994 | patch submissions.
|
---|
2995 |
|
---|
2996 | To subscribe, simply send mail to <wget-subscribe@sunsite.dk> and
|
---|
2997 | follow the instructions. Unsubscribe by mailing to
|
---|
2998 | <wget-unsubscribe@sunsite.dk>. The mailing list is archived at
|
---|
2999 | `http://news.gmane.org/gmane.comp.web.wget.patches'.
|
---|
3000 |
|
---|
3001 |
|
---|
3002 | File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Mailing List, Up: Various
|
---|
3003 |
|
---|
3004 | 8.4 Reporting Bugs
|
---|
3005 | ==================
|
---|
3006 |
|
---|
3007 | You are welcome to send bug reports about GNU Wget to
|
---|
3008 | <bug-wget@gnu.org>.
|
---|
3009 |
|
---|
3010 | Before actually submitting a bug report, please try to follow a few
|
---|
3011 | simple guidelines.
|
---|
3012 |
|
---|
3013 | 1. Please try to ascertain that the behavior you see really is a bug.
|
---|
3014 | If Wget crashes, it's a bug. If Wget does not behave as
|
---|
3015 | documented, it's a bug. If things work strange, but you are not
|
---|
3016 | sure about the way they are supposed to work, it might well be a
|
---|
3017 | bug.
|
---|
3018 |
|
---|
3019 | 2. Try to repeat the bug in as simple circumstances as possible.
|
---|
3020 | E.g. if Wget crashes while downloading `wget -rl0 -kKE -t5 -Y0
|
---|
3021 | http://yoyodyne.com -o /tmp/log', you should try to see if the
|
---|
3022 | crash is repeatable, and if will occur with a simpler set of
|
---|
3023 | options. You might even try to start the download at the page
|
---|
3024 | where the crash occurred to see if that page somehow triggered the
|
---|
3025 | crash.
|
---|
3026 |
|
---|
3027 | Also, while I will probably be interested to know the contents of
|
---|
3028 | your `.wgetrc' file, just dumping it into the debug message is
|
---|
3029 | probably a bad idea. Instead, you should first try to see if the
|
---|
3030 | bug repeats with `.wgetrc' moved out of the way. Only if it turns
|
---|
3031 | out that `.wgetrc' settings affect the bug, mail me the relevant
|
---|
3032 | parts of the file.
|
---|
3033 |
|
---|
3034 | 3. Please start Wget with `-d' option and send us the resulting
|
---|
3035 | output (or relevant parts thereof). If Wget was compiled without
|
---|
3036 | debug support, recompile it--it is _much_ easier to trace bugs
|
---|
3037 | with debug support on.
|
---|
3038 |
|
---|
3039 | Note: please make sure to remove any potentially sensitive
|
---|
3040 | information from the debug log before sending it to the bug
|
---|
3041 | address. The `-d' won't go out of its way to collect sensitive
|
---|
3042 | information, but the log _will_ contain a fairly complete
|
---|
3043 | transcript of Wget's communication with the server, which may
|
---|
3044 | include passwords and pieces of downloaded data. Since the bug
|
---|
3045 | address is publically archived, you may assume that all bug
|
---|
3046 | reports are visible to the public.
|
---|
3047 |
|
---|
3048 | 4. If Wget has crashed, try to run it in a debugger, e.g. `gdb `which
|
---|
3049 | wget` core' and type `where' to get the backtrace. This may not
|
---|
3050 | work if the system administrator has disabled core files, but it is
|
---|
3051 | safe to try.
|
---|
3052 |
|
---|
3053 |
|
---|
3054 | File: wget.info, Node: Portability, Next: Signals, Prev: Reporting Bugs, Up: Various
|
---|
3055 |
|
---|
3056 | 8.5 Portability
|
---|
3057 | ===============
|
---|
3058 |
|
---|
3059 | Like all GNU software, Wget works on the GNU system. However, since it
|
---|
3060 | uses GNU Autoconf for building and configuring, and mostly avoids using
|
---|
3061 | "special" features of any particular Unix, it should compile (and work)
|
---|
3062 | on all common Unix flavors.
|
---|
3063 |
|
---|
3064 | Various Wget versions have been compiled and tested under many kinds
|
---|
3065 | of Unix systems, including GNU/Linux, Solaris, SunOS 4.x, OSF (aka
|
---|
3066 | Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some of
|
---|
3067 | those systems are no longer in widespread use and may not be able to
|
---|
3068 | support recent versions of Wget. If Wget fails to compile on your
|
---|
3069 | system, we would like to know about it.
|
---|
3070 |
|
---|
3071 | Thanks to kind contributors, this version of Wget compiles and works
|
---|
3072 | on 32-bit Microsoft Windows platforms. It has been compiled
|
---|
3073 | successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC
|
---|
3074 | compilers. Naturally, it is crippled of some features available on
|
---|
3075 | Unix, but it should work as a substitute for people stuck with Windows.
|
---|
3076 | Note that Windows-specific portions of Wget are not guaranteed to be
|
---|
3077 | supported in the future, although this has been the case in practice
|
---|
3078 | for many years now. All questions and problems in Windows usage should
|
---|
3079 | be reported to Wget mailing list at <wget@sunsite.dk> where the
|
---|
3080 | volunteers who maintain the Windows-related features might look at them.
|
---|
3081 |
|
---|
3082 |
|
---|
3083 | File: wget.info, Node: Signals, Prev: Portability, Up: Various
|
---|
3084 |
|
---|
3085 | 8.6 Signals
|
---|
3086 | ===========
|
---|
3087 |
|
---|
3088 | Since the purpose of Wget is background work, it catches the hangup
|
---|
3089 | signal (`SIGHUP') and ignores it. If the output was on standard
|
---|
3090 | output, it will be redirected to a file named `wget-log'. Otherwise,
|
---|
3091 | `SIGHUP' is ignored. This is convenient when you wish to redirect the
|
---|
3092 | output of Wget after having started it.
|
---|
3093 |
|
---|
3094 | $ wget http://www.gnus.org/dist/gnus.tar.gz &
|
---|
3095 | ...
|
---|
3096 | $ kill -HUP %%
|
---|
3097 | SIGHUP received, redirecting output to `wget-log'.
|
---|
3098 |
|
---|
3099 | Other than that, Wget will not try to interfere with signals in any
|
---|
3100 | way. `C-c', `kill -TERM' and `kill -KILL' should kill it alike.
|
---|
3101 |
|
---|
3102 |
|
---|
3103 | File: wget.info, Node: Appendices, Next: Copying, Prev: Various, Up: Top
|
---|
3104 |
|
---|
3105 | 9 Appendices
|
---|
3106 | ************
|
---|
3107 |
|
---|
3108 | This chapter contains some references I consider useful.
|
---|
3109 |
|
---|
3110 | * Menu:
|
---|
3111 |
|
---|
3112 | * Robot Exclusion:: Wget's support for RES.
|
---|
3113 | * Security Considerations:: Security with Wget.
|
---|
3114 | * Contributors:: People who helped.
|
---|
3115 |
|
---|
3116 |
|
---|
3117 | File: wget.info, Node: Robot Exclusion, Next: Security Considerations, Up: Appendices
|
---|
3118 |
|
---|
3119 | 9.1 Robot Exclusion
|
---|
3120 | ===================
|
---|
3121 |
|
---|
3122 | It is extremely easy to make Wget wander aimlessly around a web site,
|
---|
3123 | sucking all the available data in progress. `wget -r SITE', and you're
|
---|
3124 | set. Great? Not for the server admin.
|
---|
3125 |
|
---|
3126 | As long as Wget is only retrieving static pages, and doing it at a
|
---|
3127 | reasonable rate (see the `--wait' option), there's not much of a
|
---|
3128 | problem. The trouble is that Wget can't tell the difference between the
|
---|
3129 | smallest static page and the most demanding CGI. A site I know has a
|
---|
3130 | section handled by a CGI Perl script that converts Info files to HTML on
|
---|
3131 | the fly. The script is slow, but works well enough for human users
|
---|
3132 | viewing an occasional Info file. However, when someone's recursive Wget
|
---|
3133 | download stumbles upon the index page that links to all the Info files
|
---|
3134 | through the script, the system is brought to its knees without providing
|
---|
3135 | anything useful to the user (This task of converting Info files could be
|
---|
3136 | done locally and access to Info documentation for all installed GNU
|
---|
3137 | software on a system is available from the `info' command).
|
---|
3138 |
|
---|
3139 | To avoid this kind of accident, as well as to preserve privacy for
|
---|
3140 | documents that need to be protected from well-behaved robots, the
|
---|
3141 | concept of "robot exclusion" was invented. The idea is that the server
|
---|
3142 | administrators and document authors can specify which portions of the
|
---|
3143 | site they wish to protect from robots and those they will permit access.
|
---|
3144 |
|
---|
3145 | The most popular mechanism, and the de facto standard supported by
|
---|
3146 | all the major robots, is the "Robots Exclusion Standard" (RES) written
|
---|
3147 | by Martijn Koster et al. in 1994. It specifies the format of a text
|
---|
3148 | file containing directives that instruct the robots which URL paths to
|
---|
3149 | avoid. To be found by the robots, the specifications must be placed in
|
---|
3150 | `/robots.txt' in the server root, which the robots are expected to
|
---|
3151 | download and parse.
|
---|
3152 |
|
---|
3153 | Although Wget is not a web robot in the strictest sense of the word,
|
---|
3154 | it can downloads large parts of the site without the user's
|
---|
3155 | intervention to download an individual page. Because of that, Wget
|
---|
3156 | honors RES when downloading recursively. For instance, when you issue:
|
---|
3157 |
|
---|
3158 | wget -r http://www.server.com/
|
---|
3159 |
|
---|
3160 | First the index of `www.server.com' will be downloaded. If Wget
|
---|
3161 | finds that it wants to download more documents from that server, it will
|
---|
3162 | request `http://www.server.com/robots.txt' and, if found, use it for
|
---|
3163 | further downloads. `robots.txt' is loaded only once per each server.
|
---|
3164 |
|
---|
3165 | Until version 1.8, Wget supported the first version of the standard,
|
---|
3166 | written by Martijn Koster in 1994 and available at
|
---|
3167 | `http://www.robotstxt.org/wc/norobots.html'. As of version 1.8, Wget
|
---|
3168 | has supported the additional directives specified in the internet draft
|
---|
3169 | `<draft-koster-robots-00.txt>' titled "A Method for Web Robots
|
---|
3170 | Control". The draft, which has as far as I know never made to an RFC,
|
---|
3171 | is available at `http://www.robotstxt.org/wc/norobots-rfc.txt'.
|
---|
3172 |
|
---|
3173 | This manual no longer includes the text of the Robot Exclusion
|
---|
3174 | Standard.
|
---|
3175 |
|
---|
3176 | The second, less known mechanism, enables the author of an individual
|
---|
3177 | document to specify whether they want the links from the file to be
|
---|
3178 | followed by a robot. This is achieved using the `META' tag, like this:
|
---|
3179 |
|
---|
3180 | <meta name="robots" content="nofollow">
|
---|
3181 |
|
---|
3182 | This is explained in some detail at
|
---|
3183 | `http://www.robotstxt.org/wc/meta-user.html'. Wget supports this
|
---|
3184 | method of robot exclusion in addition to the usual `/robots.txt'
|
---|
3185 | exclusion.
|
---|
3186 |
|
---|
3187 | If you know what you are doing and really really wish to turn off the
|
---|
3188 | robot exclusion, set the `robots' variable to `off' in your `.wgetrc'.
|
---|
3189 | You can achieve the same effect from the command line using the `-e'
|
---|
3190 | switch, e.g. `wget -e robots=off URL...'.
|
---|
3191 |
|
---|
3192 |
|
---|
3193 | File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robot Exclusion, Up: Appendices
|
---|
3194 |
|
---|
3195 | 9.2 Security Considerations
|
---|
3196 | ===========================
|
---|
3197 |
|
---|
3198 | When using Wget, you must be aware that it sends unencrypted passwords
|
---|
3199 | through the network, which may present a security problem. Here are the
|
---|
3200 | main issues, and some solutions.
|
---|
3201 |
|
---|
3202 | 1. The passwords on the command line are visible using `ps'. The best
|
---|
3203 | way around it is to use `wget -i -' and feed the URLs to Wget's
|
---|
3204 | standard input, each on a separate line, terminated by `C-d'.
|
---|
3205 | Another workaround is to use `.netrc' to store passwords; however,
|
---|
3206 | storing unencrypted passwords is also considered a security risk.
|
---|
3207 |
|
---|
3208 | 2. Using the insecure "basic" authentication scheme, unencrypted
|
---|
3209 | passwords are transmitted through the network routers and gateways.
|
---|
3210 |
|
---|
3211 | 3. The FTP passwords are also in no way encrypted. There is no good
|
---|
3212 | solution for this at the moment.
|
---|
3213 |
|
---|
3214 | 4. Although the "normal" output of Wget tries to hide the passwords,
|
---|
3215 | debugging logs show them, in all forms. This problem is avoided by
|
---|
3216 | being careful when you send debug logs (yes, even when you send
|
---|
3217 | them to me).
|
---|
3218 |
|
---|
3219 |
|
---|
3220 | File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
|
---|
3221 |
|
---|
3222 | 9.3 Contributors
|
---|
3223 | ================
|
---|
3224 |
|
---|
3225 | GNU Wget was written by Hrvoje Niksic <hniksic@xemacs.org>. However,
|
---|
3226 | its development could never have gone as far as it has, were it not for
|
---|
3227 | the help of many people, either with bug reports, feature proposals,
|
---|
3228 | patches, or letters saying "Thanks!".
|
---|
3229 |
|
---|
3230 | Special thanks goes to the following people (no particular order):
|
---|
3231 |
|
---|
3232 | * Karsten Thygesen--donated system resources such as the mailing
|
---|
3233 | list, web space, and FTP space, along with a lot of time to make
|
---|
3234 | these actually work.
|
---|
3235 |
|
---|
3236 | * Shawn McHorse--bug reports and patches.
|
---|
3237 |
|
---|
3238 | * Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization. Lots of
|
---|
3239 | portability fixes.
|
---|
3240 |
|
---|
3241 | * Gordon Matzigkeit--`.netrc' support.
|
---|
3242 |
|
---|
3243 | * Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
|
---|
3244 | suggestions and "philosophical" discussions.
|
---|
3245 |
|
---|
3246 | * Darko Budor--initial port to Windows.
|
---|
3247 |
|
---|
3248 | * Antonio Rosella--help and suggestions, plus the Italian
|
---|
3249 | translation.
|
---|
3250 |
|
---|
3251 | * Tomislav Petrovic, Mario Mikocevic--many bug reports and
|
---|
3252 | suggestions.
|
---|
3253 |
|
---|
3254 | * Francois Pinard--many thorough bug reports and discussions.
|
---|
3255 |
|
---|
3256 | * Karl Eichwalder--lots of help with internationalization and other
|
---|
3257 | things.
|
---|
3258 |
|
---|
3259 | * Junio Hamano--donated support for Opie and HTTP `Digest'
|
---|
3260 | authentication.
|
---|
3261 |
|
---|
3262 | * The people who provided donations for development, including Brian
|
---|
3263 | Gough.
|
---|
3264 |
|
---|
3265 | The following people have provided patches, bug/build reports, useful
|
---|
3266 | suggestions, beta testing services, fan mail and all the other things
|
---|
3267 | that make maintenance so much fun:
|
---|
3268 |
|
---|
3269 | Ian Abbott Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron,
|
---|
3270 | Roger Beeman, Dan Berger, T. Bharath, Christian Biere, Paul Bludov,
|
---|
3271 | Daniel Bodea, Mark Boyns, John Burden, Wanderlei Cavassin, Gilles Cedoc,
|
---|
3272 | Tim Charron, Noel Cragg, Kristijan Conkas, John Daily, Andreas Damm,
|
---|
3273 | Ahmon Dancy, Andrew Davison, Bertrand Demiddelaer, Andrew Deryabin,
|
---|
3274 | Ulrich Drepper, Marc Duponcheel, Damir Dzeko, Alan Eldridge,
|
---|
3275 | Hans-Andreas Engel, Aleksandar Erkalovic, Andy Eskilsson, Christian
|
---|
3276 | Fraenkel, David Fritz, Charles C. Fu, FUJISHIMA Satsuki, Masashi Fujita,
|
---|
3277 | Howard Gayle, Marcel Gerrits, Lemble Gregory, Hans Grobler, Mathieu
|
---|
3278 | Guillaume, Dan Harkless, Aaron Hawley, Herold Heiko, Jochen Hein, Karl
|
---|
3279 | Heuer, HIROSE Masaaki, Ulf Harnhammar, Gregor Hoffleit, Erik Magnus
|
---|
3280 | Hulthen, Richard Huveneers, Jonas Jensen, Larry Jones, Simon Josefsson,
|
---|
3281 | Mario Juric, Hack Kampbjorn, Const Kaplinsky, Goran Kezunovic, Igor
|
---|
3282 | Khristophorov, Robert Kleine, KOJIMA Haime, Fila Kolodny, Alexander
|
---|
3283 | Kourakos, Martin Kraemer, Sami Krank, Simos KSenitellis, Christian
|
---|
3284 | Lackas, Hrvoje Lacko, Daniel S. Lewart, Nicolas Lichtmeier, Dave Love,
|
---|
3285 | Alexander V. Lukyanov, Thomas Lussnig, Andre Majorel, Aurelien Marchand,
|
---|
3286 | Matthew J. Mellon, Jordan Mendelson, Lin Zhe Min, Jan Minar, Tim Mooney,
|
---|
3287 | Keith Moore, Adam D. Moss, Simon Munton, Charlie Negyesi, R. K. Owen,
|
---|
3288 | Leonid Petrov, Simone Piunno, Andrew Pollock, Steve Pothier, Jan
|
---|
3289 | Prikryl, Marin Purgar, Csaba Raduly, Keith Refson, Bill Richardson,
|
---|
3290 | Tyler Riddle, Tobias Ringstrom, Juan Jose Rodriguez, Maciej W. Rozycki,
|
---|
3291 | Edward J. Sabol, Heinz Salzmann, Robert Schmidt, Nicolas Schodet,
|
---|
3292 | Andreas Schwab, Chris Seawood, Dennis Smit, Toomas Soome, Tage
|
---|
3293 | Stabell-Kulo, Philip Stadermann, Daniel Stenberg, Sven Sternberger,
|
---|
3294 | Markus Strasser, John Summerfield, Szakacsits Szabolcs, Mike Thomas,
|
---|
3295 | Philipp Thomas, Mauro Tortonesi, Dave Turner, Gisle Vanem, Russell
|
---|
3296 | Vincent, Zeljko Vrba, Charles G Waldman, Douglas E. Wegscheid, YAMAZAKI
|
---|
3297 | Makoto, Jasmin Zainul, Bojan Zdrnja, Kristijan Zimmer.
|
---|
3298 |
|
---|
3299 | Apologies to all who I accidentally left out, and many thanks to all
|
---|
3300 | the subscribers of the Wget mailing list.
|
---|
3301 |
|
---|
3302 |
|
---|
3303 | File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
|
---|
3304 |
|
---|
3305 | 10 Copying
|
---|
3306 | **********
|
---|
3307 |
|
---|
3308 | GNU Wget is licensed under the GNU General Public License (GNU GPL),
|
---|
3309 | which makes it "free software". Please note that "free" in "free
|
---|
3310 | software" refers to liberty, not price. As some people like to point
|
---|
3311 | out, it's the "free" of "free speech", not the "free" of "free beer".
|
---|
3312 |
|
---|
3313 | The exact and legally binding distribution terms are spelled out
|
---|
3314 | below. The GPL guarantees that you have the right (freedom) to run and
|
---|
3315 | change GNU Wget and distribute it to others, and even--if you
|
---|
3316 | want--charge money for doing any of those things. With these rights
|
---|
3317 | comes the obligation to distribute the source code along with the
|
---|
3318 | software and to grant your recipients the same rights and impose the
|
---|
3319 | same restrictions.
|
---|
3320 |
|
---|
3321 | This licensing model is also known as "open source" because it,
|
---|
3322 | among other things, makes sure that all recipients will receive the
|
---|
3323 | source code along with the program, and be able to improve it. The GNU
|
---|
3324 | project prefers the term "free software" for reasons outlined at
|
---|
3325 | `http://www.gnu.org/philosophy/free-software-for-freedom.html'.
|
---|
3326 |
|
---|
3327 | The exact license terms are defined by this paragraph and the GNU
|
---|
3328 | General Public License it refers to:
|
---|
3329 |
|
---|
3330 | GNU Wget is free software; you can redistribute it and/or modify it
|
---|
3331 | under the terms of the GNU General Public License as published by
|
---|
3332 | the Free Software Foundation; either version 2 of the License, or
|
---|
3333 | (at your option) any later version.
|
---|
3334 |
|
---|
3335 | GNU Wget is distributed in the hope that it will be useful, but
|
---|
3336 | WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
3337 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
---|
3338 | General Public License for more details.
|
---|
3339 |
|
---|
3340 | A copy of the GNU General Public License is included as part of
|
---|
3341 | this manual; if you did not receive it, write to the Free Software
|
---|
3342 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
---|
3343 |
|
---|
3344 | In addition to this, this manual is free in the same sense:
|
---|
3345 |
|
---|
3346 | Permission is granted to copy, distribute and/or modify this
|
---|
3347 | document under the terms of the GNU Free Documentation License,
|
---|
3348 | Version 1.2 or any later version published by the Free Software
|
---|
3349 | Foundation; with the Invariant Sections being "GNU General Public
|
---|
3350 | License" and "GNU Free Documentation License", with no Front-Cover
|
---|
3351 | Texts, and with no Back-Cover Texts. A copy of the license is
|
---|
3352 | included in the section entitled "GNU Free Documentation License".
|
---|
3353 |
|
---|
3354 | The full texts of the GNU General Public License and of the GNU Free
|
---|
3355 | Documentation License are available below.
|
---|
3356 |
|
---|
3357 | * Menu:
|
---|
3358 |
|
---|
3359 | * GNU General Public License::
|
---|
3360 | * GNU Free Documentation License::
|
---|
3361 |
|
---|
3362 |
|
---|
3363 | File: wget.info, Node: GNU General Public License, Next: GNU Free Documentation License, Up: Copying
|
---|
3364 |
|
---|
3365 | 10.1 GNU General Public License
|
---|
3366 | ===============================
|
---|
3367 |
|
---|
3368 | Version 2, June 1991
|
---|
3369 |
|
---|
3370 | Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
---|
3371 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
|
---|
3372 |
|
---|
3373 | Everyone is permitted to copy and distribute verbatim copies
|
---|
3374 | of this license document, but changing it is not allowed.
|
---|
3375 |
|
---|
3376 | Preamble
|
---|
3377 | ========
|
---|
3378 |
|
---|
3379 | The licenses for most software are designed to take away your freedom
|
---|
3380 | to share and change it. By contrast, the GNU General Public License is
|
---|
3381 | intended to guarantee your freedom to share and change free
|
---|
3382 | software--to make sure the software is free for all its users. This
|
---|
3383 | General Public License applies to most of the Free Software
|
---|
3384 | Foundation's software and to any other program whose authors commit to
|
---|
3385 | using it. (Some other Free Software Foundation software is covered by
|
---|
3386 | the GNU Lesser General Public License instead.) You can apply it to
|
---|
3387 | your programs, too.
|
---|
3388 |
|
---|
3389 | When we speak of free software, we are referring to freedom, not
|
---|
3390 | price. Our General Public Licenses are designed to make sure that you
|
---|
3391 | have the freedom to distribute copies of free software (and charge for
|
---|
3392 | this service if you wish), that you receive source code or can get it
|
---|
3393 | if you want it, that you can change the software or use pieces of it in
|
---|
3394 | new free programs; and that you know you can do these things.
|
---|
3395 |
|
---|
3396 | To protect your rights, we need to make restrictions that forbid
|
---|
3397 | anyone to deny you these rights or to ask you to surrender the rights.
|
---|
3398 | These restrictions translate to certain responsibilities for you if you
|
---|
3399 | distribute copies of the software, or if you modify it.
|
---|
3400 |
|
---|
3401 | For example, if you distribute copies of such a program, whether
|
---|
3402 | gratis or for a fee, you must give the recipients all the rights that
|
---|
3403 | you have. You must make sure that they, too, receive or can get the
|
---|
3404 | source code. And you must show them these terms so they know their
|
---|
3405 | rights.
|
---|
3406 |
|
---|
3407 | We protect your rights with two steps: (1) copyright the software,
|
---|
3408 | and (2) offer you this license which gives you legal permission to copy,
|
---|
3409 | distribute and/or modify the software.
|
---|
3410 |
|
---|
3411 | Also, for each author's protection and ours, we want to make certain
|
---|
3412 | that everyone understands that there is no warranty for this free
|
---|
3413 | software. If the software is modified by someone else and passed on, we
|
---|
3414 | want its recipients to know that what they have is not the original, so
|
---|
3415 | that any problems introduced by others will not reflect on the original
|
---|
3416 | authors' reputations.
|
---|
3417 |
|
---|
3418 | Finally, any free program is threatened constantly by software
|
---|
3419 | patents. We wish to avoid the danger that redistributors of a free
|
---|
3420 | program will individually obtain patent licenses, in effect making the
|
---|
3421 | program proprietary. To prevent this, we have made it clear that any
|
---|
3422 | patent must be licensed for everyone's free use or not licensed at all.
|
---|
3423 |
|
---|
3424 | The precise terms and conditions for copying, distribution and
|
---|
3425 | modification follow.
|
---|
3426 |
|
---|
3427 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
---|
3428 | 0. This License applies to any program or other work which contains a
|
---|
3429 | notice placed by the copyright holder saying it may be distributed
|
---|
3430 | under the terms of this General Public License. The "Program",
|
---|
3431 | below, refers to any such program or work, and a "work based on
|
---|
3432 | the Program" means either the Program or any derivative work under
|
---|
3433 | copyright law: that is to say, a work containing the Program or a
|
---|
3434 | portion of it, either verbatim or with modifications and/or
|
---|
3435 | translated into another language. (Hereinafter, translation is
|
---|
3436 | included without limitation in the term "modification".) Each
|
---|
3437 | licensee is addressed as "you".
|
---|
3438 |
|
---|
3439 | Activities other than copying, distribution and modification are
|
---|
3440 | not covered by this License; they are outside its scope. The act
|
---|
3441 | of running the Program is not restricted, and the output from the
|
---|
3442 | Program is covered only if its contents constitute a work based on
|
---|
3443 | the Program (independent of having been made by running the
|
---|
3444 | Program). Whether that is true depends on what the Program does.
|
---|
3445 |
|
---|
3446 | 1. You may copy and distribute verbatim copies of the Program's
|
---|
3447 | source code as you receive it, in any medium, provided that you
|
---|
3448 | conspicuously and appropriately publish on each copy an appropriate
|
---|
3449 | copyright notice and disclaimer of warranty; keep intact all the
|
---|
3450 | notices that refer to this License and to the absence of any
|
---|
3451 | warranty; and give any other recipients of the Program a copy of
|
---|
3452 | this License along with the Program.
|
---|
3453 |
|
---|
3454 | You may charge a fee for the physical act of transferring a copy,
|
---|
3455 | and you may at your option offer warranty protection in exchange
|
---|
3456 | for a fee.
|
---|
3457 |
|
---|
3458 | 2. You may modify your copy or copies of the Program or any portion
|
---|
3459 | of it, thus forming a work based on the Program, and copy and
|
---|
3460 | distribute such modifications or work under the terms of Section 1
|
---|
3461 | above, provided that you also meet all of these conditions:
|
---|
3462 |
|
---|
3463 | a. You must cause the modified files to carry prominent notices
|
---|
3464 | stating that you changed the files and the date of any change.
|
---|
3465 |
|
---|
3466 | b. You must cause any work that you distribute or publish, that
|
---|
3467 | in whole or in part contains or is derived from the Program
|
---|
3468 | or any part thereof, to be licensed as a whole at no charge
|
---|
3469 | to all third parties under the terms of this License.
|
---|
3470 |
|
---|
3471 | c. If the modified program normally reads commands interactively
|
---|
3472 | when run, you must cause it, when started running for such
|
---|
3473 | interactive use in the most ordinary way, to print or display
|
---|
3474 | an announcement including an appropriate copyright notice and
|
---|
3475 | a notice that there is no warranty (or else, saying that you
|
---|
3476 | provide a warranty) and that users may redistribute the
|
---|
3477 | program under these conditions, and telling the user how to
|
---|
3478 | view a copy of this License. (Exception: if the Program
|
---|
3479 | itself is interactive but does not normally print such an
|
---|
3480 | announcement, your work based on the Program is not required
|
---|
3481 | to print an announcement.)
|
---|
3482 |
|
---|
3483 | These requirements apply to the modified work as a whole. If
|
---|
3484 | identifiable sections of that work are not derived from the
|
---|
3485 | Program, and can be reasonably considered independent and separate
|
---|
3486 | works in themselves, then this License, and its terms, do not
|
---|
3487 | apply to those sections when you distribute them as separate
|
---|
3488 | works. But when you distribute the same sections as part of a
|
---|
3489 | whole which is a work based on the Program, the distribution of
|
---|
3490 | the whole must be on the terms of this License, whose permissions
|
---|
3491 | for other licensees extend to the entire whole, and thus to each
|
---|
3492 | and every part regardless of who wrote it.
|
---|
3493 |
|
---|
3494 | Thus, it is not the intent of this section to claim rights or
|
---|
3495 | contest your rights to work written entirely by you; rather, the
|
---|
3496 | intent is to exercise the right to control the distribution of
|
---|
3497 | derivative or collective works based on the Program.
|
---|
3498 |
|
---|
3499 | In addition, mere aggregation of another work not based on the
|
---|
3500 | Program with the Program (or with a work based on the Program) on
|
---|
3501 | a volume of a storage or distribution medium does not bring the
|
---|
3502 | other work under the scope of this License.
|
---|
3503 |
|
---|
3504 | 3. You may copy and distribute the Program (or a work based on it,
|
---|
3505 | under Section 2) in object code or executable form under the terms
|
---|
3506 | of Sections 1 and 2 above provided that you also do one of the
|
---|
3507 | following:
|
---|
3508 |
|
---|
3509 | a. Accompany it with the complete corresponding machine-readable
|
---|
3510 | source code, which must be distributed under the terms of
|
---|
3511 | Sections 1 and 2 above on a medium customarily used for
|
---|
3512 | software interchange; or,
|
---|
3513 |
|
---|
3514 | b. Accompany it with a written offer, valid for at least three
|
---|
3515 | years, to give any third party, for a charge no more than your
|
---|
3516 | cost of physically performing source distribution, a complete
|
---|
3517 | machine-readable copy of the corresponding source code, to be
|
---|
3518 | distributed under the terms of Sections 1 and 2 above on a
|
---|
3519 | medium customarily used for software interchange; or,
|
---|
3520 |
|
---|
3521 | c. Accompany it with the information you received as to the offer
|
---|
3522 | to distribute corresponding source code. (This alternative is
|
---|
3523 | allowed only for noncommercial distribution and only if you
|
---|
3524 | received the program in object code or executable form with
|
---|
3525 | such an offer, in accord with Subsection b above.)
|
---|
3526 |
|
---|
3527 | The source code for a work means the preferred form of the work for
|
---|
3528 | making modifications to it. For an executable work, complete
|
---|
3529 | source code means all the source code for all modules it contains,
|
---|
3530 | plus any associated interface definition files, plus the scripts
|
---|
3531 | used to control compilation and installation of the executable.
|
---|
3532 | However, as a special exception, the source code distributed need
|
---|
3533 | not include anything that is normally distributed (in either
|
---|
3534 | source or binary form) with the major components (compiler,
|
---|
3535 | kernel, and so on) of the operating system on which the executable
|
---|
3536 | runs, unless that component itself accompanies the executable.
|
---|
3537 |
|
---|
3538 | If distribution of executable or object code is made by offering
|
---|
3539 | access to copy from a designated place, then offering equivalent
|
---|
3540 | access to copy the source code from the same place counts as
|
---|
3541 | distribution of the source code, even though third parties are not
|
---|
3542 | compelled to copy the source along with the object code.
|
---|
3543 |
|
---|
3544 | 4. You may not copy, modify, sublicense, or distribute the Program
|
---|
3545 | except as expressly provided under this License. Any attempt
|
---|
3546 | otherwise to copy, modify, sublicense or distribute the Program is
|
---|
3547 | void, and will automatically terminate your rights under this
|
---|
3548 | License. However, parties who have received copies, or rights,
|
---|
3549 | from you under this License will not have their licenses
|
---|
3550 | terminated so long as such parties remain in full compliance.
|
---|
3551 |
|
---|
3552 | 5. You are not required to accept this License, since you have not
|
---|
3553 | signed it. However, nothing else grants you permission to modify
|
---|
3554 | or distribute the Program or its derivative works. These actions
|
---|
3555 | are prohibited by law if you do not accept this License.
|
---|
3556 | Therefore, by modifying or distributing the Program (or any work
|
---|
3557 | based on the Program), you indicate your acceptance of this
|
---|
3558 | License to do so, and all its terms and conditions for copying,
|
---|
3559 | distributing or modifying the Program or works based on it.
|
---|
3560 |
|
---|
3561 | 6. Each time you redistribute the Program (or any work based on the
|
---|
3562 | Program), the recipient automatically receives a license from the
|
---|
3563 | original licensor to copy, distribute or modify the Program
|
---|
3564 | subject to these terms and conditions. You may not impose any
|
---|
3565 | further restrictions on the recipients' exercise of the rights
|
---|
3566 | granted herein. You are not responsible for enforcing compliance
|
---|
3567 | by third parties to this License.
|
---|
3568 |
|
---|
3569 | 7. If, as a consequence of a court judgment or allegation of patent
|
---|
3570 | infringement or for any other reason (not limited to patent
|
---|
3571 | issues), conditions are imposed on you (whether by court order,
|
---|
3572 | agreement or otherwise) that contradict the conditions of this
|
---|
3573 | License, they do not excuse you from the conditions of this
|
---|
3574 | License. If you cannot distribute so as to satisfy simultaneously
|
---|
3575 | your obligations under this License and any other pertinent
|
---|
3576 | obligations, then as a consequence you may not distribute the
|
---|
3577 | Program at all. For example, if a patent license would not permit
|
---|
3578 | royalty-free redistribution of the Program by all those who
|
---|
3579 | receive copies directly or indirectly through you, then the only
|
---|
3580 | way you could satisfy both it and this License would be to refrain
|
---|
3581 | entirely from distribution of the Program.
|
---|
3582 |
|
---|
3583 | If any portion of this section is held invalid or unenforceable
|
---|
3584 | under any particular circumstance, the balance of the section is
|
---|
3585 | intended to apply and the section as a whole is intended to apply
|
---|
3586 | in other circumstances.
|
---|
3587 |
|
---|
3588 | It is not the purpose of this section to induce you to infringe any
|
---|
3589 | patents or other property right claims or to contest validity of
|
---|
3590 | any such claims; this section has the sole purpose of protecting
|
---|
3591 | the integrity of the free software distribution system, which is
|
---|
3592 | implemented by public license practices. Many people have made
|
---|
3593 | generous contributions to the wide range of software distributed
|
---|
3594 | through that system in reliance on consistent application of that
|
---|
3595 | system; it is up to the author/donor to decide if he or she is
|
---|
3596 | willing to distribute software through any other system and a
|
---|
3597 | licensee cannot impose that choice.
|
---|
3598 |
|
---|
3599 | This section is intended to make thoroughly clear what is believed
|
---|
3600 | to be a consequence of the rest of this License.
|
---|
3601 |
|
---|
3602 | 8. If the distribution and/or use of the Program is restricted in
|
---|
3603 | certain countries either by patents or by copyrighted interfaces,
|
---|
3604 | the original copyright holder who places the Program under this
|
---|
3605 | License may add an explicit geographical distribution limitation
|
---|
3606 | excluding those countries, so that distribution is permitted only
|
---|
3607 | in or among countries not thus excluded. In such case, this
|
---|
3608 | License incorporates the limitation as if written in the body of
|
---|
3609 | this License.
|
---|
3610 |
|
---|
3611 | 9. The Free Software Foundation may publish revised and/or new
|
---|
3612 | versions of the General Public License from time to time. Such
|
---|
3613 | new versions will be similar in spirit to the present version, but
|
---|
3614 | may differ in detail to address new problems or concerns.
|
---|
3615 |
|
---|
3616 | Each version is given a distinguishing version number. If the
|
---|
3617 | Program specifies a version number of this License which applies
|
---|
3618 | to it and "any later version", you have the option of following
|
---|
3619 | the terms and conditions either of that version or of any later
|
---|
3620 | version published by the Free Software Foundation. If the Program
|
---|
3621 | does not specify a version number of this License, you may choose
|
---|
3622 | any version ever published by the Free Software Foundation.
|
---|
3623 |
|
---|
3624 | 10. If you wish to incorporate parts of the Program into other free
|
---|
3625 | programs whose distribution conditions are different, write to the
|
---|
3626 | author to ask for permission. For software which is copyrighted
|
---|
3627 | by the Free Software Foundation, write to the Free Software
|
---|
3628 | Foundation; we sometimes make exceptions for this. Our decision
|
---|
3629 | will be guided by the two goals of preserving the free status of
|
---|
3630 | all derivatives of our free software and of promoting the sharing
|
---|
3631 | and reuse of software generally.
|
---|
3632 |
|
---|
3633 | NO WARRANTY
|
---|
3634 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
|
---|
3635 | WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
|
---|
3636 | LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
---|
3637 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
|
---|
3638 | WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
|
---|
3639 | NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
|
---|
3640 | FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
|
---|
3641 | QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
---|
3642 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
|
---|
3643 | SERVICING, REPAIR OR CORRECTION.
|
---|
3644 |
|
---|
3645 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
|
---|
3646 | WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
|
---|
3647 | MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
|
---|
3648 | LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
|
---|
3649 | INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
|
---|
3650 | INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
---|
3651 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
|
---|
3652 | OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
|
---|
3653 | OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
|
---|
3654 | ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
---|
3655 |
|
---|
3656 | END OF TERMS AND CONDITIONS
|
---|
3657 | Appendix: How to Apply These Terms to Your New Programs
|
---|
3658 | =======================================================
|
---|
3659 |
|
---|
3660 | If you develop a new program, and you want it to be of the greatest
|
---|
3661 | possible use to the public, the best way to achieve this is to make it
|
---|
3662 | free software which everyone can redistribute and change under these
|
---|
3663 | terms.
|
---|
3664 |
|
---|
3665 | To do so, attach the following notices to the program. It is safest
|
---|
3666 | to attach them to the start of each source file to most effectively
|
---|
3667 | convey the exclusion of warranty; and each file should have at least
|
---|
3668 | the "copyright" line and a pointer to where the full notice is found.
|
---|
3669 |
|
---|
3670 | ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
|
---|
3671 | Copyright (C) YYYY NAME OF AUTHOR
|
---|
3672 |
|
---|
3673 | This program is free software; you can redistribute it and/or modify
|
---|
3674 | it under the terms of the GNU General Public License as published by
|
---|
3675 | the Free Software Foundation; either version 2 of the License, or
|
---|
3676 | (at your option) any later version.
|
---|
3677 |
|
---|
3678 | This program is distributed in the hope that it will be useful,
|
---|
3679 | but WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
3680 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
---|
3681 | GNU General Public License for more details.
|
---|
3682 |
|
---|
3683 | You should have received a copy of the GNU General Public License
|
---|
3684 | along with this program; if not, write to the Free Software
|
---|
3685 | Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
---|
3686 |
|
---|
3687 | Also add information on how to contact you by electronic and paper
|
---|
3688 | mail.
|
---|
3689 |
|
---|
3690 | If the program is interactive, make it output a short notice like
|
---|
3691 | this when it starts in an interactive mode:
|
---|
3692 |
|
---|
3693 | Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
|
---|
3694 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
---|
3695 | This is free software, and you are welcome to redistribute it
|
---|
3696 | under certain conditions; type `show c' for details.
|
---|
3697 |
|
---|
3698 | The hypothetical commands `show w' and `show c' should show the
|
---|
3699 | appropriate parts of the General Public License. Of course, the
|
---|
3700 | commands you use may be called something other than `show w' and `show
|
---|
3701 | c'; they could even be mouse-clicks or menu items--whatever suits your
|
---|
3702 | program.
|
---|
3703 |
|
---|
3704 | You should also get your employer (if you work as a programmer) or
|
---|
3705 | your school, if any, to sign a "copyright disclaimer" for the program,
|
---|
3706 | if necessary. Here is a sample; alter the names:
|
---|
3707 |
|
---|
3708 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
---|
3709 | `Gnomovision' (which makes passes at compilers) written by James Hacker.
|
---|
3710 |
|
---|
3711 | SIGNATURE OF TY COON, 1 April 1989
|
---|
3712 | Ty Coon, President of Vice
|
---|
3713 |
|
---|
3714 | This General Public License does not permit incorporating your
|
---|
3715 | program into proprietary programs. If your program is a subroutine
|
---|
3716 | library, you may consider it more useful to permit linking proprietary
|
---|
3717 | applications with the library. If this is what you want to do, use the
|
---|
3718 | GNU Lesser General Public License instead of this License.
|
---|
3719 |
|
---|
3720 |
|
---|
3721 | File: wget.info, Node: GNU Free Documentation License, Prev: GNU General Public License, Up: Copying
|
---|
3722 |
|
---|
3723 | 10.2 GNU Free Documentation License
|
---|
3724 | ===================================
|
---|
3725 |
|
---|
3726 | Version 1.2, November 2002
|
---|
3727 |
|
---|
3728 | Copyright (C) 2000,2001,2002 Free Software Foundation, Inc.
|
---|
3729 | 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA
|
---|
3730 |
|
---|
3731 | Everyone is permitted to copy and distribute verbatim copies
|
---|
3732 | of this license document, but changing it is not allowed.
|
---|
3733 |
|
---|
3734 | 0. PREAMBLE
|
---|
3735 |
|
---|
3736 | The purpose of this License is to make a manual, textbook, or other
|
---|
3737 | functional and useful document "free" in the sense of freedom: to
|
---|
3738 | assure everyone the effective freedom to copy and redistribute it,
|
---|
3739 | with or without modifying it, either commercially or
|
---|
3740 | noncommercially. Secondarily, this License preserves for the
|
---|
3741 | author and publisher a way to get credit for their work, while not
|
---|
3742 | being considered responsible for modifications made by others.
|
---|
3743 |
|
---|
3744 | This License is a kind of "copyleft", which means that derivative
|
---|
3745 | works of the document must themselves be free in the same sense.
|
---|
3746 | It complements the GNU General Public License, which is a copyleft
|
---|
3747 | license designed for free software.
|
---|
3748 |
|
---|
3749 | We have designed this License in order to use it for manuals for
|
---|
3750 | free software, because free software needs free documentation: a
|
---|
3751 | free program should come with manuals providing the same freedoms
|
---|
3752 | that the software does. But this License is not limited to
|
---|
3753 | software manuals; it can be used for any textual work, regardless
|
---|
3754 | of subject matter or whether it is published as a printed book.
|
---|
3755 | We recommend this License principally for works whose purpose is
|
---|
3756 | instruction or reference.
|
---|
3757 |
|
---|
3758 | 1. APPLICABILITY AND DEFINITIONS
|
---|
3759 |
|
---|
3760 | This License applies to any manual or other work, in any medium,
|
---|
3761 | that contains a notice placed by the copyright holder saying it
|
---|
3762 | can be distributed under the terms of this License. Such a notice
|
---|
3763 | grants a world-wide, royalty-free license, unlimited in duration,
|
---|
3764 | to use that work under the conditions stated herein. The
|
---|
3765 | "Document", below, refers to any such manual or work. Any member
|
---|
3766 | of the public is a licensee, and is addressed as "you". You
|
---|
3767 | accept the license if you copy, modify or distribute the work in a
|
---|
3768 | way requiring permission under copyright law.
|
---|
3769 |
|
---|
3770 | A "Modified Version" of the Document means any work containing the
|
---|
3771 | Document or a portion of it, either copied verbatim, or with
|
---|
3772 | modifications and/or translated into another language.
|
---|
3773 |
|
---|
3774 | A "Secondary Section" is a named appendix or a front-matter section
|
---|
3775 | of the Document that deals exclusively with the relationship of the
|
---|
3776 | publishers or authors of the Document to the Document's overall
|
---|
3777 | subject (or to related matters) and contains nothing that could
|
---|
3778 | fall directly within that overall subject. (Thus, if the Document
|
---|
3779 | is in part a textbook of mathematics, a Secondary Section may not
|
---|
3780 | explain any mathematics.) The relationship could be a matter of
|
---|
3781 | historical connection with the subject or with related matters, or
|
---|
3782 | of legal, commercial, philosophical, ethical or political position
|
---|
3783 | regarding them.
|
---|
3784 |
|
---|
3785 | The "Invariant Sections" are certain Secondary Sections whose
|
---|
3786 | titles are designated, as being those of Invariant Sections, in
|
---|
3787 | the notice that says that the Document is released under this
|
---|
3788 | License. If a section does not fit the above definition of
|
---|
3789 | Secondary then it is not allowed to be designated as Invariant.
|
---|
3790 | The Document may contain zero Invariant Sections. If the Document
|
---|
3791 | does not identify any Invariant Sections then there are none.
|
---|
3792 |
|
---|
3793 | The "Cover Texts" are certain short passages of text that are
|
---|
3794 | listed, as Front-Cover Texts or Back-Cover Texts, in the notice
|
---|
3795 | that says that the Document is released under this License. A
|
---|
3796 | Front-Cover Text may be at most 5 words, and a Back-Cover Text may
|
---|
3797 | be at most 25 words.
|
---|
3798 |
|
---|
3799 | A "Transparent" copy of the Document means a machine-readable copy,
|
---|
3800 | represented in a format whose specification is available to the
|
---|
3801 | general public, that is suitable for revising the document
|
---|
3802 | straightforwardly with generic text editors or (for images
|
---|
3803 | composed of pixels) generic paint programs or (for drawings) some
|
---|
3804 | widely available drawing editor, and that is suitable for input to
|
---|
3805 | text formatters or for automatic translation to a variety of
|
---|
3806 | formats suitable for input to text formatters. A copy made in an
|
---|
3807 | otherwise Transparent file format whose markup, or absence of
|
---|
3808 | markup, has been arranged to thwart or discourage subsequent
|
---|
3809 | modification by readers is not Transparent. An image format is
|
---|
3810 | not Transparent if used for any substantial amount of text. A
|
---|
3811 | copy that is not "Transparent" is called "Opaque".
|
---|
3812 |
|
---|
3813 | Examples of suitable formats for Transparent copies include plain
|
---|
3814 | ASCII without markup, Texinfo input format, LaTeX input format,
|
---|
3815 | SGML or XML using a publicly available DTD, and
|
---|
3816 | standard-conforming simple HTML, PostScript or PDF designed for
|
---|
3817 | human modification. Examples of transparent image formats include
|
---|
3818 | PNG, XCF and JPG. Opaque formats include proprietary formats that
|
---|
3819 | can be read and edited only by proprietary word processors, SGML or
|
---|
3820 | XML for which the DTD and/or processing tools are not generally
|
---|
3821 | available, and the machine-generated HTML, PostScript or PDF
|
---|
3822 | produced by some word processors for output purposes only.
|
---|
3823 |
|
---|
3824 | The "Title Page" means, for a printed book, the title page itself,
|
---|
3825 | plus such following pages as are needed to hold, legibly, the
|
---|
3826 | material this License requires to appear in the title page. For
|
---|
3827 | works in formats which do not have any title page as such, "Title
|
---|
3828 | Page" means the text near the most prominent appearance of the
|
---|
3829 | work's title, preceding the beginning of the body of the text.
|
---|
3830 |
|
---|
3831 | A section "Entitled XYZ" means a named subunit of the Document
|
---|
3832 | whose title either is precisely XYZ or contains XYZ in parentheses
|
---|
3833 | following text that translates XYZ in another language. (Here XYZ
|
---|
3834 | stands for a specific section name mentioned below, such as
|
---|
3835 | "Acknowledgements", "Dedications", "Endorsements", or "History".)
|
---|
3836 | To "Preserve the Title" of such a section when you modify the
|
---|
3837 | Document means that it remains a section "Entitled XYZ" according
|
---|
3838 | to this definition.
|
---|
3839 |
|
---|
3840 | The Document may include Warranty Disclaimers next to the notice
|
---|
3841 | which states that this License applies to the Document. These
|
---|
3842 | Warranty Disclaimers are considered to be included by reference in
|
---|
3843 | this License, but only as regards disclaiming warranties: any other
|
---|
3844 | implication that these Warranty Disclaimers may have is void and
|
---|
3845 | has no effect on the meaning of this License.
|
---|
3846 |
|
---|
3847 | 2. VERBATIM COPYING
|
---|
3848 |
|
---|
3849 | You may copy and distribute the Document in any medium, either
|
---|
3850 | commercially or noncommercially, provided that this License, the
|
---|
3851 | copyright notices, and the license notice saying this License
|
---|
3852 | applies to the Document are reproduced in all copies, and that you
|
---|
3853 | add no other conditions whatsoever to those of this License. You
|
---|
3854 | may not use technical measures to obstruct or control the reading
|
---|
3855 | or further copying of the copies you make or distribute. However,
|
---|
3856 | you may accept compensation in exchange for copies. If you
|
---|
3857 | distribute a large enough number of copies you must also follow
|
---|
3858 | the conditions in section 3.
|
---|
3859 |
|
---|
3860 | You may also lend copies, under the same conditions stated above,
|
---|
3861 | and you may publicly display copies.
|
---|
3862 |
|
---|
3863 | 3. COPYING IN QUANTITY
|
---|
3864 |
|
---|
3865 | If you publish printed copies (or copies in media that commonly
|
---|
3866 | have printed covers) of the Document, numbering more than 100, and
|
---|
3867 | the Document's license notice requires Cover Texts, you must
|
---|
3868 | enclose the copies in covers that carry, clearly and legibly, all
|
---|
3869 | these Cover Texts: Front-Cover Texts on the front cover, and
|
---|
3870 | Back-Cover Texts on the back cover. Both covers must also clearly
|
---|
3871 | and legibly identify you as the publisher of these copies. The
|
---|
3872 | front cover must present the full title with all words of the
|
---|
3873 | title equally prominent and visible. You may add other material
|
---|
3874 | on the covers in addition. Copying with changes limited to the
|
---|
3875 | covers, as long as they preserve the title of the Document and
|
---|
3876 | satisfy these conditions, can be treated as verbatim copying in
|
---|
3877 | other respects.
|
---|
3878 |
|
---|
3879 | If the required texts for either cover are too voluminous to fit
|
---|
3880 | legibly, you should put the first ones listed (as many as fit
|
---|
3881 | reasonably) on the actual cover, and continue the rest onto
|
---|
3882 | adjacent pages.
|
---|
3883 |
|
---|
3884 | If you publish or distribute Opaque copies of the Document
|
---|
3885 | numbering more than 100, you must either include a
|
---|
3886 | machine-readable Transparent copy along with each Opaque copy, or
|
---|
3887 | state in or with each Opaque copy a computer-network location from
|
---|
3888 | which the general network-using public has access to download
|
---|
3889 | using public-standard network protocols a complete Transparent
|
---|
3890 | copy of the Document, free of added material. If you use the
|
---|
3891 | latter option, you must take reasonably prudent steps, when you
|
---|
3892 | begin distribution of Opaque copies in quantity, to ensure that
|
---|
3893 | this Transparent copy will remain thus accessible at the stated
|
---|
3894 | location until at least one year after the last time you
|
---|
3895 | distribute an Opaque copy (directly or through your agents or
|
---|
3896 | retailers) of that edition to the public.
|
---|
3897 |
|
---|
3898 | It is requested, but not required, that you contact the authors of
|
---|
3899 | the Document well before redistributing any large number of
|
---|
3900 | copies, to give them a chance to provide you with an updated
|
---|
3901 | version of the Document.
|
---|
3902 |
|
---|
3903 | 4. MODIFICATIONS
|
---|
3904 |
|
---|
3905 | You may copy and distribute a Modified Version of the Document
|
---|
3906 | under the conditions of sections 2 and 3 above, provided that you
|
---|
3907 | release the Modified Version under precisely this License, with
|
---|
3908 | the Modified Version filling the role of the Document, thus
|
---|
3909 | licensing distribution and modification of the Modified Version to
|
---|
3910 | whoever possesses a copy of it. In addition, you must do these
|
---|
3911 | things in the Modified Version:
|
---|
3912 |
|
---|
3913 | A. Use in the Title Page (and on the covers, if any) a title
|
---|
3914 | distinct from that of the Document, and from those of
|
---|
3915 | previous versions (which should, if there were any, be listed
|
---|
3916 | in the History section of the Document). You may use the
|
---|
3917 | same title as a previous version if the original publisher of
|
---|
3918 | that version gives permission.
|
---|
3919 |
|
---|
3920 | B. List on the Title Page, as authors, one or more persons or
|
---|
3921 | entities responsible for authorship of the modifications in
|
---|
3922 | the Modified Version, together with at least five of the
|
---|
3923 | principal authors of the Document (all of its principal
|
---|
3924 | authors, if it has fewer than five), unless they release you
|
---|
3925 | from this requirement.
|
---|
3926 |
|
---|
3927 | C. State on the Title page the name of the publisher of the
|
---|
3928 | Modified Version, as the publisher.
|
---|
3929 |
|
---|
3930 | D. Preserve all the copyright notices of the Document.
|
---|
3931 |
|
---|
3932 | E. Add an appropriate copyright notice for your modifications
|
---|
3933 | adjacent to the other copyright notices.
|
---|
3934 |
|
---|
3935 | F. Include, immediately after the copyright notices, a license
|
---|
3936 | notice giving the public permission to use the Modified
|
---|
3937 | Version under the terms of this License, in the form shown in
|
---|
3938 | the Addendum below.
|
---|
3939 |
|
---|
3940 | G. Preserve in that license notice the full lists of Invariant
|
---|
3941 | Sections and required Cover Texts given in the Document's
|
---|
3942 | license notice.
|
---|
3943 |
|
---|
3944 | H. Include an unaltered copy of this License.
|
---|
3945 |
|
---|
3946 | I. Preserve the section Entitled "History", Preserve its Title,
|
---|
3947 | and add to it an item stating at least the title, year, new
|
---|
3948 | authors, and publisher of the Modified Version as given on
|
---|
3949 | the Title Page. If there is no section Entitled "History" in
|
---|
3950 | the Document, create one stating the title, year, authors,
|
---|
3951 | and publisher of the Document as given on its Title Page,
|
---|
3952 | then add an item describing the Modified Version as stated in
|
---|
3953 | the previous sentence.
|
---|
3954 |
|
---|
3955 | J. Preserve the network location, if any, given in the Document
|
---|
3956 | for public access to a Transparent copy of the Document, and
|
---|
3957 | likewise the network locations given in the Document for
|
---|
3958 | previous versions it was based on. These may be placed in
|
---|
3959 | the "History" section. You may omit a network location for a
|
---|
3960 | work that was published at least four years before the
|
---|
3961 | Document itself, or if the original publisher of the version
|
---|
3962 | it refers to gives permission.
|
---|
3963 |
|
---|
3964 | K. For any section Entitled "Acknowledgements" or "Dedications",
|
---|
3965 | Preserve the Title of the section, and preserve in the
|
---|
3966 | section all the substance and tone of each of the contributor
|
---|
3967 | acknowledgements and/or dedications given therein.
|
---|
3968 |
|
---|
3969 | L. Preserve all the Invariant Sections of the Document,
|
---|
3970 | unaltered in their text and in their titles. Section numbers
|
---|
3971 | or the equivalent are not considered part of the section
|
---|
3972 | titles.
|
---|
3973 |
|
---|
3974 | M. Delete any section Entitled "Endorsements". Such a section
|
---|
3975 | may not be included in the Modified Version.
|
---|
3976 |
|
---|
3977 | N. Do not retitle any existing section to be Entitled
|
---|
3978 | "Endorsements" or to conflict in title with any Invariant
|
---|
3979 | Section.
|
---|
3980 |
|
---|
3981 | O. Preserve any Warranty Disclaimers.
|
---|
3982 |
|
---|
3983 | If the Modified Version includes new front-matter sections or
|
---|
3984 | appendices that qualify as Secondary Sections and contain no
|
---|
3985 | material copied from the Document, you may at your option
|
---|
3986 | designate some or all of these sections as invariant. To do this,
|
---|
3987 | add their titles to the list of Invariant Sections in the Modified
|
---|
3988 | Version's license notice. These titles must be distinct from any
|
---|
3989 | other section titles.
|
---|
3990 |
|
---|
3991 | You may add a section Entitled "Endorsements", provided it contains
|
---|
3992 | nothing but endorsements of your Modified Version by various
|
---|
3993 | parties--for example, statements of peer review or that the text
|
---|
3994 | has been approved by an organization as the authoritative
|
---|
3995 | definition of a standard.
|
---|
3996 |
|
---|
3997 | You may add a passage of up to five words as a Front-Cover Text,
|
---|
3998 | and a passage of up to 25 words as a Back-Cover Text, to the end
|
---|
3999 | of the list of Cover Texts in the Modified Version. Only one
|
---|
4000 | passage of Front-Cover Text and one of Back-Cover Text may be
|
---|
4001 | added by (or through arrangements made by) any one entity. If the
|
---|
4002 | Document already includes a cover text for the same cover,
|
---|
4003 | previously added by you or by arrangement made by the same entity
|
---|
4004 | you are acting on behalf of, you may not add another; but you may
|
---|
4005 | replace the old one, on explicit permission from the previous
|
---|
4006 | publisher that added the old one.
|
---|
4007 |
|
---|
4008 | The author(s) and publisher(s) of the Document do not by this
|
---|
4009 | License give permission to use their names for publicity for or to
|
---|
4010 | assert or imply endorsement of any Modified Version.
|
---|
4011 |
|
---|
4012 | 5. COMBINING DOCUMENTS
|
---|
4013 |
|
---|
4014 | You may combine the Document with other documents released under
|
---|
4015 | this License, under the terms defined in section 4 above for
|
---|
4016 | modified versions, provided that you include in the combination
|
---|
4017 | all of the Invariant Sections of all of the original documents,
|
---|
4018 | unmodified, and list them all as Invariant Sections of your
|
---|
4019 | combined work in its license notice, and that you preserve all
|
---|
4020 | their Warranty Disclaimers.
|
---|
4021 |
|
---|
4022 | The combined work need only contain one copy of this License, and
|
---|
4023 | multiple identical Invariant Sections may be replaced with a single
|
---|
4024 | copy. If there are multiple Invariant Sections with the same name
|
---|
4025 | but different contents, make the title of each such section unique
|
---|
4026 | by adding at the end of it, in parentheses, the name of the
|
---|
4027 | original author or publisher of that section if known, or else a
|
---|
4028 | unique number. Make the same adjustment to the section titles in
|
---|
4029 | the list of Invariant Sections in the license notice of the
|
---|
4030 | combined work.
|
---|
4031 |
|
---|
4032 | In the combination, you must combine any sections Entitled
|
---|
4033 | "History" in the various original documents, forming one section
|
---|
4034 | Entitled "History"; likewise combine any sections Entitled
|
---|
4035 | "Acknowledgements", and any sections Entitled "Dedications". You
|
---|
4036 | must delete all sections Entitled "Endorsements."
|
---|
4037 |
|
---|
4038 | 6. COLLECTIONS OF DOCUMENTS
|
---|
4039 |
|
---|
4040 | You may make a collection consisting of the Document and other
|
---|
4041 | documents released under this License, and replace the individual
|
---|
4042 | copies of this License in the various documents with a single copy
|
---|
4043 | that is included in the collection, provided that you follow the
|
---|
4044 | rules of this License for verbatim copying of each of the
|
---|
4045 | documents in all other respects.
|
---|
4046 |
|
---|
4047 | You may extract a single document from such a collection, and
|
---|
4048 | distribute it individually under this License, provided you insert
|
---|
4049 | a copy of this License into the extracted document, and follow
|
---|
4050 | this License in all other respects regarding verbatim copying of
|
---|
4051 | that document.
|
---|
4052 |
|
---|
4053 | 7. AGGREGATION WITH INDEPENDENT WORKS
|
---|
4054 |
|
---|
4055 | A compilation of the Document or its derivatives with other
|
---|
4056 | separate and independent documents or works, in or on a volume of
|
---|
4057 | a storage or distribution medium, is called an "aggregate" if the
|
---|
4058 | copyright resulting from the compilation is not used to limit the
|
---|
4059 | legal rights of the compilation's users beyond what the individual
|
---|
4060 | works permit. When the Document is included in an aggregate, this
|
---|
4061 | License does not apply to the other works in the aggregate which
|
---|
4062 | are not themselves derivative works of the Document.
|
---|
4063 |
|
---|
4064 | If the Cover Text requirement of section 3 is applicable to these
|
---|
4065 | copies of the Document, then if the Document is less than one half
|
---|
4066 | of the entire aggregate, the Document's Cover Texts may be placed
|
---|
4067 | on covers that bracket the Document within the aggregate, or the
|
---|
4068 | electronic equivalent of covers if the Document is in electronic
|
---|
4069 | form. Otherwise they must appear on printed covers that bracket
|
---|
4070 | the whole aggregate.
|
---|
4071 |
|
---|
4072 | 8. TRANSLATION
|
---|
4073 |
|
---|
4074 | Translation is considered a kind of modification, so you may
|
---|
4075 | distribute translations of the Document under the terms of section
|
---|
4076 | 4. Replacing Invariant Sections with translations requires special
|
---|
4077 | permission from their copyright holders, but you may include
|
---|
4078 | translations of some or all Invariant Sections in addition to the
|
---|
4079 | original versions of these Invariant Sections. You may include a
|
---|
4080 | translation of this License, and all the license notices in the
|
---|
4081 | Document, and any Warranty Disclaimers, provided that you also
|
---|
4082 | include the original English version of this License and the
|
---|
4083 | original versions of those notices and disclaimers. In case of a
|
---|
4084 | disagreement between the translation and the original version of
|
---|
4085 | this License or a notice or disclaimer, the original version will
|
---|
4086 | prevail.
|
---|
4087 |
|
---|
4088 | If a section in the Document is Entitled "Acknowledgements",
|
---|
4089 | "Dedications", or "History", the requirement (section 4) to
|
---|
4090 | Preserve its Title (section 1) will typically require changing the
|
---|
4091 | actual title.
|
---|
4092 |
|
---|
4093 | 9. TERMINATION
|
---|
4094 |
|
---|
4095 | You may not copy, modify, sublicense, or distribute the Document
|
---|
4096 | except as expressly provided for under this License. Any other
|
---|
4097 | attempt to copy, modify, sublicense or distribute the Document is
|
---|
4098 | void, and will automatically terminate your rights under this
|
---|
4099 | License. However, parties who have received copies, or rights,
|
---|
4100 | from you under this License will not have their licenses
|
---|
4101 | terminated so long as such parties remain in full compliance.
|
---|
4102 |
|
---|
4103 | 10. FUTURE REVISIONS OF THIS LICENSE
|
---|
4104 |
|
---|
4105 | The Free Software Foundation may publish new, revised versions of
|
---|
4106 | the GNU Free Documentation License from time to time. Such new
|
---|
4107 | versions will be similar in spirit to the present version, but may
|
---|
4108 | differ in detail to address new problems or concerns. See
|
---|
4109 | `http://www.gnu.org/copyleft/'.
|
---|
4110 |
|
---|
4111 | Each version of the License is given a distinguishing version
|
---|
4112 | number. If the Document specifies that a particular numbered
|
---|
4113 | version of this License "or any later version" applies to it, you
|
---|
4114 | have the option of following the terms and conditions either of
|
---|
4115 | that specified version or of any later version that has been
|
---|
4116 | published (not as a draft) by the Free Software Foundation. If
|
---|
4117 | the Document does not specify a version number of this License,
|
---|
4118 | you may choose any version ever published (not as a draft) by the
|
---|
4119 | Free Software Foundation.
|
---|
4120 |
|
---|
4121 | 10.2.1 ADDENDUM: How to use this License for your documents
|
---|
4122 | -----------------------------------------------------------
|
---|
4123 |
|
---|
4124 | To use this License in a document you have written, include a copy of
|
---|
4125 | the License in the document and put the following copyright and license
|
---|
4126 | notices just after the title page:
|
---|
4127 |
|
---|
4128 | Copyright (C) YEAR YOUR NAME.
|
---|
4129 | Permission is granted to copy, distribute and/or modify this document
|
---|
4130 | under the terms of the GNU Free Documentation License, Version 1.2
|
---|
4131 | or any later version published by the Free Software Foundation;
|
---|
4132 | with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
|
---|
4133 | Texts. A copy of the license is included in the section entitled ``GNU
|
---|
4134 | Free Documentation License''.
|
---|
4135 |
|
---|
4136 | If you have Invariant Sections, Front-Cover Texts and Back-Cover
|
---|
4137 | Texts, replace the "with...Texts." line with this:
|
---|
4138 |
|
---|
4139 | with the Invariant Sections being LIST THEIR TITLES, with
|
---|
4140 | the Front-Cover Texts being LIST, and with the Back-Cover Texts
|
---|
4141 | being LIST.
|
---|
4142 |
|
---|
4143 | If you have Invariant Sections without Cover Texts, or some other
|
---|
4144 | combination of the three, merge those two alternatives to suit the
|
---|
4145 | situation.
|
---|
4146 |
|
---|
4147 | If your document contains nontrivial examples of program code, we
|
---|
4148 | recommend releasing these examples in parallel under your choice of
|
---|
4149 | free software license, such as the GNU General Public License, to
|
---|
4150 | permit their use in free software.
|
---|
4151 |
|
---|
4152 |
|
---|
4153 | File: wget.info, Node: Concept Index, Prev: Copying, Up: Top
|
---|
4154 |
|
---|
4155 | Concept Index
|
---|
4156 | *************
|
---|
4157 |
|
---|
4158 | [index]
|
---|
4159 | * Menu:
|
---|
4160 |
|
---|
4161 | * .html extension: HTTP Options. (line 6)
|
---|
4162 | * .listing files, removing: FTP Options. (line 21)
|
---|
4163 | * .netrc: Startup File. (line 6)
|
---|
4164 | * .wgetrc: Startup File. (line 6)
|
---|
4165 | * accept directories: Directory-Based Limits.
|
---|
4166 | (line 17)
|
---|
4167 | * accept suffixes: Types of Files. (line 15)
|
---|
4168 | * accept wildcards: Types of Files. (line 15)
|
---|
4169 | * append to log: Logging and Input File Options.
|
---|
4170 | (line 11)
|
---|
4171 | * arguments: Invoking. (line 6)
|
---|
4172 | * authentication <1>: HTTP Options. (line 27)
|
---|
4173 | * authentication: Download Options. (line 409)
|
---|
4174 | * backing up converted files: Recursive Retrieval Options.
|
---|
4175 | (line 71)
|
---|
4176 | * bandwidth, limit: Download Options. (line 216)
|
---|
4177 | * base for relative links in input file: Logging and Input File Options.
|
---|
4178 | (line 68)
|
---|
4179 | * bind address: Download Options. (line 6)
|
---|
4180 | * bug reports: Reporting Bugs. (line 6)
|
---|
4181 | * bugs: Reporting Bugs. (line 6)
|
---|
4182 | * cache: HTTP Options. (line 43)
|
---|
4183 | * caching of DNS lookups: Download Options. (line 302)
|
---|
4184 | * client IP address: Download Options. (line 6)
|
---|
4185 | * clobbering, file: Download Options. (line 30)
|
---|
4186 | * command line: Invoking. (line 6)
|
---|
4187 | * comments, HTML: Recursive Retrieval Options.
|
---|
4188 | (line 149)
|
---|
4189 | * connect timeout: Download Options. (line 199)
|
---|
4190 | * Content-Length, ignore: HTTP Options. (line 132)
|
---|
4191 | * continue retrieval: Download Options. (line 64)
|
---|
4192 | * contributors: Contributors. (line 6)
|
---|
4193 | * conversion of links: Recursive Retrieval Options.
|
---|
4194 | (line 32)
|
---|
4195 | * cookies: HTTP Options. (line 52)
|
---|
4196 | * cookies, loading: HTTP Options. (line 62)
|
---|
4197 | * cookies, saving: HTTP Options. (line 110)
|
---|
4198 | * cookies, session: HTTP Options. (line 115)
|
---|
4199 | * copying: Copying. (line 6)
|
---|
4200 | * cut directories: Directory Options. (line 32)
|
---|
4201 | * debug: Logging and Input File Options.
|
---|
4202 | (line 17)
|
---|
4203 | * delete after retrieval: Recursive Retrieval Options.
|
---|
4204 | (line 16)
|
---|
4205 | * directories: Directory-Based Limits.
|
---|
4206 | (line 6)
|
---|
4207 | * directories, exclude: Directory-Based Limits.
|
---|
4208 | (line 30)
|
---|
4209 | * directories, include: Directory-Based Limits.
|
---|
4210 | (line 17)
|
---|
4211 | * directory limits: Directory-Based Limits.
|
---|
4212 | (line 6)
|
---|
4213 | * directory prefix: Directory Options. (line 60)
|
---|
4214 | * DNS cache: Download Options. (line 302)
|
---|
4215 | * DNS timeout: Download Options. (line 193)
|
---|
4216 | * dot style: Download Options. (line 125)
|
---|
4217 | * downloading multiple times: Download Options. (line 30)
|
---|
4218 | * EGD: HTTPS (SSL/TLS) Options.
|
---|
4219 | (line 101)
|
---|
4220 | * entropy, specifying source of: HTTPS (SSL/TLS) Options.
|
---|
4221 | (line 85)
|
---|
4222 | * examples: Examples. (line 6)
|
---|
4223 | * exclude directories: Directory-Based Limits.
|
---|
4224 | (line 30)
|
---|
4225 | * execute wgetrc command: Basic Startup Options.
|
---|
4226 | (line 19)
|
---|
4227 | * FDL, GNU Free Documentation License: GNU Free Documentation License.
|
---|
4228 | (line 6)
|
---|
4229 | * features: Overview. (line 6)
|
---|
4230 | * file names, restrict: Download Options. (line 321)
|
---|
4231 | * filling proxy cache: Recursive Retrieval Options.
|
---|
4232 | (line 16)
|
---|
4233 | * follow FTP links: Recursive Accept/Reject Options.
|
---|
4234 | (line 20)
|
---|
4235 | * following ftp links: FTP Links. (line 6)
|
---|
4236 | * following links: Following Links. (line 6)
|
---|
4237 | * force html: Logging and Input File Options.
|
---|
4238 | (line 61)
|
---|
4239 | * free software: Copying. (line 6)
|
---|
4240 | * ftp authentication: FTP Options. (line 6)
|
---|
4241 | * ftp password: FTP Options. (line 6)
|
---|
4242 | * ftp time-stamping: FTP Time-Stamping Internals.
|
---|
4243 | (line 6)
|
---|
4244 | * ftp user: FTP Options. (line 6)
|
---|
4245 | * GFDL: Copying. (line 6)
|
---|
4246 | * globbing, toggle: FTP Options. (line 45)
|
---|
4247 | * GPL: Copying. (line 6)
|
---|
4248 | * hangup: Signals. (line 6)
|
---|
4249 | * header, add: HTTP Options. (line 143)
|
---|
4250 | * hosts, spanning: Spanning Hosts. (line 6)
|
---|
4251 | * HTML comments: Recursive Retrieval Options.
|
---|
4252 | (line 149)
|
---|
4253 | * http password: HTTP Options. (line 27)
|
---|
4254 | * http referer: HTTP Options. (line 178)
|
---|
4255 | * http time-stamping: HTTP Time-Stamping Internals.
|
---|
4256 | (line 6)
|
---|
4257 | * http user: HTTP Options. (line 27)
|
---|
4258 | * ignore length: HTTP Options. (line 132)
|
---|
4259 | * include directories: Directory-Based Limits.
|
---|
4260 | (line 17)
|
---|
4261 | * incomplete downloads: Download Options. (line 64)
|
---|
4262 | * incremental updating: Time-Stamping. (line 6)
|
---|
4263 | * input-file: Logging and Input File Options.
|
---|
4264 | (line 43)
|
---|
4265 | * invoking: Invoking. (line 6)
|
---|
4266 | * IP address, client: Download Options. (line 6)
|
---|
4267 | * IPv6: Download Options. (line 356)
|
---|
4268 | * Keep-Alive, turning off: FTP Options. (line 92)
|
---|
4269 | * latest version: Distribution. (line 6)
|
---|
4270 | * limit bandwidth: Download Options. (line 216)
|
---|
4271 | * link conversion: Recursive Retrieval Options.
|
---|
4272 | (line 32)
|
---|
4273 | * links: Following Links. (line 6)
|
---|
4274 | * list: Mailing List. (line 6)
|
---|
4275 | * loading cookies: HTTP Options. (line 62)
|
---|
4276 | * location of wgetrc: Wgetrc Location. (line 6)
|
---|
4277 | * log file: Logging and Input File Options.
|
---|
4278 | (line 6)
|
---|
4279 | * mailing list: Mailing List. (line 6)
|
---|
4280 | * mirroring: Very Advanced Usage. (line 6)
|
---|
4281 | * no parent: Directory-Based Limits.
|
---|
4282 | (line 43)
|
---|
4283 | * no-clobber: Download Options. (line 30)
|
---|
4284 | * nohup: Invoking. (line 6)
|
---|
4285 | * number of retries: Download Options. (line 12)
|
---|
4286 | * operating systems: Portability. (line 6)
|
---|
4287 | * option syntax: Option Syntax. (line 6)
|
---|
4288 | * output file: Logging and Input File Options.
|
---|
4289 | (line 6)
|
---|
4290 | * overview: Overview. (line 6)
|
---|
4291 | * page requisites: Recursive Retrieval Options.
|
---|
4292 | (line 84)
|
---|
4293 | * passive ftp: FTP Options. (line 61)
|
---|
4294 | * password: Download Options. (line 409)
|
---|
4295 | * pause: Download Options. (line 236)
|
---|
4296 | * Persistent Connections, disabling: FTP Options. (line 92)
|
---|
4297 | * portability: Portability. (line 6)
|
---|
4298 | * POST: HTTP Options. (line 211)
|
---|
4299 | * progress indicator: Download Options. (line 125)
|
---|
4300 | * proxies: Proxies. (line 6)
|
---|
4301 | * proxy <1>: HTTP Options. (line 43)
|
---|
4302 | * proxy: Download Options. (line 279)
|
---|
4303 | * proxy authentication: HTTP Options. (line 169)
|
---|
4304 | * proxy filling: Recursive Retrieval Options.
|
---|
4305 | (line 16)
|
---|
4306 | * proxy password: HTTP Options. (line 169)
|
---|
4307 | * proxy user: HTTP Options. (line 169)
|
---|
4308 | * quiet: Logging and Input File Options.
|
---|
4309 | (line 28)
|
---|
4310 | * quota: Download Options. (line 286)
|
---|
4311 | * random wait: Download Options. (line 261)
|
---|
4312 | * randomness, specifying source of: HTTPS (SSL/TLS) Options.
|
---|
4313 | (line 85)
|
---|
4314 | * rate, limit: Download Options. (line 216)
|
---|
4315 | * read timeout: Download Options. (line 204)
|
---|
4316 | * recursion: Recursive Download. (line 6)
|
---|
4317 | * recursive download: Recursive Download. (line 6)
|
---|
4318 | * redirecting output: Advanced Usage. (line 88)
|
---|
4319 | * referer, http: HTTP Options. (line 178)
|
---|
4320 | * reject directories: Directory-Based Limits.
|
---|
4321 | (line 30)
|
---|
4322 | * reject suffixes: Types of Files. (line 34)
|
---|
4323 | * reject wildcards: Types of Files. (line 34)
|
---|
4324 | * relative links: Relative Links. (line 6)
|
---|
4325 | * reporting bugs: Reporting Bugs. (line 6)
|
---|
4326 | * required images, downloading: Recursive Retrieval Options.
|
---|
4327 | (line 84)
|
---|
4328 | * resume download: Download Options. (line 64)
|
---|
4329 | * retries: Download Options. (line 12)
|
---|
4330 | * retries, waiting between: Download Options. (line 249)
|
---|
4331 | * retrieving: Recursive Download. (line 6)
|
---|
4332 | * robot exclusion: Robot Exclusion. (line 6)
|
---|
4333 | * robots.txt: Robot Exclusion. (line 6)
|
---|
4334 | * sample wgetrc: Sample Wgetrc. (line 6)
|
---|
4335 | * saving cookies: HTTP Options. (line 110)
|
---|
4336 | * security: Security Considerations.
|
---|
4337 | (line 6)
|
---|
4338 | * server maintenance: Robot Exclusion. (line 6)
|
---|
4339 | * server response, print: Download Options. (line 159)
|
---|
4340 | * server response, save: HTTP Options. (line 185)
|
---|
4341 | * session cookies: HTTP Options. (line 115)
|
---|
4342 | * signal handling: Signals. (line 6)
|
---|
4343 | * spanning hosts: Spanning Hosts. (line 6)
|
---|
4344 | * spider: Download Options. (line 164)
|
---|
4345 | * SSL: HTTPS (SSL/TLS) Options.
|
---|
4346 | (line 6)
|
---|
4347 | * SSL certificate: HTTPS (SSL/TLS) Options.
|
---|
4348 | (line 47)
|
---|
4349 | * SSL certificate authority: HTTPS (SSL/TLS) Options.
|
---|
4350 | (line 73)
|
---|
4351 | * SSL certificate type, specify: HTTPS (SSL/TLS) Options.
|
---|
4352 | (line 53)
|
---|
4353 | * SSL certificate, check: HTTPS (SSL/TLS) Options.
|
---|
4354 | (line 23)
|
---|
4355 | * SSL protocol, choose: HTTPS (SSL/TLS) Options.
|
---|
4356 | (line 10)
|
---|
4357 | * startup: Startup File. (line 6)
|
---|
4358 | * startup file: Startup File. (line 6)
|
---|
4359 | * suffixes, accept: Types of Files. (line 15)
|
---|
4360 | * suffixes, reject: Types of Files. (line 34)
|
---|
4361 | * symbolic links, retrieving: FTP Options. (line 73)
|
---|
4362 | * syntax of options: Option Syntax. (line 6)
|
---|
4363 | * syntax of wgetrc: Wgetrc Syntax. (line 6)
|
---|
4364 | * tag-based recursive pruning: Recursive Accept/Reject Options.
|
---|
4365 | (line 24)
|
---|
4366 | * time-stamping: Time-Stamping. (line 6)
|
---|
4367 | * time-stamping usage: Time-Stamping Usage. (line 6)
|
---|
4368 | * timeout: Download Options. (line 175)
|
---|
4369 | * timeout, connect: Download Options. (line 199)
|
---|
4370 | * timeout, DNS: Download Options. (line 193)
|
---|
4371 | * timeout, read: Download Options. (line 204)
|
---|
4372 | * timestamping: Time-Stamping. (line 6)
|
---|
4373 | * tries: Download Options. (line 12)
|
---|
4374 | * types of files: Types of Files. (line 6)
|
---|
4375 | * updating the archives: Time-Stamping. (line 6)
|
---|
4376 | * URL: URL Format. (line 6)
|
---|
4377 | * URL syntax: URL Format. (line 6)
|
---|
4378 | * usage, time-stamping: Time-Stamping Usage. (line 6)
|
---|
4379 | * user: Download Options. (line 409)
|
---|
4380 | * user-agent: HTTP Options. (line 189)
|
---|
4381 | * various: Various. (line 6)
|
---|
4382 | * verbose: Logging and Input File Options.
|
---|
4383 | (line 32)
|
---|
4384 | * wait: Download Options. (line 236)
|
---|
4385 | * wait, random: Download Options. (line 261)
|
---|
4386 | * waiting between retries: Download Options. (line 249)
|
---|
4387 | * Wget as spider: Download Options. (line 164)
|
---|
4388 | * wgetrc: Startup File. (line 6)
|
---|
4389 | * wgetrc commands: Wgetrc Commands. (line 6)
|
---|
4390 | * wgetrc location: Wgetrc Location. (line 6)
|
---|
4391 | * wgetrc syntax: Wgetrc Syntax. (line 6)
|
---|
4392 | * wildcards, accept: Types of Files. (line 15)
|
---|
4393 | * wildcards, reject: Types of Files. (line 34)
|
---|
4394 | * Windows file names: Download Options. (line 321)
|
---|
4395 |
|
---|
4396 |
|
---|
4397 |
|
---|
4398 | Tag Table:
|
---|
4399 | Node: Top974
|
---|
4400 | Node: Overview1845
|
---|
4401 | Node: Invoking5380
|
---|
4402 | Node: URL Format6217
|
---|
4403 | Ref: URL Format-Footnote-18790
|
---|
4404 | Node: Option Syntax8892
|
---|
4405 | Node: Basic Startup Options11571
|
---|
4406 | Node: Logging and Input File Options12376
|
---|
4407 | Node: Download Options15029
|
---|
4408 | Node: Directory Options34648
|
---|
4409 | Node: HTTP Options37353
|
---|
4410 | Node: HTTPS (SSL/TLS) Options49262
|
---|
4411 | Node: FTP Options54937
|
---|
4412 | Node: Recursive Retrieval Options59990
|
---|
4413 | Node: Recursive Accept/Reject Options67858
|
---|
4414 | Node: Recursive Download70857
|
---|
4415 | Node: Following Links73968
|
---|
4416 | Node: Spanning Hosts74905
|
---|
4417 | Node: Types of Files77078
|
---|
4418 | Node: Directory-Based Limits79538
|
---|
4419 | Node: Relative Links82188
|
---|
4420 | Node: FTP Links83025
|
---|
4421 | Node: Time-Stamping83892
|
---|
4422 | Node: Time-Stamping Usage85536
|
---|
4423 | Node: HTTP Time-Stamping Internals87362
|
---|
4424 | Ref: HTTP Time-Stamping Internals-Footnote-188638
|
---|
4425 | Node: FTP Time-Stamping Internals88837
|
---|
4426 | Node: Startup File90303
|
---|
4427 | Node: Wgetrc Location91177
|
---|
4428 | Node: Wgetrc Syntax91976
|
---|
4429 | Node: Wgetrc Commands92696
|
---|
4430 | Node: Sample Wgetrc106122
|
---|
4431 | Node: Examples111324
|
---|
4432 | Node: Simple Usage111664
|
---|
4433 | Node: Advanced Usage113068
|
---|
4434 | Node: Very Advanced Usage116769
|
---|
4435 | Node: Various118264
|
---|
4436 | Node: Proxies118789
|
---|
4437 | Node: Distribution121516
|
---|
4438 | Node: Mailing List121862
|
---|
4439 | Node: Reporting Bugs123219
|
---|
4440 | Node: Portability125576
|
---|
4441 | Node: Signals127018
|
---|
4442 | Node: Appendices127701
|
---|
4443 | Node: Robot Exclusion128023
|
---|
4444 | Node: Security Considerations131800
|
---|
4445 | Node: Contributors132984
|
---|
4446 | Node: Copying136681
|
---|
4447 | Node: GNU General Public License139393
|
---|
4448 | Node: GNU Free Documentation License158604
|
---|
4449 | Node: Concept Index181038
|
---|
4450 |
|
---|
4451 | End Tag Table
|
---|