source: python/trunk/Doc/library/urllib2.rst

Last change on this file was 391, checked in by dmik, 11 years ago

python: Merge vendor 2.7.6 to trunk.

  • Property svn:eol-style set to native
File size: 33.1 KB
RevLine 
[2]1:mod:`urllib2` --- extensible library for opening URLs
2======================================================
3
4.. module:: urllib2
5 :synopsis: Next generation URL opening library.
6.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
8
9
10.. note::
11 The :mod:`urllib2` module has been split across several modules in
[391]12 Python 3 named :mod:`urllib.request` and :mod:`urllib.error`.
[2]13 The :term:`2to3` tool will automatically adapt imports when converting
[391]14 your sources to Python 3.
[2]15
16
17The :mod:`urllib2` module defines functions and classes which help in opening
18URLs (mostly HTTP) in a complex world --- basic and digest authentication,
19redirections, cookies and more.
20
[391]21
[2]22The :mod:`urllib2` module defines the following functions:
23
24
25.. function:: urlopen(url[, data][, timeout])
26
27 Open the URL *url*, which can be either a string or a :class:`Request` object.
28
[391]29 .. warning::
30 HTTPS requests do not do any verification of the server's certificate.
31
[2]32 *data* may be a string specifying additional data to send to the server, or
33 ``None`` if no such data is needed. Currently HTTP requests are the only ones
34 that use *data*; the HTTP request will be a POST instead of a GET when the
35 *data* parameter is provided. *data* should be a buffer in the standard
36 :mimetype:`application/x-www-form-urlencoded` format. The
37 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
[391]38 returns a string in this format. urllib2 module sends HTTP/1.1 requests with
39 ``Connection:close`` header included.
[2]40
41 The optional *timeout* parameter specifies a timeout in seconds for blocking
42 operations like the connection attempt (if not specified, the global default
[391]43 timeout setting will be used). This actually only works for HTTP, HTTPS and
44 FTP connections.
[2]45
46 This function returns a file-like object with two additional methods:
47
48 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
49 determine if a redirect was followed
50
[391]51 * :meth:`info` --- return the meta-information of the page, such as headers,
52 in the form of an :class:`mimetools.Message` instance
[2]53 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_)
54
[391]55 * :meth:`getcode` --- return the HTTP status code of the response.
56
[2]57 Raises :exc:`URLError` on errors.
58
59 Note that ``None`` may be returned if no handler handles the request (though the
60 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
61 ensure this never happens).
62
[391]63 In addition, if proxy settings are detected (for example, when a ``*_proxy``
64 environment variable like :envvar:`http_proxy` is set),
65 :class:`ProxyHandler` is default installed and makes sure the requests are
66 handled through the proxy.
[2]67
68 .. versionchanged:: 2.6
69 *timeout* was added.
70
71
72.. function:: install_opener(opener)
73
74 Install an :class:`OpenerDirector` instance as the default global opener.
75 Installing an opener is only necessary if you want urlopen to use that opener;
76 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
77 The code does not check for a real :class:`OpenerDirector`, and any class with
78 the appropriate interface will work.
79
80
81.. function:: build_opener([handler, ...])
82
83 Return an :class:`OpenerDirector` instance, which chains the handlers in the
84 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
85 subclasses of :class:`BaseHandler` (in which case it must be possible to call
86 the constructor without any parameters). Instances of the following classes
87 will be in front of the *handler*\s, unless the *handler*\s contain them,
[391]88 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
89 settings are detected),
[2]90 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
91 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
92 :class:`HTTPErrorProcessor`.
93
94 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
95 :class:`HTTPSHandler` will also be added.
96
97 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
[391]98 :attr:`handler_order` attribute to modify its position in the handlers
[2]99 list.
100
101The following exceptions are raised as appropriate:
102
103
104.. exception:: URLError
105
106 The handlers raise this exception (or derived exceptions) when they run into a
107 problem. It is a subclass of :exc:`IOError`.
108
109 .. attribute:: reason
110
111 The reason for this error. It can be a message string or another exception
112 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
113 URLs).
114
115
116.. exception:: HTTPError
117
118 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
119 can also function as a non-exceptional file-like return value (the same thing
120 that :func:`urlopen` returns). This is useful when handling exotic HTTP
121 errors, such as requests for authentication.
122
123 .. attribute:: code
124
125 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
126 This numeric value corresponds to a value found in the dictionary of
127 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
128
[391]129 .. attribute:: reason
[2]130
[391]131 The reason for this error. It can be a message string or another exception
132 instance.
[2]133
134The following classes are provided:
135
136
137.. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
138
139 This class is an abstraction of a URL request.
140
141 *url* should be a string containing a valid URL.
142
143 *data* may be a string specifying additional data to send to the server, or
144 ``None`` if no such data is needed. Currently HTTP requests are the only ones
145 that use *data*; the HTTP request will be a POST instead of a GET when the
146 *data* parameter is provided. *data* should be a buffer in the standard
147 :mimetype:`application/x-www-form-urlencoded` format. The
148 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
149 returns a string in this format.
150
151 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
152 was called with each key and value as arguments. This is often used to "spoof"
153 the ``User-Agent`` header, which is used by a browser to identify itself --
154 some HTTP servers only allow requests coming from common browsers as opposed
155 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
156 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
157 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
158
159 The final two arguments are only of interest for correct handling of third-party
160 HTTP cookies:
161
162 *origin_req_host* should be the request-host of the origin transaction, as
163 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
164 is the host name or IP address of the original request that was initiated by the
165 user. For example, if the request is for an image in an HTML document, this
166 should be the request-host of the request for the page containing the image.
167
168 *unverifiable* should indicate whether the request is unverifiable, as defined
169 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
170 the user did not have the option to approve. For example, if the request is for
171 an image in an HTML document, and the user had no option to approve the
172 automatic fetching of the image, this should be true.
173
174
175.. class:: OpenerDirector()
176
177 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
178 together. It manages the chaining of handlers, and recovery from errors.
179
180
181.. class:: BaseHandler()
182
183 This is the base class for all registered handlers --- and handles only the
184 simple mechanics of registration.
185
186
187.. class:: HTTPDefaultErrorHandler()
188
189 A class which defines a default handler for HTTP error responses; all responses
190 are turned into :exc:`HTTPError` exceptions.
191
192
193.. class:: HTTPRedirectHandler()
194
195 A class to handle redirections.
196
197
198.. class:: HTTPCookieProcessor([cookiejar])
199
200 A class to handle HTTP Cookies.
201
202
203.. class:: ProxyHandler([proxies])
204
205 Cause requests to go through a proxy. If *proxies* is given, it must be a
206 dictionary mapping protocol names to URLs of proxies. The default is to read
207 the list of proxies from the environment variables
[391]208 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, then
209 in a Windows environment proxy settings are obtained from the registry's
210 Internet Settings section, and in a Mac OS X environment proxy information
[2]211 is retrieved from the OS X System Configuration Framework.
212
213 To disable autodetected proxy pass an empty dictionary.
214
215
216.. class:: HTTPPasswordMgr()
217
218 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
219
220
221.. class:: HTTPPasswordMgrWithDefaultRealm()
222
223 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
224 ``None`` is considered a catch-all realm, which is searched if no other realm
225 fits.
226
227
228.. class:: AbstractBasicAuthHandler([password_mgr])
229
230 This is a mixin class that helps with HTTP authentication, both to the remote
231 host and to a proxy. *password_mgr*, if given, should be something that is
232 compatible with :class:`HTTPPasswordMgr`; refer to section
233 :ref:`http-password-mgr` for information on the interface that must be
234 supported.
235
236
237.. class:: HTTPBasicAuthHandler([password_mgr])
238
239 Handle authentication with the remote host. *password_mgr*, if given, should be
240 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
241 :ref:`http-password-mgr` for information on the interface that must be
242 supported.
243
244
245.. class:: ProxyBasicAuthHandler([password_mgr])
246
247 Handle authentication with the proxy. *password_mgr*, if given, should be
248 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
249 :ref:`http-password-mgr` for information on the interface that must be
250 supported.
251
252
253.. class:: AbstractDigestAuthHandler([password_mgr])
254
255 This is a mixin class that helps with HTTP authentication, both to the remote
256 host and to a proxy. *password_mgr*, if given, should be something that is
257 compatible with :class:`HTTPPasswordMgr`; refer to section
258 :ref:`http-password-mgr` for information on the interface that must be
259 supported.
260
261
262.. class:: HTTPDigestAuthHandler([password_mgr])
263
264 Handle authentication with the remote host. *password_mgr*, if given, should be
265 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
266 :ref:`http-password-mgr` for information on the interface that must be
267 supported.
268
269
270.. class:: ProxyDigestAuthHandler([password_mgr])
271
272 Handle authentication with the proxy. *password_mgr*, if given, should be
273 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
274 :ref:`http-password-mgr` for information on the interface that must be
275 supported.
276
277
278.. class:: HTTPHandler()
279
280 A class to handle opening of HTTP URLs.
281
282
283.. class:: HTTPSHandler()
284
285 A class to handle opening of HTTPS URLs.
286
287
288.. class:: FileHandler()
289
290 Open local files.
291
292
293.. class:: FTPHandler()
294
295 Open FTP URLs.
296
297
298.. class:: CacheFTPHandler()
299
300 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
301
302
303.. class:: UnknownHandler()
304
305 A catch-all class to handle unknown URLs.
306
307
[391]308.. class:: HTTPErrorProcessor()
309
310 Process HTTP error responses.
311
312
[2]313.. _request-objects:
314
315Request Objects
316---------------
317
318The following methods describe all of :class:`Request`'s public interface, and
319so all must be overridden in subclasses.
320
321
322.. method:: Request.add_data(data)
323
324 Set the :class:`Request` data to *data*. This is ignored by all handlers except
325 HTTP handlers --- and there it should be a byte string, and will change the
326 request to be ``POST`` rather than ``GET``.
327
328
329.. method:: Request.get_method()
330
331 Return a string indicating the HTTP request method. This is only meaningful for
332 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
333
334
335.. method:: Request.has_data()
336
337 Return whether the instance has a non-\ ``None`` data.
338
339
340.. method:: Request.get_data()
341
342 Return the instance's data.
343
344
345.. method:: Request.add_header(key, val)
346
347 Add another header to the request. Headers are currently ignored by all
348 handlers except HTTP handlers, where they are added to the list of headers sent
349 to the server. Note that there cannot be more than one header with the same
350 name, and later calls will overwrite previous calls in case the *key* collides.
351 Currently, this is no loss of HTTP functionality, since all headers which have
352 meaning when used more than once have a (header-specific) way of gaining the
353 same functionality using only one header.
354
355
356.. method:: Request.add_unredirected_header(key, header)
357
358 Add a header that will not be added to a redirected request.
359
360 .. versionadded:: 2.4
361
362
363.. method:: Request.has_header(header)
364
365 Return whether the instance has the named header (checks both regular and
366 unredirected).
367
368 .. versionadded:: 2.4
369
370
371.. method:: Request.get_full_url()
372
373 Return the URL given in the constructor.
374
375
376.. method:: Request.get_type()
377
378 Return the type of the URL --- also known as the scheme.
379
380
381.. method:: Request.get_host()
382
383 Return the host to which a connection will be made.
384
385
386.. method:: Request.get_selector()
387
388 Return the selector --- the part of the URL that is sent to the server.
389
390
[391]391.. method:: Request.get_header(header_name, default=None)
392
393 Return the value of the given header. If the header is not present, return
394 the default value.
395
396
397.. method:: Request.header_items()
398
399 Return a list of tuples (header_name, header_value) of the Request headers.
400
401
[2]402.. method:: Request.set_proxy(host, type)
403
404 Prepare the request by connecting to a proxy server. The *host* and *type* will
405 replace those of the instance, and the instance's selector will be the original
406 URL given in the constructor.
407
408
409.. method:: Request.get_origin_req_host()
410
411 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
412 See the documentation for the :class:`Request` constructor.
413
414
415.. method:: Request.is_unverifiable()
416
417 Return whether the request is unverifiable, as defined by RFC 2965. See the
418 documentation for the :class:`Request` constructor.
419
420
421.. _opener-director-objects:
422
423OpenerDirector Objects
424----------------------
425
426:class:`OpenerDirector` instances have the following methods:
427
428
429.. method:: OpenerDirector.add_handler(handler)
430
431 *handler* should be an instance of :class:`BaseHandler`. The following
432 methods are searched, and added to the possible chains (note that HTTP errors
433 are a special case).
434
435 * :samp:`{protocol}_open` --- signal that the handler knows how to open
436 *protocol* URLs.
437
438 * :samp:`http_error_{type}` --- signal that the handler knows how to handle
439 HTTP errors with HTTP error code *type*.
440
441 * :samp:`{protocol}_error` --- signal that the handler knows how to handle
442 errors from (non-\ ``http``) *protocol*.
443
444 * :samp:`{protocol}_request` --- signal that the handler knows how to
445 pre-process *protocol* requests.
446
447 * :samp:`{protocol}_response` --- signal that the handler knows how to
448 post-process *protocol* responses.
449
450
451.. method:: OpenerDirector.open(url[, data][, timeout])
452
453 Open the given *url* (which can be a request object or a string), optionally
454 passing the given *data*. Arguments, return values and exceptions raised are
455 the same as those of :func:`urlopen` (which simply calls the :meth:`open`
456 method on the currently installed global :class:`OpenerDirector`). The
457 optional *timeout* parameter specifies a timeout in seconds for blocking
458 operations like the connection attempt (if not specified, the global default
[391]459 timeout setting will be used). The timeout feature actually works only for
460 HTTP, HTTPS and FTP connections).
[2]461
462 .. versionchanged:: 2.6
463 *timeout* was added.
464
465
466.. method:: OpenerDirector.error(proto[, arg[, ...]])
467
468 Handle an error of the given protocol. This will call the registered error
469 handlers for the given protocol with the given arguments (which are protocol
470 specific). The HTTP protocol is a special case which uses the HTTP response
471 code to determine the specific error handler; refer to the :meth:`http_error_\*`
472 methods of the handler classes.
473
474 Return values and exceptions raised are the same as those of :func:`urlopen`.
475
476OpenerDirector objects open URLs in three stages:
477
478The order in which these methods are called within each stage is determined by
479sorting the handler instances.
480
481#. Every handler with a method named like :samp:`{protocol}_request` has that
482 method called to pre-process the request.
483
484#. Handlers with a method named like :samp:`{protocol}_open` are called to handle
485 the request. This stage ends when a handler either returns a non-\ :const:`None`
486 value (ie. a response), or raises an exception (usually :exc:`URLError`).
487 Exceptions are allowed to propagate.
488
489 In fact, the above algorithm is first tried for methods named
490 :meth:`default_open`. If all such methods return :const:`None`, the
491 algorithm is repeated for methods named like :samp:`{protocol}_open`. If all
492 such methods return :const:`None`, the algorithm is repeated for methods
493 named :meth:`unknown_open`.
494
495 Note that the implementation of these methods may involve calls of the parent
[391]496 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
497 :meth:`~OpenerDirector.error` methods.
[2]498
499#. Every handler with a method named like :samp:`{protocol}_response` has that
500 method called to post-process the response.
501
502
503.. _base-handler-objects:
504
505BaseHandler Objects
506-------------------
507
508:class:`BaseHandler` objects provide a couple of methods that are directly
509useful, and others that are meant to be used by derived classes. These are
510intended for direct use:
511
512
513.. method:: BaseHandler.add_parent(director)
514
515 Add a director as parent.
516
517
518.. method:: BaseHandler.close()
519
520 Remove any parents.
521
[391]522The following attributes and methods should only be used by classes derived from
[2]523:class:`BaseHandler`.
524
525.. note::
526
527 The convention has been adopted that subclasses defining
528 :meth:`protocol_request` or :meth:`protocol_response` methods are named
529 :class:`\*Processor`; all others are named :class:`\*Handler`.
530
531
532.. attribute:: BaseHandler.parent
533
534 A valid :class:`OpenerDirector`, which can be used to open using a different
535 protocol, or handle errors.
536
537
538.. method:: BaseHandler.default_open(req)
539
540 This method is *not* defined in :class:`BaseHandler`, but subclasses should
541 define it if they want to catch all URLs.
542
543 This method, if implemented, will be called by the parent
544 :class:`OpenerDirector`. It should return a file-like object as described in
545 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
546 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
547 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
548
549 This method will be called before any protocol-specific open method.
550
551
552.. method:: BaseHandler.protocol_open(req)
553 :noindex:
554
555 ("protocol" is to be replaced by the protocol name.)
556
557 This method is *not* defined in :class:`BaseHandler`, but subclasses should
558 define it if they want to handle URLs with the given *protocol*.
559
560 This method, if defined, will be called by the parent :class:`OpenerDirector`.
561 Return values should be the same as for :meth:`default_open`.
562
563
564.. method:: BaseHandler.unknown_open(req)
565
566 This method is *not* defined in :class:`BaseHandler`, but subclasses should
567 define it if they want to catch all URLs with no specific registered handler to
568 open it.
569
570 This method, if implemented, will be called by the :attr:`parent`
571 :class:`OpenerDirector`. Return values should be the same as for
572 :meth:`default_open`.
573
574
575.. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
576
577 This method is *not* defined in :class:`BaseHandler`, but subclasses should
578 override it if they intend to provide a catch-all for otherwise unhandled HTTP
579 errors. It will be called automatically by the :class:`OpenerDirector` getting
580 the error, and should not normally be called in other circumstances.
581
582 *req* will be a :class:`Request` object, *fp* will be a file-like object with
583 the HTTP error body, *code* will be the three-digit code of the error, *msg*
584 will be the user-visible explanation of the code and *hdrs* will be a mapping
585 object with the headers of the error.
586
587 Return values and exceptions raised should be the same as those of
588 :func:`urlopen`.
589
590
591.. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
592
593 *nnn* should be a three-digit HTTP error code. This method is also not defined
594 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
595 subclass, when an HTTP error with code *nnn* occurs.
596
597 Subclasses should override this method to handle specific HTTP errors.
598
599 Arguments, return values and exceptions raised should be the same as for
600 :meth:`http_error_default`.
601
602
603.. method:: BaseHandler.protocol_request(req)
604 :noindex:
605
606 ("protocol" is to be replaced by the protocol name.)
607
608 This method is *not* defined in :class:`BaseHandler`, but subclasses should
609 define it if they want to pre-process requests of the given *protocol*.
610
611 This method, if defined, will be called by the parent :class:`OpenerDirector`.
612 *req* will be a :class:`Request` object. The return value should be a
613 :class:`Request` object.
614
615
616.. method:: BaseHandler.protocol_response(req, response)
617 :noindex:
618
619 ("protocol" is to be replaced by the protocol name.)
620
621 This method is *not* defined in :class:`BaseHandler`, but subclasses should
622 define it if they want to post-process responses of the given *protocol*.
623
624 This method, if defined, will be called by the parent :class:`OpenerDirector`.
625 *req* will be a :class:`Request` object. *response* will be an object
626 implementing the same interface as the return value of :func:`urlopen`. The
627 return value should implement the same interface as the return value of
628 :func:`urlopen`.
629
630
631.. _http-redirect-handler:
632
633HTTPRedirectHandler Objects
634---------------------------
635
636.. note::
637
638 Some HTTP redirections require action from this module's client code. If this
639 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
640 precise meanings of the various redirection codes.
641
642
643.. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
644
645 Return a :class:`Request` or ``None`` in response to a redirect. This is called
646 by the default implementations of the :meth:`http_error_30\*` methods when a
647 redirection is received from the server. If a redirection should take place,
648 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
649 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler
650 should try to handle this URL, or return ``None`` if you can't but another
651 handler might.
652
653 .. note::
654
655 The default implementation of this method does not strictly follow :rfc:`2616`,
656 which says that 301 and 302 responses to ``POST`` requests must not be
657 automatically redirected without confirmation by the user. In reality, browsers
658 do allow automatic redirection of these responses, changing the POST to a
659 ``GET``, and the default implementation reproduces this behavior.
660
661
662.. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
663
664 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the
665 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
666
667
668.. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
669
670 The same as :meth:`http_error_301`, but called for the 'found' response.
671
672
673.. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
674
675 The same as :meth:`http_error_301`, but called for the 'see other' response.
676
677
678.. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
679
680 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
681 response.
682
683
684.. _http-cookie-processor:
685
686HTTPCookieProcessor Objects
687---------------------------
688
689.. versionadded:: 2.4
690
691:class:`HTTPCookieProcessor` instances have one attribute:
692
693
694.. attribute:: HTTPCookieProcessor.cookiejar
695
696 The :class:`cookielib.CookieJar` in which cookies are stored.
697
698
699.. _proxy-handler:
700
701ProxyHandler Objects
702--------------------
703
704
705.. method:: ProxyHandler.protocol_open(request)
706 :noindex:
707
708 ("protocol" is to be replaced by the protocol name.)
709
710 The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
711 *protocol* which has a proxy in the *proxies* dictionary given in the
712 constructor. The method will modify requests to go through the proxy, by
713 calling ``request.set_proxy()``, and call the next handler in the chain to
714 actually execute the protocol.
715
716
717.. _http-password-mgr:
718
719HTTPPasswordMgr Objects
720-----------------------
721
722These methods are available on :class:`HTTPPasswordMgr` and
723:class:`HTTPPasswordMgrWithDefaultRealm` objects.
724
725
726.. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
727
728 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
729 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
730 authentication tokens when authentication for *realm* and a super-URI of any of
731 the given URIs is given.
732
733
734.. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
735
736 Get user/password for given realm and URI, if any. This method will return
737 ``(None, None)`` if there is no matching user/password.
738
739 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
740 searched if the given *realm* has no matching user/password.
741
742
743.. _abstract-basic-auth-handler:
744
745AbstractBasicAuthHandler Objects
746--------------------------------
747
748
749.. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
750
751 Handle an authentication request by getting a user/password pair, and re-trying
752 the request. *authreq* should be the name of the header where the information
753 about the realm is included in the request, *host* specifies the URL and path to
754 authenticate for, *req* should be the (failed) :class:`Request` object, and
755 *headers* should be the error headers.
756
757 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
758 authority component (e.g. ``"http://python.org/"``). In either case, the
759 authority must not contain a userinfo component (so, ``"python.org"`` and
760 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
761
762
763.. _http-basic-auth-handler:
764
765HTTPBasicAuthHandler Objects
766----------------------------
767
768
769.. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
770
771 Retry the request with authentication information, if available.
772
773
774.. _proxy-basic-auth-handler:
775
776ProxyBasicAuthHandler Objects
777-----------------------------
778
779
780.. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
781
782 Retry the request with authentication information, if available.
783
784
785.. _abstract-digest-auth-handler:
786
787AbstractDigestAuthHandler Objects
788---------------------------------
789
790
791.. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
792
793 *authreq* should be the name of the header where the information about the realm
794 is included in the request, *host* should be the host to authenticate to, *req*
795 should be the (failed) :class:`Request` object, and *headers* should be the
796 error headers.
797
798
799.. _http-digest-auth-handler:
800
801HTTPDigestAuthHandler Objects
802-----------------------------
803
804
805.. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
806
807 Retry the request with authentication information, if available.
808
809
810.. _proxy-digest-auth-handler:
811
812ProxyDigestAuthHandler Objects
813------------------------------
814
815
816.. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
817
818 Retry the request with authentication information, if available.
819
820
821.. _http-handler-objects:
822
823HTTPHandler Objects
824-------------------
825
826
827.. method:: HTTPHandler.http_open(req)
828
829 Send an HTTP request, which can be either GET or POST, depending on
830 ``req.has_data()``.
831
832
833.. _https-handler-objects:
834
835HTTPSHandler Objects
836--------------------
837
838
839.. method:: HTTPSHandler.https_open(req)
840
841 Send an HTTPS request, which can be either GET or POST, depending on
842 ``req.has_data()``.
843
844
845.. _file-handler-objects:
846
847FileHandler Objects
848-------------------
849
850
851.. method:: FileHandler.file_open(req)
852
853 Open the file locally, if there is no host name, or the host name is
854 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
855 using :attr:`parent`.
856
857
858.. _ftp-handler-objects:
859
860FTPHandler Objects
861------------------
862
863
864.. method:: FTPHandler.ftp_open(req)
865
866 Open the FTP file indicated by *req*. The login is always done with empty
867 username and password.
868
869
870.. _cacheftp-handler-objects:
871
872CacheFTPHandler Objects
873-----------------------
874
875:class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
876following additional methods:
877
878
879.. method:: CacheFTPHandler.setTimeout(t)
880
881 Set timeout of connections to *t* seconds.
882
883
884.. method:: CacheFTPHandler.setMaxConns(m)
885
886 Set maximum number of cached connections to *m*.
887
888
889.. _unknown-handler-objects:
890
891UnknownHandler Objects
892----------------------
893
894
895.. method:: UnknownHandler.unknown_open()
896
897 Raise a :exc:`URLError` exception.
898
899
900.. _http-error-processor-objects:
901
902HTTPErrorProcessor Objects
903--------------------------
904
905.. versionadded:: 2.4
906
907
[391]908.. method:: HTTPErrorProcessor.http_response()
[2]909
910 Process HTTP error responses.
911
912 For 200 error codes, the response object is returned immediately.
913
914 For non-200 error codes, this simply passes the job on to the
915 :samp:`{protocol}_error_code` handler methods, via
916 :meth:`OpenerDirector.error`. Eventually,
917 :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
918 other handler handles the error.
919
[391]920.. method:: HTTPErrorProcessor.https_response()
[2]921
[391]922 Process HTTPS error responses.
923
924 The behavior is same as :meth:`http_response`.
925
926
[2]927.. _urllib2-examples:
928
929Examples
930--------
931
932This example gets the python.org main page and displays the first 100 bytes of
933it::
934
935 >>> import urllib2
936 >>> f = urllib2.urlopen('http://www.python.org/')
937 >>> print f.read(100)
938 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
939 <?xml-stylesheet href="./css/ht2html
940
941Here we are sending a data-stream to the stdin of a CGI and reading the data it
942returns to us. Note that this example will only work when the Python
943installation supports SSL. ::
944
945 >>> import urllib2
946 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
947 ... data='This data is passed to stdin of the CGI')
948 >>> f = urllib2.urlopen(req)
949 >>> print f.read()
950 Got Data: "This data is passed to stdin of the CGI"
951
952The code for the sample CGI used in the above example is::
953
954 #!/usr/bin/env python
955 import sys
956 data = sys.stdin.read()
957 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
958
959Use of Basic HTTP Authentication::
960
961 import urllib2
962 # Create an OpenerDirector with support for Basic HTTP Authentication...
963 auth_handler = urllib2.HTTPBasicAuthHandler()
964 auth_handler.add_password(realm='PDQ Application',
965 uri='https://mahler:8092/site-updates.py',
966 user='klem',
967 passwd='kadidd!ehopper')
968 opener = urllib2.build_opener(auth_handler)
969 # ...and install it globally so it can be used with urlopen.
970 urllib2.install_opener(opener)
971 urllib2.urlopen('http://www.example.com/login.html')
972
973:func:`build_opener` provides many handlers by default, including a
974:class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
975variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
976involved. For example, the :envvar:`http_proxy` environment variable is read to
977obtain the HTTP proxy's URL.
978
979This example replaces the default :class:`ProxyHandler` with one that uses
980programmatically-supplied proxy URLs, and adds proxy authorization support with
981:class:`ProxyBasicAuthHandler`. ::
982
983 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
984 proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
985 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
986
987 opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
988 # This time, rather than install the OpenerDirector, we use it directly:
989 opener.open('http://www.example.com/login.html')
990
991Adding HTTP headers:
992
993Use the *headers* argument to the :class:`Request` constructor, or::
994
995 import urllib2
996 req = urllib2.Request('http://www.example.com/')
997 req.add_header('Referer', 'http://www.python.org/')
998 r = urllib2.urlopen(req)
999
1000:class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
1001every :class:`Request`. To change this::
1002
1003 import urllib2
1004 opener = urllib2.build_opener()
1005 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
1006 opener.open('http://www.example.com/')
1007
1008Also, remember that a few standard headers (:mailheader:`Content-Length`,
1009:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
1010:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
1011
Note: See TracBrowser for help on using the repository browser.