Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

urlparse.rst

Last change on this file was 391, checked in by dmik, 11 years ago
python: Merge vendor 2.7.6 to trunk.
Property svn:eol-style set to `native`
File size: 15.6 KB

Rev	Line
[2]	1	:mod:`urlparse` --- Parse URLs into components
	2	==============================================
	3
	4	.. module:: urlparse
	5	:synopsis: Parse URLs into or assemble them from components.
	6
	7
	8	.. index::
	9	single: WWW
	10	single: World Wide Web
	11	single: URL
	12	pair: URL; parsing
	13	pair: relative; URL
	14
	15	.. note::
[391]	16	The :mod:`urlparse` module is renamed to :mod:`urllib.parse` in Python 3.
[2]	17	The :term:`2to3` tool will automatically adapt imports when converting
[391]	18	your sources to Python 3.
[2]	19
[391]	20	Source code: :source:`Lib/urlparse.py`
[2]	21
[391]	22	--------------
	23
[2]	24	This module defines a standard interface to break Uniform Resource Locator (URL)
	25	strings up in components (addressing scheme, network location, path etc.), to
	26	combine the components back into a URL string, and to convert a "relative URL"
	27	to an absolute URL given a "base URL."
	28
	29	The module has been designed to match the Internet RFC on Relative Uniform
[391]	30	Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
	31	``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
	32	``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
	33	``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
	34	``wais``.
[2]	35
	36	.. versionadded:: 2.5
	37	Support for the ``sftp`` and ``sips`` schemes.
	38
	39	The :mod:`urlparse` module defines the following functions:
	40
	41
[391]	42	.. function:: urlparse(urlstring[, scheme[, allow_fragments]])
[2]	43
	44	Parse a URL into six components, returning a 6-tuple. This corresponds to the
	45	general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
	46	Each tuple item is a string, possibly empty. The components are not broken up in
	47	smaller parts (for example, the network location is a single string), and %
	48	escapes are not expanded. The delimiters as shown above are not part of the
	49	result, except for a leading slash in the path component, which is retained if
	50	present. For example:
	51
	52	>>> from urlparse import urlparse
	53	>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
	54	>>> o # doctest: +NORMALIZE_WHITESPACE
	55	ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
	56	params='', query='', fragment='')
	57	>>> o.scheme
	58	'http'
	59	>>> o.port
	60	80
	61	>>> o.geturl()
	62	'http://www.cwi.nl:80/%7Eguido/Python.html'
	63
[391]	64
	65	Following the syntax specifications in :rfc:`1808`, urlparse recognizes
	66	a netloc only if it is properly introduced by '//'. Otherwise the
	67	input is presumed to be a relative URL and thus to start with
	68	a path component.
	69
	70	>>> from urlparse import urlparse
	71	>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
	72	ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
	73	params='', query='', fragment='')
	74	>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
	75	ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
	76	params='', query='', fragment='')
	77	>>> urlparse('help/Python.html')
	78	ParseResult(scheme='', netloc='', path='help/Python.html', params='',
	79	query='', fragment='')
	80
	81	If the scheme argument is specified, it gives the default addressing
[2]	82	scheme, to be used only if the URL does not specify one. The default value for
	83	this argument is the empty string.
	84
	85	If the allow_fragments argument is false, fragment identifiers are not
	86	allowed, even if the URL's addressing scheme normally does support them. The
	87	default value for this argument is :const:`True`.
	88
	89	The return value is actually an instance of a subclass of :class:`tuple`. This
	90	class has the following additional read-only convenience attributes:
	91
	92	+------------------+-------+--------------------------+----------------------+
	93	\| Attribute \| Index \| Value \| Value if not present \|
	94	+==================+=======+==========================+======================+
	95	\| :attr:`scheme` \| 0 \| URL scheme specifier \| empty string \|
	96	+------------------+-------+--------------------------+----------------------+
	97	\| :attr:`netloc` \| 1 \| Network location part \| empty string \|
	98	+------------------+-------+--------------------------+----------------------+
	99	\| :attr:`path` \| 2 \| Hierarchical path \| empty string \|
	100	+------------------+-------+--------------------------+----------------------+
	101	\| :attr:`params` \| 3 \| Parameters for last path \| empty string \|
	102	\| \| \| element \| \|
	103	+------------------+-------+--------------------------+----------------------+
	104	\| :attr:`query` \| 4 \| Query component \| empty string \|
	105	+------------------+-------+--------------------------+----------------------+
	106	\| :attr:`fragment` \| 5 \| Fragment identifier \| empty string \|
	107	+------------------+-------+--------------------------+----------------------+
	108	\| :attr:`username` \| \| User name \| :const:`None` \|
	109	+------------------+-------+--------------------------+----------------------+
	110	\| :attr:`password` \| \| Password \| :const:`None` \|
	111	+------------------+-------+--------------------------+----------------------+
	112	\| :attr:`hostname` \| \| Host name (lower case) \| :const:`None` \|
	113	+------------------+-------+--------------------------+----------------------+
	114	\| :attr:`port` \| \| Port number as integer, \| :const:`None` \|
	115	\| \| \| if present \| \|
	116	+------------------+-------+--------------------------+----------------------+
	117
	118	See section :ref:`urlparse-result-object` for more information on the result
	119	object.
	120
	121	.. versionchanged:: 2.5
	122	Added attributes to return value.
	123
[391]	124	.. versionchanged:: 2.7
	125	Added IPv6 URL parsing capabilities.
	126
	127
[2]	128	.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]])
	129
	130	Parse a query string given as a string argument (data of type
	131	:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
	132	dictionary. The dictionary keys are the unique query variable names and the
	133	values are lists of values for each name.
	134
	135	The optional argument keep_blank_values is a flag indicating whether blank
[391]	136	values in percent-encoded queries should be treated as blank strings. A true value
[2]	137	indicates that blanks should be retained as blank strings. The default false
	138	value indicates that blank values are to be ignored and treated as if they were
	139	not included.
	140
	141	The optional argument strict_parsing is a flag indicating what to do with
	142	parsing errors. If false (the default), errors are silently ignored. If true,
	143	errors raise a :exc:`ValueError` exception.
	144
	145	Use the :func:`urllib.urlencode` function to convert such dictionaries into
	146	query strings.
	147
	148	.. versionadded:: 2.6
	149	Copied from the :mod:`cgi` module.
	150
	151
	152	.. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]])
	153
	154	Parse a query string given as a string argument (data of type
	155	:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
	156	name, value pairs.
	157
	158	The optional argument keep_blank_values is a flag indicating whether blank
[391]	159	values in percent-encoded queries should be treated as blank strings. A true value
[2]	160	indicates that blanks should be retained as blank strings. The default false
	161	value indicates that blank values are to be ignored and treated as if they were
	162	not included.
	163
	164	The optional argument strict_parsing is a flag indicating what to do with
	165	parsing errors. If false (the default), errors are silently ignored. If true,
	166	errors raise a :exc:`ValueError` exception.
	167
	168	Use the :func:`urllib.urlencode` function to convert such lists of pairs into
	169	query strings.
	170
	171	.. versionadded:: 2.6
	172	Copied from the :mod:`cgi` module.
	173
	174
	175	.. function:: urlunparse(parts)
	176
	177	Construct a URL from a tuple as returned by ``urlparse()``. The parts argument
	178	can be any six-item iterable. This may result in a slightly different, but
	179	equivalent URL, if the URL that was parsed originally had unnecessary delimiters
	180	(for example, a ? with an empty query; the RFC states that these are
	181	equivalent).
	182
	183
[391]	184	.. function:: urlsplit(urlstring[, scheme[, allow_fragments]])
[2]	185
	186	This is similar to :func:`urlparse`, but does not split the params from the URL.
	187	This should generally be used instead of :func:`urlparse` if the more recent URL
	188	syntax allowing parameters to be applied to each segment of the path portion
	189	of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
	190	separate the path segments and parameters. This function returns a 5-tuple:
	191	(addressing scheme, network location, path, query, fragment identifier).
	192
	193	The return value is actually an instance of a subclass of :class:`tuple`. This
	194	class has the following additional read-only convenience attributes:
	195
	196	+------------------+-------+-------------------------+----------------------+
	197	\| Attribute \| Index \| Value \| Value if not present \|
	198	+==================+=======+=========================+======================+
	199	\| :attr:`scheme` \| 0 \| URL scheme specifier \| empty string \|
	200	+------------------+-------+-------------------------+----------------------+
	201	\| :attr:`netloc` \| 1 \| Network location part \| empty string \|
	202	+------------------+-------+-------------------------+----------------------+
	203	\| :attr:`path` \| 2 \| Hierarchical path \| empty string \|
	204	+------------------+-------+-------------------------+----------------------+
	205	\| :attr:`query` \| 3 \| Query component \| empty string \|
	206	+------------------+-------+-------------------------+----------------------+
	207	\| :attr:`fragment` \| 4 \| Fragment identifier \| empty string \|
	208	+------------------+-------+-------------------------+----------------------+
	209	\| :attr:`username` \| \| User name \| :const:`None` \|
	210	+------------------+-------+-------------------------+----------------------+
	211	\| :attr:`password` \| \| Password \| :const:`None` \|
	212	+------------------+-------+-------------------------+----------------------+
	213	\| :attr:`hostname` \| \| Host name (lower case) \| :const:`None` \|
	214	+------------------+-------+-------------------------+----------------------+
	215	\| :attr:`port` \| \| Port number as integer, \| :const:`None` \|
	216	\| \| \| if present \| \|
	217	+------------------+-------+-------------------------+----------------------+
	218
	219	See section :ref:`urlparse-result-object` for more information on the result
	220	object.
	221
	222	.. versionadded:: 2.2
	223
	224	.. versionchanged:: 2.5
	225	Added attributes to return value.
	226
	227
	228	.. function:: urlunsplit(parts)
	229
	230	Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
	231	URL as a string. The parts argument can be any five-item iterable. This may
	232	result in a slightly different, but equivalent URL, if the URL that was parsed
	233	originally had unnecessary delimiters (for example, a ? with an empty query; the
	234	RFC states that these are equivalent).
	235
	236	.. versionadded:: 2.2
	237
	238
	239	.. function:: urljoin(base, url[, allow_fragments])
	240
	241	Construct a full ("absolute") URL by combining a "base URL" (base) with
	242	another URL (url). Informally, this uses components of the base URL, in
	243	particular the addressing scheme, the network location and (part of) the path,
	244	to provide missing components in the relative URL. For example:
	245
	246	>>> from urlparse import urljoin
	247	>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
	248	'http://www.cwi.nl/%7Eguido/FAQ.html'
	249
	250	The allow_fragments argument has the same meaning and default as for
	251	:func:`urlparse`.
	252
	253	.. note::
	254
	255	If url is an absolute URL (that is, starting with ``//`` or ``scheme://``),
	256	the url's host name and/or scheme will be present in the result. For example:
	257
	258	.. doctest::
	259
	260	>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
	261	... '//www.python.org/%7Eguido')
	262	'http://www.python.org/%7Eguido'
	263
	264	If you do not want that behavior, preprocess the url with :func:`urlsplit` and
	265	:func:`urlunsplit`, removing possible scheme and netloc parts.
	266
	267
	268	.. function:: urldefrag(url)
	269
	270	If url contains a fragment identifier, returns a modified version of url
	271	with no fragment identifier, and the fragment identifier as a separate string.
	272	If there is no fragment identifier in url, returns url unmodified and an
	273	empty string.
	274
	275
	276	.. seealso::
	277
[391]	278	:rfc:`3986` - Uniform Resource Identifiers
	279	This is the current standard (STD66). Any changes to urlparse module
	280	should conform to this. Certain deviations could be observed, which are
	281	mostly due backward compatiblity purposes and for certain de-facto
	282	parsing requirements as commonly observed in major browsers.
[2]	283
[391]	284	:rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
	285	This specifies the parsing requirements of IPv6 URLs.
	286
	287	:rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
	288	Document describing the generic syntactic requirements for both Uniform Resource
	289	Names (URNs) and Uniform Resource Locators (URLs).
	290
	291	:rfc:`2368` - The mailto URL scheme.
	292	Parsing requirements for mailto url schemes.
	293
[2]	294	:rfc:`1808` - Relative Uniform Resource Locators
	295	This Request For Comments includes the rules for joining an absolute and a
	296	relative URL, including a fair number of "Abnormal Examples" which govern the
	297	treatment of border cases.
	298
[391]	299	:rfc:`1738` - Uniform Resource Locators (URL)
	300	This specifies the formal syntax and semantics of absolute URLs.
[2]	301
	302
	303	.. _urlparse-result-object:
	304
	305	Results of :func:`urlparse` and :func:`urlsplit`
	306	------------------------------------------------
	307
	308	The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
	309	subclasses of the :class:`tuple` type. These subclasses add the attributes
	310	described in those functions, as well as provide an additional method:
	311
	312
	313	.. method:: ParseResult.geturl()
	314
	315	Return the re-combined version of the original URL as a string. This may differ
	316	from the original URL in that the scheme will always be normalized to lower case
	317	and empty components may be dropped. Specifically, empty parameters, queries,
	318	and fragment identifiers will be removed.
	319
	320	The result of this method is a fixpoint if passed back through the original
	321	parsing function:
	322
	323	>>> import urlparse
	324	>>> url = 'HTTP://www.Python.org/doc/#'
	325
	326	>>> r1 = urlparse.urlsplit(url)
	327	>>> r1.geturl()
	328	'http://www.Python.org/doc/'
	329
	330	>>> r2 = urlparse.urlsplit(r1.geturl())
	331	>>> r2.geturl()
	332	'http://www.Python.org/doc/'
	333
	334	.. versionadded:: 2.5
	335
	336	The following classes provide the implementations of the parse results:
	337
	338
	339	.. class:: BaseResult
	340
	341	Base class for the concrete result classes. This provides most of the attribute
	342	definitions. It does not provide a :meth:`geturl` method. It is derived from
	343	:class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
	344	methods.
	345
	346
	347	.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
	348
	349	Concrete class for :func:`urlparse` results. The :meth:`__new__` method is
	350	overridden to support checking that the right number of arguments are passed.
	351
	352
	353	.. class:: SplitResult(scheme, netloc, path, query, fragment)
	354
	355	Concrete class for :func:`urlsplit` results. The :meth:`__new__` method is
	356	overridden to support checking that the right number of arguments are passed.
	357

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Doc/library/urlparse.rst

Download in other formats: