Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

robotparser.rst

Last change on this file was 391, checked in by dmik, 11 years ago
python: Merge vendor 2.7.6 to trunk.
Property svn:eol-style set to `native`
File size: 2.1 KB

Rev	Line
[2]	1
	2	:mod:`robotparser` --- Parser for robots.txt
	3	=============================================
	4
	5	.. module:: robotparser
	6	:synopsis: Loads a robots.txt file and answers questions about
	7	fetchability of other URLs.
	8	.. sectionauthor:: Skip Montanaro <skip@pobox.com>
	9
	10
	11	.. index::
	12	single: WWW
	13	single: World Wide Web
	14	single: URL
	15	single: robots.txt
	16
	17	.. note::
	18	The :mod:`robotparser` module has been renamed :mod:`urllib.robotparser` in
[391]	19	Python 3.
[2]	20	The :term:`2to3` tool will automatically adapt imports when converting
[391]	21	your sources to Python 3.
[2]	22
	23	This module provides a single class, :class:`RobotFileParser`, which answers
	24	questions about whether or not a particular user agent can fetch a URL on the
	25	Web site that published the :file:`robots.txt` file. For more details on the
	26	structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
	27
	28
[391]	29	.. class:: RobotFileParser(url='')
[2]	30
[391]	31	This class provides methods to read, parse and answer questions about the
	32	:file:`robots.txt` file at url.
[2]	33
	34
	35	.. method:: set_url(url)
	36
	37	Sets the URL referring to a :file:`robots.txt` file.
	38
	39
	40	.. method:: read()
	41
	42	Reads the :file:`robots.txt` URL and feeds it to the parser.
	43
	44
	45	.. method:: parse(lines)
	46
	47	Parses the lines argument.
	48
	49
	50	.. method:: can_fetch(useragent, url)
	51
	52	Returns ``True`` if the useragent is allowed to fetch the url
	53	according to the rules contained in the parsed :file:`robots.txt`
	54	file.
	55
	56
	57	.. method:: mtime()
	58
	59	Returns the time the ``robots.txt`` file was last fetched. This is
	60	useful for long-running web spiders that need to check for new
	61	``robots.txt`` files periodically.
	62
	63
	64	.. method:: modified()
	65
	66	Sets the time the ``robots.txt`` file was last fetched to the current
	67	time.
	68
	69	The following example demonstrates basic use of the RobotFileParser class. ::
	70
	71	>>> import robotparser
	72	>>> rp = robotparser.RobotFileParser()
	73	>>> rp.set_url("http://www.musi-cal.com/robots.txt")
	74	>>> rp.read()
	75	>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
	76	False
	77	>>> rp.can_fetch("*", "http://www.musi-cal.com/")
	78	True
	79

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Doc/library/robotparser.rst

Download in other formats: