Ignore:
Timestamp:
Mar 19, 2014, 11:31:01 PM (11 years ago)
Author:
dmik
Message:

python: Merge vendor 2.7.6 to trunk.

Location:
python/trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • python/trunk

  • python/trunk/Doc/howto/regex.rst

    r2 r391  
    66
    77:Author: A.M. Kuchling <amk@amk.ca>
    8 :Release: 0.05
    98
    109.. TODO:
     
    8382in the rest of this HOWTO. ::
    8483
    85    . ^ $ * + ? { [ ] \ | ( )
     84   . ^ $ * + ? { } [ ] \ | ( )
    8685
    8786The first metacharacters we'll look at are ``[`` and ``]``. They're used for
     
    114113of characters that are often useful, such as the set of digits, the set of
    115114letters, or the set of anything that isn't whitespace.  The following predefined
    116 special sequences are available:
     115special sequences are a subset of those available. The equivalent classes are
     116for byte string patterns. For a complete list of sequences and expanded class
     117definitions for Unicode string patterns, see the last part of
     118:ref:`Regular Expression Syntax <re-syntax>`.
    117119
    118120``\d``
     
    264266   >>> import re
    265267   >>> p = re.compile('ab*')
    266    >>> print p
    267    <_sre.SRE_Pattern object at 80b4150>
     268   >>> p  #doctest: +ELLIPSIS
     269   <_sre.SRE_Pattern object at 0x...>
    268270
    269271:func:`re.compile` also accepts an optional *flags* argument, used to enable
     
    358360
    359361:meth:`match` and :meth:`search` return ``None`` if no match can be found.  If
    360 they're successful, a ``MatchObject`` instance is returned, containing
    361 information about the match: where it starts and ends, the substring it matched,
    362 and more.
     362they're successful, a :ref:`match object <match-objects>` instance is returned,
     363containing information about the match: where it starts and ends, the substring
     364it matched, and more.
    363365
    364366You can learn about this by interactively experimenting with the :mod:`re`
    365367module.  If you have Tkinter available, you may also want to look at
    366 :file:`Tools/scripts/redemo.py`, a demonstration program included with the
     368:source:`Tools/scripts/redemo.py`, a demonstration program included with the
    367369Python distribution.  It allows you to enter REs and strings, and displays
    368370whether the RE matches or fails. :file:`redemo.py` can be quite useful when
     
    377379   >>> import re
    378380   >>> p = re.compile('[a-z]+')
    379    >>> p
    380    <_sre.SRE_Pattern object at 80c3c28>
     381   >>> p  #doctest: +ELLIPSIS
     382   <_sre.SRE_Pattern object at 0x...>
    381383
    382384Now, you can try matching various strings against the RE ``[a-z]+``.  An empty
     
    391393
    392394Now, let's try it on a string that it should match, such as ``tempo``.  In this
    393 case, :meth:`match` will return a :class:`MatchObject`, so you should store the
    394 result in a variable for later use. ::
     395case, :meth:`match` will return a :ref:`match object <match-objects>`, so you
     396should store the result in a variable for later use. ::
    395397
    396398   >>> m = p.match('tempo')
    397    >>> print m
    398    <_sre.SRE_Match object at 80c4f68>
    399 
    400 Now you can query the :class:`MatchObject` for information about the matching
    401 string.   :class:`MatchObject` instances also have several methods and
    402 attributes; the most important ones are:
     399   >>> m  #doctest: +ELLIPSIS
     400   <_sre.SRE_Match object at 0x...>
     401
     402Now you can query the :ref:`match object <match-objects>` for information
     403about the matching string.  :ref:`match object <match-objects>` instances
     404also have several methods and attributes; the most important ones are:
    403405
    404406+------------------+--------------------------------------------+
     
    434436   >>> print p.match('::: message')
    435437   None
    436    >>> m = p.search('::: message') ; print m
    437    <re.MatchObject instance at 80c9650>
     438   >>> m = p.search('::: message'); print m  #doctest: +ELLIPSIS
     439   <_sre.SRE_Match object at 0x...>
    438440   >>> m.group()
    439441   'message'
     
    441443   (4, 11)
    442444
    443 In actual programs, the most common style is to store the :class:`MatchObject`
    444 in a variable, and then check if it was ``None``.  This usually looks like::
     445In actual programs, the most common style is to store the
     446:ref:`match object <match-objects>` in a variable, and then check if it was
     447``None``.  This usually looks like::
    445448
    446449   p = re.compile( ... )
     
    459462
    460463:meth:`findall` has to create the entire list before it can be returned as the
    461 result.  The :meth:`finditer` method returns a sequence of :class:`MatchObject`
    462 instances as an :term:`iterator`. [#]_ ::
     464result.  The :meth:`finditer` method returns a sequence of
     465:ref:`match object <match-objects>` instances as an :term:`iterator`. [#]_ ::
    463466
    464467   >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
    465    >>> iterator
    466    <callable-iterator object at 0x401833ac>
     468   >>> iterator  #doctest: +ELLIPSIS
     469   <callable-iterator object at 0x...>
    467470   >>> for match in iterator:
    468471   ...     print match.span()
     
    481484take the same arguments as the corresponding pattern method, with
    482485the RE string added as the first argument, and still return either ``None`` or a
    483 :class:`MatchObject` instance. ::
     486:ref:`match object <match-objects>` instance. ::
    484487
    485488   >>> print re.match(r'From\s+', 'Fromage amk')
    486489   None
    487    >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
    488    <re.MatchObject instance at 80c5978>
     490   >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')  #doctest: +ELLIPSIS
     491   <_sre.SRE_Match object at 0x...>
    489492
    490493Under the hood, these functions simply create a pattern object for you
     
    500503the definitions in one place, in a section of code that compiles all the REs
    501504ahead of time.  To take an example from the standard library, here's an extract
    502 from :file:`xmllib.py`::
     505from the deprecated :mod:`xmllib` module::
    503506
    504507   ref = re.compile( ... )
     
    686689   line, the RE to use is ``^From``. ::
    687690
    688       >>> print re.search('^From', 'From Here to Eternity')
    689       <re.MatchObject instance at 80c1520>
     691      >>> print re.search('^From', 'From Here to Eternity')  #doctest: +ELLIPSIS
     692      <_sre.SRE_Match object at 0x...>
    690693      >>> print re.search('^From', 'Reciting From Memory')
    691694      None
     
    698701   or any location followed by a newline character.     ::
    699702
    700       >>> print re.search('}$', '{block}')
    701       <re.MatchObject instance at 80adfa8>
     703      >>> print re.search('}$', '{block}')  #doctest: +ELLIPSIS
     704      <_sre.SRE_Match object at 0x...>
    702705      >>> print re.search('}$', '{block} ')
    703706      None
    704       >>> print re.search('}$', '{block}\n')
    705       <re.MatchObject instance at 80adfa8>
     707      >>> print re.search('}$', '{block}\n')  #doctest: +ELLIPSIS
     708      <_sre.SRE_Match object at 0x...>
    706709
    707710   To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
     
    727730
    728731      >>> p = re.compile(r'\bclass\b')
    729       >>> print p.search('no class at all')
    730       <re.MatchObject instance at 80c8f28>
     732      >>> print p.search('no class at all')  #doctest: +ELLIPSIS
     733      <_sre.SRE_Match object at 0x...>
    731734      >>> print p.search('the declassified algorithm')
    732735      None
     
    745748      >>> print p.search('no class at all')
    746749      None
    747       >>> print p.search('\b' + 'class' + '\b')
    748       <re.MatchObject instance at 80c3ee0>
     750      >>> print p.search('\b' + 'class' + '\b')  #doctest: +ELLIPSIS
     751      <_sre.SRE_Match object at 0x...>
    749752
    750753   Second, inside a character class, where there's no use for this assertion,
     
    790793to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`.  Groups are
    791794numbered starting with 0.  Group 0 is always present; it's the whole RE, so
    792 :class:`MatchObject` methods all have group 0 as their default argument.  Later
    793 we'll see how to express groups that don't capture the span of text that they
    794 match. ::
     795:ref:`match object <match-objects>` methods all have group 0 as their default
     796argument.  Later we'll see how to express groups that don't capture the span
     797of text that they match. ::
    795798
    796799   >>> p = re.compile('(a)b')
     
    912915``(?P<name>...)``.  *name* is, obviously, the name of the group.  Named groups
    913916also behave exactly like capturing groups, and additionally associate a name
    914 with a group.  The :class:`MatchObject` methods that deal with capturing groups
    915 all accept either integers that refer to the group by number or strings that
    916 contain the desired group's name.  Named groups are still given numbers, so you
    917 can retrieve information about a group in two ways::
     917with a group.  The :ref:`match object <match-objects>` methods that deal with
     918capturing groups all accept either integers that refer to the group by number
     919or strings that contain the desired group's name.  Named groups are still
     920given numbers, so you can retrieve information about a group in two ways::
    918921
    919922   >>> p = re.compile(r'(?P<word>\b\w+\b)')
     
    11791182*replacement* can also be a function, which gives you even more control.  If
    11801183*replacement* is a function, the function is called for every non-overlapping
    1181 occurrence of *pattern*.  On each call, the function is  passed a
    1182 :class:`MatchObject` argument for the match and can use this information to
    1183 compute the desired replacement string and return it.
    1184 
    1185 In the following example, the replacement function translates  decimals into
     1184occurrence of *pattern*.  On each call, the function is passed a
     1185:ref:`match object <match-objects>` argument for the match and can use this
     1186information to compute the desired replacement string and return it.
     1187
     1188In the following example, the replacement function translates decimals into
    11861189hexadecimal::
    11871190
    1188    >>> def hexrepl( match ):
     1191   >>> def hexrepl(match):
    11891192   ...     "Return the hex string for a decimal number"
    1190    ...     value = int( match.group() )
     1193   ...     value = int(match.group())
    11911194   ...     return hex(value)
    11921195   ...
     
    13161319
    13171320
    1318 Not Using re.VERBOSE
     1321Using re.VERBOSE
    13191322--------------------
    13201323
Note: See TracChangeset for help on using the changeset viewer.