Ignore:
Timestamp:
Mar 19, 2014, 11:31:01 PM (11 years ago)
Author:
dmik
Message:

python: Merge vendor 2.7.6 to trunk.

Location:
python/trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • python/trunk

  • python/trunk/Doc/library/re.rst

    r2 r391  
    157157
    158158``[]``
    159    Used to indicate a set of characters.  Characters can be listed individually, or
    160    a range of characters can be indicated by giving two characters and separating
    161    them by a ``'-'``.  Special characters are not active inside sets.  For example,
    162    ``[akm$]`` will match any of the characters ``'a'``, ``'k'``,
    163    ``'m'``, or ``'$'``; ``[a-z]`` will match any lowercase letter, and
    164    ``[a-zA-Z0-9]`` matches any letter or digit.  Character classes such
    165    as ``\w`` or ``\S`` (defined below) are also acceptable inside a
    166    range, although the characters they match depends on whether :const:`LOCALE`
    167    or  :const:`UNICODE` mode is in force.  If you want to include a
    168    ``']'`` or a ``'-'`` inside a set, precede it with a backslash, or
    169    place it as the first character.  The pattern ``[]]`` will match
    170    ``']'``, for example.
    171 
    172    You can match the characters not within a range by :dfn:`complementing` the set.
    173    This is indicated by including a ``'^'`` as the first character of the set;
    174    ``'^'`` elsewhere will simply match the ``'^'`` character.  For example,
    175    ``[^5]`` will match any character except ``'5'``, and ``[^^]`` will match any
    176    character except ``'^'``.
    177 
    178    Note that inside ``[]`` the special forms and special characters lose
    179    their meanings and only the syntaxes described here are valid. For
    180    example, ``+``, ``*``, ``(``, ``)``, and so on are treated as
    181    literals inside ``[]``, and backreferences cannot be used inside
    182    ``[]``.
     159   Used to indicate a set of characters.  In a set:
     160
     161   * Characters can be listed individually, e.g. ``[amk]`` will match ``'a'``,
     162     ``'m'``, or ``'k'``.
     163
     164   * Ranges of characters can be indicated by giving two characters and separating
     165     them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter,
     166     ``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and
     167     ``[0-9A-Fa-f]`` will match any hexadecimal digit.  If ``-`` is escaped (e.g.
     168     ``[a\-z]``) or if it's placed as the first or last character (e.g. ``[a-]``),
     169     it will match a literal ``'-'``.
     170
     171   * Special characters lose their special meaning inside sets.  For example,
     172     ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``,
     173     ``'*'``, or ``')'``.
     174
     175   * Character classes such as ``\w`` or ``\S`` (defined below) are also accepted
     176     inside a set, although the characters they match depends on whether
     177     :const:`LOCALE` or  :const:`UNICODE` mode is in force.
     178
     179   * Characters that are not within a range can be matched by :dfn:`complementing`
     180     the set.  If the first character of the set is ``'^'``, all the characters
     181     that are *not* in the set will be matched.  For example, ``[^5]`` will match
     182     any character except ``'5'``, and ``[^^]`` will match any character except
     183     ``'^'``.  ``^`` has no special meaning if it's not the first character in
     184     the set.
     185
     186   * To match a literal ``']'`` inside a set, precede it with a backslash, or
     187     place it at the beginning of the set.  For example, both ``[()[\]{}]`` and
     188     ``[]()[{}]`` will both match a parenthesis.
    183189
    184190``'|'``
     
    225231
    226232``(?:...)``
    227    A non-grouping version of regular parentheses. Matches whatever regular
     233   A non-capturing version of regular parentheses. Matches whatever regular
    228234   expression is inside the parentheses, but the substring matched by the group
    229235   *cannot* be retrieved after performing a match or referenced later in the
     
    232238``(?P<name>...)``
    233239   Similar to regular parentheses, but the substring matched by the group is
    234    accessible within the rest of the regular expression via the symbolic group
    235    name *name*.  Group names must be valid Python identifiers, and each group
    236    name must be defined only once within a regular expression.  A symbolic group
    237    is also a numbered group, just as if the group were not named.  So the group
    238    named ``id`` in the example below can also be referenced as the numbered group
    239    ``1``.
    240 
    241    For example, if the pattern is ``(?P<id>[a-zA-Z_]\w*)``, the group can be
    242    referenced by its name in arguments to methods of match objects, such as
    243    ``m.group('id')`` or ``m.end('id')``, and also by name in the regular
    244    expression itself (using ``(?P=id)``) and replacement text given to
    245    ``.sub()`` (using ``\g<id>``).
     240   accessible via the symbolic group name *name*.  Group names must be valid
     241   Python identifiers, and each group name must be defined only once within a
     242   regular expression.  A symbolic group is also a numbered group, just as if
     243   the group were not named.
     244
     245   Named groups can be referenced in three contexts.  If the pattern is
     246   ``(?P<quote>['"]).*?(?P=quote)`` (i.e. matching a string quoted with either
     247   single or double quotes):
     248
     249   +---------------------------------------+----------------------------------+
     250   | Context of reference to group "quote" | Ways to reference it             |
     251   +=======================================+==================================+
     252   | in the same pattern itself            | * ``(?P=quote)`` (as shown)      |
     253   |                                       | * ``\1``                         |
     254   +---------------------------------------+----------------------------------+
     255   | when processing match object ``m``    | * ``m.group('quote')``           |
     256   |                                       | * ``m.end('quote')`` (etc.)      |
     257   +---------------------------------------+----------------------------------+
     258   | in a string passed to the ``repl``    | * ``\g<quote>``                  |
     259   | argument of ``re.sub()``              | * ``\g<1>``                      |
     260   |                                       | * ``\1``                         |
     261   +---------------------------------------+----------------------------------+
    246262
    247263``(?P=name)``
    248    Matches whatever text was matched by the earlier group named *name*.
     264   A backreference to a named group; it matches whatever text was matched by the
     265   earlier group named *name*.
    249266
    250267``(?#...)``
     
    268285   The contained pattern must only match strings of some fixed length, meaning that
    269286   ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not.  Note that
    270    patterns which start with positive lookbehind assertions will never match at the
     287   patterns which start with positive lookbehind assertions will not match at the
    271288   beginning of the string being searched; you will most likely want to use the
    272289   :func:`search` function rather than the :func:`match` function:
     
    306323   Matches the contents of the group of the same number.  Groups are numbered
    307324   starting from 1.  For example, ``(.+) \1`` matches ``'the the'`` or ``'55 55'``,
    308    but not ``'the end'`` (note the space after the group).  This special sequence
     325   but not ``'thethe'`` (note the space after the group).  This special sequence
    309326   can only be used to match one of the first 99 groups.  If the first digit of
    310327   *number* is 0, or *number* is 3 octal digits long, it will not be interpreted as
     
    320337   defined as a sequence of alphanumeric or underscore characters, so the end of a
    321338   word is indicated by whitespace or a non-alphanumeric, non-underscore character.
    322    Note that  ``\b`` is defined as the boundary between ``\w`` and ``\ W``, so the
    323    precise set of characters deemed to be alphanumeric depends on the values of the
    324    ``UNICODE`` and ``LOCALE`` flags.  Inside a character range, ``\b`` represents
    325    the backspace character, for compatibility with Python's string literals.
     339   Note that formally, ``\b`` is defined as the boundary between a ``\w`` and
     340   a ``\W`` character (or vice versa), or between ``\w`` and the beginning/end
     341   of the string, so the precise set of characters deemed to be alphanumeric
     342   depends on the values of the ``UNICODE`` and ``LOCALE`` flags.
     343   For example, ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
     344   ``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
     345   Inside a character range, ``\b`` represents the backspace character, for
     346   compatibility with Python's string literals.
    326347
    327348``\B``
    328349   Matches the empty string, but only when it is *not* at the beginning or end of a
    329    word.  This is just the opposite of ``\b``, so is also subject to the settings
     350   word.  This means that ``r'py\B'`` matches ``'python'``, ``'py3'``, ``'py2'``,
     351   but not ``'py'``, ``'py.'``, or ``'py!'``.
     352   ``\B`` is just the opposite of ``\b``, so is also subject to the settings
    330353   of ``LOCALE`` and ``UNICODE``.
    331354
     
    333356   When the :const:`UNICODE` flag is not specified, matches any decimal digit; this
    334357   is equivalent to the set ``[0-9]``.  With :const:`UNICODE`, it will match
    335    whatever is classified as a digit in the Unicode character properties database.
     358   whatever is classified as a decimal digit in the Unicode character properties
     359   database.
    336360
    337361``\D``
     
    342366
    343367``\s``
    344    When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
    345    any whitespace character; this is equivalent to the set ``[ \t\n\r\f\v]``. With
    346    :const:`LOCALE`, it will match this set plus whatever characters are defined as
    347    space for the current locale. If :const:`UNICODE` is set, this will match the
    348    characters ``[ \t\n\r\f\v]`` plus whatever is classified as space in the Unicode
    349    character properties database.
     368   When the :const:`UNICODE` flag is not specified, it matches any whitespace
     369   character, this is equivalent to the set ``[ \t\n\r\f\v]``. The
     370   :const:`LOCALE` flag has no extra effect on matching of the space.
     371   If :const:`UNICODE` is set, this will match the characters ``[ \t\n\r\f\v]``
     372   plus whatever is classified as space in the Unicode character properties
     373   database.
    350374
    351375``\S``
    352    When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
    353    any non-whitespace character; this is equivalent to the set ``[^ \t\n\r\f\v]``
    354    With :const:`LOCALE`, it will match any character not in this set, and not
    355    defined as space in the current locale. If :const:`UNICODE` is set, this will
    356    match anything other than ``[ \t\n\r\f\v]`` and characters marked as space in
    357    the Unicode character properties database.
     376   When the :const:`UNICODE` flags is not specified, matches any non-whitespace
     377   character; this is equivalent to the set ``[^ \t\n\r\f\v]`` The
     378   :const:`LOCALE` flag has no extra effect on non-whitespace match.  If
     379   :const:`UNICODE` is set, then any character not marked as space in the
     380   Unicode character properties database is matched.
     381
    358382
    359383``\w``
     
    370394   With :const:`LOCALE`, it will match any character not in the set ``[0-9_]``, and
    371395   not defined as alphanumeric for the current locale. If :const:`UNICODE` is set,
    372    this will match anything other than ``[0-9_]`` and characters marked as
    373    alphanumeric in the Unicode character properties database.
     396   this will match anything other than ``[0-9_]`` plus characters classied as
     397   not alphanumeric in the Unicode character properties database.
    374398
    375399``\Z``
    376400   Matches only at the end of the string.
     401
     402If both :const:`LOCALE` and :const:`UNICODE` flags are included for a
     403particular sequence, then :const:`LOCALE` flag takes effect first followed by
     404the :const:`UNICODE`.
    377405
    378406Most of the standard escapes supported by Python string literals are also
     
    382410   \r      \t      \v      \x
    383411   \\
     412
     413(Note that ``\b`` is used to represent word boundaries, and means "backspace"
     414only inside character classes.)
    384415
    385416Octal escapes are included in a limited form: If the first digit is a 0, or if
     
    389420
    390421
    391 .. _matching-searching:
    392 
    393 Matching vs Searching
    394 ---------------------
    395 
    396 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
    397 
    398 
    399 Python offers two different primitive operations based on regular expressions:
    400 **match** checks for a match only at the beginning of the string, while
    401 **search** checks for a match anywhere in the string (this is what Perl does
    402 by default).
    403 
    404 Note that match may differ from search even when using a regular expression
    405 beginning with ``'^'``: ``'^'`` matches only at the start of the string, or in
    406 :const:`MULTILINE` mode also immediately following a newline.  The "match"
    407 operation succeeds only if the pattern matches at the start of the string
    408 regardless of mode, or at the starting position given by the optional *pos*
    409 argument regardless of whether a newline precedes it.
    410 
    411    >>> re.match("c", "abcdef")  # No match
    412    >>> re.search("c", "abcdef") # Match
    413    <_sre.SRE_Match object at ...>
    414 
    415 
    416422.. _contents-of-module-re:
    417423
     
    425431
    426432
    427 .. function:: compile(pattern[, flags])
     433.. function:: compile(pattern, flags=0)
    428434
    429435   Compile a regular expression pattern into a regular expression object, which
     
    454460      programs that use only a few regular expressions at a time needn't worry
    455461      about compiling regular expressions.
     462
     463
     464.. data:: DEBUG
     465
     466   Display debug information about compiled expression.
    456467
    457468
     
    515526
    516527
    517 .. function:: search(pattern, string[, flags])
     528.. function:: search(pattern, string, flags=0)
    518529
    519530   Scan through *string* looking for a location where the regular expression
     
    524535
    525536
    526 .. function:: match(pattern, string[, flags])
     537.. function:: match(pattern, string, flags=0)
    527538
    528539   If zero or more characters at the beginning of *string* match the regular
     
    531542   different from a zero-length match.
    532543
    533    .. note::
    534 
    535       If you want to locate a match anywhere in *string*, use :func:`search`
    536       instead.
    537 
    538 
    539 .. function:: split(pattern, string[, maxsplit=0])
     544   Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match
     545   at the beginning of the string and not at the beginning of each line.
     546
     547   If you want to locate a match anywhere in *string*, use :func:`search`
     548   instead (see also :ref:`search-vs-match`).
     549
     550
     551.. function:: split(pattern, string, maxsplit=0, flags=0)
    540552
    541553   Split *string* by the occurrences of *pattern*.  If capturing parentheses are
     
    552564      >>> re.split('\W+', 'Words, words, words.', 1)
    553565      ['Words', 'words, words.']
     566      >>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
     567      ['0', '3', '9']
    554568
    555569   If there are capturing groups in the separator and it matches at the start of
     
    572586      ['foo\n\nbar\n']
    573587
    574 
    575 .. function:: findall(pattern, string[, flags])
     588   .. versionchanged:: 2.7
     589      Added the optional flags argument.
     590
     591
     592.. function:: findall(pattern, string, flags=0)
    576593
    577594   Return all non-overlapping matches of *pattern* in *string*, as a list of
     
    588605
    589606
    590 .. function:: finditer(pattern, string[, flags])
     607.. function:: finditer(pattern, string, flags=0)
    591608
    592609   Return an :term:`iterator` yielding :class:`MatchObject` instances over all
     
    602619
    603620
    604 .. function:: sub(pattern, repl, string[, count])
     621.. function:: sub(pattern, repl, string, count=0, flags=0)
    605622
    606623   Return the string obtained by replacing the leftmost non-overlapping occurrences
     
    608625   *string* is returned unchanged.  *repl* can be a string or a function; if it is
    609626   a string, any backslash escapes in it are processed.  That is, ``\n`` is
    610    converted to a single newline character, ``\r`` is converted to a linefeed, and
     627   converted to a single newline character, ``\r`` is converted to a carriage return, and
    611628   so forth.  Unknown escapes such as ``\j`` are left alone.  Backreferences, such
    612629   as ``\6``, are replaced with the substring matched by group 6 in the pattern.
     
    627644      >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
    628645      'pro--gram files'
    629 
    630    The pattern may be a string or an RE object; if you need to specify regular
    631    expression flags, you must use a RE object, or use embedded modifiers in a
    632    pattern; for example, ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.
     646      >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
     647      'Baked Beans & Spam'
     648
     649   The pattern may be a string or an RE object.
    633650
    634651   The optional argument *count* is the maximum number of pattern occurrences to be
     
    638655   ``'-a-b-c-'``.
    639656
    640    In addition to character escapes and backreferences as described above,
     657   In string-type *repl* arguments, in addition to the character escapes and
     658   backreferences described above,
    641659   ``\g<name>`` will use the substring matched by the group named ``name``, as
    642660   defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding
     
    647665   substring matched by the RE.
    648666
    649 
    650 .. function:: subn(pattern, repl, string[, count])
     667   .. versionchanged:: 2.7
     668      Added the optional flags argument.
     669
     670
     671.. function:: subn(pattern, repl, string, count=0, flags=0)
    651672
    652673   Perform the same operation as :func:`sub`, but return a tuple ``(new_string,
    653674   number_of_subs_made)``.
     675
     676   .. versionchanged:: 2.7
     677      Added the optional flags argument.
    654678
    655679
     
    659683   want to match an arbitrary literal string that may have regular expression
    660684   metacharacters in it.
     685
     686
     687.. function:: purge()
     688
     689   Clear the regular expression cache.
    661690
    662691
     
    674703--------------------------
    675704
    676 Compiled regular expression objects support the following methods and
    677 attributes:
    678 
    679 
    680 .. method:: RegexObject.match(string[, pos[, endpos]])
    681 
    682    If zero or more characters at the beginning of *string* match this regular
    683    expression, return a corresponding :class:`MatchObject` instance.  Return
    684    ``None`` if the string does not match the pattern; note that this is different
    685    from a zero-length match.
    686 
    687    .. note::
    688 
    689       If you want to locate a match anywhere in *string*, use
    690       :meth:`~RegexObject.search` instead.
    691 
    692    The optional second parameter *pos* gives an index in the string where the
    693    search is to start; it defaults to ``0``.  This is not completely equivalent to
    694    slicing the string; the ``'^'`` pattern character matches at the real beginning
    695    of the string and at positions just after a newline, but not necessarily at the
    696    index where the search is to start.
    697 
    698    The optional parameter *endpos* limits how far the string will be searched; it
    699    will be as if the string is *endpos* characters long, so only the characters
    700    from *pos* to ``endpos - 1`` will be searched for a match.  If *endpos* is less
    701    than *pos*, no match will be found, otherwise, if *rx* is a compiled regular
    702    expression object, ``rx.match(string, 0, 50)`` is equivalent to
    703    ``rx.match(string[:50], 0)``.
     705.. class:: RegexObject
     706
     707   The :class:`RegexObject` class supports the following methods and attributes:
     708
     709   .. method:: RegexObject.search(string[, pos[, endpos]])
     710
     711      Scan through *string* looking for a location where this regular expression
     712      produces a match, and return a corresponding :class:`MatchObject` instance.
     713      Return ``None`` if no position in the string matches the pattern; note that this
     714      is different from finding a zero-length match at some point in the string.
     715
     716      The optional second parameter *pos* gives an index in the string where the
     717      search is to start; it defaults to ``0``.  This is not completely equivalent to
     718      slicing the string; the ``'^'`` pattern character matches at the real beginning
     719      of the string and at positions just after a newline, but not necessarily at the
     720      index where the search is to start.
     721
     722      The optional parameter *endpos* limits how far the string will be searched; it
     723      will be as if the string is *endpos* characters long, so only the characters
     724      from *pos* to ``endpos - 1`` will be searched for a match.  If *endpos* is less
     725      than *pos*, no match will be found, otherwise, if *rx* is a compiled regular
     726      expression object, ``rx.search(string, 0, 50)`` is equivalent to
     727      ``rx.search(string[:50], 0)``.
     728
     729      >>> pattern = re.compile("d")
     730      >>> pattern.search("dog")     # Match at index 0
     731      <_sre.SRE_Match object at ...>
     732      >>> pattern.search("dog", 1)  # No match; search doesn't include the "d"
     733
     734
     735   .. method:: RegexObject.match(string[, pos[, endpos]])
     736
     737      If zero or more characters at the *beginning* of *string* match this regular
     738      expression, return a corresponding :class:`MatchObject` instance.  Return
     739      ``None`` if the string does not match the pattern; note that this is different
     740      from a zero-length match.
     741
     742      The optional *pos* and *endpos* parameters have the same meaning as for the
     743      :meth:`~RegexObject.search` method.
    704744
    705745      >>> pattern = re.compile("o")
    706       >>> pattern.match("dog")      # No match as "o" is not at the start of "dog."
     746      >>> pattern.match("dog")      # No match as "o" is not at the start of "dog".
    707747      >>> pattern.match("dog", 1)   # Match as "o" is the 2nd character of "dog".
    708748      <_sre.SRE_Match object at ...>
    709749
    710 
    711 .. method:: RegexObject.search(string[, pos[, endpos]])
    712 
    713    Scan through *string* looking for a location where this regular expression
    714    produces a match, and return a corresponding :class:`MatchObject` instance.
    715    Return ``None`` if no position in the string matches the pattern; note that this
    716    is different from finding a zero-length match at some point in the string.
    717 
    718    The optional *pos* and *endpos* parameters have the same meaning as for the
    719    :meth:`~RegexObject.match` method.
    720 
    721 
    722 .. method:: RegexObject.split(string[, maxsplit=0])
    723 
    724    Identical to the :func:`split` function, using the compiled pattern.
    725 
    726 
    727 .. method:: RegexObject.findall(string[, pos[, endpos]])
    728 
    729    Identical to the :func:`findall` function, using the compiled pattern.
    730 
    731 
    732 .. method:: RegexObject.finditer(string[, pos[, endpos]])
    733 
    734    Identical to the :func:`finditer` function, using the compiled pattern.
    735 
    736 
    737 .. method:: RegexObject.sub(repl, string[, count=0])
    738 
    739    Identical to the :func:`sub` function, using the compiled pattern.
    740 
    741 
    742 .. method:: RegexObject.subn(repl, string[, count=0])
    743 
    744    Identical to the :func:`subn` function, using the compiled pattern.
    745 
    746 
    747 .. attribute:: RegexObject.flags
    748 
    749    The flags argument used when the RE object was compiled, or ``0`` if no flags
    750    were provided.
    751 
    752 
    753 .. attribute:: RegexObject.groups
    754 
    755    The number of capturing groups in the pattern.
    756 
    757 
    758 .. attribute:: RegexObject.groupindex
    759 
    760    A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
    761    numbers.  The dictionary is empty if no symbolic groups were used in the
    762    pattern.
    763 
    764 
    765 .. attribute:: RegexObject.pattern
    766 
    767    The pattern string from which the RE object was compiled.
     750      If you want to locate a match anywhere in *string*, use
     751      :meth:`~RegexObject.search` instead (see also :ref:`search-vs-match`).
     752
     753
     754   .. method:: RegexObject.split(string, maxsplit=0)
     755
     756      Identical to the :func:`split` function, using the compiled pattern.
     757
     758
     759   .. method:: RegexObject.findall(string[, pos[, endpos]])
     760
     761      Similar to the :func:`findall` function, using the compiled pattern, but
     762      also accepts optional *pos* and *endpos* parameters that limit the search
     763      region like for :meth:`match`.
     764
     765
     766   .. method:: RegexObject.finditer(string[, pos[, endpos]])
     767
     768      Similar to the :func:`finditer` function, using the compiled pattern, but
     769      also accepts optional *pos* and *endpos* parameters that limit the search
     770      region like for :meth:`match`.
     771
     772
     773   .. method:: RegexObject.sub(repl, string, count=0)
     774
     775      Identical to the :func:`sub` function, using the compiled pattern.
     776
     777
     778   .. method:: RegexObject.subn(repl, string, count=0)
     779
     780      Identical to the :func:`subn` function, using the compiled pattern.
     781
     782
     783   .. attribute:: RegexObject.flags
     784
     785      The regex matching flags.  This is a combination of the flags given to
     786      :func:`.compile` and any ``(?...)`` inline flags in the pattern.
     787
     788
     789   .. attribute:: RegexObject.groups
     790
     791      The number of capturing groups in the pattern.
     792
     793
     794   .. attribute:: RegexObject.groupindex
     795
     796      A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
     797      numbers.  The dictionary is empty if no symbolic groups were used in the
     798      pattern.
     799
     800
     801   .. attribute:: RegexObject.pattern
     802
     803      The pattern string from which the RE object was compiled.
    768804
    769805
     
    773809-------------
    774810
    775 Match objects always have a boolean value of :const:`True`, so that you can test
    776 whether e.g. :func:`match` resulted in a match with a simple if statement.  They
    777 support the following methods and attributes:
    778 
    779 
    780 .. method:: MatchObject.expand(template)
    781 
    782    Return the string obtained by doing backslash substitution on the template
    783    string *template*, as done by the :meth:`~RegexObject.sub` method.  Escapes
    784    such as ``\n`` are converted to the appropriate characters, and numeric
    785    backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``,
    786    ``\g<name>``) are replaced by the contents of the corresponding group.
    787 
    788 
    789 .. method:: MatchObject.group([group1, ...])
    790 
    791    Returns one or more subgroups of the match.  If there is a single argument, the
    792    result is a single string; if there are multiple arguments, the result is a
    793    tuple with one item per argument. Without arguments, *group1* defaults to zero
    794    (the whole match is returned). If a *groupN* argument is zero, the corresponding
    795    return value is the entire matching string; if it is in the inclusive range
    796    [1..99], it is the string matching the corresponding parenthesized group.  If a
    797    group number is negative or larger than the number of groups defined in the
    798    pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
    799    part of the pattern that did not match, the corresponding result is ``None``.
    800    If a group is contained in a part of the pattern that matched multiple times,
    801    the last match is returned.
    802 
    803       >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
    804       >>> m.group(0)       # The entire match
    805       'Isaac Newton'
    806       >>> m.group(1)       # The first parenthesized subgroup.
    807       'Isaac'
    808       >>> m.group(2)       # The second parenthesized subgroup.
    809       'Newton'
    810       >>> m.group(1, 2)    # Multiple arguments give us a tuple.
    811       ('Isaac', 'Newton')
    812 
    813    If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN*
    814    arguments may also be strings identifying groups by their group name.  If a
    815    string argument is not used as a group name in the pattern, an :exc:`IndexError`
    816    exception is raised.
    817 
    818    A moderately complicated example:
    819 
    820       >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
    821       >>> m.group('first_name')
    822       'Malcolm'
    823       >>> m.group('last_name')
    824       'Reynolds'
    825 
    826    Named groups can also be referred to by their index:
    827 
    828       >>> m.group(1)
    829       'Malcolm'
    830       >>> m.group(2)
    831       'Reynolds'
    832 
    833    If a group matches multiple times, only the last match is accessible:
    834 
    835       >>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
    836       >>> m.group(1)                        # Returns only the last match.
    837       'c3'
    838 
    839 
    840 .. method:: MatchObject.groups([default])
    841 
    842    Return a tuple containing all the subgroups of the match, from 1 up to however
    843    many groups are in the pattern.  The *default* argument is used for groups that
    844    did not participate in the match; it defaults to ``None``.  (Incompatibility
    845    note: in the original Python 1.5 release, if the tuple was one element long, a
    846    string would be returned instead.  In later versions (from 1.5.1 on), a
    847    singleton tuple is returned in such cases.)
    848 
    849    For example:
    850 
    851       >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
    852       >>> m.groups()
    853       ('24', '1632')
    854 
    855    If we make the decimal place and everything after it optional, not all groups
    856    might participate in the match.  These groups will default to ``None`` unless
    857    the *default* argument is given:
    858 
    859       >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
    860       >>> m.groups()      # Second group defaults to None.
    861       ('24', None)
    862       >>> m.groups('0')   # Now, the second group defaults to '0'.
    863       ('24', '0')
    864 
    865 
    866 .. method:: MatchObject.groupdict([default])
    867 
    868    Return a dictionary containing all the *named* subgroups of the match, keyed by
    869    the subgroup name.  The *default* argument is used for groups that did not
    870    participate in the match; it defaults to ``None``.  For example:
    871 
    872       >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
    873       >>> m.groupdict()
    874       {'first_name': 'Malcolm', 'last_name': 'Reynolds'}
    875 
    876 
    877 .. method:: MatchObject.start([group])
    878             MatchObject.end([group])
    879 
    880    Return the indices of the start and end of the substring matched by *group*;
    881    *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
    882    *group* exists but did not contribute to the match.  For a match object *m*, and
    883    a group *g* that did contribute to the match, the substring matched by group *g*
    884    (equivalent to ``m.group(g)``) is ::
    885 
    886       m.string[m.start(g):m.end(g)]
    887 
    888    Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
    889    null string.  For example, after ``m = re.search('b(c?)', 'cba')``,
    890    ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
    891    2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
    892 
    893    An example that will remove *remove_this* from email addresses:
    894 
    895       >>> email = "tony@tiremove_thisger.net"
    896       >>> m = re.search("remove_this", email)
    897       >>> email[:m.start()] + email[m.end():]
    898       'tony@tiger.net'
    899 
    900 
    901 .. method:: MatchObject.span([group])
    902 
    903    For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group),
    904    m.end(group))``. Note that if *group* did not contribute to the match, this is
    905    ``(-1, -1)``.  *group* defaults to zero, the entire match.
    906 
    907 
    908 .. attribute:: MatchObject.pos
    909 
    910    The value of *pos* which was passed to the :meth:`~RegexObject.search` or
    911    :meth:`~RegexObject.match` method of the :class:`RegexObject`.  This is the
    912    index into the string at which the RE engine started looking for a match.
    913 
    914 
    915 .. attribute:: MatchObject.endpos
    916 
    917    The value of *endpos* which was passed to the :meth:`~RegexObject.search` or
    918    :meth:`~RegexObject.match` method of the :class:`RegexObject`.  This is the
    919    index into the string beyond which the RE engine will not go.
    920 
    921 
    922 .. attribute:: MatchObject.lastindex
    923 
    924    The integer index of the last matched capturing group, or ``None`` if no group
    925    was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
    926    ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
    927    the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
    928    string.
    929 
    930 
    931 .. attribute:: MatchObject.lastgroup
    932 
    933    The name of the last matched capturing group, or ``None`` if the group didn't
    934    have a name, or if no group was matched at all.
    935 
    936 
    937 .. attribute:: MatchObject.re
    938 
    939    The regular expression object whose :meth:`~RegexObject.match` or
    940    :meth:`~RegexObject.search` method produced this :class:`MatchObject`
    941    instance.
    942 
    943 
    944 .. attribute:: MatchObject.string
    945 
    946    The string passed to :meth:`~RegexObject.match` or
    947    :meth:`~RegexObject.search`.
     811.. class:: MatchObject
     812
     813   Match objects always have a boolean value of ``True``.
     814   Since :meth:`~regex.match` and :meth:`~regex.search` return ``None``
     815   when there is no match, you can test whether there was a match with a simple
     816   ``if`` statement::
     817
     818      match = re.search(pattern, string)
     819      if match:
     820          process(match)
     821
     822   Match objects support the following methods and attributes:
     823
     824
     825   .. method:: MatchObject.expand(template)
     826
     827      Return the string obtained by doing backslash substitution on the template
     828      string *template*, as done by the :meth:`~RegexObject.sub` method.  Escapes
     829      such as ``\n`` are converted to the appropriate characters, and numeric
     830      backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``,
     831      ``\g<name>``) are replaced by the contents of the corresponding group.
     832
     833
     834   .. method:: MatchObject.group([group1, ...])
     835
     836      Returns one or more subgroups of the match.  If there is a single argument, the
     837      result is a single string; if there are multiple arguments, the result is a
     838      tuple with one item per argument. Without arguments, *group1* defaults to zero
     839      (the whole match is returned). If a *groupN* argument is zero, the corresponding
     840      return value is the entire matching string; if it is in the inclusive range
     841      [1..99], it is the string matching the corresponding parenthesized group.  If a
     842      group number is negative or larger than the number of groups defined in the
     843      pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
     844      part of the pattern that did not match, the corresponding result is ``None``.
     845      If a group is contained in a part of the pattern that matched multiple times,
     846      the last match is returned.
     847
     848         >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
     849         >>> m.group(0)       # The entire match
     850         'Isaac Newton'
     851         >>> m.group(1)       # The first parenthesized subgroup.
     852         'Isaac'
     853         >>> m.group(2)       # The second parenthesized subgroup.
     854         'Newton'
     855         >>> m.group(1, 2)    # Multiple arguments give us a tuple.
     856         ('Isaac', 'Newton')
     857
     858      If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN*
     859      arguments may also be strings identifying groups by their group name.  If a
     860      string argument is not used as a group name in the pattern, an :exc:`IndexError`
     861      exception is raised.
     862
     863      A moderately complicated example:
     864
     865         >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
     866         >>> m.group('first_name')
     867         'Malcolm'
     868         >>> m.group('last_name')
     869         'Reynolds'
     870
     871      Named groups can also be referred to by their index:
     872
     873         >>> m.group(1)
     874         'Malcolm'
     875         >>> m.group(2)
     876         'Reynolds'
     877
     878      If a group matches multiple times, only the last match is accessible:
     879
     880         >>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
     881         >>> m.group(1)                        # Returns only the last match.
     882         'c3'
     883
     884
     885   .. method:: MatchObject.groups([default])
     886
     887      Return a tuple containing all the subgroups of the match, from 1 up to however
     888      many groups are in the pattern.  The *default* argument is used for groups that
     889      did not participate in the match; it defaults to ``None``.  (Incompatibility
     890      note: in the original Python 1.5 release, if the tuple was one element long, a
     891      string would be returned instead.  In later versions (from 1.5.1 on), a
     892      singleton tuple is returned in such cases.)
     893
     894      For example:
     895
     896         >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
     897         >>> m.groups()
     898         ('24', '1632')
     899
     900      If we make the decimal place and everything after it optional, not all groups
     901      might participate in the match.  These groups will default to ``None`` unless
     902      the *default* argument is given:
     903
     904         >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
     905         >>> m.groups()      # Second group defaults to None.
     906         ('24', None)
     907         >>> m.groups('0')   # Now, the second group defaults to '0'.
     908         ('24', '0')
     909
     910
     911   .. method:: MatchObject.groupdict([default])
     912
     913      Return a dictionary containing all the *named* subgroups of the match, keyed by
     914      the subgroup name.  The *default* argument is used for groups that did not
     915      participate in the match; it defaults to ``None``.  For example:
     916
     917         >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
     918         >>> m.groupdict()
     919         {'first_name': 'Malcolm', 'last_name': 'Reynolds'}
     920
     921
     922   .. method:: MatchObject.start([group])
     923               MatchObject.end([group])
     924
     925      Return the indices of the start and end of the substring matched by *group*;
     926      *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
     927      *group* exists but did not contribute to the match.  For a match object *m*, and
     928      a group *g* that did contribute to the match, the substring matched by group *g*
     929      (equivalent to ``m.group(g)``) is ::
     930
     931         m.string[m.start(g):m.end(g)]
     932
     933      Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
     934      null string.  For example, after ``m = re.search('b(c?)', 'cba')``,
     935      ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
     936      2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
     937
     938      An example that will remove *remove_this* from email addresses:
     939
     940         >>> email = "tony@tiremove_thisger.net"
     941         >>> m = re.search("remove_this", email)
     942         >>> email[:m.start()] + email[m.end():]
     943         'tony@tiger.net'
     944
     945
     946   .. method:: MatchObject.span([group])
     947
     948      For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group),
     949      m.end(group))``. Note that if *group* did not contribute to the match, this is
     950      ``(-1, -1)``.  *group* defaults to zero, the entire match.
     951
     952
     953   .. attribute:: MatchObject.pos
     954
     955      The value of *pos* which was passed to the :meth:`~RegexObject.search` or
     956      :meth:`~RegexObject.match` method of the :class:`RegexObject`.  This is the
     957      index into the string at which the RE engine started looking for a match.
     958
     959
     960   .. attribute:: MatchObject.endpos
     961
     962      The value of *endpos* which was passed to the :meth:`~RegexObject.search` or
     963      :meth:`~RegexObject.match` method of the :class:`RegexObject`.  This is the
     964      index into the string beyond which the RE engine will not go.
     965
     966
     967   .. attribute:: MatchObject.lastindex
     968
     969      The integer index of the last matched capturing group, or ``None`` if no group
     970      was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
     971      ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
     972      the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
     973      string.
     974
     975
     976   .. attribute:: MatchObject.lastgroup
     977
     978      The name of the last matched capturing group, or ``None`` if the group didn't
     979      have a name, or if no group was matched at all.
     980
     981
     982   .. attribute:: MatchObject.re
     983
     984      The regular expression object whose :meth:`~RegexObject.match` or
     985      :meth:`~RegexObject.search` method produced this :class:`MatchObject`
     986      instance.
     987
     988
     989   .. attribute:: MatchObject.string
     990
     991      The string passed to :meth:`~RegexObject.match` or
     992      :meth:`~RegexObject.search`.
    948993
    949994
     
    9671012Suppose you are writing a poker program where a player's hand is represented as
    9681013a 5-character string with each character representing a card, "a" for ace, "k"
    969 for king, "q" for queen, j for jack, "0" for 10, and "1" through "9"
     1014for king, "q" for queen, "j" for jack, "t" for 10, and "2" through "9"
    9701015representing the card with that value.
    9711016
    9721017To see if a given string is a valid hand, one could do the following:
    9731018
    974    >>> valid = re.compile(r"[0-9akqj]{5}$")
    975    >>> displaymatch(valid.match("ak05q"))  # Valid.
    976    "<Match: 'ak05q', groups=()>"
    977    >>> displaymatch(valid.match("ak05e"))  # Invalid.
    978    >>> displaymatch(valid.match("ak0"))    # Invalid.
     1019   >>> valid = re.compile(r"^[a2-9tjqk]{5}$")
     1020   >>> displaymatch(valid.match("akt5q"))  # Valid.
     1021   "<Match: 'akt5q', groups=()>"
     1022   >>> displaymatch(valid.match("akt5e"))  # Invalid.
     1023   >>> displaymatch(valid.match("akt"))    # Invalid.
    9791024   >>> displaymatch(valid.match("727ak"))  # Valid.
    9801025   "<Match: '727ak', groups=()>"
     
    10151060.. index:: single: scanf()
    10161061
    1017 Python does not currently have an equivalent to :cfunc:`scanf`.  Regular
     1062Python does not currently have an equivalent to :c:func:`scanf`.  Regular
    10181063expressions are generally more powerful, though also more verbose, than
    1019 :cfunc:`scanf` format strings.  The table below offers some more-or-less
    1020 equivalent mappings between :cfunc:`scanf` format tokens and regular
     1064:c:func:`scanf` format strings.  The table below offers some more-or-less
     1065equivalent mappings between :c:func:`scanf` format tokens and regular
    10211066expressions.
    10221067
    10231068+--------------------------------+---------------------------------------------+
    1024 | :cfunc:`scanf` Token           | Regular Expression                          |
     1069| :c:func:`scanf` Token          | Regular Expression                          |
    10251070+================================+=============================================+
    10261071| ``%c``                         | ``.``                                       |
     
    10341079| ``%i``                         | ``[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)``     |
    10351080+--------------------------------+---------------------------------------------+
    1036 | ``%o``                         | ``0[0-7]*``                                 |
     1081| ``%o``                         | ``[-+]?[0-7]+``                             |
    10371082+--------------------------------+---------------------------------------------+
    10381083| ``%s``                         | ``\S+``                                     |
     
    10401085| ``%u``                         | ``\d+``                                     |
    10411086+--------------------------------+---------------------------------------------+
    1042 | ``%x``, ``%X``                 | ``0[xX][\dA-Fa-f]+``                        |
     1087| ``%x``, ``%X``                 | ``[-+]?(0[xX])?[\dA-Fa-f]+``                |
    10431088+--------------------------------+---------------------------------------------+
    10441089
     
    10471092   /usr/sbin/sendmail - 0 errors, 4 warnings
    10481093
    1049 you would use a :cfunc:`scanf` format like ::
     1094you would use a :c:func:`scanf` format like ::
    10501095
    10511096   %s - %d errors, %d warnings
     
    10561101
    10571102
    1058 Avoiding recursion
    1059 ^^^^^^^^^^^^^^^^^^
    1060 
    1061 If you create regular expressions that require the engine to perform a lot of
    1062 recursion, you may encounter a :exc:`RuntimeError` exception with the message
    1063 ``maximum recursion limit`` exceeded. For example, ::
    1064 
    1065    >>> s = 'Begin ' + 1000*'a very long string ' + 'end'
    1066    >>> re.match('Begin (\w| )*? end', s).end()
    1067    Traceback (most recent call last):
    1068      File "<stdin>", line 1, in ?
    1069      File "/usr/local/lib/python2.5/re.py", line 132, in match
    1070        return _compile(pattern, flags).match(string)
    1071    RuntimeError: maximum recursion limit exceeded
    1072 
    1073 You can often restructure your regular expression to avoid recursion.
    1074 
    1075 Starting with Python 2.3, simple uses of the ``*?`` pattern are special-cased to
    1076 avoid recursion.  Thus, the above regular expression can avoid recursion by
    1077 being recast as ``Begin [a-zA-Z0-9_ ]*?end``.  As a further benefit, such
    1078 regular expressions will run faster than their recursive equivalents.
    1079 
     1103.. _search-vs-match:
    10801104
    10811105search() vs. match()
    10821106^^^^^^^^^^^^^^^^^^^^
    10831107
    1084 In a nutshell, :func:`match` only attempts to match a pattern at the beginning
    1085 of a string where :func:`search` will match a pattern anywhere in a string.
    1086 For example:
    1087 
    1088    >>> re.match("o", "dog")  # No match as "o" is not the first letter of "dog".
    1089    >>> re.search("o", "dog") # Match as search() looks everywhere in the string.
     1108.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
     1109
     1110Python offers two different primitive operations based on regular expressions:
     1111:func:`re.match` checks for a match only at the beginning of the string, while
     1112:func:`re.search` checks for a match anywhere in the string (this is what Perl
     1113does by default).
     1114
     1115For example::
     1116
     1117   >>> re.match("c", "abcdef")  # No match
     1118   >>> re.search("c", "abcdef") # Match
    10901119   <_sre.SRE_Match object at ...>
    10911120
    1092 .. note::
    1093 
    1094    The following applies only to regular expression objects like those created
    1095    with ``re.compile("pattern")``, not the primitives ``re.match(pattern,
    1096    string)`` or ``re.search(pattern, string)``.
    1097 
    1098 :func:`match` has an optional second parameter that gives an index in the string
    1099 where the search is to start::
    1100 
    1101    >>> pattern = re.compile("o")
    1102    >>> pattern.match("dog")      # No match as "o" is not at the start of "dog."
    1103 
    1104    # Equivalent to the above expression as 0 is the default starting index:
    1105    >>> pattern.match("dog", 0)
    1106 
    1107    # Match as "o" is the 2nd character of "dog" (index 0 is the first):
    1108    >>> pattern.match("dog", 1)
     1121Regular expressions beginning with ``'^'`` can be used with :func:`search` to
     1122restrict the match at the beginning of the string::
     1123
     1124   >>> re.match("c", "abcdef")  # No match
     1125   >>> re.search("^c", "abcdef") # No match
     1126   >>> re.search("^a", "abcdef")  # Match
    11091127   <_sre.SRE_Match object at ...>
    1110    >>> pattern.match("dog", 2)   # No match as "o" is not the 3rd character of "dog."
     1128
     1129Note however that in :const:`MULTILINE` mode :func:`match` only matches at the
     1130beginning of the string, whereas using :func:`search` with a regular expression
     1131beginning with ``'^'`` will match at the beginning of each line.
     1132
     1133   >>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
     1134   >>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
     1135   <_sre.SRE_Match object at ...>
    11111136
    11121137
     
    11221147triple-quoted string syntax:
    11231148
    1124    >>> input = """Ross McFluff: 834.345.1254 155 Elm Street
     1149   >>> text = """Ross McFluff: 834.345.1254 155 Elm Street
    11251150   ...
    11261151   ... Ronald Heathmore: 892.345.3428 436 Finley Avenue
     
    11361161   :options: +NORMALIZE_WHITESPACE
    11371162
    1138    >>> entries = re.split("\n+", input)
     1163   >>> entries = re.split("\n+", text)
    11391164   >>> entries
    11401165   ['Ross McFluff: 834.345.1254 155 Elm Street',
     
    11831208   ...   return m.group(1) + "".join(inner_word) + m.group(3)
    11841209   >>> text = "Professor Abdolmalek, please report your absences promptly."
    1185    >>> re.sub("(\w)(\w+)(\w)", repl, text)
     1210   >>> re.sub(r"(\w)(\w+)(\w)", repl, text)
    11861211   'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'
    1187    >>> re.sub("(\w)(\w+)(\w)", repl, text)
     1212   >>> re.sub(r"(\w)(\w+)(\w)", repl, text)
    11881213   'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'
    11891214
Note: See TracChangeset for help on using the changeset viewer.