Changeset 391 for python/trunk/Doc/library/re.rst
- Timestamp:
- Mar 19, 2014, 11:31:01 PM (11 years ago)
- Location:
- python/trunk
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
python/trunk
-
Property svn:mergeinfo
set to
/python/vendor/Python-2.7.6 merged eligible /python/vendor/current merged eligible
-
Property svn:mergeinfo
set to
-
python/trunk/Doc/library/re.rst
r2 r391 157 157 158 158 ``[]`` 159 Used to indicate a set of characters. Characters can be listed individually, or 160 a range of characters can be indicated by giving two characters and separating 161 them by a ``'-'``. Special characters are not active inside sets. For example, 162 ``[akm$]`` will match any of the characters ``'a'``, ``'k'``, 163 ``'m'``, or ``'$'``; ``[a-z]`` will match any lowercase letter, and 164 ``[a-zA-Z0-9]`` matches any letter or digit. Character classes such 165 as ``\w`` or ``\S`` (defined below) are also acceptable inside a 166 range, although the characters they match depends on whether :const:`LOCALE` 167 or :const:`UNICODE` mode is in force. If you want to include a 168 ``']'`` or a ``'-'`` inside a set, precede it with a backslash, or 169 place it as the first character. The pattern ``[]]`` will match 170 ``']'``, for example. 171 172 You can match the characters not within a range by :dfn:`complementing` the set. 173 This is indicated by including a ``'^'`` as the first character of the set; 174 ``'^'`` elsewhere will simply match the ``'^'`` character. For example, 175 ``[^5]`` will match any character except ``'5'``, and ``[^^]`` will match any 176 character except ``'^'``. 177 178 Note that inside ``[]`` the special forms and special characters lose 179 their meanings and only the syntaxes described here are valid. For 180 example, ``+``, ``*``, ``(``, ``)``, and so on are treated as 181 literals inside ``[]``, and backreferences cannot be used inside 182 ``[]``. 159 Used to indicate a set of characters. In a set: 160 161 * Characters can be listed individually, e.g. ``[amk]`` will match ``'a'``, 162 ``'m'``, or ``'k'``. 163 164 * Ranges of characters can be indicated by giving two characters and separating 165 them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter, 166 ``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and 167 ``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g. 168 ``[a\-z]``) or if it's placed as the first or last character (e.g. ``[a-]``), 169 it will match a literal ``'-'``. 170 171 * Special characters lose their special meaning inside sets. For example, 172 ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, 173 ``'*'``, or ``')'``. 174 175 * Character classes such as ``\w`` or ``\S`` (defined below) are also accepted 176 inside a set, although the characters they match depends on whether 177 :const:`LOCALE` or :const:`UNICODE` mode is in force. 178 179 * Characters that are not within a range can be matched by :dfn:`complementing` 180 the set. If the first character of the set is ``'^'``, all the characters 181 that are *not* in the set will be matched. For example, ``[^5]`` will match 182 any character except ``'5'``, and ``[^^]`` will match any character except 183 ``'^'``. ``^`` has no special meaning if it's not the first character in 184 the set. 185 186 * To match a literal ``']'`` inside a set, precede it with a backslash, or 187 place it at the beginning of the set. For example, both ``[()[\]{}]`` and 188 ``[]()[{}]`` will both match a parenthesis. 183 189 184 190 ``'|'`` … … 225 231 226 232 ``(?:...)`` 227 A non- grouping version of regular parentheses.Matches whatever regular233 A non-capturing version of regular parentheses. Matches whatever regular 228 234 expression is inside the parentheses, but the substring matched by the group 229 235 *cannot* be retrieved after performing a match or referenced later in the … … 232 238 ``(?P<name>...)`` 233 239 Similar to regular parentheses, but the substring matched by the group is 234 accessible within the rest of the regular expression via the symbolic group 235 name *name*. Group names must be valid Python identifiers, and each group 236 name must be defined only once within a regular expression. A symbolic group 237 is also a numbered group, just as if the group were not named. So the group 238 named ``id`` in the example below can also be referenced as the numbered group 239 ``1``. 240 241 For example, if the pattern is ``(?P<id>[a-zA-Z_]\w*)``, the group can be 242 referenced by its name in arguments to methods of match objects, such as 243 ``m.group('id')`` or ``m.end('id')``, and also by name in the regular 244 expression itself (using ``(?P=id)``) and replacement text given to 245 ``.sub()`` (using ``\g<id>``). 240 accessible via the symbolic group name *name*. Group names must be valid 241 Python identifiers, and each group name must be defined only once within a 242 regular expression. A symbolic group is also a numbered group, just as if 243 the group were not named. 244 245 Named groups can be referenced in three contexts. If the pattern is 246 ``(?P<quote>['"]).*?(?P=quote)`` (i.e. matching a string quoted with either 247 single or double quotes): 248 249 +---------------------------------------+----------------------------------+ 250 | Context of reference to group "quote" | Ways to reference it | 251 +=======================================+==================================+ 252 | in the same pattern itself | * ``(?P=quote)`` (as shown) | 253 | | * ``\1`` | 254 +---------------------------------------+----------------------------------+ 255 | when processing match object ``m`` | * ``m.group('quote')`` | 256 | | * ``m.end('quote')`` (etc.) | 257 +---------------------------------------+----------------------------------+ 258 | in a string passed to the ``repl`` | * ``\g<quote>`` | 259 | argument of ``re.sub()`` | * ``\g<1>`` | 260 | | * ``\1`` | 261 +---------------------------------------+----------------------------------+ 246 262 247 263 ``(?P=name)`` 248 Matches whatever text was matched by the earlier group named *name*. 264 A backreference to a named group; it matches whatever text was matched by the 265 earlier group named *name*. 249 266 250 267 ``(?#...)`` … … 268 285 The contained pattern must only match strings of some fixed length, meaning that 269 286 ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that 270 patterns which start with positive lookbehind assertions will n evermatch at the287 patterns which start with positive lookbehind assertions will not match at the 271 288 beginning of the string being searched; you will most likely want to use the 272 289 :func:`search` function rather than the :func:`match` function: … … 306 323 Matches the contents of the group of the same number. Groups are numbered 307 324 starting from 1. For example, ``(.+) \1`` matches ``'the the'`` or ``'55 55'``, 308 but not ``'the end'`` (note the space after the group). This special sequence325 but not ``'thethe'`` (note the space after the group). This special sequence 309 326 can only be used to match one of the first 99 groups. If the first digit of 310 327 *number* is 0, or *number* is 3 octal digits long, it will not be interpreted as … … 320 337 defined as a sequence of alphanumeric or underscore characters, so the end of a 321 338 word is indicated by whitespace or a non-alphanumeric, non-underscore character. 322 Note that ``\b`` is defined as the boundary between ``\w`` and ``\ W``, so the 323 precise set of characters deemed to be alphanumeric depends on the values of the 324 ``UNICODE`` and ``LOCALE`` flags. Inside a character range, ``\b`` represents 325 the backspace character, for compatibility with Python's string literals. 339 Note that formally, ``\b`` is defined as the boundary between a ``\w`` and 340 a ``\W`` character (or vice versa), or between ``\w`` and the beginning/end 341 of the string, so the precise set of characters deemed to be alphanumeric 342 depends on the values of the ``UNICODE`` and ``LOCALE`` flags. 343 For example, ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``, 344 ``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``. 345 Inside a character range, ``\b`` represents the backspace character, for 346 compatibility with Python's string literals. 326 347 327 348 ``\B`` 328 349 Matches the empty string, but only when it is *not* at the beginning or end of a 329 word. This is just the opposite of ``\b``, so is also subject to the settings 350 word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``, ``'py2'``, 351 but not ``'py'``, ``'py.'``, or ``'py!'``. 352 ``\B`` is just the opposite of ``\b``, so is also subject to the settings 330 353 of ``LOCALE`` and ``UNICODE``. 331 354 … … 333 356 When the :const:`UNICODE` flag is not specified, matches any decimal digit; this 334 357 is equivalent to the set ``[0-9]``. With :const:`UNICODE`, it will match 335 whatever is classified as a digit in the Unicode character properties database. 358 whatever is classified as a decimal digit in the Unicode character properties 359 database. 336 360 337 361 ``\D`` … … 342 366 343 367 ``\s`` 344 When the :const:` LOCALE` and :const:`UNICODE` flags are not specified, matches345 any whitespace character; this is equivalent to the set ``[ \t\n\r\f\v]``. With346 :const:`LOCALE` , it will match this set plus whatever characters are defined as347 space for the current locale. If :const:`UNICODE` is set, this will match the348 characters ``[ \t\n\r\f\v]`` plus whatever is classified as space in the Unicode349 character propertiesdatabase.368 When the :const:`UNICODE` flag is not specified, it matches any whitespace 369 character, this is equivalent to the set ``[ \t\n\r\f\v]``. The 370 :const:`LOCALE` flag has no extra effect on matching of the space. 371 If :const:`UNICODE` is set, this will match the characters ``[ \t\n\r\f\v]`` 372 plus whatever is classified as space in the Unicode character properties 373 database. 350 374 351 375 ``\S`` 352 When the :const:` LOCALE` and :const:`UNICODE` flags are not specified, matches353 any non-whitespace character; this is equivalent to the set ``[^ \t\n\r\f\v]``354 With :const:`LOCALE`, it will match any character not in this set, and not355 defined as space in the current locale. If :const:`UNICODE` is set, this will356 match anything other than ``[ \t\n\r\f\v]`` and characters marked as space in357 the Unicode character properties database. 376 When the :const:`UNICODE` flags is not specified, matches any non-whitespace 377 character; this is equivalent to the set ``[^ \t\n\r\f\v]`` The 378 :const:`LOCALE` flag has no extra effect on non-whitespace match. If 379 :const:`UNICODE` is set, then any character not marked as space in the 380 Unicode character properties database is matched. 381 358 382 359 383 ``\w`` … … 370 394 With :const:`LOCALE`, it will match any character not in the set ``[0-9_]``, and 371 395 not defined as alphanumeric for the current locale. If :const:`UNICODE` is set, 372 this will match anything other than ``[0-9_]`` and characters marked as373 alphanumeric in the Unicode character properties database.396 this will match anything other than ``[0-9_]`` plus characters classied as 397 not alphanumeric in the Unicode character properties database. 374 398 375 399 ``\Z`` 376 400 Matches only at the end of the string. 401 402 If both :const:`LOCALE` and :const:`UNICODE` flags are included for a 403 particular sequence, then :const:`LOCALE` flag takes effect first followed by 404 the :const:`UNICODE`. 377 405 378 406 Most of the standard escapes supported by Python string literals are also … … 382 410 \r \t \v \x 383 411 \\ 412 413 (Note that ``\b`` is used to represent word boundaries, and means "backspace" 414 only inside character classes.) 384 415 385 416 Octal escapes are included in a limited form: If the first digit is a 0, or if … … 389 420 390 421 391 .. _matching-searching:392 393 Matching vs Searching394 ---------------------395 396 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>397 398 399 Python offers two different primitive operations based on regular expressions:400 **match** checks for a match only at the beginning of the string, while401 **search** checks for a match anywhere in the string (this is what Perl does402 by default).403 404 Note that match may differ from search even when using a regular expression405 beginning with ``'^'``: ``'^'`` matches only at the start of the string, or in406 :const:`MULTILINE` mode also immediately following a newline. The "match"407 operation succeeds only if the pattern matches at the start of the string408 regardless of mode, or at the starting position given by the optional *pos*409 argument regardless of whether a newline precedes it.410 411 >>> re.match("c", "abcdef") # No match412 >>> re.search("c", "abcdef") # Match413 <_sre.SRE_Match object at ...>414 415 416 422 .. _contents-of-module-re: 417 423 … … 425 431 426 432 427 .. function:: compile(pattern [, flags])433 .. function:: compile(pattern, flags=0) 428 434 429 435 Compile a regular expression pattern into a regular expression object, which … … 454 460 programs that use only a few regular expressions at a time needn't worry 455 461 about compiling regular expressions. 462 463 464 .. data:: DEBUG 465 466 Display debug information about compiled expression. 456 467 457 468 … … 515 526 516 527 517 .. function:: search(pattern, string [, flags])528 .. function:: search(pattern, string, flags=0) 518 529 519 530 Scan through *string* looking for a location where the regular expression … … 524 535 525 536 526 .. function:: match(pattern, string [, flags])537 .. function:: match(pattern, string, flags=0) 527 538 528 539 If zero or more characters at the beginning of *string* match the regular … … 531 542 different from a zero-length match. 532 543 533 .. note:: 534 535 If you want to locate a match anywhere in *string*, use :func:`search` 536 instead. 537 538 539 .. function:: split(pattern, string[, maxsplit=0]) 544 Note that even in :const:`MULTILINE` mode, :func:`re.match` will only match 545 at the beginning of the string and not at the beginning of each line. 546 547 If you want to locate a match anywhere in *string*, use :func:`search` 548 instead (see also :ref:`search-vs-match`). 549 550 551 .. function:: split(pattern, string, maxsplit=0, flags=0) 540 552 541 553 Split *string* by the occurrences of *pattern*. If capturing parentheses are … … 552 564 >>> re.split('\W+', 'Words, words, words.', 1) 553 565 ['Words', 'words, words.'] 566 >>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE) 567 ['0', '3', '9'] 554 568 555 569 If there are capturing groups in the separator and it matches at the start of … … 572 586 ['foo\n\nbar\n'] 573 587 574 575 .. function:: findall(pattern, string[, flags]) 588 .. versionchanged:: 2.7 589 Added the optional flags argument. 590 591 592 .. function:: findall(pattern, string, flags=0) 576 593 577 594 Return all non-overlapping matches of *pattern* in *string*, as a list of … … 588 605 589 606 590 .. function:: finditer(pattern, string [, flags])607 .. function:: finditer(pattern, string, flags=0) 591 608 592 609 Return an :term:`iterator` yielding :class:`MatchObject` instances over all … … 602 619 603 620 604 .. function:: sub(pattern, repl, string [, count])621 .. function:: sub(pattern, repl, string, count=0, flags=0) 605 622 606 623 Return the string obtained by replacing the leftmost non-overlapping occurrences … … 608 625 *string* is returned unchanged. *repl* can be a string or a function; if it is 609 626 a string, any backslash escapes in it are processed. That is, ``\n`` is 610 converted to a single newline character, ``\r`` is converted to a linefeed, and627 converted to a single newline character, ``\r`` is converted to a carriage return, and 611 628 so forth. Unknown escapes such as ``\j`` are left alone. Backreferences, such 612 629 as ``\6``, are replaced with the substring matched by group 6 in the pattern. … … 627 644 >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files') 628 645 'pro--gram files' 629 630 The pattern may be a string or an RE object; if you need to specify regular631 expression flags, you must use a RE object, or use embedded modifiers in a 632 pattern; for example, ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.646 >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE) 647 'Baked Beans & Spam' 648 649 The pattern may be a string or an RE object. 633 650 634 651 The optional argument *count* is the maximum number of pattern occurrences to be … … 638 655 ``'-a-b-c-'``. 639 656 640 In addition to character escapes and backreferences as described above, 657 In string-type *repl* arguments, in addition to the character escapes and 658 backreferences described above, 641 659 ``\g<name>`` will use the substring matched by the group named ``name``, as 642 660 defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding … … 647 665 substring matched by the RE. 648 666 649 650 .. function:: subn(pattern, repl, string[, count]) 667 .. versionchanged:: 2.7 668 Added the optional flags argument. 669 670 671 .. function:: subn(pattern, repl, string, count=0, flags=0) 651 672 652 673 Perform the same operation as :func:`sub`, but return a tuple ``(new_string, 653 674 number_of_subs_made)``. 675 676 .. versionchanged:: 2.7 677 Added the optional flags argument. 654 678 655 679 … … 659 683 want to match an arbitrary literal string that may have regular expression 660 684 metacharacters in it. 685 686 687 .. function:: purge() 688 689 Clear the regular expression cache. 661 690 662 691 … … 674 703 -------------------------- 675 704 676 Compiled regular expression objects support the following methods and 677 attributes: 678 679 680 .. method:: RegexObject.match(string[, pos[, endpos]]) 681 682 If zero or more characters at the beginning of *string* match this regular 683 expression, return a corresponding :class:`MatchObject` instance. Return 684 ``None`` if the string does not match the pattern; note that this is different 685 from a zero-length match. 686 687 .. note:: 688 689 If you want to locate a match anywhere in *string*, use 690 :meth:`~RegexObject.search` instead. 691 692 The optional second parameter *pos* gives an index in the string where the 693 search is to start; it defaults to ``0``. This is not completely equivalent to 694 slicing the string; the ``'^'`` pattern character matches at the real beginning 695 of the string and at positions just after a newline, but not necessarily at the 696 index where the search is to start. 697 698 The optional parameter *endpos* limits how far the string will be searched; it 699 will be as if the string is *endpos* characters long, so only the characters 700 from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less 701 than *pos*, no match will be found, otherwise, if *rx* is a compiled regular 702 expression object, ``rx.match(string, 0, 50)`` is equivalent to 703 ``rx.match(string[:50], 0)``. 705 .. class:: RegexObject 706 707 The :class:`RegexObject` class supports the following methods and attributes: 708 709 .. method:: RegexObject.search(string[, pos[, endpos]]) 710 711 Scan through *string* looking for a location where this regular expression 712 produces a match, and return a corresponding :class:`MatchObject` instance. 713 Return ``None`` if no position in the string matches the pattern; note that this 714 is different from finding a zero-length match at some point in the string. 715 716 The optional second parameter *pos* gives an index in the string where the 717 search is to start; it defaults to ``0``. This is not completely equivalent to 718 slicing the string; the ``'^'`` pattern character matches at the real beginning 719 of the string and at positions just after a newline, but not necessarily at the 720 index where the search is to start. 721 722 The optional parameter *endpos* limits how far the string will be searched; it 723 will be as if the string is *endpos* characters long, so only the characters 724 from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less 725 than *pos*, no match will be found, otherwise, if *rx* is a compiled regular 726 expression object, ``rx.search(string, 0, 50)`` is equivalent to 727 ``rx.search(string[:50], 0)``. 728 729 >>> pattern = re.compile("d") 730 >>> pattern.search("dog") # Match at index 0 731 <_sre.SRE_Match object at ...> 732 >>> pattern.search("dog", 1) # No match; search doesn't include the "d" 733 734 735 .. method:: RegexObject.match(string[, pos[, endpos]]) 736 737 If zero or more characters at the *beginning* of *string* match this regular 738 expression, return a corresponding :class:`MatchObject` instance. Return 739 ``None`` if the string does not match the pattern; note that this is different 740 from a zero-length match. 741 742 The optional *pos* and *endpos* parameters have the same meaning as for the 743 :meth:`~RegexObject.search` method. 704 744 705 745 >>> pattern = re.compile("o") 706 >>> pattern.match("dog") # No match as "o" is not at the start of "dog ."746 >>> pattern.match("dog") # No match as "o" is not at the start of "dog". 707 747 >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". 708 748 <_sre.SRE_Match object at ...> 709 749 710 711 .. method:: RegexObject.search(string[, pos[, endpos]]) 712 713 Scan through *string* looking for a location where this regular expression 714 produces a match, and return a corresponding :class:`MatchObject` instance. 715 Return ``None`` if no position in the string matches the pattern; note that this 716 is different from finding a zero-length match at some point in the string. 717 718 The optional *pos* and *endpos* parameters have the same meaning as for the 719 :meth:`~RegexObject.match` method. 720 721 722 .. method:: RegexObject.split(string[, maxsplit=0]) 723 724 Identical to the :func:`split` function, using the compiled pattern. 725 726 727 .. method:: RegexObject.findall(string[, pos[, endpos]]) 728 729 Identical to the :func:`findall` function, using the compiled pattern. 730 731 732 .. method:: RegexObject.finditer(string[, pos[, endpos]]) 733 734 Identical to the :func:`finditer` function, using the compiled pattern. 735 736 737 .. method:: RegexObject.sub(repl, string[, count=0]) 738 739 Identical to the :func:`sub` function, using the compiled pattern. 740 741 742 .. method:: RegexObject.subn(repl, string[, count=0]) 743 744 Identical to the :func:`subn` function, using the compiled pattern. 745 746 747 .. attribute:: RegexObject.flags 748 749 The flags argument used when the RE object was compiled, or ``0`` if no flags 750 were provided. 751 752 753 .. attribute:: RegexObject.groups 754 755 The number of capturing groups in the pattern. 756 757 758 .. attribute:: RegexObject.groupindex 759 760 A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group 761 numbers. The dictionary is empty if no symbolic groups were used in the 762 pattern. 763 764 765 .. attribute:: RegexObject.pattern 766 767 The pattern string from which the RE object was compiled. 750 If you want to locate a match anywhere in *string*, use 751 :meth:`~RegexObject.search` instead (see also :ref:`search-vs-match`). 752 753 754 .. method:: RegexObject.split(string, maxsplit=0) 755 756 Identical to the :func:`split` function, using the compiled pattern. 757 758 759 .. method:: RegexObject.findall(string[, pos[, endpos]]) 760 761 Similar to the :func:`findall` function, using the compiled pattern, but 762 also accepts optional *pos* and *endpos* parameters that limit the search 763 region like for :meth:`match`. 764 765 766 .. method:: RegexObject.finditer(string[, pos[, endpos]]) 767 768 Similar to the :func:`finditer` function, using the compiled pattern, but 769 also accepts optional *pos* and *endpos* parameters that limit the search 770 region like for :meth:`match`. 771 772 773 .. method:: RegexObject.sub(repl, string, count=0) 774 775 Identical to the :func:`sub` function, using the compiled pattern. 776 777 778 .. method:: RegexObject.subn(repl, string, count=0) 779 780 Identical to the :func:`subn` function, using the compiled pattern. 781 782 783 .. attribute:: RegexObject.flags 784 785 The regex matching flags. This is a combination of the flags given to 786 :func:`.compile` and any ``(?...)`` inline flags in the pattern. 787 788 789 .. attribute:: RegexObject.groups 790 791 The number of capturing groups in the pattern. 792 793 794 .. attribute:: RegexObject.groupindex 795 796 A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group 797 numbers. The dictionary is empty if no symbolic groups were used in the 798 pattern. 799 800 801 .. attribute:: RegexObject.pattern 802 803 The pattern string from which the RE object was compiled. 768 804 769 805 … … 773 809 ------------- 774 810 775 Match objects always have a boolean value of :const:`True`, so that you can test 776 whether e.g. :func:`match` resulted in a match with a simple if statement. They 777 support the following methods and attributes: 778 779 780 .. method:: MatchObject.expand(template) 781 782 Return the string obtained by doing backslash substitution on the template 783 string *template*, as done by the :meth:`~RegexObject.sub` method. Escapes 784 such as ``\n`` are converted to the appropriate characters, and numeric 785 backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``, 786 ``\g<name>``) are replaced by the contents of the corresponding group. 787 788 789 .. method:: MatchObject.group([group1, ...]) 790 791 Returns one or more subgroups of the match. If there is a single argument, the 792 result is a single string; if there are multiple arguments, the result is a 793 tuple with one item per argument. Without arguments, *group1* defaults to zero 794 (the whole match is returned). If a *groupN* argument is zero, the corresponding 795 return value is the entire matching string; if it is in the inclusive range 796 [1..99], it is the string matching the corresponding parenthesized group. If a 797 group number is negative or larger than the number of groups defined in the 798 pattern, an :exc:`IndexError` exception is raised. If a group is contained in a 799 part of the pattern that did not match, the corresponding result is ``None``. 800 If a group is contained in a part of the pattern that matched multiple times, 801 the last match is returned. 802 803 >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") 804 >>> m.group(0) # The entire match 805 'Isaac Newton' 806 >>> m.group(1) # The first parenthesized subgroup. 807 'Isaac' 808 >>> m.group(2) # The second parenthesized subgroup. 809 'Newton' 810 >>> m.group(1, 2) # Multiple arguments give us a tuple. 811 ('Isaac', 'Newton') 812 813 If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN* 814 arguments may also be strings identifying groups by their group name. If a 815 string argument is not used as a group name in the pattern, an :exc:`IndexError` 816 exception is raised. 817 818 A moderately complicated example: 819 820 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") 821 >>> m.group('first_name') 822 'Malcolm' 823 >>> m.group('last_name') 824 'Reynolds' 825 826 Named groups can also be referred to by their index: 827 828 >>> m.group(1) 829 'Malcolm' 830 >>> m.group(2) 831 'Reynolds' 832 833 If a group matches multiple times, only the last match is accessible: 834 835 >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. 836 >>> m.group(1) # Returns only the last match. 837 'c3' 838 839 840 .. method:: MatchObject.groups([default]) 841 842 Return a tuple containing all the subgroups of the match, from 1 up to however 843 many groups are in the pattern. The *default* argument is used for groups that 844 did not participate in the match; it defaults to ``None``. (Incompatibility 845 note: in the original Python 1.5 release, if the tuple was one element long, a 846 string would be returned instead. In later versions (from 1.5.1 on), a 847 singleton tuple is returned in such cases.) 848 849 For example: 850 851 >>> m = re.match(r"(\d+)\.(\d+)", "24.1632") 852 >>> m.groups() 853 ('24', '1632') 854 855 If we make the decimal place and everything after it optional, not all groups 856 might participate in the match. These groups will default to ``None`` unless 857 the *default* argument is given: 858 859 >>> m = re.match(r"(\d+)\.?(\d+)?", "24") 860 >>> m.groups() # Second group defaults to None. 861 ('24', None) 862 >>> m.groups('0') # Now, the second group defaults to '0'. 863 ('24', '0') 864 865 866 .. method:: MatchObject.groupdict([default]) 867 868 Return a dictionary containing all the *named* subgroups of the match, keyed by 869 the subgroup name. The *default* argument is used for groups that did not 870 participate in the match; it defaults to ``None``. For example: 871 872 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") 873 >>> m.groupdict() 874 {'first_name': 'Malcolm', 'last_name': 'Reynolds'} 875 876 877 .. method:: MatchObject.start([group]) 878 MatchObject.end([group]) 879 880 Return the indices of the start and end of the substring matched by *group*; 881 *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if 882 *group* exists but did not contribute to the match. For a match object *m*, and 883 a group *g* that did contribute to the match, the substring matched by group *g* 884 (equivalent to ``m.group(g)``) is :: 885 886 m.string[m.start(g):m.end(g)] 887 888 Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a 889 null string. For example, after ``m = re.search('b(c?)', 'cba')``, 890 ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both 891 2, and ``m.start(2)`` raises an :exc:`IndexError` exception. 892 893 An example that will remove *remove_this* from email addresses: 894 895 >>> email = "tony@tiremove_thisger.net" 896 >>> m = re.search("remove_this", email) 897 >>> email[:m.start()] + email[m.end():] 898 'tony@tiger.net' 899 900 901 .. method:: MatchObject.span([group]) 902 903 For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group), 904 m.end(group))``. Note that if *group* did not contribute to the match, this is 905 ``(-1, -1)``. *group* defaults to zero, the entire match. 906 907 908 .. attribute:: MatchObject.pos 909 910 The value of *pos* which was passed to the :meth:`~RegexObject.search` or 911 :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the 912 index into the string at which the RE engine started looking for a match. 913 914 915 .. attribute:: MatchObject.endpos 916 917 The value of *endpos* which was passed to the :meth:`~RegexObject.search` or 918 :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the 919 index into the string beyond which the RE engine will not go. 920 921 922 .. attribute:: MatchObject.lastindex 923 924 The integer index of the last matched capturing group, or ``None`` if no group 925 was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and 926 ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while 927 the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same 928 string. 929 930 931 .. attribute:: MatchObject.lastgroup 932 933 The name of the last matched capturing group, or ``None`` if the group didn't 934 have a name, or if no group was matched at all. 935 936 937 .. attribute:: MatchObject.re 938 939 The regular expression object whose :meth:`~RegexObject.match` or 940 :meth:`~RegexObject.search` method produced this :class:`MatchObject` 941 instance. 942 943 944 .. attribute:: MatchObject.string 945 946 The string passed to :meth:`~RegexObject.match` or 947 :meth:`~RegexObject.search`. 811 .. class:: MatchObject 812 813 Match objects always have a boolean value of ``True``. 814 Since :meth:`~regex.match` and :meth:`~regex.search` return ``None`` 815 when there is no match, you can test whether there was a match with a simple 816 ``if`` statement:: 817 818 match = re.search(pattern, string) 819 if match: 820 process(match) 821 822 Match objects support the following methods and attributes: 823 824 825 .. method:: MatchObject.expand(template) 826 827 Return the string obtained by doing backslash substitution on the template 828 string *template*, as done by the :meth:`~RegexObject.sub` method. Escapes 829 such as ``\n`` are converted to the appropriate characters, and numeric 830 backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``, 831 ``\g<name>``) are replaced by the contents of the corresponding group. 832 833 834 .. method:: MatchObject.group([group1, ...]) 835 836 Returns one or more subgroups of the match. If there is a single argument, the 837 result is a single string; if there are multiple arguments, the result is a 838 tuple with one item per argument. Without arguments, *group1* defaults to zero 839 (the whole match is returned). If a *groupN* argument is zero, the corresponding 840 return value is the entire matching string; if it is in the inclusive range 841 [1..99], it is the string matching the corresponding parenthesized group. If a 842 group number is negative or larger than the number of groups defined in the 843 pattern, an :exc:`IndexError` exception is raised. If a group is contained in a 844 part of the pattern that did not match, the corresponding result is ``None``. 845 If a group is contained in a part of the pattern that matched multiple times, 846 the last match is returned. 847 848 >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") 849 >>> m.group(0) # The entire match 850 'Isaac Newton' 851 >>> m.group(1) # The first parenthesized subgroup. 852 'Isaac' 853 >>> m.group(2) # The second parenthesized subgroup. 854 'Newton' 855 >>> m.group(1, 2) # Multiple arguments give us a tuple. 856 ('Isaac', 'Newton') 857 858 If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN* 859 arguments may also be strings identifying groups by their group name. If a 860 string argument is not used as a group name in the pattern, an :exc:`IndexError` 861 exception is raised. 862 863 A moderately complicated example: 864 865 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") 866 >>> m.group('first_name') 867 'Malcolm' 868 >>> m.group('last_name') 869 'Reynolds' 870 871 Named groups can also be referred to by their index: 872 873 >>> m.group(1) 874 'Malcolm' 875 >>> m.group(2) 876 'Reynolds' 877 878 If a group matches multiple times, only the last match is accessible: 879 880 >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. 881 >>> m.group(1) # Returns only the last match. 882 'c3' 883 884 885 .. method:: MatchObject.groups([default]) 886 887 Return a tuple containing all the subgroups of the match, from 1 up to however 888 many groups are in the pattern. The *default* argument is used for groups that 889 did not participate in the match; it defaults to ``None``. (Incompatibility 890 note: in the original Python 1.5 release, if the tuple was one element long, a 891 string would be returned instead. In later versions (from 1.5.1 on), a 892 singleton tuple is returned in such cases.) 893 894 For example: 895 896 >>> m = re.match(r"(\d+)\.(\d+)", "24.1632") 897 >>> m.groups() 898 ('24', '1632') 899 900 If we make the decimal place and everything after it optional, not all groups 901 might participate in the match. These groups will default to ``None`` unless 902 the *default* argument is given: 903 904 >>> m = re.match(r"(\d+)\.?(\d+)?", "24") 905 >>> m.groups() # Second group defaults to None. 906 ('24', None) 907 >>> m.groups('0') # Now, the second group defaults to '0'. 908 ('24', '0') 909 910 911 .. method:: MatchObject.groupdict([default]) 912 913 Return a dictionary containing all the *named* subgroups of the match, keyed by 914 the subgroup name. The *default* argument is used for groups that did not 915 participate in the match; it defaults to ``None``. For example: 916 917 >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") 918 >>> m.groupdict() 919 {'first_name': 'Malcolm', 'last_name': 'Reynolds'} 920 921 922 .. method:: MatchObject.start([group]) 923 MatchObject.end([group]) 924 925 Return the indices of the start and end of the substring matched by *group*; 926 *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if 927 *group* exists but did not contribute to the match. For a match object *m*, and 928 a group *g* that did contribute to the match, the substring matched by group *g* 929 (equivalent to ``m.group(g)``) is :: 930 931 m.string[m.start(g):m.end(g)] 932 933 Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a 934 null string. For example, after ``m = re.search('b(c?)', 'cba')``, 935 ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both 936 2, and ``m.start(2)`` raises an :exc:`IndexError` exception. 937 938 An example that will remove *remove_this* from email addresses: 939 940 >>> email = "tony@tiremove_thisger.net" 941 >>> m = re.search("remove_this", email) 942 >>> email[:m.start()] + email[m.end():] 943 'tony@tiger.net' 944 945 946 .. method:: MatchObject.span([group]) 947 948 For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group), 949 m.end(group))``. Note that if *group* did not contribute to the match, this is 950 ``(-1, -1)``. *group* defaults to zero, the entire match. 951 952 953 .. attribute:: MatchObject.pos 954 955 The value of *pos* which was passed to the :meth:`~RegexObject.search` or 956 :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the 957 index into the string at which the RE engine started looking for a match. 958 959 960 .. attribute:: MatchObject.endpos 961 962 The value of *endpos* which was passed to the :meth:`~RegexObject.search` or 963 :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the 964 index into the string beyond which the RE engine will not go. 965 966 967 .. attribute:: MatchObject.lastindex 968 969 The integer index of the last matched capturing group, or ``None`` if no group 970 was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and 971 ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while 972 the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same 973 string. 974 975 976 .. attribute:: MatchObject.lastgroup 977 978 The name of the last matched capturing group, or ``None`` if the group didn't 979 have a name, or if no group was matched at all. 980 981 982 .. attribute:: MatchObject.re 983 984 The regular expression object whose :meth:`~RegexObject.match` or 985 :meth:`~RegexObject.search` method produced this :class:`MatchObject` 986 instance. 987 988 989 .. attribute:: MatchObject.string 990 991 The string passed to :meth:`~RegexObject.match` or 992 :meth:`~RegexObject.search`. 948 993 949 994 … … 967 1012 Suppose you are writing a poker program where a player's hand is represented as 968 1013 a 5-character string with each character representing a card, "a" for ace, "k" 969 for king, "q" for queen, j for jack, "0" for 10, and "1" through "9"1014 for king, "q" for queen, "j" for jack, "t" for 10, and "2" through "9" 970 1015 representing the card with that value. 971 1016 972 1017 To see if a given string is a valid hand, one could do the following: 973 1018 974 >>> valid = re.compile(r" [0-9akqj]{5}$")975 >>> displaymatch(valid.match("ak 05q")) # Valid.976 "<Match: 'ak 05q', groups=()>"977 >>> displaymatch(valid.match("ak 05e")) # Invalid.978 >>> displaymatch(valid.match("ak 0")) # Invalid.1019 >>> valid = re.compile(r"^[a2-9tjqk]{5}$") 1020 >>> displaymatch(valid.match("akt5q")) # Valid. 1021 "<Match: 'akt5q', groups=()>" 1022 >>> displaymatch(valid.match("akt5e")) # Invalid. 1023 >>> displaymatch(valid.match("akt")) # Invalid. 979 1024 >>> displaymatch(valid.match("727ak")) # Valid. 980 1025 "<Match: '727ak', groups=()>" … … 1015 1060 .. index:: single: scanf() 1016 1061 1017 Python does not currently have an equivalent to :c func:`scanf`. Regular1062 Python does not currently have an equivalent to :c:func:`scanf`. Regular 1018 1063 expressions are generally more powerful, though also more verbose, than 1019 :c func:`scanf` format strings. The table below offers some more-or-less1020 equivalent mappings between :c func:`scanf` format tokens and regular1064 :c:func:`scanf` format strings. The table below offers some more-or-less 1065 equivalent mappings between :c:func:`scanf` format tokens and regular 1021 1066 expressions. 1022 1067 1023 1068 +--------------------------------+---------------------------------------------+ 1024 | :c func:`scanf` Token| Regular Expression |1069 | :c:func:`scanf` Token | Regular Expression | 1025 1070 +================================+=============================================+ 1026 1071 | ``%c`` | ``.`` | … … 1034 1079 | ``%i`` | ``[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)`` | 1035 1080 +--------------------------------+---------------------------------------------+ 1036 | ``%o`` | `` 0[0-7]*``|1081 | ``%o`` | ``[-+]?[0-7]+`` | 1037 1082 +--------------------------------+---------------------------------------------+ 1038 1083 | ``%s`` | ``\S+`` | … … 1040 1085 | ``%u`` | ``\d+`` | 1041 1086 +--------------------------------+---------------------------------------------+ 1042 | ``%x``, ``%X`` | `` 0[xX][\dA-Fa-f]+``|1087 | ``%x``, ``%X`` | ``[-+]?(0[xX])?[\dA-Fa-f]+`` | 1043 1088 +--------------------------------+---------------------------------------------+ 1044 1089 … … 1047 1092 /usr/sbin/sendmail - 0 errors, 4 warnings 1048 1093 1049 you would use a :c func:`scanf` format like ::1094 you would use a :c:func:`scanf` format like :: 1050 1095 1051 1096 %s - %d errors, %d warnings … … 1056 1101 1057 1102 1058 Avoiding recursion 1059 ^^^^^^^^^^^^^^^^^^ 1060 1061 If you create regular expressions that require the engine to perform a lot of 1062 recursion, you may encounter a :exc:`RuntimeError` exception with the message 1063 ``maximum recursion limit`` exceeded. For example, :: 1064 1065 >>> s = 'Begin ' + 1000*'a very long string ' + 'end' 1066 >>> re.match('Begin (\w| )*? end', s).end() 1067 Traceback (most recent call last): 1068 File "<stdin>", line 1, in ? 1069 File "/usr/local/lib/python2.5/re.py", line 132, in match 1070 return _compile(pattern, flags).match(string) 1071 RuntimeError: maximum recursion limit exceeded 1072 1073 You can often restructure your regular expression to avoid recursion. 1074 1075 Starting with Python 2.3, simple uses of the ``*?`` pattern are special-cased to 1076 avoid recursion. Thus, the above regular expression can avoid recursion by 1077 being recast as ``Begin [a-zA-Z0-9_ ]*?end``. As a further benefit, such 1078 regular expressions will run faster than their recursive equivalents. 1079 1103 .. _search-vs-match: 1080 1104 1081 1105 search() vs. match() 1082 1106 ^^^^^^^^^^^^^^^^^^^^ 1083 1107 1084 In a nutshell, :func:`match` only attempts to match a pattern at the beginning 1085 of a string where :func:`search` will match a pattern anywhere in a string. 1086 For example: 1087 1088 >>> re.match("o", "dog") # No match as "o" is not the first letter of "dog". 1089 >>> re.search("o", "dog") # Match as search() looks everywhere in the string. 1108 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> 1109 1110 Python offers two different primitive operations based on regular expressions: 1111 :func:`re.match` checks for a match only at the beginning of the string, while 1112 :func:`re.search` checks for a match anywhere in the string (this is what Perl 1113 does by default). 1114 1115 For example:: 1116 1117 >>> re.match("c", "abcdef") # No match 1118 >>> re.search("c", "abcdef") # Match 1090 1119 <_sre.SRE_Match object at ...> 1091 1120 1092 .. note:: 1093 1094 The following applies only to regular expression objects like those created 1095 with ``re.compile("pattern")``, not the primitives ``re.match(pattern, 1096 string)`` or ``re.search(pattern, string)``. 1097 1098 :func:`match` has an optional second parameter that gives an index in the string 1099 where the search is to start:: 1100 1101 >>> pattern = re.compile("o") 1102 >>> pattern.match("dog") # No match as "o" is not at the start of "dog." 1103 1104 # Equivalent to the above expression as 0 is the default starting index: 1105 >>> pattern.match("dog", 0) 1106 1107 # Match as "o" is the 2nd character of "dog" (index 0 is the first): 1108 >>> pattern.match("dog", 1) 1121 Regular expressions beginning with ``'^'`` can be used with :func:`search` to 1122 restrict the match at the beginning of the string:: 1123 1124 >>> re.match("c", "abcdef") # No match 1125 >>> re.search("^c", "abcdef") # No match 1126 >>> re.search("^a", "abcdef") # Match 1109 1127 <_sre.SRE_Match object at ...> 1110 >>> pattern.match("dog", 2) # No match as "o" is not the 3rd character of "dog." 1128 1129 Note however that in :const:`MULTILINE` mode :func:`match` only matches at the 1130 beginning of the string, whereas using :func:`search` with a regular expression 1131 beginning with ``'^'`` will match at the beginning of each line. 1132 1133 >>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match 1134 >>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match 1135 <_sre.SRE_Match object at ...> 1111 1136 1112 1137 … … 1122 1147 triple-quoted string syntax: 1123 1148 1124 >>> input = """Ross McFluff: 834.345.1254 155 Elm Street1149 >>> text = """Ross McFluff: 834.345.1254 155 Elm Street 1125 1150 ... 1126 1151 ... Ronald Heathmore: 892.345.3428 436 Finley Avenue … … 1136 1161 :options: +NORMALIZE_WHITESPACE 1137 1162 1138 >>> entries = re.split("\n+", input)1163 >>> entries = re.split("\n+", text) 1139 1164 >>> entries 1140 1165 ['Ross McFluff: 834.345.1254 155 Elm Street', … … 1183 1208 ... return m.group(1) + "".join(inner_word) + m.group(3) 1184 1209 >>> text = "Professor Abdolmalek, please report your absences promptly." 1185 >>> re.sub( "(\w)(\w+)(\w)", repl, text)1210 >>> re.sub(r"(\w)(\w+)(\w)", repl, text) 1186 1211 'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.' 1187 >>> re.sub( "(\w)(\w+)(\w)", repl, text)1212 >>> re.sub(r"(\w)(\w+)(\w)", repl, text) 1188 1213 'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.' 1189 1214
Note:
See TracChangeset
for help on using the changeset viewer.