Changeset 391 for python/trunk/Doc/howto/regex.rst
- Timestamp:
- Mar 19, 2014, 11:31:01 PM (11 years ago)
- Location:
- python/trunk
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
python/trunk
-
Property svn:mergeinfo
set to
/python/vendor/Python-2.7.6 merged eligible /python/vendor/current merged eligible
-
Property svn:mergeinfo
set to
-
python/trunk/Doc/howto/regex.rst
r2 r391 6 6 7 7 :Author: A.M. Kuchling <amk@amk.ca> 8 :Release: 0.059 8 10 9 .. TODO: … … 83 82 in the rest of this HOWTO. :: 84 83 85 . ^ $ * + ? { [ ] \ | ( )84 . ^ $ * + ? { } [ ] \ | ( ) 86 85 87 86 The first metacharacters we'll look at are ``[`` and ``]``. They're used for … … 114 113 of characters that are often useful, such as the set of digits, the set of 115 114 letters, or the set of anything that isn't whitespace. The following predefined 116 special sequences are available: 115 special sequences are a subset of those available. The equivalent classes are 116 for byte string patterns. For a complete list of sequences and expanded class 117 definitions for Unicode string patterns, see the last part of 118 :ref:`Regular Expression Syntax <re-syntax>`. 117 119 118 120 ``\d`` … … 264 266 >>> import re 265 267 >>> p = re.compile('ab*') 266 >>> p rint p267 <_sre.SRE_Pattern object at 80b4150>268 >>> p #doctest: +ELLIPSIS 269 <_sre.SRE_Pattern object at 0x...> 268 270 269 271 :func:`re.compile` also accepts an optional *flags* argument, used to enable … … 358 360 359 361 :meth:`match` and :meth:`search` return ``None`` if no match can be found. If 360 they're successful, a ``MatchObject`` instance is returned, containing361 information about the match: where it starts and ends, the substring it matched, 362 and more.362 they're successful, a :ref:`match object <match-objects>` instance is returned, 363 containing information about the match: where it starts and ends, the substring 364 it matched, and more. 363 365 364 366 You can learn about this by interactively experimenting with the :mod:`re` 365 367 module. If you have Tkinter available, you may also want to look at 366 : file:`Tools/scripts/redemo.py`, a demonstration program included with the368 :source:`Tools/scripts/redemo.py`, a demonstration program included with the 367 369 Python distribution. It allows you to enter REs and strings, and displays 368 370 whether the RE matches or fails. :file:`redemo.py` can be quite useful when … … 377 379 >>> import re 378 380 >>> p = re.compile('[a-z]+') 379 >>> p 380 <_sre.SRE_Pattern object at 80c3c28>381 >>> p #doctest: +ELLIPSIS 382 <_sre.SRE_Pattern object at 0x...> 381 383 382 384 Now, you can try matching various strings against the RE ``[a-z]+``. An empty … … 391 393 392 394 Now, let's try it on a string that it should match, such as ``tempo``. In this 393 case, :meth:`match` will return a : class:`MatchObject`, so you should store the394 result in a variable for later use. ::395 case, :meth:`match` will return a :ref:`match object <match-objects>`, so you 396 should store the result in a variable for later use. :: 395 397 396 398 >>> m = p.match('tempo') 397 >>> print m398 <_sre.SRE_Match object at 80c4f68>399 400 Now you can query the : class:`MatchObject` for information about the matching401 string. :class:`MatchObject` instances also have several methods and 402 a ttributes; the most important ones are:399 >>> m #doctest: +ELLIPSIS 400 <_sre.SRE_Match object at 0x...> 401 402 Now you can query the :ref:`match object <match-objects>` for information 403 about the matching string. :ref:`match object <match-objects>` instances 404 also have several methods and attributes; the most important ones are: 403 405 404 406 +------------------+--------------------------------------------+ … … 434 436 >>> print p.match('::: message') 435 437 None 436 >>> m = p.search('::: message') ; print m437 < re.MatchObject instance at 80c9650>438 >>> m = p.search('::: message'); print m #doctest: +ELLIPSIS 439 <_sre.SRE_Match object at 0x...> 438 440 >>> m.group() 439 441 'message' … … 441 443 (4, 11) 442 444 443 In actual programs, the most common style is to store the :class:`MatchObject` 444 in a variable, and then check if it was ``None``. This usually looks like:: 445 In actual programs, the most common style is to store the 446 :ref:`match object <match-objects>` in a variable, and then check if it was 447 ``None``. This usually looks like:: 445 448 446 449 p = re.compile( ... ) … … 459 462 460 463 :meth:`findall` has to create the entire list before it can be returned as the 461 result. The :meth:`finditer` method returns a sequence of :class:`MatchObject`462 instances as an :term:`iterator`. [#]_ ::464 result. The :meth:`finditer` method returns a sequence of 465 :ref:`match object <match-objects>` instances as an :term:`iterator`. [#]_ :: 463 466 464 467 >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') 465 >>> iterator 466 <callable-iterator object at 0x 401833ac>468 >>> iterator #doctest: +ELLIPSIS 469 <callable-iterator object at 0x...> 467 470 >>> for match in iterator: 468 471 ... print match.span() … … 481 484 take the same arguments as the corresponding pattern method, with 482 485 the RE string added as the first argument, and still return either ``None`` or a 483 : class:`MatchObject` instance. ::486 :ref:`match object <match-objects>` instance. :: 484 487 485 488 >>> print re.match(r'From\s+', 'Fromage amk') 486 489 None 487 >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') 488 < re.MatchObject instance at 80c5978>490 >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS 491 <_sre.SRE_Match object at 0x...> 489 492 490 493 Under the hood, these functions simply create a pattern object for you … … 500 503 the definitions in one place, in a section of code that compiles all the REs 501 504 ahead of time. To take an example from the standard library, here's an extract 502 from :file:`xmllib.py`::505 from the deprecated :mod:`xmllib` module:: 503 506 504 507 ref = re.compile( ... ) … … 686 689 line, the RE to use is ``^From``. :: 687 690 688 >>> print re.search('^From', 'From Here to Eternity') 689 < re.MatchObject instance at 80c1520>691 >>> print re.search('^From', 'From Here to Eternity') #doctest: +ELLIPSIS 692 <_sre.SRE_Match object at 0x...> 690 693 >>> print re.search('^From', 'Reciting From Memory') 691 694 None … … 698 701 or any location followed by a newline character. :: 699 702 700 >>> print re.search('}$', '{block}') 701 < re.MatchObject instance at 80adfa8>703 >>> print re.search('}$', '{block}') #doctest: +ELLIPSIS 704 <_sre.SRE_Match object at 0x...> 702 705 >>> print re.search('}$', '{block} ') 703 706 None 704 >>> print re.search('}$', '{block}\n') 705 < re.MatchObject instance at 80adfa8>707 >>> print re.search('}$', '{block}\n') #doctest: +ELLIPSIS 708 <_sre.SRE_Match object at 0x...> 706 709 707 710 To match a literal ``'$'``, use ``\$`` or enclose it inside a character class, … … 727 730 728 731 >>> p = re.compile(r'\bclass\b') 729 >>> print p.search('no class at all') 730 < re.MatchObject instance at 80c8f28>732 >>> print p.search('no class at all') #doctest: +ELLIPSIS 733 <_sre.SRE_Match object at 0x...> 731 734 >>> print p.search('the declassified algorithm') 732 735 None … … 745 748 >>> print p.search('no class at all') 746 749 None 747 >>> print p.search('\b' + 'class' + '\b') 748 < re.MatchObject instance at 80c3ee0>750 >>> print p.search('\b' + 'class' + '\b') #doctest: +ELLIPSIS 751 <_sre.SRE_Match object at 0x...> 749 752 750 753 Second, inside a character class, where there's no use for this assertion, … … 790 793 to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`. Groups are 791 794 numbered starting with 0. Group 0 is always present; it's the whole RE, so 792 : class:`MatchObject` methods all have group 0 as their default argument. Later793 we'll see how to express groups that don't capture the span of text that they 794 match. ::795 :ref:`match object <match-objects>` methods all have group 0 as their default 796 argument. Later we'll see how to express groups that don't capture the span 797 of text that they match. :: 795 798 796 799 >>> p = re.compile('(a)b') … … 912 915 ``(?P<name>...)``. *name* is, obviously, the name of the group. Named groups 913 916 also behave exactly like capturing groups, and additionally associate a name 914 with a group. The : class:`MatchObject` methods that deal with capturing groups915 all accept either integers that refer to the group by number or strings that 916 contain the desired group's name. Named groups are still given numbers, so you 917 can retrieve information about a group in two ways::917 with a group. The :ref:`match object <match-objects>` methods that deal with 918 capturing groups all accept either integers that refer to the group by number 919 or strings that contain the desired group's name. Named groups are still 920 given numbers, so you can retrieve information about a group in two ways:: 918 921 919 922 >>> p = re.compile(r'(?P<word>\b\w+\b)') … … 1179 1182 *replacement* can also be a function, which gives you even more control. If 1180 1183 *replacement* is a function, the function is called for every non-overlapping 1181 occurrence of *pattern*. On each call, the function is 1182 : class:`MatchObject` argument for the match and can use this information to1183 compute the desired replacement string and return it.1184 1185 In the following example, the replacement function translates 1184 occurrence of *pattern*. On each call, the function is passed a 1185 :ref:`match object <match-objects>` argument for the match and can use this 1186 information to compute the desired replacement string and return it. 1187 1188 In the following example, the replacement function translates decimals into 1186 1189 hexadecimal:: 1187 1190 1188 >>> def hexrepl( match):1191 >>> def hexrepl(match): 1189 1192 ... "Return the hex string for a decimal number" 1190 ... value = int( match.group())1193 ... value = int(match.group()) 1191 1194 ... return hex(value) 1192 1195 ... … … 1316 1319 1317 1320 1318 NotUsing re.VERBOSE1321 Using re.VERBOSE 1319 1322 -------------------- 1320 1323
Note:
See TracChangeset
for help on using the changeset viewer.