Ignore:
Timestamp:
Mar 19, 2014, 11:31:01 PM (11 years ago)
Author:
dmik
Message:

python: Merge vendor 2.7.6 to trunk.

Location:
python/trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • python/trunk

  • python/trunk/Doc/howto/unicode.rst

    r2 r391  
    33*****************
    44
    5 :Release: 1.02
    6 
    7 This HOWTO discusses Python's support for Unicode, and explains various problems
    8 that people commonly encounter when trying to work with Unicode.
     5:Release: 1.03
     6
     7This HOWTO discusses Python 2.x's support for Unicode, and explains
     8various problems that people commonly encounter when trying to work
     9with Unicode.  For the Python 3 version, see
     10<http://docs.python.org/py3k/howto/unicode.html>.
    911
    1012Introduction to Unicode
     
    145147   handle content with embedded zero bytes.
    146148
    147 Generally people don't use this encoding, instead choosing other encodings that
    148 are more efficient and convenient.
     149Generally people don't use this encoding, instead choosing other
     150encodings that are more efficient and convenient.  UTF-8 is probably
     151the most commonly supported encoding; it will be discussed below.
    149152
    150153Encodings don't have to handle every possible Unicode character, and most
     
    223226
    224227
    225 Python's Unicode Support
    226 ========================
     228Python 2.x's Unicode Support
     229============================
    227230
    228231Now that you've learned the rudiments of Unicode, we can look at Python's
     
    251254    >>> type(s)
    252255    <type 'unicode'>
    253     >>> unicode('abcdef' + chr(255))
     256    >>> unicode('abcdef' + chr(255))    #doctest: +NORMALIZE_WHITESPACE
    254257    Traceback (most recent call last):
    255       File "<stdin>", line 1, in ?
     258    ...
    256259    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
    257                         ordinal not in range(128)
     260    ordinal not in range(128)
    258261
    259262The ``errors`` argument specifies the response when the input string can't be
     
    263266Unicode result).  The following examples show the differences::
    264267
    265     >>> unicode('\x80abc', errors='strict')
     268    >>> unicode('\x80abc', errors='strict')     #doctest: +NORMALIZE_WHITESPACE
    266269    Traceback (most recent call last):
    267       File "<stdin>", line 1, in ?
     270        ...
    268271    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
    269                         ordinal not in range(128)
     272    ordinal not in range(128)
    270273    >>> unicode('\x80abc', errors='replace')
    271274    u'\ufffdabc'
     
    273276    u'abc'
    274277
    275 Encodings are specified as strings containing the encoding's name.  Python 2.4
     278Encodings are specified as strings containing the encoding's name.  Python 2.7
    276279comes with roughly 100 different encodings; see the Python Library Reference at
    277280:ref:`standard-encodings` for a list.  Some encodings
     
    310313than 127 will cause an exception::
    311314
    312     >>> s.find('Was\x9f')
     315    >>> s.find('Was\x9f')                   #doctest: +NORMALIZE_WHITESPACE
    313316    Traceback (most recent call last):
    314       File "<stdin>", line 1, in ?
    315     UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
     317        ...
     318    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3:
     319    ordinal not in range(128)
    316320    >>> s.find(u'Was\x9f')
    317321    -1
     
    331335    >>> u.encode('utf-8')
    332336    '\xea\x80\x80abcd\xde\xb4'
    333     >>> u.encode('ascii')
     337    >>> u.encode('ascii')                       #doctest: +NORMALIZE_WHITESPACE
    334338    Traceback (most recent call last):
    335       File "<stdin>", line 1, in ?
    336     UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
     339        ...
     340    UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in
     341    position 0: ordinal not in range(128)
    337342    >>> u.encode('ascii', 'ignore')
    338343    'abcd'
     
    382387
    383388    >>> s = u"a\xac\u1234\u20ac\U00008000"
    384                ^^^^ two-digit hex escape
    385                    ^^^^^^ four-digit Unicode escape
    386                                ^^^^^^^^^^ eight-digit Unicode escape
     389    ... #      ^^^^ two-digit hex escape
     390    ... #          ^^^^^^ four-digit Unicode escape
     391    ... #                      ^^^^^^^^^^ eight-digit Unicode escape
    387392    >>> for c in s:  print ord(c),
    388393    ...
     
    428433When you run it with Python 2.4, it will output the following warning::
    429434
    430     amk:~$ python p263.py
     435    amk:~$ python2.4 p263.py
    431436    sys:1: DeprecationWarning: Non-ASCII character '\xe9'
    432437         in file p263.py on line 2, but no encoding declared;
    433438         see http://www.python.org/peps/pep-0263.html for details
     439
     440Python 2.5 and higher are stricter and will produce a syntax error::
     441
     442    amk:~$ python2.5 p263.py
     443    File "/tmp/p263.py", line 2
     444    SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py
     445      on line 2, but no encoding declared; see
     446      http://www.python.org/peps/pep-0263.html for details
    434447
    435448
     
    473486"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
    474487other".  See
    475 <http://unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values> for a
     488<http://www.unicode.org/reports/tr44/#General_Category_Values> for a
    476489list of category codes.
    477490
     
    694707Version 1.02: posted August 16 2005.  Corrects factual errors.
    695708
    696 
     709Version 1.03: posted June 20 2010.  Notes that Python 3.x is not covered,
     710and that the HOWTO only covers 2.x.
     711
     712
     713.. comment Describe Python 3.x support (new section? new document?)
    697714.. comment Additional topic: building Python w/ UCS2 or UCS4 support
    698715.. comment Describe obscure -U switch somewhere?
Note: See TracChangeset for help on using the changeset viewer.