Context Navigation

← Previous Change
Next Change →

unicode.rst

Timestamp:

Mar 19, 2014, 11:31:01 PM (11 years ago)

Author:

dmik

Message:

python: Merge vendor 2.7.6 to trunk.

Location:

python/trunk

Files:

: 2 edited

. (modified) (1 prop)
Doc/howto/unicode.rst (modified) (12 diffs)

Legend:

: Unmodified
: Added
: Removed

python/trunk
- Property svn:mergeinfo set to
  /python/vendor/Python-2.7.6 merged eligible
  /python/vendor/current merged eligible

python/trunk/Doc/howto/unicode.rst

-              r2
+              r391
 *****************
+:Release: 1.02
+This HOWTO discusses Python's support for Unicode, and explains various problems
+that people commonly encounter when trying to work with Unicode.
+:Release: 1.03
+This HOWTO discusses Python 2.x's support for Unicode, and explains
+various problems that people commonly encounter when trying to work
+with Unicode.  For the Python 3 version, see
+<http://docs.python.org/py3k/howto/unicode.html>.
 Introduction to Unicode
 …
    handle content with embedded zero bytes.
+Generally people don't use this encoding, instead choosing other encodings that
+are more efficient and convenient.
+Generally people don't use this encoding, instead choosing other
+encodings that are more efficient and convenient.  UTF-8 is probably
+the most commonly supported encoding; it will be discussed below.
 Encodings don't have to handle every possible Unicode character, and most
 …
 Python's Unicode Support
 ========================
+Python 2.x's Unicode Support
+============================
 Now that you've learned the rudiments of Unicode, we can look at Python's
 …
     >>> type(s)
     <type 'unicode'>
     >>> unicode('abcdef' + chr(255))
+    >>> unicode('abcdef' + chr(255))    #doctest: +NORMALIZE_WHITESPACE
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
+    ...
     UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
                         ordinal not in range(128)
+    ordinal not in range(128)
 The ``errors`` argument specifies the response when the input string can't be
 …
 Unicode result).  The following examples show the differences::
     >>> unicode('\x80abc', errors='strict')
+    >>> unicode('\x80abc', errors='strict')     #doctest: +NORMALIZE_WHITESPACE
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
+        ...
     UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
                         ordinal not in range(128)
+    ordinal not in range(128)
     >>> unicode('\x80abc', errors='replace')
     u'\ufffdabc'
 …
     u'abc'
 Encodings are specified as strings containing the encoding's name.  Python 2.4
+Encodings are specified as strings containing the encoding's name.  Python 2.7
 comes with roughly 100 different encodings; see the Python Library Reference at
 :ref:`standard-encodings` for a list.  Some encodings
 …
 than 127 will cause an exception::
     >>> s.find('Was\x9f')
+    >>> s.find('Was\x9f')                   #doctest: +NORMALIZE_WHITESPACE
     Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
+        ...
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3:
+    ordinal not in range(128)
     >>> s.find(u'Was\x9f')
     -1
 …
     >>> u.encode('utf-8')
     '\xea\x80\x80abcd\xde\xb4'
     >>> u.encode('ascii')
+    >>> u.encode('ascii')                       #doctest: +NORMALIZE_WHITESPACE
     Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
+        ...
+    UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in
+    position 0: ordinal not in range(128)
     >>> u.encode('ascii', 'ignore')
     'abcd'
 …
     >>> s = u"a\xac\u1234\u20ac\U00008000"
                ^^^^ two-digit hex escape
                    ^^^^^^ four-digit Unicode escape
                                ^^^^^^^^^^ eight-digit Unicode escape
+    ... #      ^^^^ two-digit hex escape
+    ... #          ^^^^^^ four-digit Unicode escape
+    ... #                      ^^^^^^^^^^ eight-digit Unicode escape
     >>> for c in s:  print ord(c),
     ...
 …
 When you run it with Python 2.4, it will output the following warning::
     amk:~$ python p263.py
+    amk:~$ python2.4 p263.py
     sys:1: DeprecationWarning: Non-ASCII character '\xe9'
          in file p263.py on line 2, but no encoding declared;
          see http://www.python.org/peps/pep-0263.html for details
+Python 2.5 and higher are stricter and will produce a syntax error::
+    amk:~$ python2.5 p263.py
+    File "/tmp/p263.py", line 2
+    SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py
+      on line 2, but no encoding declared; see
+      http://www.python.org/peps/pep-0263.html for details
 …
 "Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
 other".  See
 <http://unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values> for a
+<http://www.unicode.org/reports/tr44/#General_Category_Values> for a
 list of category codes.
 …
 Version 1.02: posted August 16 2005.  Corrects factual errors.
+Version 1.03: posted June 20 2010.  Notes that Python 3.x is not covered,
+and that the HOWTO only covers 2.x.
+.. comment Describe Python 3.x support (new section? new document?)
 .. comment Additional topic: building Python w/ UCS2 or UCS4 support
 .. comment Describe obscure -U switch somewhere?

Note: See TracChangeset for help on using the changeset viewer.

/python/vendor/Python-2.7.6	merged	eligible
/python/vendor/current	merged	eligible

Context Navigation

Changeset 391 for python/trunk/Doc/howto/unicode.rst

Legend:

python/trunk

python/trunk/Doc/howto/unicode.rst

Download in other formats: