Context Navigation

← Previous Change
Next Change →

tokenize.py

Timestamp:

Mar 19, 2014, 11:31:01 PM (11 years ago)

Author:

dmik

Message:

python: Merge vendor 2.7.6 to trunk.

Location:

python/trunk

Files:

: 2 edited

. (modified) (1 prop)
Lib/lib2to3/pgen2/tokenize.py (modified) (9 diffs)

Legend:

: Unmodified
: Added
: Removed

python/trunk
- Property svn:mergeinfo set to
  /python/vendor/Python-2.7.6 merged eligible
  /python/vendor/current merged eligible

python/trunk/Lib/lib2to3/pgen2/tokenize.py

-              r2
+              r391
            "generate_tokens", "untokenize"]
 del token
+try:
+    bytes
+except NameError:
+    # Support bytes type in Python <= 2.5, so 2to3 turns itself into
+    # valid Python 3 code.
+    bytes = str
 def group(*choices): return '(' + '|'.join(choices) + ')'
 …
             toks_append(tokval)
 cookie_re = re.compile("coding[:=]\s*([-\w.]+)")
+cookie_re = re.compile(r'^[ \t\f]*#.*coding[:=][ \t]*([-\w.]+)')
 def _get_normal_name(orig_enc):
 …
     It detects the encoding from the presence of a utf-8 bom or an encoding
+    cookie as specified in pep-0263. If both a bom and a cookie are present,
+    but disagree, a SyntaxError will be raised. If the encoding cookie is an
+    invalid charset, raise a SyntaxError.
+    cookie as specified in pep-0263. If both a bom and a cookie are present, but
+    disagree, a SyntaxError will be raised. If the encoding cookie is an invalid
+    charset, raise a SyntaxError.  Note that if a utf-8 bom is found,
+    'utf-8-sig' is returned.
     If no encoding is specified, then the default of 'utf-8' will be returned.
 …
     bom_found = False
     encoding = None
+    default = 'utf-8'
     def read_or_stop():
         try:
             return readline()
         except StopIteration:
             return b''
+            return bytes()
     def find_cookie(line):
 …
         except UnicodeDecodeError:
             return None
+        matches = cookie_re.findall(line_string)
+        if not matches:
+        match = cookie_re.match(line_string)
+        if not match:
             return None
         encoding = _get_normal_name(matches[0])
+        encoding = _get_normal_name(match.group(1))
         try:
             codec = lookup(encoding)
 …
                 # This behaviour mimics the Python interpreter
                 raise SyntaxError('encoding problem: utf-8')
+            else:
+                # Allow it to be properly encoded and decoded.
+                encoding = 'utf-8-sig'
+            encoding += '-sig'
         return encoding
 …
         bom_found = True
         first = first[3:]
+        default = 'utf-8-sig'
     if not first:
         return 'utf-8', []
+        return default, []
     encoding = find_cookie(first)
 …
     second = read_or_stop()
     if not second:
         return 'utf-8', [first]
+        return default, [first]
     encoding = find_cookie(second)
 …
         return encoding, [first, second]
     return 'utf-8', [first, second]
+    return default, [first, second]
 def untokenize(iterable):

Note: See TracChangeset for help on using the changeset viewer.

/python/vendor/Python-2.7.6	merged	eligible
/python/vendor/current	merged	eligible

Context Navigation

Changeset 391 for python/trunk/Lib/lib2to3/pgen2/tokenize.py

Legend:

python/trunk

python/trunk/Lib/lib2to3/pgen2/tokenize.py

Download in other formats: