Python 2.2 有什么新变化

作者

A.M. Kuchling

簡介

本文本介绍了 Python 2.2.2 的新增特性,该版本发布于 2002 年 10 月 14日。 Python 2.2.2 是 Python 2.2 的问题修正发布版,最初发布于 2001 年 12 月 21 日。

Python 2.2 可以被看作是 "清理发布版"。 有一些特性如生成器和迭代器等是全新的,但大多数变化,尽管可能是重大而深远的,都是为了清理语言设计中的不规范和阴暗角落。

本文并不试图提供对新特性的完整规范说明,而是提供一个便捷的概览。 要获取全部细节,你应该参阅 Python 2.2 的文档,比如 Python 库参考Python 参考指南。 如果你想要了解某项更改的完整实现和设计理念,请参阅特定新特性的 PEP。

PEP 252 和 253:类型和类的修改

Python 2.2 中最大且影响最深远的改变是针对 Python 的对象和类模型。 这些变化应该是向下兼容的,因此你的代码将能继续运行而无需修改,但这些变化提供了一些很棒的新功能。 在开始本文最长和最复杂的部分之前,我提供对这些变化的概览并附带一些注释。

A long time ago I wrote a Web page listing flaws in Python's design. One of the most significant flaws was that it's impossible to subclass Python types implemented in C. In particular, it's not possible to subclass built-in types, so you can't just subclass, say, lists in order to add a single useful method to them. The UserList module provides a class that supports all of the methods of lists and that can be subclassed further, but there's lots of C code that expects a regular Python list and won't accept a UserList instance.

Python 2.2 修正了此问题,并在此过程中添加了一些令人激动的新功能。 简明概述如下:

  • 你可以继承内置类型,例如列表和整数,并且你的子类应该在任何需要原始类型的地方正常工作。这使得 Python 的面向对象编程更加灵活和强大。

  • 现在,除了之前版本的 Python 中可用的实例方法外,还可以定义静态方法和类方法。这使得你可以更灵活地组织类的行为。

  • It's also possible to automatically call methods on accessing or setting an instance attribute by using a new mechanism called properties. Many uses of __getattr__() can be rewritten to use properties instead, making the resulting code simpler and faster. As a small side benefit, attributes can now have docstrings, too.

  • 可以使用 __slots__ 限制实例的合法属性列表,从而防止拼写错误,并且在未来的 Python 版本中可能进行更多的优化。

一些用户对这些变化表示担忧。确实,他们说,新功能很棒,可以实现以前版本的 Python 无法做到的各种技巧,但它们也使语言变得更加复杂。一些人表示,他们一直推荐 Python 是因为它的简单性,现在感觉这种简单性正在丧失。

个人而言,我认为没有必要担心。许多新功能相当深奥,你可以编写大量 Python 代码而不需要了解它们。编写一个简单的类并不比以前更难,因此除非确实需要,否则你不必费心去学习或教授这些新功能。一些以前只有在 C 语言中才能实现的非常复杂的任务,现在可以用纯 Python 实现,在我看来,这一切都更好了。

我不会尝试涵盖所有为了使新功能生效而需要的每一个边缘情况和小改动。相反,本节将只勾勒出大致的轮廓。有关 Python 2.2 新对象模型的更多信息,请参见 相关链接 的“相关链接”部分。

旧式类和新式类

首先,你应该知道 Python 2.2 实际上有两种类型的类:经典类(或旧式类)和新式类。旧式类模型与早期版本的 Python 中的类模型完全相同。本节描述的所有新功能仅适用于新式类。这种分歧并不是永久的;最终,旧式类将被淘汰,可能在 Python 3.0 中被移除。

那么如何定义一个新式类呢?你可以通过继承一个现有的新式类来实现。大多数 Python 内置类型,如整数、列表、字典,甚至文件,现在都是新式类。此外,还添加了一个名为 object 的新式类,它是所有内置类型的基类,因此如果没有合适的内置类型,你可以直接继承 object 类:

class C(object):
    def __init__ (self):
        ...
    ...

This means that class statements that don't have any base classes are always classic classes in Python 2.2. (Actually you can also change this by setting a module-level variable named __metaclass__ --- see PEP 253 for the details --- but it's easier to just subclass object.)

内置类型的类型对象在 Python 2.2 中作为内置对象提供,使用了一种巧妙的技巧命名。Python 一直有名为 int()float()str() 的内置函数。在 Python 2.2 中,它们不再是函数,而是作为被调用时表现为工厂的类型对象。

>>> int
<type 'int'>
>>> int('123')
123

To make the set of types complete, new type objects such as dict() and file() have been added. Here's a more interesting example, adding a lock() method to file objects:

class LockableFile(file):
    def lock (self, operation, length=0, start=0, whence=0):
        import fcntl
        return fcntl.lockf(self.fileno(), operation,
                           length, start, whence)

The now-obsolete posixfile module contained a class that emulated all of a file object's methods and also added a lock() method, but this class couldn't be passed to internal functions that expected a built-in file, something which is possible with our new LockableFile.

描述器

In previous versions of Python, there was no consistent way to discover what attributes and methods were supported by an object. There were some informal conventions, such as defining __members__ and __methods__ attributes that were lists of names, but often the author of an extension type or a class wouldn't bother to define them. You could fall back on inspecting the __dict__ of an object, but when class inheritance or an arbitrary __getattr__() hook were in use this could still be inaccurate.

新类模型的一个核心理念是正式化了使用描述符来描述对象属性的 API。描述符指定属性的值,说明它是方法还是字段。通过描述符 API,静态方法和类方法成为可能,以及其他更复杂的构造。

属性描述符是存在于类对象内部的对象,它们自身具有一些属性。描述符协议由三个主要方法组成:

  • __name__ 是属性的名称。

  • __doc__ 是属性的文档字符串。

  • __get__(object) 是一个从 object 中提取属性值的方法。

  • __set__(object, value)object 上的属性设为 value

  • __delete__(object, value) 将删除 objectvalue 属性。

例如,当你写下 obj.x,Python 实际要执行的步骤是:

descriptor = obj.__class__.x
descriptor.__get__(obj)

For methods, descriptor.__get__() returns a temporary object that's callable, and wraps up the instance and the method to be called on it. This is also why static methods and class methods are now possible; they have descriptors that wrap up just the method, or the method and the class. As a brief explanation of these new kinds of methods, static methods aren't passed the instance, and therefore resemble regular functions. Class methods are passed the class of the object, but not the object itself. Static and class methods are defined like this:

class C(object):
    def f(arg1, arg2):
        ...
    f = staticmethod(f)

    def g(cls, arg1, arg2):
        ...
    g = classmethod(g)

The staticmethod() function takes the function f(), and returns it wrapped up in a descriptor so it can be stored in the class object. You might expect there to be special syntax for creating such methods (def static f, defstatic f(), or something like that) but no such syntax has been defined yet; that's been left for future versions of Python.

更多的新功能,如 __slots__ 和属性,也作为新类型的描述符实现。编写一个实现新功能的描述符类并不困难。例如,可以编写一个描述符类,使其能够为方法编写类似 Eiffel 风格的前置条件和后置条件。使用该功能的类可能定义如下:

from eiffel import eiffelmethod

class C(object):
    def f(self, arg1, arg2):
        # The actual function
        ...
    def pre_f(self):
        # Check preconditions
        ...
    def post_f(self):
        # Check postconditions
        ...

    f = eiffelmethod(f, pre_f, post_f)

Note that a person using the new eiffelmethod() doesn't have to understand anything about descriptors. This is why I think the new features don't increase the basic complexity of the language. There will be a few wizards who need to know about it in order to write eiffelmethod() or the ZODB or whatever, but most users will just write code on top of the resulting libraries and ignore the implementation details.

多重继承:钻石规则

通过改变名称解析规则,多重继承也变得更加有用。 请看下面这组类(图表摘自 PEP 253 ,作者 Guido van Rossum):

      class A:
        ^ ^  def save(self): ...
       /   \
      /     \
     /       \
    /         \
class B     class C:
    ^         ^  def save(self): ...
     \       /
      \     /
       \   /
        \ /
      class D

The lookup rule for classic classes is simple but not very smart; the base classes are searched depth-first, going from left to right. A reference to D.save() will search the classes D, B, and then A, where save() would be found and returned. C.save() would never be found at all. This is bad, because if C's save() method is saving some internal state specific to C, not calling it will result in that state never getting saved.

新式类遵循一种不同的算法,虽然解释起来有点复杂,但在这种情况下能做正确的事情。(请注意,Python 2.3 改变了这个算法,在大多数情况下会产生相同的结果,但对于非常复杂的继承图会产生更有用的结果。)

  1. List all the base classes, following the classic lookup rule and include a class multiple times if it's visited repeatedly. In the above example, the list of visited classes is [D, B, A, C, A].

  2. Scan the list for duplicated classes. If any are found, remove all but one occurrence, leaving the last one in the list. In the above example, the list becomes [D, B, C, A] after dropping duplicates.

Following this rule, referring to D.save() will return C.save(), which is the behaviour we're after. This lookup rule is the same as the one followed by Common Lisp. A new built-in function, super(), provides a way to get at a class's superclasses without having to reimplement Python's algorithm. The most commonly used form will be super(class, obj), which returns a bound superclass object (not the actual class object). This form will be used in methods to call a method in the superclass; for example, D's save() method would look like this:

class D (B,C):
    def save (self):
        # Call superclass .save()
        super(D, self).save()
        # Save D's private information here
        ...

super() 在以 super(class)super(class1, class2) 形式调用时也可以返回未绑定的超类对象,但这可能并不常用。

属性访问

A fair number of sophisticated Python classes define hooks for attribute access using __getattr__(); most commonly this is done for convenience, to make code more readable by automatically mapping an attribute access such as obj.parent into a method call such as obj.get_parent. Python 2.2 adds some new ways of controlling attribute access.

首先,新式类仍然支持 __getattr__(attr_name),关于它的任何内容都没有改变。 和以前一样,当试图访问 obj.foo 时,如果在实例的字典中找不到名为 foo 的属性,就会调用它。

New-style classes also support a new method, __getattribute__(attr_name). The difference between the two methods is that __getattribute__() is always called whenever any attribute is accessed, while the old __getattr__() is only called if foo isn't found in the instance's dictionary.

However, Python 2.2's support for properties will often be a simpler way to trap attribute references. Writing a __getattr__() method is complicated because to avoid recursion you can't use regular attribute accesses inside them, and instead have to mess around with the contents of __dict__. __getattr__() methods also end up being called by Python when it checks for other methods such as __repr__() or __coerce__(), and so have to be written with this in mind. Finally, calling a function on every attribute access results in a sizable performance loss.

property is a new built-in type that packages up three functions that get, set, or delete an attribute, and a docstring. For example, if you want to define a size attribute that's computed, but also settable, you could write:

class C(object):
    def get_size (self):
        result = ... computation ...
        return result
    def set_size (self, size):
        ... compute something based on the size
        and set internal state appropriately ...

    # Define a property.  The 'delete this attribute'
    # method is defined as None, so the attribute
    # can't be deleted.
    size = property(get_size, set_size,
                    None,
                    "Storage size of this instance")

That is certainly clearer and easier to write than a pair of __getattr__()/__setattr__() methods that check for the size attribute and handle it specially while retrieving all other attributes from the instance's __dict__. Accesses to size are also the only ones which have to perform the work of calling a function, so references to other attributes run at their usual speed.

最后,可以使用新的类属性 __slots__ 来限制对象上可以引用的属性列表。Python 对象通常非常动态,可以随时通过简单地 obj.new_attr=1 来定义一个新属性。新式类可以定义一个名为 __slots__ 的类属性,以将合法属性限制为特定的一组名称。一个例子可以更清楚地说明这一点:

>>> class C(object):
...     __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'

注意,当尝试为未列在 __slots__ 中的属性赋值时,会引发 AttributeError

PEP 234: 迭代器

Python 2.2 的另一个重要新增功能是在 C 和 Python 两个层面上引入了迭代接口。对象可以定义如何被调用者循环遍历。

In Python versions up to 2.1, the usual way to make for item in obj work is to define a __getitem__() method that looks something like this:

def __getitem__(self, index):
    return <next item>

__getitem__() is more properly used to define an indexing operation on an object so that you can write obj[5] to retrieve the sixth element. It's a bit misleading when you're using this only to support for loops. Consider some file-like object that wants to be looped over; the index parameter is essentially meaningless, as the class probably assumes that a series of __getitem__() calls will be made with index incrementing by one each time. In other words, the presence of the __getitem__() method doesn't mean that using file[5] to randomly access the sixth element will work, though it really should.

In Python 2.2, iteration can be implemented separately, and __getitem__() methods can be limited to classes that really do support random access. The basic idea of iterators is simple. A new built-in function, iter(obj) or iter(C, sentinel), is used to get an iterator. iter(obj) returns an iterator for the object obj, while iter(C, sentinel) returns an iterator that will invoke the callable object C until it returns sentinel to signal that the iterator is done.

Python classes can define an __iter__() method, which should create and return a new iterator for the object; if the object is its own iterator, this method can just return self. In particular, iterators will usually be their own iterators. Extension types implemented in C can implement a tp_iter function in order to return an iterator, and extension types that want to behave as iterators can define a tp_iternext function.

总结一下,迭代器实际上做什么?它们有一个必需的方法 next(),该方法不接受任何参数并返回下一个值。当没有更多的值可以返回时,调用 next() 应该引发 StopIteration 异常。以下是一个简单的例子来说明迭代器的工作原理:

>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
>>>

In 2.2, Python's for statement no longer expects a sequence; it expects something for which iter() will return an iterator. For backward compatibility and convenience, an iterator is automatically constructed for sequences that don't implement __iter__() or a tp_iter slot, so for i in [1,2,3] will still work. Wherever the Python interpreter loops over a sequence, it's been changed to use the iterator protocol. This means you can do things like this:

>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)

迭代器支持已被添加到Python的一些基本类型中。对字典调用 iter() 会返回一个迭代器,该迭代器遍历字典的键,如下所示:

>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10

That's just the default behaviour. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator. In a minor related change, the in operator now works on dictionaries, so key in dict is now equivalent to dict.has_key(key).

文件也提供了一个迭代器,它会调用 readline() 方法,直到文件中没有更多的行。这意味着你现在可以使用类似这样的代码来读取文件的每一行:

for line in file:
    # do something for each line
    ...

请注意,你只能在迭代器中向前移动;没有办法获取前一个元素、重置迭代器或复制迭代器。一个迭代器对象可以提供这些额外的功能,但迭代器协议只要求有一个 next() 方法。

也參考

PEP 234 - 迭代器

由 Ka-Ping Yee 和 GvR 撰写;由 Python Labs 小组(主要由 GvR 和 Tim Peters)实现。

PEP 255: 简单的生成器

生成器是另一个新增特性,它是与迭代器的引入相互关联的。

你一定熟悉在Python或C语言中函数调用的工作方式。当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。当函数执行到 return 语句时,这些局部变量会被销毁,并将结果值返回给调用者。稍后对同一个函数的调用将获得一套全新的局部变量。但,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。

这里是一个生成器函数的最简示例:

def generate_ints(N):
    for i in range(N):
        yield i

一个新的关键字 yield 被引入用于生成器。任何包含 yield 语句的函数都是生成器函数;这由Python的字节码编译器检测到,并因此对函数进行特殊编译。由于引入了一个新的关键字,生成器必须通过在模块的源代码顶部附近包含一条 from __future__ import generators 语句来显式启用。在Python 2.3中,这条语句将变得不再必要。

当你调用一个生成器函数时,它不会返回单个值;相反,它返回一个支持迭代器协议的生成器对象。在执行 yield 语句时,生成器输出 i 的值,类似于 return 语句。yieldreturn 语句之间的重大区别在于,当到达 yield 时,生成器的执行状态被挂起,并且局部变量被保留。在下一次调用生成器的 next() 方法时,函数将立即在 yield 语句之后恢复执行。(由于复杂的原因,yield 语句不允许在 try ... finally 语句的 try 块中使用;请阅读 PEP 255 以获得关于 yield 和异常交互的详细解释。)

这里是 generate_ints() 生成器的用法示例:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in generate_ints
StopIteration

你可以等价地写成 for i in generate_ints(5)a,b,c = generate_ints(3)

在生成器函数内部, return 语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return,例如 return 5,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration 异常来指示,或者只是让执行流自然地从函数底部流出。

你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count 设置为0,并让 next() 方法递增 self.count 并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py 包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Lib/test/test_generators.py 中还有另外两个例子,它们分别解决了N皇后问题(在$NxN$的棋盘上放置$N$个皇后,使得没有任何皇后威胁到其他皇后)和骑士巡游问题(在$NxN$的棋盘上,骑士访问每一个方格且不重复访问任何方格的路径)。

The idea of generators comes from other programming languages, especially Icon (https://www.cs.arizona.edu/icon/), where the idea of generators is central. In Icon, every expression and function call behaves like a generator. One example from "An Overview of the Icon Programming Language" at https://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks like:

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring "or" is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python 并没有像 Icon 那样将生成器作为核心概念来采纳。生成器被认为是 Python 核心语言的一部分,但学习或使用它们并不是强制性的;如果它们不能解决你的问题,可以完全忽略它们。与 Icon 相比,Python 的一个新颖特性是生成器的状态表示为一个具体对象(迭代器),该对象可以传递给其他函数或存储在数据结构中。

也參考

PEP 255 - 简单生成器

由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 Python Labs 团队的修正。

PEP 237: 统一长整数和整数

In recent versions, the distinction between regular integers, which are 32-bit values on most machines, and long integers, which can be of arbitrary size, was becoming an annoyance. For example, on platforms that support files larger than 2**32 bytes, the tell() method of file objects has to return a long integer. However, there were various bits of Python that expected plain integers and would raise an error if a long integer was provided instead. For example, in Python 1.5, only regular integers could be used as a slice index, and 'abc'[1L:] would raise a TypeError exception with the message 'slice index must be int'.

Python 2.2 将根据需要将数值从短整数转换为长整数。'L' 后缀不再需要用于表示长整数字面量,因为现在编译器会自动选择适当的类型。(在未来的 2.x 版本的 Python 中,使用 'L' 后缀将被不鼓励,并在 Python 2.4 中触发警告,可能在 Python 3.0 中被移除。)许多以前会引发 OverflowError 的操作现在会返回一个长整数作为结果。例如:

>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L

在大多数情况下,整数和长整数现在将被视为相同。你仍然可以使用内置的 type() 函数区分它们,但这很少需要。

也參考

PEP 237 - 统一长整数和整数

由 Moshe Zadka 和 Guido van Rossum 撰写 ; 大部分由 Guido van Rossum 实现。

PEP 238:修改除法运算符

Python 2.2中最具争议的变化预示着修复一个自Python诞生以来的旧设计缺陷的努力的开始。目前,Python的除法操作符 / 在接收两个整数参数时表现得像C语言的除法操作符:它返回一个被截断为整数的结果。例如,3/2 是1,而不是1.5,(-1)/2 是-1,而不是-0.5。这意味着除法的结果可能会根据两个操作数的类型而意外变化,并且由于Python是动态类型的,确定操作数的可能类型可能会很困难。

(争议在于这是否*真的*算是一个设计缺陷,以及是否值得为了修复它而破坏现有代码。这在python-dev上引发了无休止的讨论,并在2001年7月爆发成一场在 comp.lang.python 的充满讽刺性言辞的风暴。我不会在这里为任何一方辩护,只会描述在2.2中实现的内容。请阅读 PEP 238 以获取争论和反驳的摘要。)

由于这一变化可能会破坏现有代码,因此它正在非常逐步地引入。Python 2.2 开始了这一过渡,但直到 Python 3.0 这一转换才会完全完成。

首先,我将借用一些来自 PEP 238 的术语。“真除法”是大多数非程序员所熟悉的除法:3/2是1.5,1/4是0.25,等等。“地板除法”是Python的 / 操作符在给定整数操作数时当前执行的操作;其结果是真除法返回值的地板值。“经典除法”是当前 / 操作符的混合行为;当操作数是整数时,它返回地板除法的结果,而当其中一个操作数是浮点数时,它返回真除法的结果。

Python 2.2 引入了以下变化:

  • 一个新的操作符,// 是地板除法操作符。 (是的,我们知道它看起来像 C++ 的注释符号。) // 始终 执行地板除法,无论其操作数的类型是什么,因此 1 // 2 是 0,1.0 // 2.0 也是0.0。

    // 操作符在Python 2.2中始终可用;你不需要通过 __future__ 语句来启用它。

  • 通过在模块中包含 from __future__ import division/``操作符将被更改为返回真除法的结果,因此 ``1/2 是0.5。如果没有这条 __future__ 语句,/ 仍然表示经典除法。/ 的默认含义在Python 3.0之前不会改变。

  • Classes can define methods called __truediv__() and __floordiv__() to overload the two division operators. At the C level, there are also slots in the PyNumberMethods structure so extension types can define the two operators.

  • Python 2.2 支持一些命令行参数,用于测试代码是否能在除法语义改变的情况下正常工作。运行 Python 并使用 -Q warn 选项时,当对两个整数应用除法时会发出警告。你可以利用这个功能找到受影响的代码并进行修复。默认情况下,Python 2.2 会执行经典除法而不会发出警告;在 Python 2.3 中,警告将默认开启。

也參考

PEP 238:改变除法运算符

由 Moshe Zadka 和 Guido van Rossum 撰写 ; 由 Guido van Rossum 实现。

Unicode 的改变

Python的Unicode支持在2.2版本中有所增强。Unicode字符串通常以UCS-2形式存储,即16位无符号整数。通过向配置脚本提供 --enable-unicode=ucs4 选项,Python 2.2也可以编译为使用UCS-4(32位无符号整数)作为其内部编码。(也可以指定 --disable-unicode 选项来完全禁用Unicode支持。)

When built to use UCS-4 (a "wide Python"), the interpreter can natively handle Unicode characters from U+000000 to U+110000, so the range of legal values for the unichr() function is expanded accordingly. Using an interpreter compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still cause unichr() to raise a ValueError exception. This is all described in PEP 261, "Support for 'wide' Unicode characters"; consult it for further details.

Another change is simpler to explain. Since their introduction, Unicode strings have supported an encode() method to convert the string to a selected encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*]) method has been added to 8-bit strings (though not to Unicode strings) in 2.2. decode() assumes that the string is in the specified encoding and decodes it, returning whatever is returned by the codec.

利用这一新特性,编解码器被添加用于与Unicode不直接相关的任务。例如,已经添加了用于uu编码、MIME的base64编码以及使用 zlib 模块进行压缩的编解码器:

>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*

end
>>> "sheesh".encode('rot-13')
'furrfu'

To convert a class instance to Unicode, a __unicode__() method can be defined by a class, analogous to __str__().

encode(), decode(), and __unicode__() were implemented by Marc-André Lemburg. The changes to support using UCS-4 internally were implemented by Fredrik Lundh and Martin von Löwis.

也參考

PEP 261 - 对 '宽' Unicode 字符的支持

由 Paul Prescod 编写。

PEP 227: 嵌套的作用域

在Python 2.1中,静态嵌套作用域作为一个可选特性被添加,需要通过 from __future__ import nested_scopes 指令来启用。在2.2版本中,嵌套作用域不再需要特别启用,现在总是存在。本节的其余部分是从我的《Python 2.1的新特性》文档中复制的嵌套作用域描述;如果你在2.1发布时已经阅读过,可以跳过本节的其余部分。

Python 2.1 中的最大改变是 Python 的作用域规则,在Python 2.2中得到完善。 在 Python 2.0 中,任意给定的时刻至多使用三个命名空间来查找变量名称:局部、模块和内置命名空间。 这往往会导致令人吃惊的结果因为它与人们直觉上的预期不相匹配。 例如,一个嵌套的递归函数将不起作用:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

The function g() will always raise a NameError exception, because the binding of the name g isn't in either its local namespace or in the module-level namespace. This isn't much of a problem in practice (how often do you recursively define interior functions like this?), but this also made using the lambda expression clumsier, and this was a problem in practice. In code which uses lambda you can often find local variables being copied by passing them as the default values of arguments.

def find(self, name):
    "Return list of any entries equal to 'name'"
    L = filter(lambda x, name=name: x == name,
               self.list_attribute)
    return L

结果将会严重损害以高度函数式风格编写的 Python 代码的可读性。

Python 2.2 最显著的改变是增加了静态作用域这一语言特征来解决此问题。 作为它的第一项影响,在上述示例中的 name=name 默认参数现在将不再必要。 简单地说,当一个函数内部的给定变量名没有被赋值时(通过赋值语句,或者 def, classimport 语句),对该变量的引用将在外层作用域的局部命名空间中查找。 对于该规则的更详细解释,以及具体实现的分析,请参阅相应的 PEP。

对于同时在模块层级和包含下层函数定义的函数内部局部变量使用了相同变量名的代码来说这项改变可能会导致一些兼容性问题。 不过这看来不太可能发生,因为阅读这样的代码本来就会相当令人困惑。

此项改变的一个附带影响是在特定条件下函数作用域内部 from module import *exec 语句将不允许使用。 Python 参考手册已经写明 from module import * 仅在模块最高层级上是可用的,但此前 CPython 解释器从未强制实施此规则。 作为嵌套作用域具体实现的一部分,将 Python 源码转为字节码的编译器会生成不同的代码来访问某个包含作用域内的变量。 from module import *exec 会使得编译器无法正确执行,因为它们会向局部命名空间添加在编译时还不存在的名称。 为此,如果一个函数包含带有自由变量的函数定义或 lambda 表达式,编译器将通过引发 SyntaxError 异常来提示。

为了使前面的解释更清楚,下面是一个例子:

x = 1
def f():
    # The next line is a syntax error
    exec 'x=2'
    def g():
        return x

包含 exec 语句的第 4 行有语法错误,因为 exec 将定义一个名为 x 的新局部变量,它的值应当被 g() 访问。

这应该不会是太大的限制,因为 exec 在多数 Python 代码中都极少被使用(而当它被使用时,往往也是个存在糟糕设计的信号)。

也參考

PEP 227 - 静态嵌套作用域

由 Jeremy Hylton 撰写并实现。

新增和改进的模块

  • The xmlrpclib module was contributed to the standard library by Fredrik Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple remote procedure call protocol built on top of HTTP and XML. For example, the following snippet retrieves a list of RSS channels from the O'Reilly Network, and then lists the recent headlines for one channel:

    import xmlrpclib
    s = xmlrpclib.Server(
          'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
    channels = s.meerkat.getChannels()
    # channels is a list of dictionaries, like this:
    # [{'id': 4, 'title': 'Freshmeat Daily News'}
    #  {'id': 190, 'title': '32Bits Online'},
    #  {'id': 4549, 'title': '3DGamers'}, ... ]
    
    # Get the items for one channel
    items = s.meerkat.getItems( {'channel': 4} )
    
    # 'items' is another list of dictionaries, like this:
    # [{'link': 'http://freshmeat.net/releases/52719/',
    #   'description': 'A utility which converts HTML to XSL FO.',
    #   'title': 'html2fo 0.3 (Default)'}, ... ]
    

    The SimpleXMLRPCServer module makes it easy to create straightforward XML-RPC servers. See http://xmlrpc.scripting.com/ for more information about XML-RPC.

  • 新的 hmac 模块实现了由 RFC 2104 描述的HMAC算法。(由Gerhard Häring贡献。)

  • Several functions that originally returned lengthy tuples now return pseudo-sequences that still behave like tuples but also have mnemonic attributes such as memberst_mtime or tm_year. The enhanced functions include stat(), fstat(), statvfs(), and fstatvfs() in the os module, and localtime(), gmtime(), and strptime() in the time module.

    例如,使用旧的元组来获取文件的大小时,你可能会写成 file_size = os.stat(filename)[stat.ST_SIZE] ,但现在可以更清晰地写成 file_size = os.stat(filename).st_size

    此特性的初始补丁由 Nick Mathewson 贡献。

  • Python 的分析器进行了大量的重构,并纠正了其输出中的各种错误。(由 Fred L. Drake, Jr. 和 Tim Peters 贡献。)

  • socket 模块可以编译为支持IPv6;为Python的配置脚本指定 --enable-ipv6 选项。(由Jun-ichiro "itojun" Hagino贡献。)

  • Two new format characters were added to the struct module for 64-bit integers on platforms that support the C long long type. q is for a signed 64-bit integer, and Q is for an unsigned one. The value is returned in Python's long integer type. (Contributed by Tim Peters.)

  • 在解释器的交互模式下,有一个新的内置函数 help(),它使用在Python 2.1 中引入的 pydoc 模块来提供交互式帮助。 help(object) 显示关于*object*的任何可用帮助文本。不带参数调用 help() 会进入一个在线帮助工具,在那里你可以输入函数、类或模块的名称来阅读它们的帮助文本。(由Guido van Rossum贡献,使用Ka-Ping Yee的 pydoc 模块。)

  • Various bugfixes and performance improvements have been made to the SRE engine underlying the re module. For example, the re.sub() and re.split() functions have been rewritten in C. Another contributed patch speeds up certain Unicode character ranges by a factor of two, and a new finditer() method that returns an iterator over all the non-overlapping matches in a given string. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von Löwis.)

  • smtplib 模块现在支持 RFC 2487:"Secure SMTP over TLS",因此现在可以加密Python程序与接收消息的邮件传输代理之间的SMTP流量。smtplib 还支持SMTP身份验证。(由Gerhard Häring贡献。)

  • imaplib 模块由 Piers Lauder 维护,支持几个新扩展: RFC 2342 中定义的 NAMESPACE 扩展、SORT、GETACL和SETACL。(由 Anthony Baxter 和 Michel Pelletier 贡献。)

  • The rfc822 module's parsing of email addresses is now compliant with RFC 2822, an update to RFC 822. (The module's name is not going to be changed to rfc2822.) A new package, email, has also been added for parsing and generating e-mail messages. (Contributed by Barry Warsaw, and arising out of his work on Mailman.)

  • The difflib module now contains a new Differ class for producing human-readable lists of changes (a "delta") between two sequences of lines of text. There are also two generator functions, ndiff() and restore(), which respectively return a delta from two sequences, or one of the original sequences from a delta. (Grunt work contributed by David Goodger, from ndiff.py code by Tim Peters who then did the generatorization.)

  • New constants ascii_letters, ascii_lowercase, and ascii_uppercase were added to the string module. There were several modules in the standard library that used string.letters to mean the ranges A-Za-z, but that assumption is incorrect when locales are in use, because string.letters varies depending on the set of legal characters defined by the current locale. The buggy modules have all been fixed to use ascii_letters instead. (Reported by an unknown person; fixed by Fred L. Drake, Jr.)

  • The mimetypes module now makes it easier to use alternative MIME-type databases by the addition of a MimeTypes class, which takes a list of filenames to be parsed. (Contributed by Fred L. Drake, Jr.)

  • A Timer class was added to the threading module that allows scheduling an activity to happen at some future time. (Contributed by Itamar Shtull-Trauring.)

解释器的改变和修正

有些变化只会影响那些在 C 级别处理 Python 解释器的人,因为他们正在编写 Python 扩展模块、嵌入解释器或仅仅是在修改解释器本身。如果你只编写 Python 代码,这里描述的变化对你几乎没有影响。

  • 性能分析和追踪函数现在可以用 C 语言来实现,相比基于 Python 的函数能够显著提高运行速度并能够减少性能分析和追踪的资源开销。 Python 开发环境的编写者对此将会很感兴趣。 Python 的 API 增加了两个新的 C 函数,PyEval_SetProfile()PyEval_SetTrace()。 现有的 sys.setprofile()sys.settrace() 函数仍然存在,并已简单地更改为使用新的 C 层级接口。 (由 Fred L. Drake, Jr. 贡献。)

  • Another low-level API, primarily of interest to implementors of Python debuggers and development tools, was added. PyInterpreterState_Head() and PyInterpreterState_Next() let a caller walk through all the existing interpreter objects; PyInterpreterState_ThreadHead() and PyThreadState_Next() allow looping over all the thread states for a given interpreter. (Contributed by David Beazley.)

  • 垃圾收集器的 C 级接口已经发生了变化,使得编写支持垃圾收集的扩展类型和调试函数误用变得更容易。各种函数的语义略有不同,因此需要重命名一系列函数。使用旧 API 的扩展仍然可以编译,但不会参与垃圾收集,因此应优先考虑将它们更新为 2.2 版本。

    要将一个扩展模块升级至新 API,请执行下列步骤:

  • Py_TPFLAGS_GC() 重命名为 PyTPFLAGS_HAVE_GC()

  • 使用 PyObject_GC_New()PyObject_GC_NewVar() 来分配

    对象,并使用 PyObject_GC_Del() 来释放它们。

  • PyObject_GC_Init() 重命名为 PyObject_GC_Track()

    PyObject_GC_Fini() 重命名为 PyObject_GC_UnTrack()

  • PyGC_HEAD_SIZE() 从对象大小计算中移除。

  • 移除对 PyObject_AS_GC()PyObject_FROM_GC() 的调用。

  • PyArg_ParseTuple() 添加了一个新的 et 格式序列;et 接受一个形参和一个编码格式名称,如果该形参值是一个 Unicode 字符串则将其转换为给定的编码格式,或者如果它是一个 8 比特位字符串则让其保持原样,即假定它已经使用了适当的编码格式。 这不同于 es 格式字符,它假定该 8 比特位字符串是使用 Python 默认的 ASCII 编码格式并将其转换为指定的新编码格式。 (由 M.-A. Lemburg 贡献,用于下一节所描述的 Windows 上的 MBCS 支持。)

  • A different argument parsing function, PyArg_UnpackTuple(), has been added that's simpler and presumably faster. Instead of specifying a format string, the caller simply gives the minimum and maximum number of arguments expected, and a set of pointers to PyObject* variables that will be filled in with argument values.

  • Two new flags METH_NOARGS and METH_O are available in method definition tables to simplify implementation of methods with no arguments or a single untyped argument. Calling such methods is more efficient than calling a corresponding method that uses METH_VARARGS. Also, the old METH_OLDARGS style of writing C methods is now officially deprecated.

  • Two new wrapper functions, PyOS_snprintf() and PyOS_vsnprintf() were added to provide cross-platform implementations for the relatively new snprintf() and vsnprintf() C lib APIs. In contrast to the standard sprintf() and vsprintf() functions, the Python versions check the bounds of the buffer used to protect against buffer overruns. (Contributed by M.-A. Lemburg.)

  • _PyTuple_Resize() 函数去掉了一个未使用的形参,因此现在它接受 2 个形参而不是 3 个。 第三个参数从未被使用,在将代码从较早的版本移植到 Python 2.2 时可以简单地丢弃它。

其他的改变和修正

像往常一样,源代码树中散布着许多其他改进和错误修复。通过搜索 CVS 更改日志,可以发现 Python 2.1 到 2.2 之间应用了 527 个补丁并修复了 683 个错误;2.2.1 应用了 139 个补丁并修复了 143 个错误;2.2.2 应用了 106 个补丁并修复了 82 个错误。这些数字可能是低估的。

一些较为重要的改变:

  • 适用于 MacOS 的 Python 端口代码现在保存在主 Python CVS 树中,由 Jack Jansen 维护,并且为了支持 MacOS X,进行了许多更改。

    最重要的变化是能够将 Python 作为框架来进行构建,这可以通过在编译 Python 时向配置脚本提供 --enable-framework 选项来启用。 根据 Jack Jansen 的说法,“这会将一个独立的 Python 安装版加上 OS X 框架‘粘合起来’放到 /Library/Frameworks/Python.framework 中(或者其他选定的位置)。 就目前而言这样做并没有什么直接的额外好处(实际上,这样做还存在必须更改 PATH 才能找到Python 的坏处),但它是创建完整的 Python 应用程序、移植 MacPython IDE、并可能使用 Python 作为标准 OSA 脚本语言及其他更多功能的基础。”

    作为 MacOS API 如 windowing, QuickTime, scripting 等的接口的许多 MacPython 工具箱模块已被移植到 OS X,但它们在 setup.py 中被注释掉了。 希望尝试这些模块的人可以手动取消注释它们。

  • 现在将关键字参数传给不接受它们的内置函数会导致引发 TypeError 异常,并附带消息 "function takes no keyword arguments"。

  • 在 Python 2.1 中作为扩展模块加入的弱引用现在已成为核心组成部分,因为它们被用于新式类的实现。 为此 ReferenceError 异常也已从 weakref 模块移出成为一个内置异常。

  • 由 Tim Peters 编写的新脚本 Tools/scripts/cleanfuture.py 可自动从 Python 源代码移除过时的 __future__ 语句。

  • 向内置函数 compile() 添加了一个额外的 flags 参数,以便现在 __future__ 语句的行为能在模拟的 shell,例如由 IDLE 和其他开发环境所提供的此类工具中被正确地观察。 此特性的描述参见 PEP 264。 (由 Michael Hudson 贡献。)

  • Python 1.6 引入的新许可证与 GPL 不兼容。通过对 2.2 许可证进行一些小的文本修改,这个问题得以解决,因此现在可以合法地将 Python 嵌入到 GPL 授权的程序中。请注意,Python 本身并不是在 GPL 授权下,而是采用一个与 BSD 许可证本质上等效的许可证,这与之前的情况一样。这些许可证更改也应用到了 Python 2.0.1 和 2.1.1 版本中。

  • 在 Windows 上,当 Python 遇到一个 Unicode 文件名时,现在会将其转换为 MBCS 编码的字符串,这种编码由 Microsoft 文件 API 使用。由于文件 API 明确使用 MBCS 编码,Python 默认选择 ASCII 作为编码方式显得很不方便。在 Unix 上,如果 locale.nl_langinfo(CODESET) 可用,Python 将使用本地字符集。(Windows 支持由 Mark Hammond 提供,Marc-André Lemburg 提供协助。Unix 支持由 Martin von Löwis 添加。)

  • 大文件支持目前已在 Windows 上启用。 (由 Tim Peters 贡献。)

  • Tools/scripts/ftpmirror.py 脚本现在会解析 .netrc 文件,如果存在的话。 (由 Mike Romberg 贡献。)

  • Some features of the object returned by the xrange() function are now deprecated, and trigger warnings when they're accessed; they'll disappear in Python 2.3. xrange objects tried to pretend they were full sequence types by supporting slicing, sequence multiplication, and the in operator, but these features were rarely used and therefore buggy. The tolist() method and the start, stop, and step attributes are also being deprecated. At the C level, the fourth argument to the PyRange_New() function, repeat, has also been deprecated.

  • 字典实现中有一堆补丁,主要是为了修复潜在的核心转储问题,这些问题发生在字典中包含的对象悄悄改变其哈希值,或者在它们所包含的字典中发生突变时。那段时间,python-dev 邮件列表进入了一个微妙的节奏:Michael Hudson 发现一个导致核心转储的案例,Tim Peters 修复这个 bug,接着 Michael 又发现另一个案例,如此反复循环。

  • 在 Windows 上,Python 现在可以使用 Borland C 编译,这要归功于 Stephen Hansen 提供的多个补丁,尽管结果还不完全可用。(但这*确实*是一个进步……)

  • 另一个 Windows 改进:Wise Solutions 慷慨地向 PythonLabs 提供了他们的 InstallerMaster 8.1 系统。早期的 PythonLabs Windows 安装程序使用的是 Wise 5.0a,已经开始显得过时。(由 Tim Peters 打包。)

  • 在 Windows 上现在将会导入以 .pyw 结尾的文件。 .pyw 是 Windows 专属的,用来指明一个脚本需要使用 PYTHONW.EXE 而不是 PYTHON.EXE 来运行以避免弹出 DOS 控制台来显示输出。 该补丁使得导入这样的脚本成为可能,让它们也可以作为模块来使用。 (由 David Bolen 实现。)

  • 在 Python 会使用 C dlopen() 函数来加载扩展模块的平台上,现在可以使用 sys.getdlopenflags()sys.setdlopenflags() 等函数来设置 dlopen() 所使用的旗标。 (由 Bram Stolk 贡献。)

  • The pow() built-in function no longer supports 3 arguments when floating-point numbers are supplied. pow(x, y, z) returns (x**y) % z, but this is never useful for floating point numbers, and the final result varies unpredictably depending on the platform. A call such as pow(2.0, 8.0, 7.0) will now raise a TypeError exception.

致谢

作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg, Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.