Python 2.2 有什么新变化

作者:

A.M. Kuchling

概述

本文本介绍了 Python 2.2.2 的新增特性,该版本发布于 2002 年 10 月 14日。 Python 2.2.2 是 Python 2.2 的问题修正发布版,最初发布于 2001 年 12 月 21 日。

Python 2.2 可以被看作是 "清理发布版"。 有一些特性如生成器和迭代器等是全新的,但大多数变化,尽管可能是重大而深远的,都是为了清理语言设计中的不规范和阴暗角落。

本文并不试图提供对新特性的完整规范说明,而是提供一个便捷的概览。 要获取全部细节,你应该参阅 Python 2.2 的文档,比如 Python 库参考Python 参考指南。 如果你想要了解某项更改的完整实现和设计理念,请参阅特定新特性的 PEP。

PEP 252 和 253:类型和类的修改

Python 2.2 中最大且影响最深远的改变是针对 Python 的对象和类模型。 这些变化应该是向下兼容的,因此你的代码将能继续运行而无需修改,但这些变化提供了一些很棒的新功能。 在开始本文最长和最复杂的部分之前,我提供对这些变化的概览并附带一些注释。

很久以前我写过一个网页来列出 Python 设计中的一些缺陷。 其中一个最明显的缺陷是无法子类化用 C 实现的 Python 类型。 具体来说,内置类型是无法子类化的,例如你不能简单地子类化列表以便向其添加一个有用的方法。 虽然 UserList 模块提供了一个支持所有列表方法的类并且可以进一步子类化,但有很多 C 代码都期望一个常规的 Python 列表而不能接受 UserList 实例。

Python 2.2 修正了此问题,并在此过程中添加了一些令人激动的新功能。 简明概述如下:

  • 你可以继承内置类型,例如列表和整数,并且你的子类应该在任何需要原始类型的地方正常工作。这使得 Python 的面向对象编程更加灵活和强大。

  • 现在,除了之前版本的 Python 中可用的实例方法外,还可以定义静态方法和类方法。这使得你可以更灵活地组织类的行为。

  • 另一种可能的做法是通过使用名为 特征属性 的机制在访问或设置实例属性时自动调用方法。 许多 __getattr__() 的用法可以被重写为改用特征属性,使得结果代码更简单且更快速。 作为一个小小的附带好处,现在属性也可以带有文档字符串。

  • 可以使用 __slots__ 限制实例的合法属性列表,从而防止拼写错误,并且在未来的 Python 版本中可能进行更多的优化。

一些用户对这些变化表示担忧。确实,他们说,新功能很棒,可以实现以前版本的 Python 无法做到的各种技巧,但它们也使语言变得更加复杂。一些人表示,他们一直推荐 Python 是因为它的简单性,现在感觉这种简单性正在丧失。

个人而言,我认为没有必要担心。许多新功能相当深奥,你可以编写大量 Python 代码而不需要了解它们。编写一个简单的类并不比以前更难,因此除非确实需要,否则你不必费心去学习或教授这些新功能。一些以前只有在 C 语言中才能实现的非常复杂的任务,现在可以用纯 Python 实现,在我看来,这一切都更好了。

我不会尝试涵盖所有为了使新功能生效而需要的每一个边缘情况和小改动。相反,本节将只勾勒出大致的轮廓。有关 Python 2.2 新对象模型的更多信息,请参见 相关链接 的“相关链接”部分。

旧式类和新式类

首先,你应该知道 Python 2.2 实际上有两种类型的类:经典类(或旧式类)和新式类。旧式类模型与早期版本的 Python 中的类模型完全相同。本节描述的所有新功能仅适用于新式类。这种分歧并不是永久的;最终,旧式类将被淘汰,可能在 Python 3.0 中被移除。

那么如何定义一个新式类呢?你可以通过继承一个现有的新式类来实现。大多数 Python 内置类型,如整数、列表、字典,甚至文件,现在都是新式类。此外,还添加了一个名为 object 的新式类,它是所有内置类型的基类,因此如果没有合适的内置类型,你可以直接继承 object 类:

class C(object):
    def __init__ (self):
        ...
    ...

这意味着在 Python 2.2 中不带任何基类的 class 语句总是属于经典类。 (实际上你也可以通过设置一个名为 __metaclass__ 的模块级变量来改变这一点 —— 详见 PEP 253 —— 但更简单的做法是直接子类化 object。)

内置类型的类型对象在 Python 2.2 中作为内置对象提供,使用了一种巧妙的技巧命名。Python 一直有名为 int()float()str() 的内置函数。在 Python 2.2 中,它们不再是函数,而是作为被调用时表现为工厂的类型对象。

>>> int
<type 'int'>
>>> int('123')
123

为了使类型集合更为完备,增加了新的类型对象如 dict()file()。 下面是一个更有趣的示例,向文件对象添加一个 lock() 方法:

class LockableFile(file):
    def lock (self, operation, length=0, start=0, whence=0):
        import fcntl
        return fcntl.lockf(self.fileno(), operation,
                           length, start, whence)

现在已经过时的 posixfile 模块包含一个类,该类模仿了文件对象的所有方法,并添加了一个 lock() 方法,但这个类不能传递给期望内置文件对象的内部函数,而这在我们的新 LockableFile 实现中是可能的。

描述器

在以前的 Python 版本中,没有一致的方法来发现对象支持的属性和方法。有一些非正式的约定,例如定义 __members____methods__ 属性,这些属性是名称列表,但扩展类型或类的作者往往不会去定义它们。你可以退而求其次,检查对象的 __dict__ 属性,但在使用类继承或任意的 __getattr__() 钩子时,这仍然可能是不准确的。

新类模型的一个核心理念是正式化了使用描述符来描述对象属性的 API。描述符指定属性的值,说明它是方法还是字段。通过描述符 API,静态方法和类方法成为可能,以及其他更复杂的构造。

属性描述符是存在于类对象内部的对象,它们自身具有一些属性。描述符协议由三个主要方法组成:

  • __name__ 是属性的名称。

  • __doc__ 是属性的文档字符串。

  • __get__(object) 是一个从 object 中提取属性值的方法。

  • __set__(object, value)object 上的属性设为 value

  • __delete__(object, value) 将删除 objectvalue 属性。

例如,当你写下 obj.x,Python 实际要执行的步骤是:

descriptor = obj.__class__.x
descriptor.__get__(obj)

对于方法,descriptor.__get__() 返回一个可调用的临时对象,它将实例和要调用的方法封装在一起。这也是为什么现在可以实现静态方法和类方法的原因;它们有描述符,可以只封装方法,或者封装方法和类。作为对这些新方法类型的简要说明,静态方法不传递实例,因此类似于常规函数。类方法传递对象的类,但不传递对象本身。静态方法和类方法的定义如下:

class C(object):
    def f(arg1, arg2):
        ...
    f = staticmethod(f)

    def g(cls, arg1, arg2):
        ...
    g = classmethod(g)

staticmethod() 函数接收函数 f() , 并将其封装在描述符中返回,这样它就可以存储在类对象中。 您可能希望有特殊的语法来创建这样的方法 (def static f ,``defstatic f()``, 或类似的东西),但目前还没有定义这样的语法;这要留待 Python 的未来版本来解决。

更多的新功能,如 __slots__ 和属性,也作为新类型的描述符实现。编写一个实现新功能的描述符类并不困难。例如,可以编写一个描述符类,使其能够为方法编写类似 Eiffel 风格的前置条件和后置条件。使用该功能的类可能定义如下:

from eiffel import eiffelmethod

class C(object):
    def f(self, arg1, arg2):
        # The actual function
        ...
    def pre_f(self):
        # Check preconditions
        ...
    def post_f(self):
        # Check postconditions
        ...

    f = eiffelmethod(f, pre_f, post_f)

请注意,使用新 eiffelmethod() 的人不必了解任何关于描述符的知识。 这就是我认为新功能不会增加语言基本复杂性的原因。会有一些向导需要了解它,以便编写 eiffelmethod() 或 ZODB 或其他内容,但大多数用户只会在生成的库之上编写代码,而不会理会实现细节。

多重继承:钻石规则

通过改变名称解析规则,多重继承也变得更加有用。 请看下面这组类(图表摘自 PEP 253 ,作者 Guido van Rossum):

      class A:
        ^ ^  def save(self): ...
       /   \
      /     \
     /       \
    /         \
class B     class C:
    ^         ^  def save(self): ...
     \       /
      \     /
       \   /
        \ /
      class D

经典类的查找规则很简单,但并不高明;基类的查找是深度优先的,从左到右依次查找。 对 D.save() 的引用将搜索类 DB ,然后是 A ,其中 save() 将被找到并返回。C.save() 根本不会被找到。 这很糟糕,因为如果 Csave() 方法正在保存 C 特有的某些内部状态,不调用该方法将导致该状态永远不会被保存。

新式类遵循一种不同的算法,虽然解释起来有点复杂,但在这种情况下能做正确的事情。(请注意,Python 2.3 改变了这个算法,在大多数情况下会产生相同的结果,但对于非常复杂的继承图会产生更有用的结果。)

  1. 按照经典的查找规则列出所有基类,如果一个类被重复访问,则将其包含多次。 在上例中,已访问过的类列表为 [D,:class:!B,:class:!A,:class:!C,:class:!A]。

  2. 扫描列表,查找重复的类。 如果发现有重复的类,则删除所有重复的类,只留下列表中*后一个。 在上例中,删除重复后的列表变成 [D,:class:!B,:class:!C,:class:!A]。

根据这条规则,引用 D.save() 将返回 C.save() ,这正是我们想要的行为。 这一查找规则与 Common Lisp 遵循的规则相同。 新的内置函数 super() 提供了一种无需重新实现 Python 算法就能获取类的超类的方法。最常用的形式是 super(class, obj) ,它返回一个绑定的超类对象(而不是实际的类对象)。 这种形式将用于调用超类中的方法;例如,Dsave() 方法看起来像这样:

class D (B,C):
    def save (self):
        # Call superclass .save()
        super(D, self).save()
        # Save D's private information here
        ...

super() 在以 super(class)super(class1, class2) 形式调用时也可以返回未绑定的超类对象,但这可能并不常用。

属性访问

许多高级的 Python 类通过 __getattr__() 定义属性访问钩子;通常这样做是为了方便,通过自动将诸如 obj.parent 这样的属性访问映射到诸如 obj.get_parent 这样的方法调用,使代码更具可读性。 Python 2.2 添加了一些新的方法来控制属性访问。

首先,新式类仍然支持 __getattr__(attr_name),关于它的任何内容都没有改变。 和以前一样,当试图访问 obj.foo 时,如果在实例的字典中找不到名为 foo 的属性,就会调用它。

新式类还支持一种新方法 __getattribute__(attr_name)。这两个方法的区别在于,__getattribute__() 在访问任何属性时*总是*被调用,而旧的 __getattr__() 仅在 foo 未在实例的字典中找到时才被调用。

然而,Python 2.2 对 properties 的支持通常是捕获属性引用的更简单方法。编写 __getattr__() 方法非常复杂,因为为了避免递归,你不能在其中使用常规的属性访问,而是不得不处理 __dict__ 的内容。此外,__getattr__() 方法在 Python 检查其他例如 __repr__()__coerce__() 等方法时也会被调用,因此在编写时需要考虑这些情况。最后,每次属性访问都调用一个函数会导致显著的性能损失。

property 是一种新的内置类型,它打包了三个用于获取、设置或删除属性的函数,以及一个文档字符串。例如,如果你想定义一个计算得出的属性 size,同时又希望这个属性是可设置的,你可以这样写:

class C(object):
    def get_size (self):
        result = ... computation ...
        return result
    def set_size (self, size):
        ... compute something based on the size
        and set internal state appropriately ...

    # Define a property.  The 'delete this attribute'
    # method is defined as None, so the attribute
    # can't be deleted.
    size = property(get_size, set_size,
                    None,
                    "Storage size of this instance")

这确实比编写一对 __getattr__() / __setattr__() 方法要清晰和容易得多,后者需要检查 size 属性并在检索 __dict__ 的所有其他属性时进行特殊处理。 对 size 属性的访问是唯一需要执行调用函数工作的访问,因此对其他属性的引用仍然以通常的速度运行。

最后,可以使用新的类属性 __slots__ 来限制对象上可以引用的属性列表。Python 对象通常非常动态,可以随时通过简单地 obj.new_attr=1 来定义一个新属性。新式类可以定义一个名为 __slots__ 的类属性,以将合法属性限制为特定的一组名称。一个例子可以更清楚地说明这一点:

>>> class C(object):
...     __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'

注意,当尝试为未列在 __slots__ 中的属性赋值时,会引发 AttributeError

PEP 234: 迭代器

Python 2.2 的另一个重要新增功能是在 C 和 Python 两个层面上引入了迭代接口。对象可以定义如何被调用者循环遍历。

在 Python 2.1 及之前的版本中,使 for item in obj 语句生效的常用方法是定义一个类似于下面的 __getitem__() 方法:

def __getitem__(self, index):
    return <next item>

__getitem__() 更适用于在对象上定义索引操作,以便你可以编写 obj[5] 来检索第六个元素。如果仅仅为了支持 for 循环而使用它,会有些误导。考虑一个类文件对象,它希望被循环遍历;index 参数基本上是没有意义的,因为类可能假定会有一系列的 __getitem__() 调用,每次 index 增加一。换句话说,__getitem__() 方法的存在并不意味着使用 file[5] 随机访问第六个元素是可行的,尽管实际上它应该是可行的。

在 Python 2.2 中,可以单独实现迭代,而 __getitem__() 方法可以仅限于真正支持随机访问的类。迭代器的基本概念很简单。一个新的内置函数 iter(obj)iter(C, sentinel) 被引入,用于获取迭代器。iter(obj) 返回对象 obj 的迭代器,而 iter(C, sentinel) 返回一个迭代器,该迭代器将调用可调用对象 C,直到它返回 sentinel,以此表示迭代结束。

Python 类可以定义一个 __iter__() 方法,该方法应该创建并返回一个对象的新迭代器;如果对象本身就是它自己的迭代器,这个方法可以简单地返回 self。特别地,迭代器通常会是它们自己的迭代器。用 C 实现的扩展类型可以实现一个 tp_iter 函数来返回一个迭代器,想要表现为迭代器的扩展类型可以定义一个 tp_iternext 函数。

总结一下,迭代器实际上做什么?它们有一个必需的方法 next(),该方法不接受任何参数并返回下一个值。当没有更多的值可以返回时,调用 next() 应该引发 StopIteration 异常。以下是一个简单的例子来说明迭代器的工作原理:

>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
>>>

在Python 2.2中,for 语句不再期望一个序列;它期望的是一个可以返回迭代器的 iter() 对象。为了向后兼容和方便,对于那些没有实现 __iter__()tp_iter 方法的序列,会自动构造一个迭代器,因此 for i in [1,2,3] 仍然可以正常工作。Python解释器在循环遍历序列时,已经改为使用迭代器协议。这意味着你可以做类似这样的事情:

>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)

迭代器支持已被添加到Python的一些基本类型中。对字典调用 iter() 会返回一个迭代器,该迭代器遍历字典的键,如下所示:

>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10

That's just the default behaviour. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator. In a minor related change, the in operator now works on dictionaries, so key in dict is now equivalent to dict.has_key(key).

Files also provide an iterator, which calls the readline() method until there are no more lines in the file. This means you can now read each line of a file using code like this:

for line in file:
    # do something for each line
    ...

Note that you can only go forward in an iterator; there's no way to get the previous element, reset the iterator, or make a copy of it. An iterator object could provide such additional capabilities, but the iterator protocol only requires a next() method.

参见

PEP 234 - 迭代器

由 Ka-Ping Yee 和 GvR 撰写;由 Python Labs 小组(主要由 GvR 和 Tim Peters)实现。

PEP 255: 简单的生成器

生成器是另一个新增特性,它是与迭代器的引入相互关联的。

你一定熟悉在Python或C语言中函数调用的工作方式。当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。当函数执行到 return 语句时,这些局部变量会被销毁,并将结果值返回给调用者。稍后对同一个函数的调用将获得一套全新的局部变量。但,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。

这里是一个生成器函数的最简示例:

def generate_ints(N):
    for i in range(N):
        yield i

一个新的关键字 yield 被引入用于生成器。任何包含 yield 语句的函数都是生成器函数;这由Python的字节码编译器检测到,并因此对函数进行特殊编译。由于引入了一个新的关键字,生成器必须通过在模块的源代码顶部附近包含一条 from __future__ import generators 语句来显式启用。在Python 2.3中,这条语句将变得不再必要。

当你调用一个生成器函数时,它不会返回单个值;相反,它返回一个支持迭代器协议的生成器对象。在执行 yield 语句时,生成器输出 i 的值,类似于 return 语句。yieldreturn 语句之间的重大区别在于,当到达 yield 时,生成器的执行状态被挂起,并且局部变量被保留。在下一次调用生成器的 next() 方法时,函数将立即在 yield 语句之后恢复执行。(由于复杂的原因,yield 语句不允许在 try ... finally 语句的 try 块中使用;请阅读 PEP 255 以获得关于 yield 和异常交互的详细解释。)

下面是 generate_ints() 生成器的用法示例:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in generate_ints
StopIteration

你可以等价地写成 for i in generate_ints(5)a,b,c = generate_ints(3)

在生成器函数内部, return 语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return,例如 return 5,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration 异常来指示,或者只是让执行流自然地从函数底部流出。

你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count 设置为0,并让 next() 方法递增 self.count 并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py 包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Two other examples in Lib/test/test_generators.py produce solutions for the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no queen threatens another) and the Knight's Tour (a route that takes a knight to every square of an $NxN$ chessboard without visiting any square twice).

生成器的概念源自其他编程语言,尤其是 Icon(https://www2.cs.arizona.edu/icon/ ),在 Icon 语言中,生成器的概念是核心。在 Icon 中,每个表达式和函数调用生成器的概念源自其他编程语言,尤其是 Icon。在Icon中,每个表达式和函数调用都可以表现得像一个生成器。以下是来自“Icon 编程语言概述”中的一个示例,展示了生成器的用法 https://www2.cs.arizona.edu/icon/docs/ipd266.htm

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring "or" is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python 并没有像 Icon 那样将生成器作为核心概念来采纳。生成器被认为是 Python 核心语言的一部分,但学习或使用它们并不是强制性的;如果它们不能解决你的问题,可以完全忽略它们。与 Icon 相比,Python 的一个新颖特性是生成器的状态表示为一个具体对象(迭代器),该对象可以传递给其他函数或存储在数据结构中。

参见

PEP 255 - 简单生成器

由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 Python Labs 团队的修正。

PEP 237: 统一长整数和整数

In recent versions, the distinction between regular integers, which are 32-bit values on most machines, and long integers, which can be of arbitrary size, was becoming an annoyance. For example, on platforms that support files larger than 2**32 bytes, the tell() method of file objects has to return a long integer. However, there were various bits of Python that expected plain integers and would raise an error if a long integer was provided instead. For example, in Python 1.5, only regular integers could be used as a slice index, and 'abc'[1L:] would raise a TypeError exception with the message 'slice index must be int'.

Python 2.2 将根据需要将数值从短整数转换为长整数。'L' 后缀不再需要用于表示长整数字面量,因为现在编译器会自动选择适当的类型。(在未来的 2.x 版本的 Python 中,使用 'L' 后缀将被不鼓励,并在 Python 2.4 中触发警告,可能在 Python 3.0 中被移除。)许多以前会引发 OverflowError 的操作现在会返回一个长整数作为结果。例如:

>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L

In most cases, integers and long integers will now be treated identically. You can still distinguish them with the type() built-in function, but that's rarely needed.

参见

PEP 237 - 统一长整数和整数

由 Moshe Zadka 和 Guido van Rossum 撰写 ; 大部分由 Guido van Rossum 实现。

PEP 238:修改除法运算符

The most controversial change in Python 2.2 heralds the start of an effort to fix an old design flaw that's been in Python from the beginning. Currently Python's division operator, /, behaves like C's division operator when presented with two integer arguments: it returns an integer result that's truncated down when there would be a fractional part. For example, 3/2 is 1, not 1.5, and (-1)/2 is -1, not -0.5. This means that the results of division can vary unexpectedly depending on the type of the two operands and because Python is dynamically typed, it can be difficult to determine the possible types of the operands.

(The controversy is over whether this is really a design flaw, and whether it's worth breaking existing code to fix this. It's caused endless discussions on python-dev, and in July 2001 erupted into a storm of acidly sarcastic postings on comp.lang.python. I won't argue for either side here and will stick to describing what's implemented in 2.2. Read PEP 238 for a summary of arguments and counter-arguments.)

由于这一变化可能会破坏现有代码,因此它正在非常逐步地引入。Python 2.2 开始了这一过渡,但直到 Python 3.0 这一转换才会完全完成。

First, I'll borrow some terminology from PEP 238. "True division" is the division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25, and so forth. "Floor division" is what Python's / operator currently does when given integer operands; the result is the floor of the value returned by true division. "Classic division" is the current mixed behaviour of /; it returns the result of floor division when the operands are integers, and returns the result of true division when one of the operands is a floating-point number.

Python 2.2 引入了以下变化:

  • A new operator, //, is the floor division operator. (Yes, we know it looks like C++'s comment symbol.) // always performs floor division no matter what the types of its operands are, so 1 // 2 is 0 and 1.0 // 2.0 is also 0.0.

    // is always available in Python 2.2; you don't need to enable it using a __future__ statement.

  • By including a from __future__ import division in a module, the / operator will be changed to return the result of true division, so 1/2 is 0.5. Without the __future__ statement, / still means classic division. The default meaning of / will not change until Python 3.0.

  • Classes can define methods called __truediv__() and __floordiv__() to overload the two division operators. At the C level, there are also slots in the PyNumberMethods structure so extension types can define the two operators.

  • Python 2.2 支持一些命令行参数,用于测试代码是否能在除法语义改变的情况下正常工作。运行 Python 并使用 -Q warn 选项时,当对两个整数应用除法时会发出警告。你可以利用这个功能找到受影响的代码并进行修复。默认情况下,Python 2.2 会执行经典除法而不会发出警告;在 Python 2.3 中,警告将默认开启。

参见

PEP 238:改变除法运算符

由 Moshe Zadka 和 Guido van Rossum 撰写 ; 由 Guido van Rossum 实现。

Unicode 的改变

Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by supplying --enable-unicode=ucs4 to the configure script. (It's also possible to specify --disable-unicode to completely disable Unicode support.)

When built to use UCS-4 (a "wide Python"), the interpreter can natively handle Unicode characters from U+000000 to U+110000, so the range of legal values for the unichr() function is expanded accordingly. Using an interpreter compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still cause unichr() to raise a ValueError exception. This is all described in PEP 261, "Support for 'wide' Unicode characters"; consult it for further details.

Another change is simpler to explain. Since their introduction, Unicode strings have supported an encode() method to convert the string to a selected encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*]) method has been added to 8-bit strings (though not to Unicode strings) in 2.2. decode() assumes that the string is in the specified encoding and decodes it, returning whatever is returned by the codec.

Using this new feature, codecs have been added for tasks not directly related to Unicode. For example, codecs have been added for uu-encoding, MIME's base64 encoding, and compression with the zlib module:

>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*

end
>>> "sheesh".encode('rot-13')
'furrfu'

To convert a class instance to Unicode, a __unicode__() method can be defined by a class, analogous to __str__().

encode(), decode(), and __unicode__() were implemented by Marc-André Lemburg. The changes to support using UCS-4 internally were implemented by Fredrik Lundh and Martin von Löwis.

参见

PEP 261 - 对 '宽' Unicode 字符的支持

由 Paul Prescod 编写。

PEP 227: 嵌套的作用域

In Python 2.1, statically nested scopes were added as an optional feature, to be enabled by a from __future__ import nested_scopes directive. In 2.2 nested scopes no longer need to be specially enabled, and are now always present. The rest of this section is a copy of the description of nested scopes from my "What's New in Python 2.1" document; if you read it when 2.1 came out, you can skip the rest of this section.

Python 2.1 中的最大改变是 Python 的作用域规则,在Python 2.2中得到完善。 在 Python 2.0 中,任意给定的时刻至多使用三个命名空间来查找变量名称:局部、模块和内置命名空间。 这往往会导致令人吃惊的结果因为它与人们直觉上的预期不相匹配。 例如,一个嵌套的递归函数将不起作用:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

The function g() will always raise a NameError exception, because the binding of the name g isn't in either its local namespace or in the module-level namespace. This isn't much of a problem in practice (how often do you recursively define interior functions like this?), but this also made using the lambda expression clumsier, and this was a problem in practice. In code which uses lambda you can often find local variables being copied by passing them as the default values of arguments.

def find(self, name):
    "Return list of any entries equal to 'name'"
    L = filter(lambda x, name=name: x == name,
               self.list_attribute)
    return L

结果将会严重损害以高度函数式风格编写的 Python 代码的可读性。

Python 2.2 最显著的改变是增加了静态作用域这一语言特征来解决此问题。 作为它的第一项影响,在上述示例中的 name=name 默认参数现在将不再必要。 简单地说,当一个函数内部的给定变量名没有被赋值时(通过赋值语句,或者 def, classimport 语句),对该变量的引用将在外层作用域的局部命名空间中查找。 对于该规则的更详细解释,以及具体实现的分析,请参阅相应的 PEP。

对于同时在模块层级和包含下层函数定义的函数内部局部变量使用了相同变量名的代码来说这项改变可能会导致一些兼容性问题。 不过这看来不太可能发生,因为阅读这样的代码本来就会相当令人困惑。

此项改变的一个附带影响是在特定条件下函数作用域内部 from module import *exec 语句将不允许使用。 Python 参考手册已经写明 from module import * 仅在模块最高层级上是可用的,但此前 CPython 解释器从未强制实施此规则。 作为嵌套作用域具体实现的一部分,将 Python 源码转为字节码的编译器会生成不同的代码来访问某个包含作用域内的变量。 from module import *exec 会使得编译器无法正确执行,因为它们会向局部命名空间添加在编译时还不存在的名称。 为此,如果一个函数包含带有自由变量的函数定义或 lambda 表达式,编译器将通过引发 SyntaxError 异常来提示。

为了使前面的解释更清楚,下面是一个例子:

x = 1
def f():
    # The next line is a syntax error
    exec 'x=2'
    def g():
        return x

包含 exec 语句的第 4 行有语法错误,因为 exec 会定义一个名为 x 的新局部变量,它的值应当被 g() 所访问。

这应该不会是太大的限制,因为 exec 在多数 Python 代码中都极少被使用(而当它被使用时,往往也是个存在糟糕设计的信号)。

参见

PEP 227 - 静态嵌套作用域

由 Jeremy Hylton 撰写并实现。

新增和改进的模块

  • The xmlrpclib module was contributed to the standard library by Fredrik Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple remote procedure call protocol built on top of HTTP and XML. For example, the following snippet retrieves a list of RSS channels from the O'Reilly Network, and then lists the recent headlines for one channel:

    import xmlrpclib
    s = xmlrpclib.Server(
          'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
    channels = s.meerkat.getChannels()
    # channels is a list of dictionaries, like this:
    # [{'id': 4, 'title': 'Freshmeat Daily News'}
    #  {'id': 190, 'title': '32Bits Online'},
    #  {'id': 4549, 'title': '3DGamers'}, ... ]
    
    # Get the items for one channel
    items = s.meerkat.getItems( {'channel': 4} )
    
    # 'items' is another list of dictionaries, like this:
    # [{'link': 'http://freshmeat.net/releases/52719/',
    #   'description': 'A utility which converts HTML to XSL FO.',
    #   'title': 'html2fo 0.3 (Default)'}, ... ]
    

    The SimpleXMLRPCServer module makes it easy to create straightforward XML-RPC servers. See http://xmlrpc.scripting.com/ for more information about XML-RPC.

  • The new hmac module implements the HMAC algorithm described by RFC 2104. (Contributed by Gerhard Häring.)

  • Several functions that originally returned lengthy tuples now return pseudo-sequences that still behave like tuples but also have mnemonic attributes such as memberst_mtime or tm_year. The enhanced functions include stat(), fstat(), statvfs(), and fstatvfs() in the os module, and localtime(), gmtime(), and strptime() in the time module.

    For example, to obtain a file's size using the old tuples, you'd end up writing something like file_size = os.stat(filename)[stat.ST_SIZE], but now this can be written more clearly as file_size = os.stat(filename).st_size.

    此特性的初始补丁由 Nick Mathewson 贡献。

  • Python 的分析器进行了大量的重构,并纠正了其输出中的各种错误。(由 Fred L. Drake, Jr. 和 Tim Peters 贡献。)

  • The socket module can be compiled to support IPv6; specify the --enable-ipv6 option to Python's configure script. (Contributed by Jun-ichiro "itojun" Hagino.)

  • Two new format characters were added to the struct module for 64-bit integers on platforms that support the C long long type. q is for a signed 64-bit integer, and Q is for an unsigned one. The value is returned in Python's long integer type. (Contributed by Tim Peters.)

  • In the interpreter's interactive mode, there's a new built-in function help() that uses the pydoc module introduced in Python 2.1 to provide interactive help. help(object) displays any available help text about object. help() with no argument puts you in an online help utility, where you can enter the names of functions, classes, or modules to read their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee's pydoc module.)

  • Various bugfixes and performance improvements have been made to the SRE engine underlying the re module. For example, the re.sub() and re.split() functions have been rewritten in C. Another contributed patch speeds up certain Unicode character ranges by a factor of two, and a new finditer() method that returns an iterator over all the non-overlapping matches in a given string. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von Löwis.)

  • The smtplib module now supports RFC 2487, "Secure SMTP over TLS", so it's now possible to encrypt the SMTP traffic between a Python program and the mail transport agent being handed a message. smtplib also supports SMTP authentication. (Contributed by Gerhard Häring.)

  • The imaplib module, maintained by Piers Lauder, has support for several new extensions: the NAMESPACE extension defined in RFC 2342, SORT, GETACL and SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)

  • The rfc822 module's parsing of email addresses is now compliant with RFC 2822, an update to RFC 822. (The module's name is not going to be changed to rfc2822.) A new package, email, has also been added for parsing and generating e-mail messages. (Contributed by Barry Warsaw, and arising out of his work on Mailman.)

  • The difflib module now contains a new Differ class for producing human-readable lists of changes (a "delta") between two sequences of lines of text. There are also two generator functions, ndiff() and restore(), which respectively return a delta from two sequences, or one of the original sequences from a delta. (Grunt work contributed by David Goodger, from ndiff.py code by Tim Peters who then did the generatorization.)

  • New constants ascii_letters, ascii_lowercase, and ascii_uppercase were added to the string module. There were several modules in the standard library that used string.letters to mean the ranges A-Za-z, but that assumption is incorrect when locales are in use, because string.letters varies depending on the set of legal characters defined by the current locale. The buggy modules have all been fixed to use ascii_letters instead. (Reported by an unknown person; fixed by Fred L. Drake, Jr.)

  • The mimetypes module now makes it easier to use alternative MIME-type databases by the addition of a MimeTypes class, which takes a list of filenames to be parsed. (Contributed by Fred L. Drake, Jr.)

  • A Timer class was added to the threading module that allows scheduling an activity to happen at some future time. (Contributed by Itamar Shtull-Trauring.)

解释器的改变和修正

有些变化只会影响那些在 C 级别处理 Python 解释器的人,因为他们正在编写 Python 扩展模块、嵌入解释器或仅仅是在修改解释器本身。如果你只编写 Python 代码,这里描述的变化对你几乎没有影响。

  • Profiling and tracing functions can now be implemented in C, which can operate at much higher speeds than Python-based functions and should reduce the overhead of profiling and tracing. This will be of interest to authors of development environments for Python. Two new C functions were added to Python's API, PyEval_SetProfile() and PyEval_SetTrace(). The existing sys.setprofile() and sys.settrace() functions still exist, and have simply been changed to use the new C-level interface. (Contributed by Fred L. Drake, Jr.)

  • Another low-level API, primarily of interest to implementers of Python debuggers and development tools, was added. PyInterpreterState_Head() and PyInterpreterState_Next() let a caller walk through all the existing interpreter objects; PyInterpreterState_ThreadHead() and PyThreadState_Next() allow looping over all the thread states for a given interpreter. (Contributed by David Beazley.)

  • 垃圾收集器的 C 级接口已经发生了变化,使得编写支持垃圾收集的扩展类型和调试函数误用变得更容易。各种函数的语义略有不同,因此需要重命名一系列函数。使用旧 API 的扩展仍然可以编译,但不会参与垃圾收集,因此应优先考虑将它们更新为 2.2 版本。

    要将一个扩展模块升级至新 API,请执行下列步骤:

  • Py_TPFLAGS_GC 重命名为 Py_TPFLAGS_HAVE_GC

  • 使用 PyObject_GC_New()PyObject_GC_NewVar() 来分配

    对象,并使用 PyObject_GC_Del() 来释放它们。

  • Rename PyObject_GC_Init() to PyObject_GC_Track() and PyObject_GC_Fini() to PyObject_GC_UnTrack().

  • 从对象大小计算中移除 PyGC_HEAD_SIZE

  • Remove calls to PyObject_AS_GC() and PyObject_FROM_GC().

  • A new et format sequence was added to PyArg_ParseTuple(); et takes both a parameter and an encoding name, and converts the parameter to the given encoding if the parameter turns out to be a Unicode string, or leaves it alone if it's an 8-bit string, assuming it to already be in the desired encoding. This differs from the es format character, which assumes that 8-bit strings are in Python's default ASCII encoding and converts them to the specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS support on Windows described in the following section.)

  • A different argument parsing function, PyArg_UnpackTuple(), has been added that's simpler and presumably faster. Instead of specifying a format string, the caller simply gives the minimum and maximum number of arguments expected, and a set of pointers to PyObject* variables that will be filled in with argument values.

  • Two new flags METH_NOARGS and METH_O are available in method definition tables to simplify implementation of methods with no arguments or a single untyped argument. Calling such methods is more efficient than calling a corresponding method that uses METH_VARARGS. Also, the old METH_OLDARGS style of writing C methods is now officially deprecated.

  • Two new wrapper functions, PyOS_snprintf() and PyOS_vsnprintf() were added to provide cross-platform implementations for the relatively new snprintf() and vsnprintf() C lib APIs. In contrast to the standard sprintf() and vsprintf() functions, the Python versions check the bounds of the buffer used to protect against buffer overruns. (Contributed by M.-A. Lemburg.)

  • The _PyTuple_Resize() function has lost an unused parameter, so now it takes 2 parameters instead of 3. The third argument was never used, and can simply be discarded when porting code from earlier versions to Python 2.2.

其他的改变和修正

像往常一样,源代码树中散布着许多其他改进和错误修复。通过搜索 CVS 更改日志,可以发现 Python 2.1 到 2.2 之间应用了 527 个补丁并修复了 683 个错误;2.2.1 应用了 139 个补丁并修复了 143 个错误;2.2.2 应用了 106 个补丁并修复了 82 个错误。这些数字可能是低估的。

一些较为重要的改变:

  • 适用于 MacOS 的 Python 端口代码现在保存在主 Python CVS 树中,由 Jack Jansen 维护,并且为了支持 MacOS X,进行了许多更改。

    The most significant change is the ability to build Python as a framework, enabled by supplying the --enable-framework option to the configure script when compiling Python. According to Jack Jansen, "This installs a self-contained Python installation plus the OS X framework "glue" into /Library/Frameworks/Python.framework (or another location of choice). For now there is little immediate added benefit to this (actually, there is the disadvantage that you have to change your PATH to be able to find Python), but it is the basis for creating a full-blown Python application, porting the MacPython IDE, possibly using Python as a standard OSA scripting language and much more."

    Most of the MacPython toolbox modules, which interface to MacOS APIs such as windowing, QuickTime, scripting, etc. have been ported to OS X, but they've been left commented out in setup.py. People who want to experiment with these modules can uncomment them manually.

  • Keyword arguments passed to built-in functions that don't take them now cause a TypeError exception to be raised, with the message "function takes no keyword arguments".

  • Weak references, added in Python 2.1 as an extension module, are now part of the core because they're used in the implementation of new-style classes. The ReferenceError exception has therefore moved from the weakref module to become a built-in exception.

  • A new script, Tools/scripts/cleanfuture.py by Tim Peters, automatically removes obsolete __future__ statements from Python source code.

  • An additional flags argument has been added to the built-in function compile(), so the behaviour of __future__ statements can now be correctly observed in simulated shells, such as those presented by IDLE and other development environments. This is described in PEP 264. (Contributed by Michael Hudson.)

  • Python 1.6 引入的新许可证与 GPL 不兼容。通过对 2.2 许可证进行一些小的文本修改,这个问题得以解决,因此现在可以合法地将 Python 嵌入到 GPL 授权的程序中。请注意,Python 本身并不是在 GPL 授权下,而是采用一个与 BSD 许可证本质上等效的许可证,这与之前的情况一样。这些许可证更改也应用到了 Python 2.0.1 和 2.1.1 版本中。

  • 在 Windows 上,当 Python 遇到一个 Unicode 文件名时,现在会将其转换为 MBCS 编码的字符串,这种编码由 Microsoft 文件 API 使用。由于文件 API 明确使用 MBCS 编码,Python 默认选择 ASCII 作为编码方式显得很不方便。在 Unix 上,如果 locale.nl_langinfo(CODESET) 可用,Python 将使用本地字符集。(Windows 支持由 Mark Hammond 提供,Marc-André Lemburg 提供协助。Unix 支持由 Martin von Löwis 添加。)

  • 大文件支持目前已在 Windows 上启用。 (由 Tim Peters 贡献。)

  • Tools/scripts/ftpmirror.py 脚本现在会解析 .netrc 文件,如果存在的话。 (由 Mike Romberg 贡献。)

  • Some features of the object returned by the xrange() function are now deprecated, and trigger warnings when they're accessed; they'll disappear in Python 2.3. xrange objects tried to pretend they were full sequence types by supporting slicing, sequence multiplication, and the in operator, but these features were rarely used and therefore buggy. The tolist() method and the start, stop, and step attributes are also being deprecated. At the C level, the fourth argument to the PyRange_New() function, repeat, has also been deprecated.

  • 字典实现中有一堆补丁,主要是为了修复潜在的核心转储问题,这些问题发生在字典中包含的对象悄悄改变其哈希值,或者在它们所包含的字典中发生突变时。那段时间,python-dev 邮件列表进入了一个微妙的节奏:Michael Hudson 发现一个导致核心转储的案例,Tim Peters 修复这个 bug,接着 Michael 又发现另一个案例,如此反复循环。

  • 在 Windows 上,Python 现在可以使用 Borland C 编译,这要归功于 Stephen Hansen 提供的多个补丁,尽管结果还不完全可用。(但这*确实*是一个进步……)

  • 另一个 Windows 改进:Wise Solutions 慷慨地向 PythonLabs 提供了他们的 InstallerMaster 8.1 系统。早期的 PythonLabs Windows 安装程序使用的是 Wise 5.0a,已经开始显得过时。(由 Tim Peters 打包。)

  • Files ending in .pyw can now be imported on Windows. .pyw is a Windows-only thing, used to indicate that a script needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping up to display the output. This patch makes it possible to import such scripts, in case they're also usable as modules. (Implemented by David Bolen.)

  • On platforms where Python uses the C dlopen() function to load extension modules, it's now possible to set the flags used by dlopen() using the sys.getdlopenflags() and sys.setdlopenflags() functions. (Contributed by Bram Stolk.)

  • The pow() built-in function no longer supports 3 arguments when floating-point numbers are supplied. pow(x, y, z) returns (x**y) % z, but this is never useful for floating-point numbers, and the final result varies unpredictably depending on the platform. A call such as pow(2.0, 8.0, 7.0) will now raise a TypeError exception.

致谢

作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg, Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.