Python 2.2 有什么新变化¶
- 作者
A.M. Kuchling
概述¶
本文本介绍了 Python 2.2.2 的新增特性,该版本发布于 2002 年 10 月 14日。 Python 2.2.2 是 Python 2.2 的问题修正发布版,最初发布于 2001 年 12 月 21 日。
Python 2.2 可以被看作是 "清理发布版"。 有一些特性如生成器和迭代器等是全新的,但大多数变化,尽管可能是重大而深远的,都是为了清理语言设计中的不规范和阴暗角落。
本文并不试图提供对新特性的完整规范说明,而是提供一个便捷的概览。 要获取全部细节,你应该参阅 Python 2.2 的文档,比如 Python 库参考 和 Python 参考指南。 如果你想要了解某项更改的完整实现和设计理念,请参阅特定新特性的 PEP。
PEP 252 和 253:类型和类的修改¶
Python 2.2 中最大且影响最深远的改变是针对 Python 的对象和类模型。 这些变化应该是向下兼容的,因此你的代码将能继续运行而无需修改,但这些变化提供了一些很棒的新功能。 在开始本文最长和最复杂的部分之前,我提供对这些变化的概览并附带一些注释。
A long time ago I wrote a Web page listing flaws in Python's design. One of the
most significant flaws was that it's impossible to subclass Python types
implemented in C. In particular, it's not possible to subclass built-in types,
so you can't just subclass, say, lists in order to add a single useful method to
them. The UserList
module provides a class that supports all of the
methods of lists and that can be subclassed further, but there's lots of C code
that expects a regular Python list and won't accept a UserList
instance.
Python 2.2 修正了此问题,并在此过程中添加了一些令人激动的新功能。 简明概述如下:
你可以继承内置类型,例如列表和整数,并且你的子类应该在任何需要原始类型的地方正常工作。这使得 Python 的面向对象编程更加灵活和强大。
现在,除了之前版本的 Python 中可用的实例方法外,还可以定义静态方法和类方法。这使得你可以更灵活地组织类的行为。
It's also possible to automatically call methods on accessing or setting an instance attribute by using a new mechanism called properties. Many uses of
__getattr__()
can be rewritten to use properties instead, making the resulting code simpler and faster. As a small side benefit, attributes can now have docstrings, too.可以使用 __slots__ 限制实例的合法属性列表,从而防止拼写错误,并且在未来的 Python 版本中可能进行更多的优化。
一些用户对这些变化表示担忧。确实,他们说,新功能很棒,可以实现以前版本的 Python 无法做到的各种技巧,但它们也使语言变得更加复杂。一些人表示,他们一直推荐 Python 是因为它的简单性,现在感觉这种简单性正在丧失。
个人而言,我认为没有必要担心。许多新功能相当深奥,你可以编写大量 Python 代码而不需要了解它们。编写一个简单的类并不比以前更难,因此除非确实需要,否则你不必费心去学习或教授这些新功能。一些以前只有在 C 语言中才能实现的非常复杂的任务,现在可以用纯 Python 实现,在我看来,这一切都更好了。
我不会尝试涵盖所有为了使新功能生效而需要的每一个边缘情况和小改动。相反,本节将只勾勒出大致的轮廓。有关 Python 2.2 新对象模型的更多信息,请参见 相关链接 的“相关链接”部分。
旧式类和新式类¶
首先,你应该知道 Python 2.2 实际上有两种类型的类:经典类(或旧式类)和新式类。旧式类模型与早期版本的 Python 中的类模型完全相同。本节描述的所有新功能仅适用于新式类。这种分歧并不是永久的;最终,旧式类将被淘汰,可能在 Python 3.0 中被移除。
那么如何定义一个新式类呢?你可以通过继承一个现有的新式类来实现。大多数 Python 内置类型,如整数、列表、字典,甚至文件,现在都是新式类。此外,还添加了一个名为 object
的新式类,它是所有内置类型的基类,因此如果没有合适的内置类型,你可以直接继承 object
类:
class C(object):
def __init__ (self):
...
...
This means that class
statements that don't have any base classes are
always classic classes in Python 2.2. (Actually you can also change this by
setting a module-level variable named __metaclass__
--- see PEP 253
for the details --- but it's easier to just subclass object
.)
内置类型的类型对象在 Python 2.2 中作为内置对象提供,使用了一种巧妙的技巧命名。Python 一直有名为 int()
、float()
和 str()
的内置函数。在 Python 2.2 中,它们不再是函数,而是作为被调用时表现为工厂的类型对象。
>>> int
<type 'int'>
>>> int('123')
123
To make the set of types complete, new type objects such as dict()
and
file()
have been added. Here's a more interesting example, adding a
lock()
method to file objects:
class LockableFile(file):
def lock (self, operation, length=0, start=0, whence=0):
import fcntl
return fcntl.lockf(self.fileno(), operation,
length, start, whence)
The now-obsolete posixfile
module contained a class that emulated all of
a file object's methods and also added a lock()
method, but this class
couldn't be passed to internal functions that expected a built-in file,
something which is possible with our new LockableFile
.
描述器¶
In previous versions of Python, there was no consistent way to discover what
attributes and methods were supported by an object. There were some informal
conventions, such as defining __members__
and __methods__
attributes that were lists of names, but often the author of an extension type
or a class wouldn't bother to define them. You could fall back on inspecting
the __dict__
of an object, but when class inheritance or an arbitrary
__getattr__()
hook were in use this could still be inaccurate.
新类模型的一个核心理念是正式化了使用描述符来描述对象属性的 API。描述符指定属性的值,说明它是方法还是字段。通过描述符 API,静态方法和类方法成为可能,以及其他更复杂的构造。
属性描述符是存在于类对象内部的对象,它们自身具有一些属性。描述符协议由三个主要方法组成:
__name__
是属性的名称。__doc__
是属性的文档字符串。__get__(object)
是一个从 object 中提取属性值的方法。__set__(object, value)
将 object 上的属性设为 value。__delete__(object, value)
将删除 object 的 value 属性。
例如,当你写下 obj.x
,Python 实际要执行的步骤是:
descriptor = obj.__class__.x
descriptor.__get__(obj)
For methods, descriptor.__get__()
returns a temporary object that's
callable, and wraps up the instance and the method to be called on it. This is
also why static methods and class methods are now possible; they have
descriptors that wrap up just the method, or the method and the class. As a
brief explanation of these new kinds of methods, static methods aren't passed
the instance, and therefore resemble regular functions. Class methods are
passed the class of the object, but not the object itself. Static and class
methods are defined like this:
class C(object):
def f(arg1, arg2):
...
f = staticmethod(f)
def g(cls, arg1, arg2):
...
g = classmethod(g)
The staticmethod()
function takes the function f()
, and returns it
wrapped up in a descriptor so it can be stored in the class object. You might
expect there to be special syntax for creating such methods (def static f
,
defstatic f()
, or something like that) but no such syntax has been defined
yet; that's been left for future versions of Python.
更多的新功能,如 __slots__ 和属性,也作为新类型的描述符实现。编写一个实现新功能的描述符类并不困难。例如,可以编写一个描述符类,使其能够为方法编写类似 Eiffel 风格的前置条件和后置条件。使用该功能的类可能定义如下:
from eiffel import eiffelmethod
class C(object):
def f(self, arg1, arg2):
# The actual function
...
def pre_f(self):
# Check preconditions
...
def post_f(self):
# Check postconditions
...
f = eiffelmethod(f, pre_f, post_f)
Note that a person using the new eiffelmethod()
doesn't have to understand
anything about descriptors. This is why I think the new features don't increase
the basic complexity of the language. There will be a few wizards who need to
know about it in order to write eiffelmethod()
or the ZODB or whatever,
but most users will just write code on top of the resulting libraries and ignore
the implementation details.
多重继承:钻石规则¶
通过改变名称解析规则,多重继承也变得更加有用。 请看下面这组类(图表摘自 PEP 253 ,作者 Guido van Rossum):
class A:
^ ^ def save(self): ...
/ \
/ \
/ \
/ \
class B class C:
^ ^ def save(self): ...
\ /
\ /
\ /
\ /
class D
The lookup rule for classic classes is simple but not very smart; the base
classes are searched depth-first, going from left to right. A reference to
D.save()
will search the classes D
, B
, and then
A
, where save()
would be found and returned. C.save()
would never be found at all. This is bad, because if C
's save()
method is saving some internal state specific to C
, not calling it will
result in that state never getting saved.
新式类遵循一种不同的算法,虽然解释起来有点复杂,但在这种情况下能做正确的事情。(请注意,Python 2.3 改变了这个算法,在大多数情况下会产生相同的结果,但对于非常复杂的继承图会产生更有用的结果。)
List all the base classes, following the classic lookup rule and include a class multiple times if it's visited repeatedly. In the above example, the list of visited classes is [
D
,B
,A
,C
,A
].Scan the list for duplicated classes. If any are found, remove all but one occurrence, leaving the last one in the list. In the above example, the list becomes [
D
,B
,C
,A
] after dropping duplicates.
Following this rule, referring to D.save()
will return C.save()
,
which is the behaviour we're after. This lookup rule is the same as the one
followed by Common Lisp. A new built-in function, super()
, provides a way
to get at a class's superclasses without having to reimplement Python's
algorithm. The most commonly used form will be super(class, obj)
, which
returns a bound superclass object (not the actual class object). This form
will be used in methods to call a method in the superclass; for example,
D
's save()
method would look like this:
class D (B,C):
def save (self):
# Call superclass .save()
super(D, self).save()
# Save D's private information here
...
super()
在以 super(class)
或 super(class1, class2)
形式调用时也可以返回未绑定的超类对象,但这可能并不常用。
属性访问¶
A fair number of sophisticated Python classes define hooks for attribute access
using __getattr__()
; most commonly this is done for convenience, to make
code more readable by automatically mapping an attribute access such as
obj.parent
into a method call such as obj.get_parent
. Python 2.2 adds
some new ways of controlling attribute access.
首先,新式类仍然支持 __getattr__(attr_name)
,关于它的任何内容都没有改变。 和以前一样,当试图访问 obj.foo
时,如果在实例的字典中找不到名为 foo
的属性,就会调用它。
New-style classes also support a new method,
__getattribute__(attr_name)
. The difference between the two methods is
that __getattribute__()
is always called whenever any attribute is
accessed, while the old __getattr__()
is only called if foo
isn't
found in the instance's dictionary.
However, Python 2.2's support for properties will often be a simpler way
to trap attribute references. Writing a __getattr__()
method is
complicated because to avoid recursion you can't use regular attribute accesses
inside them, and instead have to mess around with the contents of
__dict__
. __getattr__()
methods also end up being called by Python
when it checks for other methods such as __repr__()
or __coerce__()
,
and so have to be written with this in mind. Finally, calling a function on
every attribute access results in a sizable performance loss.
property
is a new built-in type that packages up three functions that
get, set, or delete an attribute, and a docstring. For example, if you want to
define a size
attribute that's computed, but also settable, you could
write:
class C(object):
def get_size (self):
result = ... computation ...
return result
def set_size (self, size):
... compute something based on the size
and set internal state appropriately ...
# Define a property. The 'delete this attribute'
# method is defined as None, so the attribute
# can't be deleted.
size = property(get_size, set_size,
None,
"Storage size of this instance")
That is certainly clearer and easier to write than a pair of
__getattr__()
/__setattr__()
methods that check for the size
attribute and handle it specially while retrieving all other attributes from the
instance's __dict__
. Accesses to size
are also the only ones
which have to perform the work of calling a function, so references to other
attributes run at their usual speed.
最后,可以使用新的类属性 __slots__
来限制对象上可以引用的属性列表。Python 对象通常非常动态,可以随时通过简单地 obj.new_attr=1
来定义一个新属性。新式类可以定义一个名为 __slots__
的类属性,以将合法属性限制为特定的一组名称。一个例子可以更清楚地说明这一点:
>>> class C(object):
... __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'
注意,当尝试为未列在 __slots__
中的属性赋值时,会引发 AttributeError
。
PEP 234: 迭代器¶
Python 2.2 的另一个重要新增功能是在 C 和 Python 两个层面上引入了迭代接口。对象可以定义如何被调用者循环遍历。
In Python versions up to 2.1, the usual way to make for item in obj
work is
to define a __getitem__()
method that looks something like this:
def __getitem__(self, index):
return <next item>
__getitem__()
is more properly used to define an indexing operation on an
object so that you can write obj[5]
to retrieve the sixth element. It's a
bit misleading when you're using this only to support for
loops.
Consider some file-like object that wants to be looped over; the index
parameter is essentially meaningless, as the class probably assumes that a
series of __getitem__()
calls will be made with index incrementing by
one each time. In other words, the presence of the __getitem__()
method
doesn't mean that using file[5]
to randomly access the sixth element will
work, though it really should.
In Python 2.2, iteration can be implemented separately, and __getitem__()
methods can be limited to classes that really do support random access. The
basic idea of iterators is simple. A new built-in function, iter(obj)
or iter(C, sentinel)
, is used to get an iterator. iter(obj)
returns
an iterator for the object obj, while iter(C, sentinel)
returns an
iterator that will invoke the callable object C until it returns sentinel to
signal that the iterator is done.
Python classes can define an __iter__()
method, which should create and
return a new iterator for the object; if the object is its own iterator, this
method can just return self
. In particular, iterators will usually be their
own iterators. Extension types implemented in C can implement a tp_iter
function in order to return an iterator, and extension types that want to behave
as iterators can define a tp_iternext
function.
总结一下,迭代器实际上做什么?它们有一个必需的方法 next()
,该方法不接受任何参数并返回下一个值。当没有更多的值可以返回时,调用 next()
应该引发 StopIteration
异常。以下是一个简单的例子来说明迭代器的工作原理:
>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
In 2.2, Python's for
statement no longer expects a sequence; it
expects something for which iter()
will return an iterator. For backward
compatibility and convenience, an iterator is automatically constructed for
sequences that don't implement __iter__()
or a tp_iter
slot, so
for i in [1,2,3]
will still work. Wherever the Python interpreter loops
over a sequence, it's been changed to use the iterator protocol. This means you
can do things like this:
>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)
迭代器支持已被添加到Python的一些基本类型中。对字典调用 iter()
会返回一个迭代器,该迭代器遍历字典的键,如下所示:
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10
That's just the default behaviour. If you want to iterate over keys, values, or
key/value pairs, you can explicitly call the iterkeys()
,
itervalues()
, or iteritems()
methods to get an appropriate iterator.
In a minor related change, the in
operator now works on dictionaries,
so key in dict
is now equivalent to dict.has_key(key)
.
文件也提供了一个迭代器,它会调用 readline()
方法,直到文件中没有更多的行。这意味着你现在可以使用类似这样的代码来读取文件的每一行:
for line in file:
# do something for each line
...
请注意,你只能在迭代器中向前移动;没有办法获取前一个元素、重置迭代器或复制迭代器。一个迭代器对象可以提供这些额外的功能,但迭代器协议只要求有一个 next()
方法。
参见
- PEP 234 - 迭代器
由 Ka-Ping Yee 和 GvR 撰写;由 Python Labs 小组(主要由 GvR 和 Tim Peters)实现。
PEP 255: 简单的生成器¶
生成器是另一个新增特性,它是与迭代器的引入相互关联的。
你一定熟悉在Python或C语言中函数调用的工作方式。当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。当函数执行到 return
语句时,这些局部变量会被销毁,并将结果值返回给调用者。稍后对同一个函数的调用将获得一套全新的局部变量。但,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。
这里是一个生成器函数的最简示例:
def generate_ints(N):
for i in range(N):
yield i
一个新的关键字 yield
被引入用于生成器。任何包含 yield
语句的函数都是生成器函数;这由Python的字节码编译器检测到,并因此对函数进行特殊编译。由于引入了一个新的关键字,生成器必须通过在模块的源代码顶部附近包含一条 from __future__ import generators
语句来显式启用。在Python 2.3中,这条语句将变得不再必要。
当你调用一个生成器函数时,它不会返回单个值;相反,它返回一个支持迭代器协议的生成器对象。在执行 yield
语句时,生成器输出 i
的值,类似于 return
语句。yield
和 return
语句之间的重大区别在于,当到达 yield
时,生成器的执行状态被挂起,并且局部变量被保留。在下一次调用生成器的 next()
方法时,函数将立即在 yield
语句之后恢复执行。(由于复杂的原因,yield
语句不允许在 try
... finally
语句的 try
块中使用;请阅读 PEP 255 以获得关于 yield
和异常交互的详细解释。)
这里是 generate_ints()
生成器的用法示例:
>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in generate_ints
StopIteration
你可以等价地写成 for i in generate_ints(5)
或 a,b,c = generate_ints(3)
。
在生成器函数内部, return
语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return
,例如 return 5
,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration
异常来指示,或者只是让执行流自然地从函数底部流出。
你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count
设置为0,并让 next()
方法递增 self.count
并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py
包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:
# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
在 Lib/test/test_generators.py
中还有另外两个例子,它们分别解决了N皇后问题(在$NxN$的棋盘上放置$N$个皇后,使得没有任何皇后威胁到其他皇后)和骑士巡游问题(在$NxN$的棋盘上,骑士访问每一个方格且不重复访问任何方格的路径)。
The idea of generators comes from other programming languages, especially Icon (https://www.cs.arizona.edu/icon/), where the idea of generators is central. In Icon, every expression and function call behaves like a generator. One example from "An Overview of the Icon Programming Language" at https://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks like:
sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)
In Icon the find()
function returns the indexes at which the substring
"or" is found: 3, 23, 33. In the if
statement, i
is first
assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
retries it with the second value of 23. 23 is greater than 5, so the comparison
now succeeds, and the code prints the value 23 to the screen.
Python 并没有像 Icon 那样将生成器作为核心概念来采纳。生成器被认为是 Python 核心语言的一部分,但学习或使用它们并不是强制性的;如果它们不能解决你的问题,可以完全忽略它们。与 Icon 相比,Python 的一个新颖特性是生成器的状态表示为一个具体对象(迭代器),该对象可以传递给其他函数或存储在数据结构中。
参见
- PEP 255 - 简单生成器
由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 Python Labs 团队的修正。
PEP 237: 统一长整数和整数¶
In recent versions, the distinction between regular integers, which are 32-bit
values on most machines, and long integers, which can be of arbitrary size, was
becoming an annoyance. For example, on platforms that support files larger than
2**32
bytes, the tell()
method of file objects has to return a long
integer. However, there were various bits of Python that expected plain integers
and would raise an error if a long integer was provided instead. For example,
in Python 1.5, only regular integers could be used as a slice index, and
'abc'[1L:]
would raise a TypeError
exception with the message 'slice
index must be int'.
Python 2.2 将根据需要将数值从短整数转换为长整数。'L' 后缀不再需要用于表示长整数字面量,因为现在编译器会自动选择适当的类型。(在未来的 2.x 版本的 Python 中,使用 'L' 后缀将被不鼓励,并在 Python 2.4 中触发警告,可能在 Python 3.0 中被移除。)许多以前会引发 OverflowError
的操作现在会返回一个长整数作为结果。例如:
>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L
在大多数情况下,整数和长整数现在将被视为相同。你仍然可以使用内置的 type()
函数区分它们,但这很少需要。
参见
- PEP 237 - 统一长整数和整数
由 Moshe Zadka 和 Guido van Rossum 撰写 ; 大部分由 Guido van Rossum 实现。
PEP 238:修改除法运算符¶
Python 2.2中最具争议的变化预示着修复一个自Python诞生以来的旧设计缺陷的努力的开始。目前,Python的除法操作符 /
在接收两个整数参数时表现得像C语言的除法操作符:它返回一个被截断为整数的结果。例如,3/2
是1,而不是1.5,(-1)/2
是-1,而不是-0.5。这意味着除法的结果可能会根据两个操作数的类型而意外变化,并且由于Python是动态类型的,确定操作数的可能类型可能会很困难。
(争议在于这是否*真的*算是一个设计缺陷,以及是否值得为了修复它而破坏现有代码。这在python-dev上引发了无休止的讨论,并在2001年7月爆发成一场在 comp.lang.python 的充满讽刺性言辞的风暴。我不会在这里为任何一方辩护,只会描述在2.2中实现的内容。请阅读 PEP 238 以获取争论和反驳的摘要。)
由于这一变化可能会破坏现有代码,因此它正在非常逐步地引入。Python 2.2 开始了这一过渡,但直到 Python 3.0 这一转换才会完全完成。
首先,我将借用一些来自 PEP 238 的术语。“真除法”是大多数非程序员所熟悉的除法:3/2是1.5,1/4是0.25,等等。“地板除法”是Python的 /
操作符在给定整数操作数时当前执行的操作;其结果是真除法返回值的地板值。“经典除法”是当前 /
操作符的混合行为;当操作数是整数时,它返回地板除法的结果,而当其中一个操作数是浮点数时,它返回真除法的结果。
Python 2.2 引入了以下变化:
一个新的操作符,
//
是地板除法操作符。 (是的,我们知道它看起来像 C++ 的注释符号。)//
始终 执行地板除法,无论其操作数的类型是什么,因此1 // 2
是 0,1.0 // 2.0
也是0.0。//
操作符在Python 2.2中始终可用;你不需要通过__future__
语句来启用它。通过在模块中包含
from __future__ import division
,/``操作符将被更改为返回真除法的结果,因此 ``1/2
是0.5。如果没有这条__future__
语句,/
仍然表示经典除法。/
的默认含义在Python 3.0之前不会改变。Classes can define methods called
__truediv__()
and__floordiv__()
to overload the two division operators. At the C level, there are also slots in thePyNumberMethods
structure so extension types can define the two operators.Python 2.2 支持一些命令行参数,用于测试代码是否能在除法语义改变的情况下正常工作。运行 Python 并使用 -Q warn 选项时,当对两个整数应用除法时会发出警告。你可以利用这个功能找到受影响的代码并进行修复。默认情况下,Python 2.2 会执行经典除法而不会发出警告;在 Python 2.3 中,警告将默认开启。
参见
- PEP 238:改变除法运算符
由 Moshe Zadka 和 Guido van Rossum 撰写 ; 由 Guido van Rossum 实现。
Unicode 的改变¶
Python的Unicode支持在2.2版本中有所增强。Unicode字符串通常以UCS-2形式存储,即16位无符号整数。通过向配置脚本提供 --enable-unicode=ucs4
选项,Python 2.2也可以编译为使用UCS-4(32位无符号整数)作为其内部编码。(也可以指定 --disable-unicode
选项来完全禁用Unicode支持。)
When built to use UCS-4 (a "wide Python"), the interpreter can natively handle
Unicode characters from U+000000 to U+110000, so the range of legal values for
the unichr()
function is expanded accordingly. Using an interpreter
compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still
cause unichr()
to raise a ValueError
exception. This is all
described in PEP 261, "Support for 'wide' Unicode characters"; consult it for
further details.
Another change is simpler to explain. Since their introduction, Unicode strings
have supported an encode()
method to convert the string to a selected
encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*])
method has been added to 8-bit strings (though not to Unicode strings) in 2.2.
decode()
assumes that the string is in the specified encoding and decodes
it, returning whatever is returned by the codec.
利用这一新特性,编解码器被添加用于与Unicode不直接相关的任务。例如,已经添加了用于uu编码、MIME的base64编码以及使用 zlib
模块进行压缩的编解码器:
>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
end
>>> "sheesh".encode('rot-13')
'furrfu'
To convert a class instance to Unicode, a __unicode__()
method can be
defined by a class, analogous to __str__()
.
encode()
, decode()
, and __unicode__()
were implemented by
Marc-André Lemburg. The changes to support using UCS-4 internally were
implemented by Fredrik Lundh and Martin von Löwis.
参见
- PEP 261 - 对 '宽' Unicode 字符的支持
由 Paul Prescod 编写。
PEP 227: 嵌套的作用域¶
在Python 2.1中,静态嵌套作用域作为一个可选特性被添加,需要通过 from __future__ import nested_scopes
指令来启用。在2.2版本中,嵌套作用域不再需要特别启用,现在总是存在。本节的其余部分是从我的《Python 2.1的新特性》文档中复制的嵌套作用域描述;如果你在2.1发布时已经阅读过,可以跳过本节的其余部分。
Python 2.1 中的最大改变是 Python 的作用域规则,在Python 2.2中得到完善。 在 Python 2.0 中,任意给定的时刻至多使用三个命名空间来查找变量名称:局部、模块和内置命名空间。 这往往会导致令人吃惊的结果因为它与人们直觉上的预期不相匹配。 例如,一个嵌套的递归函数将不起作用:
def f():
...
def g(value):
...
return g(value-1) + 1
...
The function g()
will always raise a NameError
exception, because
the binding of the name g
isn't in either its local namespace or in the
module-level namespace. This isn't much of a problem in practice (how often do
you recursively define interior functions like this?), but this also made using
the lambda
expression clumsier, and this was a problem in practice.
In code which uses lambda
you can often find local variables being
copied by passing them as the default values of arguments.
def find(self, name):
"Return list of any entries equal to 'name'"
L = filter(lambda x, name=name: x == name,
self.list_attribute)
return L
结果将会严重损害以高度函数式风格编写的 Python 代码的可读性。
Python 2.2 最显著的改变是增加了静态作用域这一语言特征来解决此问题。 作为它的第一项影响,在上述示例中的 name=name
默认参数现在将不再必要。 简单地说,当一个函数内部的给定变量名没有被赋值时(通过赋值语句,或者 def
, class
或 import
语句),对该变量的引用将在外层作用域的局部命名空间中查找。 对于该规则的更详细解释,以及具体实现的分析,请参阅相应的 PEP。
对于同时在模块层级和包含下层函数定义的函数内部局部变量使用了相同变量名的代码来说这项改变可能会导致一些兼容性问题。 不过这看来不太可能发生,因为阅读这样的代码本来就会相当令人困惑。
此项改变的一个附带影响是在特定条件下函数作用域内部 from module import *
和 exec
语句将不允许使用。 Python 参考手册已经写明 from module import *
仅在模块最高层级上是可用的,但此前 CPython 解释器从未强制实施此规则。 作为嵌套作用域具体实现的一部分,将 Python 源码转为字节码的编译器会生成不同的代码来访问某个包含作用域内的变量。 from module import *
和 exec
会使得编译器无法正确执行,因为它们会向局部命名空间添加在编译时还不存在的名称。 为此,如果一个函数包含带有自由变量的函数定义或 lambda
表达式,编译器将通过引发 SyntaxError
异常来提示。
为了使前面的解释更清楚,下面是一个例子:
x = 1
def f():
# The next line is a syntax error
exec 'x=2'
def g():
return x
包含 exec
语句的第 4 行有语法错误,因为 exec
将定义一个名为 x
的新局部变量,它的值应当被 g()
访问。
这应该不会是太大的限制,因为 exec
在多数 Python 代码中都极少被使用(而当它被使用时,往往也是个存在糟糕设计的信号)。
参见
- PEP 227 - 静态嵌套作用域
由 Jeremy Hylton 撰写并实现。
新增和改进的模块¶
The
xmlrpclib
module was contributed to the standard library by Fredrik Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple remote procedure call protocol built on top of HTTP and XML. For example, the following snippet retrieves a list of RSS channels from the O'Reilly Network, and then lists the recent headlines for one channel:import xmlrpclib s = xmlrpclib.Server( 'http://www.oreillynet.com/meerkat/xml-rpc/server.php') channels = s.meerkat.getChannels() # channels is a list of dictionaries, like this: # [{'id': 4, 'title': 'Freshmeat Daily News'} # {'id': 190, 'title': '32Bits Online'}, # {'id': 4549, 'title': '3DGamers'}, ... ] # Get the items for one channel items = s.meerkat.getItems( {'channel': 4} ) # 'items' is another list of dictionaries, like this: # [{'link': 'http://freshmeat.net/releases/52719/', # 'description': 'A utility which converts HTML to XSL FO.', # 'title': 'html2fo 0.3 (Default)'}, ... ]
The
SimpleXMLRPCServer
module makes it easy to create straightforward XML-RPC servers. See http://xmlrpc.scripting.com/ for more information about XML-RPC.Several functions that originally returned lengthy tuples now return pseudo-sequences that still behave like tuples but also have mnemonic attributes such as memberst_mtime or
tm_year
. The enhanced functions includestat()
,fstat()
,statvfs()
, andfstatvfs()
in theos
module, andlocaltime()
,gmtime()
, andstrptime()
in thetime
module.例如,使用旧的元组来获取文件的大小时,你可能会写成
file_size = os.stat(filename)[stat.ST_SIZE]
,但现在可以更清晰地写成file_size = os.stat(filename).st_size
。此特性的初始补丁由 Nick Mathewson 贡献。
Python 的分析器进行了大量的重构,并纠正了其输出中的各种错误。(由 Fred L. Drake, Jr. 和 Tim Peters 贡献。)
socket
模块可以编译为支持IPv6;为Python的配置脚本指定--enable-ipv6
选项。(由Jun-ichiro "itojun" Hagino贡献。)Two new format characters were added to the
struct
module for 64-bit integers on platforms that support the Clong long
type.q
is for a signed 64-bit integer, andQ
is for an unsigned one. The value is returned in Python's long integer type. (Contributed by Tim Peters.)在解释器的交互模式下,有一个新的内置函数
help()
,它使用在Python 2.1 中引入的pydoc
模块来提供交互式帮助。help(object)
显示关于*object*的任何可用帮助文本。不带参数调用help()
会进入一个在线帮助工具,在那里你可以输入函数、类或模块的名称来阅读它们的帮助文本。(由Guido van Rossum贡献,使用Ka-Ping Yee的pydoc
模块。)Various bugfixes and performance improvements have been made to the SRE engine underlying the
re
module. For example, there.sub()
andre.split()
functions have been rewritten in C. Another contributed patch speeds up certain Unicode character ranges by a factor of two, and a newfinditer()
method that returns an iterator over all the non-overlapping matches in a given string. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von Löwis.)smtplib
模块现在支持 RFC 2487:"Secure SMTP over TLS",因此现在可以加密Python程序与接收消息的邮件传输代理之间的SMTP流量。smtplib
还支持SMTP身份验证。(由Gerhard Häring贡献。)imaplib
模块由 Piers Lauder 维护,支持几个新扩展: RFC 2342 中定义的 NAMESPACE 扩展、SORT、GETACL和SETACL。(由 Anthony Baxter 和 Michel Pelletier 贡献。)The
rfc822
module's parsing of email addresses is now compliant with RFC 2822, an update to RFC 822. (The module's name is not going to be changed torfc2822
.) A new package,email
, has also been added for parsing and generating e-mail messages. (Contributed by Barry Warsaw, and arising out of his work on Mailman.)The
difflib
module now contains a newDiffer
class for producing human-readable lists of changes (a "delta") between two sequences of lines of text. There are also two generator functions,ndiff()
andrestore()
, which respectively return a delta from two sequences, or one of the original sequences from a delta. (Grunt work contributed by David Goodger, from ndiff.py code by Tim Peters who then did the generatorization.)New constants
ascii_letters
,ascii_lowercase
, andascii_uppercase
were added to thestring
module. There were several modules in the standard library that usedstring.letters
to mean the ranges A-Za-z, but that assumption is incorrect when locales are in use, becausestring.letters
varies depending on the set of legal characters defined by the current locale. The buggy modules have all been fixed to useascii_letters
instead. (Reported by an unknown person; fixed by Fred L. Drake, Jr.)The
mimetypes
module now makes it easier to use alternative MIME-type databases by the addition of aMimeTypes
class, which takes a list of filenames to be parsed. (Contributed by Fred L. Drake, Jr.)A
Timer
class was added to thethreading
module that allows scheduling an activity to happen at some future time. (Contributed by Itamar Shtull-Trauring.)
解释器的改变和修正¶
有些变化只会影响那些在 C 级别处理 Python 解释器的人,因为他们正在编写 Python 扩展模块、嵌入解释器或仅仅是在修改解释器本身。如果你只编写 Python 代码,这里描述的变化对你几乎没有影响。
性能分析和追踪函数现在可以用 C 语言来实现,相比基于 Python 的函数能够显著提高运行速度并能够减少性能分析和追踪的资源开销。 Python 开发环境的编写者对此将会很感兴趣。 Python 的 API 增加了两个新的 C 函数,
PyEval_SetProfile()
和PyEval_SetTrace()
。 现有的sys.setprofile()
和sys.settrace()
函数仍然存在,并已简单地更改为使用新的 C 层级接口。 (由 Fred L. Drake, Jr. 贡献。)Another low-level API, primarily of interest to implementors of Python debuggers and development tools, was added.
PyInterpreterState_Head()
andPyInterpreterState_Next()
let a caller walk through all the existing interpreter objects;PyInterpreterState_ThreadHead()
andPyThreadState_Next()
allow looping over all the thread states for a given interpreter. (Contributed by David Beazley.)垃圾收集器的 C 级接口已经发生了变化,使得编写支持垃圾收集的扩展类型和调试函数误用变得更容易。各种函数的语义略有不同,因此需要重命名一系列函数。使用旧 API 的扩展仍然可以编译,但不会参与垃圾收集,因此应优先考虑将它们更新为 2.2 版本。
要将一个扩展模块升级至新 API,请执行下列步骤:
将
Py_TPFLAGS_GC()
重命名为PyTPFLAGS_HAVE_GC()
。- 使用
PyObject_GC_New()
或PyObject_GC_NewVar()
来分配 对象,并使用
PyObject_GC_Del()
来释放它们。
- 使用
- 将
PyObject_GC_Init()
重命名为PyObject_GC_Track()
并 将
PyObject_GC_Fini()
重命名为PyObject_GC_UnTrack()
。
- 将
将
PyGC_HEAD_SIZE()
从对象大小计算中移除。移除对
PyObject_AS_GC()
和PyObject_FROM_GC()
的调用。向
PyArg_ParseTuple()
添加了一个新的et
格式序列;et
接受一个形参和一个编码格式名称,如果该形参值是一个 Unicode 字符串则将其转换为给定的编码格式,或者如果它是一个 8 比特位字符串则让其保持原样,即假定它已经使用了适当的编码格式。 这不同于es
格式字符,它假定该 8 比特位字符串是使用 Python 默认的 ASCII 编码格式并将其转换为指定的新编码格式。 (由 M.-A. Lemburg 贡献,用于下一节所描述的 Windows 上的 MBCS 支持。)A different argument parsing function,
PyArg_UnpackTuple()
, has been added that's simpler and presumably faster. Instead of specifying a format string, the caller simply gives the minimum and maximum number of arguments expected, and a set of pointers toPyObject*
variables that will be filled in with argument values.Two new flags
METH_NOARGS
andMETH_O
are available in method definition tables to simplify implementation of methods with no arguments or a single untyped argument. Calling such methods is more efficient than calling a corresponding method that usesMETH_VARARGS
. Also, the oldMETH_OLDARGS
style of writing C methods is now officially deprecated.Two new wrapper functions,
PyOS_snprintf()
andPyOS_vsnprintf()
were added to provide cross-platform implementations for the relatively newsnprintf()
andvsnprintf()
C lib APIs. In contrast to the standardsprintf()
andvsprintf()
functions, the Python versions check the bounds of the buffer used to protect against buffer overruns. (Contributed by M.-A. Lemburg.)_PyTuple_Resize()
函数去掉了一个未使用的形参,因此现在它接受 2 个形参而不是 3 个。 第三个参数从未被使用,在将代码从较早的版本移植到 Python 2.2 时可以简单地丢弃它。
其他的改变和修正¶
像往常一样,源代码树中散布着许多其他改进和错误修复。通过搜索 CVS 更改日志,可以发现 Python 2.1 到 2.2 之间应用了 527 个补丁并修复了 683 个错误;2.2.1 应用了 139 个补丁并修复了 143 个错误;2.2.2 应用了 106 个补丁并修复了 82 个错误。这些数字可能是低估的。
一些较为重要的改变:
适用于 MacOS 的 Python 端口代码现在保存在主 Python CVS 树中,由 Jack Jansen 维护,并且为了支持 MacOS X,进行了许多更改。
最重要的变化是能够将 Python 作为框架来进行构建,这可以通过在编译 Python 时向配置脚本提供
--enable-framework
选项来启用。 根据 Jack Jansen 的说法,“这会将一个独立的 Python 安装版加上 OS X 框架‘粘合起来’放到/Library/Frameworks/Python.framework
中(或者其他选定的位置)。 就目前而言这样做并没有什么直接的额外好处(实际上,这样做还存在必须更改 PATH 才能找到Python 的坏处),但它是创建完整的 Python 应用程序、移植 MacPython IDE、并可能使用 Python 作为标准 OSA 脚本语言及其他更多功能的基础。”作为 MacOS API 如 windowing, QuickTime, scripting 等的接口的许多 MacPython 工具箱模块已被移植到 OS X,但它们在
setup.py
中被注释掉了。 希望尝试这些模块的人可以手动取消注释它们。现在将关键字参数传给不接受它们的内置函数会导致引发
TypeError
异常,并附带消息 "function takes no keyword arguments"。在 Python 2.1 中作为扩展模块加入的弱引用现在已成为核心组成部分,因为它们被用于新式类的实现。 为此
ReferenceError
异常也已从weakref
模块移出成为一个内置异常。由 Tim Peters 编写的新脚本
Tools/scripts/cleanfuture.py
可自动从 Python 源代码移除过时的__future__
语句。向内置函数
compile()
添加了一个额外的 flags 参数,以便现在__future__
语句的行为能在模拟的 shell,例如由 IDLE 和其他开发环境所提供的此类工具中被正确地观察。 此特性的描述参见 PEP 264。 (由 Michael Hudson 贡献。)Python 1.6 引入的新许可证与 GPL 不兼容。通过对 2.2 许可证进行一些小的文本修改,这个问题得以解决,因此现在可以合法地将 Python 嵌入到 GPL 授权的程序中。请注意,Python 本身并不是在 GPL 授权下,而是采用一个与 BSD 许可证本质上等效的许可证,这与之前的情况一样。这些许可证更改也应用到了 Python 2.0.1 和 2.1.1 版本中。
在 Windows 上,当 Python 遇到一个 Unicode 文件名时,现在会将其转换为 MBCS 编码的字符串,这种编码由 Microsoft 文件 API 使用。由于文件 API 明确使用 MBCS 编码,Python 默认选择 ASCII 作为编码方式显得很不方便。在 Unix 上,如果
locale.nl_langinfo(CODESET)
可用,Python 将使用本地字符集。(Windows 支持由 Mark Hammond 提供,Marc-André Lemburg 提供协助。Unix 支持由 Martin von Löwis 添加。)大文件支持目前已在 Windows 上启用。 (由 Tim Peters 贡献。)
Tools/scripts/ftpmirror.py
脚本现在会解析.netrc
文件,如果存在的话。 (由 Mike Romberg 贡献。)Some features of the object returned by the
xrange()
function are now deprecated, and trigger warnings when they're accessed; they'll disappear in Python 2.3.xrange
objects tried to pretend they were full sequence types by supporting slicing, sequence multiplication, and thein
operator, but these features were rarely used and therefore buggy. Thetolist()
method and thestart
,stop
, andstep
attributes are also being deprecated. At the C level, the fourth argument to thePyRange_New()
function,repeat
, has also been deprecated.字典实现中有一堆补丁,主要是为了修复潜在的核心转储问题,这些问题发生在字典中包含的对象悄悄改变其哈希值,或者在它们所包含的字典中发生突变时。那段时间,python-dev 邮件列表进入了一个微妙的节奏:Michael Hudson 发现一个导致核心转储的案例,Tim Peters 修复这个 bug,接着 Michael 又发现另一个案例,如此反复循环。
在 Windows 上,Python 现在可以使用 Borland C 编译,这要归功于 Stephen Hansen 提供的多个补丁,尽管结果还不完全可用。(但这*确实*是一个进步……)
另一个 Windows 改进:Wise Solutions 慷慨地向 PythonLabs 提供了他们的 InstallerMaster 8.1 系统。早期的 PythonLabs Windows 安装程序使用的是 Wise 5.0a,已经开始显得过时。(由 Tim Peters 打包。)
在 Windows 上现在将会导入以
.pyw
结尾的文件。.pyw
是 Windows 专属的,用来指明一个脚本需要使用 PYTHONW.EXE 而不是 PYTHON.EXE 来运行以避免弹出 DOS 控制台来显示输出。 该补丁使得导入这样的脚本成为可能,让它们也可以作为模块来使用。 (由 David Bolen 实现。)在 Python 会使用 C
dlopen()
函数来加载扩展模块的平台上,现在可以使用sys.getdlopenflags()
和sys.setdlopenflags()
等函数来设置dlopen()
所使用的旗标。 (由 Bram Stolk 贡献。)The
pow()
built-in function no longer supports 3 arguments when floating-point numbers are supplied.pow(x, y, z)
returns(x**y) % z
, but this is never useful for floating point numbers, and the final result varies unpredictably depending on the platform. A call such aspow(2.0, 8.0, 7.0)
will now raise aTypeError
exception.
致谢¶
作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg, Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.