Python 2.3 有什么新变化

作者

A.M. Kuchling

本文介绍了 Python 2.3 的新特性。 Python 2.3 发布于 2003 年 7 月 29 日。

Python 2.3 的主要主题是完善在 2.2 中添加的一些功能、为核心语言添加各种小但实用的增强功能,以及扩展标准库。 上一版本引入的新对象模型已经受益于 18 个月的错误修复和优化努力,这些优化提升了新式类的性能。 新增了几个内置函数,例如 sum()enumerate()in 操作符现在可以用于子字符串搜索 (例如,"ab" in "abc" 将返回 True)。

许多新库功能包括布尔值、集合、堆、日期/时间数据类型,从ZIP格式的归档文件中导入模块的能力,期待已久的 Python 目录的元数据支持,更新版本的 IDLE,以及用于日志记录、文本包装、解析 CSV 文件、处理命令行选项、使用 BerkeleyDB 数据库的模块…… 新模块和增强模块的列表相当长。

本文并不试图提供对新功能的完整规范,而是提供了一个方便的概览。 有关详细信息,你应该参考 Python 2.3 的文档,例如 Python 库参考和 Python 参考手册。 如果你想了解完整的实现和设计原理,请参阅特定新功能的 PEP。

PEP 218: 标准集合数据类型

The new sets module contains an implementation of a set datatype. The Set class is for mutable sets, sets that can have members added and removed. The ImmutableSet class is for sets that can't be modified, and instances of ImmutableSet can therefore be used as dictionary keys. Sets are built on top of dictionaries, so the elements within a set must be hashable.

这是一个简单的示例:

>>> import sets
>>> S = sets.Set([1,2,3])
>>> S
Set([1, 2, 3])
>>> 1 in S
True
>>> 0 in S
False
>>> S.add(5)
>>> S.remove(3)
>>> S
Set([1, 2, 5])
>>>

The union and intersection of sets can be computed with the union() and intersection() methods; an alternative notation uses the bitwise operators & and |. Mutable sets also have in-place versions of these methods, union_update() and intersection_update().

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([4,5,6])
>>> S1.union(S2)
Set([1, 2, 3, 4, 5, 6])
>>> S1 | S2                  # Alternative notation
Set([1, 2, 3, 4, 5, 6])
>>> S1.intersection(S2)
Set([])
>>> S1 & S2                  # Alternative notation
Set([])
>>> S1.union_update(S2)
>>> S1
Set([1, 2, 3, 4, 5, 6])
>>>

It's also possible to take the symmetric difference of two sets. This is the set of all elements in the union that aren't in the intersection. Another way of putting it is that the symmetric difference contains all elements that are in exactly one set. Again, there's an alternative notation (^), and an in-place version with the ungainly name symmetric_difference_update().

>>> S1 = sets.Set([1,2,3,4])
>>> S2 = sets.Set([3,4,5,6])
>>> S1.symmetric_difference(S2)
Set([1, 2, 5, 6])
>>> S1 ^ S2
Set([1, 2, 5, 6])
>>>

另外还有 issubset()issuperset() 方法用来检查一个集合是否为另一个集合的子集或超集:

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([2,3])
>>> S2.issubset(S1)
True
>>> S1.issubset(S2)
False
>>> S1.issuperset(S2)
True
>>>

参见

PEP 218 - 添加内置Set对象类型

PEP 由 Greg V. Wilson 撰写 ; 由 Greg V. Wilson, Alex Martelli 和 GvR 实现。

PEP 255: 简单的生成器

在 Python 2.2 中,生成器作为一个可选特性被添加,需要通过 from __future__ import generators 指令来启用。 在 2.3 版本中,生成器不再需要特别启用,现在总是存在;这意味着 yield 现在始终是一个关键字。 本节的其余部分是从《Python 2.2的新特性》文档中复制的生成器描述;如果你在 Python 2.2 发布时已经阅读过,可以跳过本节的其余部分。

你一定熟悉在 Python 或 C 语言中函数调用的工作方式。 当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。 当函数执行到 return 语句时,这些局部变量会被销毁,并将结果值返回给调用者。 稍后对同一个函数的调用将获得一套全新的局部变量。 但是,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。

这里是一个生成器函数的最简示例:

def generate_ints(N):
    for i in range(N):
        yield i

一个新的关键字 yield 被引入用于生成器。 任何包含 yield 语句的函数都是生成器函数;这由 Python 的字节码编译器检测到,并因此对函数进行特殊编译。

当您调用生成器函数时,它不会返回一个单独的值;相反,它会返回一个支持迭代器协议的生成器对象。在执行 yield 语句时,生成器会输出 i 的值 ,类似于 return 语句。 yieldreturn 语句之间的最大区别在于,在到达 yield 时,生成器的执行状态会暂停,并保留本地变量。 在下一次调用生成器 的 .next() 方法时,函数将在 yield 语句之后立即恢复执行。 (由于复杂的原因,yield 语句不允许在 try...finally 语句的 try 代码块内出现;有关 yield 和异常之间交互的完整解释,请阅读 PEP 255。)

这里是 generate_ints() 生成器的用法示例:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "stdin", line 1, in ?
  File "stdin", line 2, in generate_ints
StopIteration

你可以等价地写成 for i in generate_ints(5)a,b,c = generate_ints(3)

在生成器函数内部, return 语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return,例如 return 5,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration 异常来指示,或者只是让执行流自然地从函数底部流出。

你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count 设置为0,并让 next() 方法递增 self.count 并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py 包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Lib/test/test_generators.py 中还有另外两个例子,它们分别解决了N皇后问题(在$NxN$的棋盘上放置$N$个皇后,使得没有任何皇后威胁到其他皇后)和骑士巡游问题(在$NxN$的棋盘上,骑士访问每一个方格且不重复访问任何方格的路径)。

The idea of generators comes from other programming languages, especially Icon (https://www.cs.arizona.edu/icon/), where the idea of generators is central. In Icon, every expression and function call behaves like a generator. One example from "An Overview of the Icon Programming Language" at https://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks like:

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring "or" is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python并不像Icon那样将生成器作为核心概念来采用。生成器被视为Python核心语言的一部分,但学习或使用它们并不是强制的;如果它们不能解决你遇到的问题,可以完全忽略它们。与Icon相比,Python接口的一个新颖特性是生成器的状态表示为一个具体的对象(迭代器),可以传递给其他函数或存储在数据结构中。

参见

PEP 255 - 简单生成器

由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 Python Labs 团队的修正。

PEP 263: 源代码的字符编码格式

现在可以声明Python源文件使用不同的字符集编码。通过在源文件的第一行或第二行包含特定格式的注释来声明编码。例如,一个UTF-8文件可以这样声明:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

如果没有这样的编码声明,默认使用7位ASCII编码。执行或导入包含8位字符的字符串字面量且没有编码声明的模块时,在Python 2.3中会触发 DeprecationWarning 警告;而在Python 2.4中,这将成为语法错误

编码声明只影响Unicode字符串字面量,这些字面量将使用指定的编码转换为Unicode。请注意,Python的标识符仍然限制为ASCII字符,因此变量名不能使用超出常规字母数字字符范围的字符。

参见

PEP 263 - 定义 Python 源代码的编码格式

由 Marc-André Lemburg 和 Martin von Löwis 撰写 ; 由 Suzuki Hisao 和 Martin von Löwis 实现。

PEP 273: 从ZIP压缩包导入模块

通过新的 zipimport 模块增加了从 ZIP 格式归档文件导入模块的支持。 你不需要显式地导入模块;它将在 ZIP 归档文件名被添加到 sys.path 的情况下自动导入。 例如:

amk@nyman:~/src/python$ unzip -l /tmp/example.zip
Archive:  /tmp/example.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
     8467  11-26-02 22:30   jwzthreading.py
 --------                   -------
     8467                   1 file
amk@nyman:~/src/python$ ./python
Python 2.3 (#1, Aug 1 2003, 19:54:32)
>>> import sys
>>> sys.path.insert(0, '/tmp/example.zip')  # Add .zip file to front of path
>>> import jwzthreading
>>> jwzthreading.__file__
'/tmp/example.zip/jwzthreading.py'
>>>

sys.path 中的一个条目现在可以是 ZIP 归档文件的文件名。ZIP 归档可以包含任何类型的文件,但只有名为 *.py*.pyc*.pyo 的文件可以被导入。如果归档仅包含 *.py 文件,Python 不会尝试通过添加相应的 *.pyc 文件来修改归档,这意味着如果 ZIP 归档不包含 *.pyc 文件,导入可能会相当慢。

也可以指定归档内的路径,以仅从子目录导入;例如,路径 /tmp/example.zip/lib/ 将仅从归档中的 lib/ 子目录导入。

参见

PEP 273 - 从 ZIP 压缩包导入模块

由James C. Ahlstrom撰写,并提供了一个实现。Python 2.3遵循 PEP 273 中的规范,但使用了Just van Rossum编写的实现,该实现利用了 PEP 302 中描述的导入钩子。有关新导入钩子的描述,请参见 PEP 302: 新导入钩子 的相关部分。

PEP 277: 针对 Windows NT 的 Unicode 文件名支持

在Windows NT、2000和XP上,系统将文件名存储为Unicode字符串。传统上,Python将文件名表示为字节字符串,这种方式不够完善,因为它会导致某些文件名无法访问。

Python now allows using arbitrary Unicode strings (within the limitations of the file system) for all functions that expect file names, most notably the open() built-in function. If a Unicode string is passed to os.listdir(), Python now returns a list of Unicode strings. A new function, os.getcwdu(), returns the current directory as a Unicode string.

字节串仍可被用作文件名,并且在 Windows 上 Python 将透明地使用 mbcs 编码格式将其转换为 Unicode。

Other systems also allow Unicode strings as file names but convert them to byte strings before passing them to the system, which can cause a UnicodeError to be raised. Applications can test whether arbitrary Unicode strings are supported as file names by checking os.path.supports_unicode_filenames, a Boolean value.

在 MacOS 下,os.listdir() 现在可以返回 Unicode 文件名。

参见

PEP 277 - 针对 Windows NT 的 Unicode 文件名支持

由 Neil Hodgson 撰写 ; 由 Neil Hodgson, Martin von Löwis 和 Mark Hammond 实现。

PEP 278: 通用换行支持

目前使用的三大操作系统是微软的 Windows、苹果的 Macintosh OS 和各种 Unix 衍生系统。跨平台工作的一个小麻烦是,这三个平台都使用不同的字符来标记文本文件中的行结束。Unix 使用换行符(ASCII 字符 10),MacOS 使用回车符(ASCII 字符 13),Windows 使用回车符加换行符的双字符序列。

Python's file objects can now support end of line conventions other than the one followed by the platform on which Python is running. Opening a file with the mode 'U' or 'rU' will open a file for reading in universal newlines mode. All three line ending conventions will be translated to a '\n' in the strings returned by the various file methods such as read() and readline().

Universal newline support is also used when importing modules and when executing a file with the execfile() function. This means that Python modules can be shared between all three operating systems without needing to convert the line-endings.

在编译 Python 时,可以通过在运行 Python 的 configure 脚本时指定 --without-universal-newlines 开关禁用该功能。

参见

PEP 278 - 通用换行支持

由 Jack Jansen 撰写并实现。

PEP 279: enumerate()

新的内置函数 enumerate() 将使某些循环更加清晰。 在 enumerate(thing) 中,如果 thing 是迭代器或序列,则返回一个迭代器,该迭代器将返回 (0, thing[0])(1, thing[1])(2, thing[2]),以此类推。

改变一个列表中每个元素的常见写法看起来像是这样:

for i in range(len(L)):
    item = L[i]
    # ... compute some result based on item ...
    L[i] = result

可以使用 enumerate() 重写为:

for i, item in enumerate(L):
    # ... compute some result based on item ...
    L[i] = result

参见

PEP 279 - 内置函数 enumerate()

由 Raymond D. Hettinger 撰写并实现。

PEP 282: logging 包

Python 2.3 中新增了一个用于编写日志的标准软件包 logging。 它为生成日志输出提供了一个强大而灵活的机制,这些输出可以通过各种方式进行过滤和处理。用标准格式编写的配置文件可以用来控制程序的日志行为。 Python 包含的处理器可以将日志记录写入标准错误、文件或套接字,发送到系统日志,甚至通过电子邮件发送到特定地址;当然,您也可以编写自己的处理器类。

The Logger class is the primary class. Most application code will deal with one or more Logger objects, each one used by a particular subsystem of the application. Each Logger is identified by a name, and names are organized into a hierarchy using . as the component separator. For example, you might have Logger instances named server, server.auth and server.network. The latter two instances are below server in the hierarchy. This means that if you turn up the verbosity for server or direct server messages to a different handler, the changes will also apply to records logged to server.auth and server.network. There's also a root Logger that's the parent of all other loggers.

为了简化使用,logging 包提供了一些始终使用根日志的便捷函数:

import logging

logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

这会产生以下输出:

WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down

In the default configuration, informational and debugging messages are suppressed and the output is sent to standard error. You can enable the display of informational and debugging messages by calling the setLevel() method on the root logger.

Notice the warning() call's use of string formatting operators; all of the functions for logging messages take the arguments (msg, arg1, arg2, ...) and log the string resulting from msg % (arg1, arg2, ...).

There's also an exception() function that records the most recent traceback. Any of the other functions will also record the traceback if you specify a true value for the keyword argument exc_info.

def f():
    try:    1/0
    except: logging.exception('Problem recorded')

f()

这会产生以下输出:

ERROR:root:Problem recorded
Traceback (most recent call last):
  File "t.py", line 6, in f
    1/0
ZeroDivisionError: integer division or modulo by zero

稍微高级一些的程序将使用除根记录器之外的记录器。getLogger(name) 函数用于获取特定的日志,如果该日志尚不存在则创建它。getLogger(None) 返回根记录器。:

log = logging.getLogger('server')
 ...
log.info('Listening on port %i', port)
 ...
log.critical('Disk full')
 ...

Log records are usually propagated up the hierarchy, so a message logged to server.auth is also seen by server and root, but a Logger can prevent this by setting its propagate attribute to False.

There are more classes provided by the logging package that can be customized. When a Logger instance is told to log a message, it creates a LogRecord instance that is sent to any number of different Handler instances. Loggers and handlers can also have an attached list of filters, and each filter can cause the LogRecord to be ignored or can modify the record before passing it along. When they're finally output, LogRecord instances are converted to text by a Formatter class. All of these classes can be replaced by your own specially-written classes.

logging 软件包具有所有这些功能,即使是最复杂的应用程序也能灵活运用。 本文仅是对其功能的不完整概述,因此请参阅软件包的参考文档了解所有细节。 阅读 PEP 282 也会有所帮助。

参见

PEP 282 - Logging 系统

由 Vinay Sajip 和 Trent Mick 撰写 ; 由 Vinay Sajip 实现。

PEP 285: 布尔类型

A Boolean type was added to Python 2.3. Two new constants were added to the __builtin__ module, True and False. (True and False constants were added to the built-ins in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of 1 and 0 and aren't a different type.)

这个新类型的类型对象名为 bool;它的构造函数接收任何 Python 值,并将其转换为 TrueFalse。:

>>> bool(1)
True
>>> bool(0)
False
>>> bool([])
False
>>> bool( (1,) )
True

大多数标准库模块和内置函数都改为返回布尔值:

>>> obj = []
>>> hasattr(obj, 'append')
True
>>> isinstance(obj, list)
True
>>> isinstance(obj, tuple)
False

添加 Python 布尔运算的主要目的是使代码更清晰。 例如,如果您在阅读一个函数时遇到 return 1 语句,您可能会想知道 1 代表的是布尔真值、索引还是乘以其他量的系数。 然而,如果语句是 return True,返回值的含义就非常清楚了。

Python 的布尔值 不是 为了严格的类型检查而添加的。 像 Pascal 这样非常严格的语言也会阻止您使用布尔进行算术运算,并要求 if 语句中的表达式总是求布尔结果。 正如 PEP 285 所明确指出的,Python 没有这么严格,以后也不会有。 这意味着您仍然可以在 if 语句中使用任何表达式,甚至是求值为 list、tuple 或一些随机对象的表达式。 布尔类型是 int 类的子类,因此使用布尔值进行算术运算仍然有效:

>>> True + 1
2
>>> False + 1
1
>>> False * 75
0
>>> True * 75
75

用一句话概括 TrueFalse: 它们是拼写整数值 1 和 0 的另一种方式,唯一不同的是 str()repr() 返回的字符串是 'True''False',而不是 '1''0'

参见

PEP 285 - 添加布尔类型

由 GvR 撰写并实现。

PEP 293: 编解码器错误处理回调

将 Unicode 字符串编码为字节字符串时,可能会遇到无法编码的字符。 到目前为止,Python 允许将错误处理指定为 "strict" (引发 UnicodeError)、"ignore" (跳过该字符) 或 "replace" (在输出字符串中使用问号),其中 "strict" 是默认行为。 可能需要指定对此类错误的其他处理方式,例如在转换后的字符串中插入 XML 字符引用或 HTML 实体引用。

Python 现在有一个灵活的框架,可以添加不同的处理策略。可以通过 codecs.register_error() 添加新的错误处理器,然后编解码器可以通过 codecs.lookup_error() 访问错误处理器。 错误处理器会获取必要的状态信息,如正在转换的字符串、字符串中检测到错误的位置以及目标编码。 然后,处理器可以引发异常或返回替换字符串。

使用该框架还实现了两个额外的错误处理器: "backslashreplace" 使用 Python 反斜杠引号来表示无法编码的字符,而 "xmlcharrefreplace" 则转换为 XML 字符引用。

参见

PEP 293 - 编解码器错误处理回调

由 Walter Dörwald 撰写并实现。

PEP 301: Distutils的软件包索引和元数据

广受期待的对 Python 编目的支持在 2.3 版中首次出现。

编目功能的核心是新的 Distutils register 命令。 运行 python setup.py register 将会收集描述软件包的元数据,例如其名称、版本、维护者、描述信息等等,并将其发送给中央编目服务器。 结果编目数据可在 https://pypi.org 获取。

To make the catalog a bit more useful, a new optional classifiers keyword argument has been added to the Distutils setup() function. A list of Trove-style strings can be supplied to help classify the software.

下面是一个带有分类器的 setup.py 示例,其编写是为了兼容旧版本的 Distutils:

from distutils import core
kw = {'name': "Quixote",
      'version': "0.5.1",
      'description': "A highly Pythonic Web application framework",
      # ...
      }

if (hasattr(core, 'setup_keywords') and
    'classifiers' in core.setup_keywords):
    kw['classifiers'] = \
        ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
         'Environment :: No Input/Output (Daemon)',
         'Intended Audience :: Developers'],

core.setup(**kw)

完整的 classifiers 列表可通过运行 python setup.py register --list-classifiers 来获取。

参见

PEP 301 - Distutils 的软件包索引和元数据

由 Richard Jones 撰写并实现。

PEP 302: 新导入钩子

While it's been possible to write custom import hooks ever since the ihooks module was introduced in Python 1.3, no one has ever been really happy with it because writing new import hooks is difficult and messy. There have been various proposed alternatives such as the imputil and iu modules, but none of them has ever gained much acceptance, and none of them were easily usable from C code.

PEP 302 borrows ideas from its predecessors, especially from Gordon McMillan's iu module. Three new items are added to the sys module:

  • sys.path_hooks 是一个可调用对象列表,通常是类。 每个可调用对象都接收一个包含路径的字符串,然后返回一个可处理从该路径导入的导入器对象,如果不能处理该路径,则引发 ImportError 异常。

  • sys.path_importer_cache 会缓存每条路径的导入器对象,因此 sys.path_hooks 只需为每条路径遍历一次。

  • sys.meta_path 是一个导入器对象列表,在检查 sys.path 之前将遍历该列表。 该列表最初为空,但用户代码可以向其中添加对象。 其他内置模块和冻结模块可以通过添加到该列表中的对象导入。

Importer objects must have a single method, find_module(fullname, path=None). fullname will be a module or package name, e.g. string or distutils.core. find_module() must return a loader object that has a single method, load_module(fullname), that creates and returns the corresponding module object.

因此,Python 新导入逻辑的伪代码如下 (略有简化;详情请参见 PEP 302):

for mp in sys.meta_path:
    loader = mp(fullname)
    if loader is not None:
        <module> = loader.load_module(fullname)

for path in sys.path:
    for hook in sys.path_hooks:
        try:
            importer = hook(path)
        except ImportError:
            # ImportError, so try the other path hooks
            pass
        else:
            loader = importer.find_module(fullname)
            <module> = loader.load_module(fullname)

# Not found!
raise ImportError

参见

PEP 302 - 新导入钩

由 Just van Rossum 和 Paul Moore 撰写 ; 由 Just van Rossum 实现。

PEP 305: 逗号分隔文件

以逗号作为分隔符的文件是一种常用于从数据库和电子表格导出数据的格式。 Python 2.3 增加了一个针对逗号分隔文件的解析器。

逗号分隔文件乍一看非常简单:

Costs,150,200,3.95

读取一行并调用 line.split(','): 再简单不过了吧? 但是考虑到可能包含逗号的字符串数据,事件就变得复杂起来:

"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"

一个大的丑陋的正则表达式可以解析这些内容,但使用新的 csv 软件包要简单得多:

import csv

input = open('datafile', 'rb')
reader = csv.reader(input)
for line in reader:
    print line

The reader() function takes a number of different options. The field separator isn't limited to the comma and can be changed to any character, and so can the quoting and line-ending characters.

可以定义和注册不同的逗号分隔文件方言;目前有两种方言,均由 Microsoft Excel 使用。一个单独的 csv.writer 类将从一系列元组或列表生成逗号分隔文件,并对包含分隔符的字符串进行引用。

参见

该实现在“Python 增强提议” - PEP 305 (CSV 文件 API) 中被提出

由 Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells 撰写并实现。

PEP 307:对 pickle 的改进

The pickle and cPickle modules received some attention during the 2.3 development cycle. In 2.2, new-style classes could be pickled without difficulty, but they weren't pickled very compactly; PEP 307 quotes a trivial example where a new-style class results in a pickled string three times longer than that for a classic class.

解决办法就是发明一种新的 pickle 协议。 pickle.dumps() 函数很早就支持文本或二进制标志。 在 2.3 中,该标志从布尔值重新定义为整数:0 表示旧的文本模式 pickle 格式,1 表示旧的二进制格式,现在 2 表示新的 2.3 专用格式。 一个新常量 pickle.HIGHEST_PROTOCOL 可用来选择最先进的协议。

Unpickling is no longer considered a safe operation. 2.2's pickle provided hooks for trying to prevent unsafe classes from being unpickled (specifically, a __safe_for_unpickling__ attribute), but none of this code was ever audited and therefore it's all been ripped out in 2.3. You should not unpickle untrusted data in any version of Python.

To reduce the pickling overhead for new-style classes, a new interface for customizing pickling was added using three special methods: __getstate__(), __setstate__(), and __getnewargs__(). Consult PEP 307 for the full semantics of these methods.

为了进一步压缩 pickle 类,现在可以使用整数代码而不是长字符串来标识 pickle 类。 Python 软件基金会将维护一个标准化代码列表;还有一系列供私人使用的代码。 目前还没有指定任何代码。

参见

PEP 307 - pickle 协议的扩展

PEP 由 Guido van Rossum 和 Tim Peters 撰写和实现。

扩展切片

从 Python 1.4 开始,切片语法支持可选的第三个“step”或“stride”参数。例如,这些都是合法的 Python 语法: L[1:10:2]L[:-1:1]L[::-1]。 这是应 Numerical Python 开发者的要求添加到 Python 中的,因为 Numerical Python 广泛使用第三个参数。 然而,Python 内置的 list、tuple 和字符串序列类型从未支持过这一特性,如果您尝试使用,会引发 TypeError。 Michael Hudson 提供了一个补丁来修复这一缺陷。

例如,您现在可以轻松地提取出具有偶数索引的列表元素:

>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]

也可以用负值以按相反顺序复制相同的列表:

>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

这也适用于元组、数组和字符串:

>>> s='abcd'
>>> s[::2]
'ac'
>>> s[::-1]
'dcba'

如果你有一个可变序列如列表或数组,你可以对扩展切片进行赋值或删除,但对扩展切片的赋值与对常规切片的赋值有一些区别。对常规片段的赋值可以用来改变序列的长度:

>>> a = range(3)
>>> a
[0, 1, 2]
>>> a[1:3] = [4, 5, 6]
>>> a
[0, 4, 5, 6]

扩展分片则没有这种灵活性。 在为扩展分片赋值时,语句右侧的列表必须包含与要替换的分片相同数量的项目:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> a[::2] = [0, -1]
>>> a
[0, 1, -1, 3]
>>> a[::2] = [0,1,2]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: attempt to assign sequence of size 3 to extended slice of size 2

删除操作更为直观:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> del a[::2]
>>> a
[1, 3]

One can also now pass slice objects to the __getitem__() methods of the built-in sequences:

>>> range(10).__getitem__(slice(0, 5, 2))
[0, 2, 4]

或者直接在下标中使用切片对象:

>>> range(10)[slice(0, 5, 2)]
[0, 2, 4]

To simplify implementing sequences that support extended slicing, slice objects now have a method indices(length) which, given the length of a sequence, returns a (start, stop, step) tuple that can be passed directly to range(). indices() handles omitted and out-of-bounds indices in a manner consistent with regular slices (and this innocuous phrase hides a welter of confusing details!). The method is intended to be used like this:

class FakeSeq:
    ...
    def calc_item(self, i):
        ...
    def __getitem__(self, item):
        if isinstance(item, slice):
            indices = item.indices(len(self))
            return FakeSeq([self.calc_item(i) for i in range(*indices)])
        else:
            return self.calc_item(i)

从这个例子中还可以看到,内置的 slice 对象现在是 slice 类型的类型对象,而不再是函数。 这与 Python 2.2 是一致的,在 Python 2.2 中,intstr 等也经历了同样的变化。

其他语言特性修改

以下是 Python 2.3 针对核心 Python 语言的所有改变。

  • yield 语句现在将始终是关键字,如本文档的 PEP 255: 简单的生成器 一节所描述的。

  • 新增内置函数 enumerate(),如本文档的 PEP 279: enumerate() 一节所描述的。

  • 新增两个常量 TrueFalse 以及内置的 bool 类型,如本文档的 PEP 285: 布尔类型 一节所描述的。

  • int() 类型构造函数现在会返回一个长整数,而不会在字符串或浮点数太大而无法放入整数时引发 OverflowError。 这可能会导致 isinstance(int(expression), int) 为假的矛盾结果,但在实践中似乎不太可能造成问题。

  • 内置类型现在支持扩展的切分语法,详见本文档 扩展切片 一节。

  • 一个新的内置函数 sum(iterable, start=0) 用于对可迭代对象中的数值项进行求和并返回其总和。sum() 只接受数字,意味着你不能用它来连接一堆字符串。(由 Alex Martelli 贡献。)

  • 以前 list.insert(pos, value)pos 为负值时会将 value 插入到列表的前面。 现在,该行为已被修改为与切片索引一致,因此当 pos 为 -1 时,值将被插入最后一个元素之前,以此类推。

  • list.index(value) 会在列表中搜索 value,并返回其索引,现在可以使用可选的 startstop 参数,将搜索范围限制在列表的一部分。

  • 字典有一个新方法 pop(key[, *default*]),可返回 key 对应的值,并从字典中删除该键/值对。如果请求的键不在字典中,如果指定了 default,则返回 default,如果没有指定则会引发 KeyError

    >>> d = {1:2}
    >>> d
    {1: 2}
    >>> d.pop(4)
    Traceback (most recent call last):
      File "stdin", line 1, in ?
    KeyError: 4
    >>> d.pop(1)
    2
    >>> d.pop(1)
    Traceback (most recent call last):
      File "stdin", line 1, in ?
    KeyError: 'pop(): dictionary is empty'
    >>> d
    {}
    >>>
    

    还有一个新的类方法 dict.fromkeys(iterable, value),用于创建一个字典,其键取自所提供的迭代器 iterable,所有值设置为 value,默认为 None

    (由 Raymond Hettinger 贡献补丁。)

    此外,现在 dict() 构建器可接受关键字参数以简化小型字典的创建:

    >>> dict(red=1, blue=2, green=3, black=4)
    {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
    

    (由 Just van Rossum 贡献。)

  • assert 语句将不再检查 __debug__ 旗标,因此你无法再通过为 __debug__ 赋值来禁用断言。 使用 -O 开关运行 Python 仍会生成不执行任何断言的代码。

  • Most type objects are now callable, so you can use them to create new objects such as functions, classes, and modules. (This means that the new module can be deprecated in a future Python version, because you can now use the type objects available in the types module.) For example, you can create a new module object with the following code:

    >>> import types
    >>> m = types.ModuleType('abc','docstring')
    >>> m
    <module 'abc' (built-in)>
    >>> m.__doc__
    'docstring'
    
  • 添加了一个新的警告 PendingDeprecationWarning,用于指示正在被废弃的功能。 默认情况下 不会 打印该警告。 要检查是否使用了将来会被废弃的功能,可在命令行中提供 -Walways::PendingDeprecationWarning:: 或使用 warnings.filterwarnings()

  • raise "Error occurred" 一样,基于字符串的异常的废弃过程已经开始。 现在,引发字符串异常将触发 PendingDeprecationWarning

  • 现在使用 None 作为变量名将导致 SyntaxWarning 警告。 在未来的 Python 版本中,None 将最终成为一个保留关键字。

  • The xreadlines() method of file objects, introduced in Python 2.1, is no longer necessary because files now behave as their own iterator. xreadlines() was originally introduced as a faster way to loop over all the lines in a file, but now you can simply write for line in file_obj. File objects also have a new read-only encoding attribute that gives the encoding used by the file; Unicode strings written to the file will be automatically converted to bytes using the given encoding.

  • The method resolution order used by new-style classes has changed, though you'll only notice the difference if you have a really complicated inheritance hierarchy. Classic classes are unaffected by this change. Python 2.2 originally used a topological sort of a class's ancestors, but 2.3 now uses the C3 algorithm as described in the paper "A Monotonic Superclass Linearization for Dylan". To understand the motivation for this change, read Michele Simionato's article "Python 2.3 Method Resolution Order", or read the thread on python-dev starting with the message at https://mail.python.org/pipermail/python-dev/2002-October/029035.html. Samuele Pedroni first pointed out the problem and also implemented the fix by coding the C3 algorithm.

  • Python runs multithreaded programs by switching between threads after executing N bytecodes. The default value for N has been increased from 10 to 100 bytecodes, speeding up single-threaded applications by reducing the switching overhead. Some multithreaded applications may suffer slower response time, but that's easily fixed by setting the limit back to a lower number using sys.setcheckinterval(N). The limit can be retrieved with the new sys.getcheckinterval() function.

  • One minor but far-reaching change is that the names of extension types defined by the modules included with Python now contain the module and a '.' in front of the type name. For example, in Python 2.2, if you created a socket and printed its __class__, you'd get this output:

    >>> s = socket.socket()
    >>> s.__class__
    <type 'socket'>
    

    在 2.3 中,您会得到以下信息:

    >>> s.__class__
    <type '_socket.socket'>
    
  • 旧式和新式类之间的不兼容问题之一已被消除:您现在可以为新式类的 __name____bases__ 属性赋值。 对 __bases__ 的赋值有一些限制,与对实例的 __class__ 属性的赋值类似。

字符串的改变

  • in 运算符现在对字符串的作用不同了。 以前,当计算 X in Y 时,XY 都是字符串,X 只能是单字符。 现在情况有所改变;X 可以是任意长度的字符串,如果 XY 的子串,X in Y 将返回 True。 如果 X 是空字符串,结果总是 True

    >>> 'ab' in 'abcd'
    True
    >>> 'ad' in 'abcd'
    False
    >>> '' in 'abcd'
    True
    

    Note that this doesn't tell you where the substring starts; if you need that information, use the find() string method.

  • The strip(), lstrip(), and rstrip() string methods now have an optional argument for specifying the characters to strip. The default is still to remove all whitespace characters:

    >>> '   abc '.strip()
    'abc'
    >>> '><><abc<><><>'.strip('<>')
    'abc'
    >>> '><><abc<><><>\n'.strip('<>')
    'abc<><><>\n'
    >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
    u'\u4001abc'
    >>>
    

    (由 Simon Brunning 提议并由 Walter Dörwald 实现。)

  • The startswith() and endswith() string methods now accept negative numbers for the start and end parameters.

  • Another new string method is zfill(), originally a function in the string module. zfill() pads a numeric string with zeros on the left until it's the specified width. Note that the % operator is still more flexible and powerful than zfill().

    >>> '45'.zfill(4)
    '0045'
    >>> '12345'.zfill(4)
    '12345'
    >>> 'goofy'.zfill(6)
    '0goofy'
    

    (由 Walter Dörwald 贡献。)

  • A new type object, basestring, has been added. Both 8-bit strings and Unicode strings inherit from this type, so isinstance(obj, basestring) will return True for either kind of string. It's a completely abstract type, so you can't create basestring instances.

  • 内部字符串不再是不朽的,现在当它们唯一的引用来自内部字符串字典时,将按照通常的方式进行垃圾回收。(由 Oren Tirosh 实现。)

性能优化

  • 新式类实例的创建速度获得大幅提升;现在已经比经典类更快了!

  • The sort() method of list objects has been extensively rewritten by Tim Peters, and the implementation is significantly faster.

  • Multiplication of large long integers is now much faster thanks to an implementation of Karatsuba multiplication, an algorithm that scales better than the O(n*n) required for the grade-school multiplication algorithm. (Original patch by Christopher A. Craig, and significantly reworked by Tim Peters.)

  • SET_LINENO 操作码现已移除。这可能提供小幅速度提升,具体取决于你编译器的特性。详见 其他的改变和修正 部分的详细解释。(由Michael Hudson移除。)

  • xrange() objects now have their own iterator, making for i in xrange(n) slightly faster than for i in range(n). (Patch by Raymond Hettinger.)

  • 在多个热点区域进行了一些小的调整以提升性能,例如内联函数或移除部分代码。(主要由GvR实现,但许多人都贡献了单个更改。)

2.3 优化的总体结果是 Python 2.3 运行 pystone 基准测试的速度比 Python 2.2 加快了大约 25%。

新增,改进和弃用的模块

一如既往,Python的标准库获得了一些增强和错误修复。以下是一些最显著变化的部分列表,按模块名称字母顺序排序。请查阅源树中的 Misc/NEWS 文件以获取更完整的变化列表,或通过CVS日志查看所有细节。

  • array 模块现在支持使用 'u' 格式字符的Unicode字符数组。数组还支持使用 += 赋值运算符来添加另一个数组的内容,以及使用 *= 赋值运算符来重复数组。(由 Jason Orendorff 贡献。)

  • The bsddb module has been replaced by version 4.1.6 of the PyBSDDB package, providing a more complete interface to the transactional features of the BerkeleyDB library.

    The old version of the module has been renamed to bsddb185 and is no longer built automatically; you'll have to edit Modules/Setup to enable it. Note that the new bsddb package is intended to be compatible with the old module, so be sure to file bugs if you discover any incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled with a new version of the underlying BerkeleyDB library, you will almost certainly have to convert your database files to the new version. You can do this fairly easily with the new scripts db2pickle.py and pickle2db.py which you will find in the distribution's Tools/scripts directory. If you've already been using the PyBSDDB package and importing it as bsddb3, you will have to change your import statements to import it as bsddb.

  • 新的 bz2 模块是 bz2 数据压缩库的接口。bz2 压缩的数据通常比相应的 zlib-压缩数据更小。(由 Gustavo Niemeyer 贡献。)

  • 在新的 datetime 模块中添加了一组标准日期/时间类型。更多详细信息请参见下一节。

  • The Distutils Extension class now supports an extra constructor argument named depends for listing additional source files that an extension depends on. This lets Distutils recompile the module if any of the dependency files are modified. For example, if sampmodule.c includes the header file sample.h, you would create the Extension object like this:

    ext = Extension("samp",
                    sources=["sampmodule.c"],
                    depends=["sample.h"])
    

    修改 sample.h 将导致模块被重新编译。 (由 Jeremy Hylton 贡献。)

  • Other minor changes to Distutils: it now checks for the CC, CFLAGS, CPP, LDFLAGS, and CPPFLAGS environment variables, using them to override the settings in Python's configuration (contributed by Robert Weber).

  • Previously the doctest module would only search the docstrings of public methods and functions for test cases, but it now also examines private ones as well. The DocTestSuite() function creates a unittest.TestSuite object from a set of doctest tests.

  • 新的 gc.get_referents(object) 函数将返回由 object 引用的所有对象组成的列表。

  • The getopt module gained a new function, gnu_getopt(), that supports the same arguments as the existing getopt() function but uses GNU-style scanning mode. The existing getopt() stops processing options as soon as a non-option argument is encountered, but in GNU-style mode processing continues, meaning that options and arguments can be mixed. For example:

    >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
    ([('-f', 'filename')], ['output', '-v'])
    >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
    ([('-f', 'filename'), ('-v', '')], ['output'])
    

    (由 Peter Åstrand 贡献。)

  • 现在 grp, pwdresource 模块将返回加强版的元组:

    >>> import grp
    >>> g = grp.getgrnam('amk')
    >>> g.gr_name, g.gr_gid
    ('amk', 500)
    
  • 现在 gzip 模块能够处理超过 2 GiB 的文件。

  • The new heapq module contains an implementation of a heap queue algorithm. A heap is an array-like data structure that keeps items in a partially sorted order such that, for every index k, heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2]. This makes it quick to remove the smallest item, and inserting a new item while maintaining the heap property is O(lg n). (See https://xlinux.nist.gov/dads//HTML/priorityque.html for more information about the priority queue data structure.)

    The heapq module provides heappush() and heappop() functions for adding and removing items while maintaining the heap property on top of some other mutable Python sequence type. Here's an example that uses a Python list:

    >>> import heapq
    >>> heap = []
    >>> for item in [3, 7, 5, 11, 1]:
    ...    heapq.heappush(heap, item)
    ...
    >>> heap
    [1, 3, 5, 11, 7]
    >>> heapq.heappop(heap)
    1
    >>> heapq.heappop(heap)
    3
    >>> heap
    [5, 7, 11]
    

    (由 Kevin O'Connor 贡献。)

  • The IDLE integrated development environment has been updated using the code from the IDLEfork project (http://idlefork.sourceforge.net). The most notable feature is that the code being developed is now executed in a subprocess, meaning that there's no longer any need for manual reload() operations. IDLE's core code has been incorporated into the standard library as the idlelib package.

  • 模块 imaplib 现在支持通过 SSL 的 IMAP。(由 Piers Lauder 和 Tino Lange 贡献。)

  • The itertools contains a number of useful functions for use with iterators, inspired by various functions provided by the ML and Haskell languages. For example, itertools.ifilter(predicate, iterator) returns all elements in the iterator for which the function predicate() returns True, and itertools.repeat(obj, N) returns obj N times. There are a number of other functions in the module; see the package's reference documentation for details. (Contributed by Raymond Hettinger.)

  • 模块 math 中新增了两个函数,degrees(rads)radians(degs),用于在弧度和度之间进行转换。模块 math 中的其他函数,如 math.sin()math.cos(),一直需要以弧度为单位输入值。此外,math.log() 函数增加了一个可选的 base 参数,以便更容易计算除 e10 以外的对数。(由 Raymond Hettinger 贡献。)

  • Several new POSIX functions (getpgid(), killpg(), lchown(), loadavg(), major(), makedev(), minor(), and mknod()) were added to the posix module that underlies the os module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)

  • In the os module, the *stat() family of functions can now report fractions of a second in a timestamp. Such time stamps are represented as floats, similar to the value returned by time.time().

    During testing, it was found that some applications will break if time stamps are floats. For compatibility, when using the tuple interface of the stat_result time stamps will be represented as integers. When using named fields (a feature first introduced in Python 2.2), time stamps are still represented as integers, unless os.stat_float_times() is invoked to enable float return values:

    >>> os.stat("/tmp").st_mtime
    1034791200
    >>> os.stat_float_times(True)
    >>> os.stat("/tmp").st_mtime
    1034791200.6335014
    

    在 Python 2.4 中,默认将改为总是返回浮点数。

    Application developers should enable this feature only if all their libraries work properly when confronted with floating point time stamps, or if they use the tuple API. If used, the feature should be activated on an application level instead of trying to enable it on a per-use basis.

  • optparse 模块包含一个新的命令行参数解析器,可以将选项值转换为特定的Python类型,并自动生成使用说明。更多详情请参见下一节。

  • The old and never-documented linuxaudiodev module has been deprecated, and a new version named ossaudiodev has been added. The module was renamed because the OSS sound drivers can be used on platforms other than Linux, and the interface has also been tidied and brought up to date in various ways. (Contributed by Greg Ward and Nicholas FitzRoy-Dale.)

  • 新的 platform 模块包含多个函数,尝试确定你正在运行的平台的各项属性。有获取架构、CPU类型、Windows操作系统版本,甚至Linux发行版版本的函数。(由Marc-André Lemburg贡献。)

  • The parser objects provided by the pyexpat module can now optionally buffer character data, resulting in fewer calls to your character data handler and therefore faster performance. Setting the parser object's buffer_text attribute to True will enable buffering.

  • The sample(population, k) function was added to the random module. population is a sequence or xrange object containing the elements of a population, and sample() chooses k elements from the population without replacing chosen elements. k can be any value up to len(population). For example:

    >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
    >>> random.sample(days, 3)      # Choose 3 elements
    ['St', 'Sn', 'Th']
    >>> random.sample(days, 7)      # Choose 7 elements
    ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
    >>> random.sample(days, 7)      # Choose 7 again
    ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
    >>> random.sample(days, 8)      # Can't choose eight
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "random.py", line 414, in sample
          raise ValueError, "sample larger than population"
    ValueError: sample larger than population
    >>> random.sample(xrange(1,10000,2), 10)   # Choose ten odd nos. under 10000
    [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
    

    现在 random 模块使用新的“梅森旋转”算法,并以 C 实现。 它的速度更快并且与之前的算法相比研究更充分。

    (所有改变均由 Raymond Hettinger 贡献。)

  • The readline module also gained a number of new functions: get_history_item(), get_current_history_length(), and redisplay().

  • The rexec and Bastion modules have been declared dead, and attempts to import them will fail with a RuntimeError. New-style classes provide new ways to break out of the restricted execution environment provided by rexec, and no one has interest in fixing them or time to do so. If you have applications using rexec, rewrite them to use something else.

    (Sticking with Python 2.2 or 2.1 will not make your applications any safer because there are known bugs in the rexec module in those versions. To repeat: if you're using rexec, stop using it immediately.)

  • The rotor module has been deprecated because the algorithm it uses for encryption is not believed to be secure. If you need encryption, use one of the several AES Python modules that are available separately.

  • shutil 模块新增了一个 move(src, dest) 函数,用于递归地将文件或目录移动到新位置。

  • 对更高级的 POSIX 信号处理的支持已添加到 signal 模块中,但随后又因无法在各个平台间可靠地工作而被移除。

  • socket 模块现在支持超时设置。你可以在套接字对象上调用 settimeout(t) 方法来设置 t 秒的超时时间。后续的套接字操作如果超过 t 秒未完成将被中止,并引发 socket.timeout 异常。

    最初的超时实现由 Tim O'Malley 完成。Michael Gilfix 将其集成到 Python 的 socket 模块中,并经过长时间的审查。代码提交后,Guido van Rossum 重写了部分内容。(这是一个协作开发过程的良好示例。)

  • 在 Windows,socket 模块现在将附带安全套接字层(SSL)支持。

  • 现在 C PYTHON_API_VERSION 宏的值将在 Python 层级上暴露为 sys.api_version。 当前的异常可通过调用新的 sys.exc_clear() 函数来清除。

  • 新的 tarfile 模块允许读取和写入 tar格式的归档文件。(由 Lars Gustäbel 贡献。)

  • The new textwrap module contains functions for wrapping strings containing paragraphs of text. The wrap(text, width) function takes a string and returns a list containing the text split into lines of no more than the chosen width. The fill(text, width) function returns a single string, reformatted to fit into lines no longer than the chosen width. (As you can guess, fill() is built on top of wrap(). For example:

    >>> import textwrap
    >>> paragraph = "Not a whit, we defy augury: ... more text ..."
    >>> textwrap.wrap(paragraph, 60)
    ["Not a whit, we defy augury: there's a special providence in",
     "the fall of a sparrow. If it be now, 'tis not to come; if it",
     ...]
    >>> print textwrap.fill(paragraph, 35)
    Not a whit, we defy augury: there's
    a special providence in the fall of
    a sparrow. If it be now, 'tis not
    to come; if it be not to come, it
    will be now; if it be not now, yet
    it will come: the readiness is all.
    >>>
    

    The module also contains a TextWrapper class that actually implements the text wrapping strategy. Both the TextWrapper class and the wrap() and fill() functions support a number of additional keyword arguments for fine-tuning the formatting; consult the module's documentation for details. (Contributed by Greg Ward.)

  • The thread and threading modules now have companion modules, dummy_thread and dummy_threading, that provide a do-nothing implementation of the thread module's interface for platforms where threads are not supported. The intention is to simplify thread-aware modules (ones that don't rely on threads to run) by putting the following code at the top:

    try:
        import threading as _threading
    except ImportError:
        import dummy_threading as _threading
    

    In this example, _threading is used as the module name to make it clear that the module being used is not necessarily the actual threading module. Code can call functions and use classes in _threading whether or not threads are supported, avoiding an if statement and making the code slightly clearer. This module will not magically make multithreaded code run without threads; code that waits for another thread to return or to do something will simply hang forever.

  • The time module's strptime() function has long been an annoyance because it uses the platform C library's strptime() implementation, and different platforms sometimes have odd bugs. Brett Cannon contributed a portable implementation that's written in pure Python and should behave identically on all platforms.

  • The new timeit module helps measure how long snippets of Python code take to execute. The timeit.py file can be run directly from the command line, or the module's Timer class can be imported and used directly. Here's a short example that figures out whether it's faster to convert an 8-bit string to Unicode by appending an empty Unicode string to it or by using the unicode() function:

    import timeit
    
    timer1 = timeit.Timer('unicode("abc")')
    timer2 = timeit.Timer('"abc" + u""')
    
    # Run three trials
    print timer1.repeat(repeat=3, number=100000)
    print timer2.repeat(repeat=3, number=100000)
    
    # On my laptop this outputs:
    # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
    # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
    
  • The Tix module has received various bug fixes and updates for the current version of the Tix package.

  • The Tkinter module now works with a thread-enabled version of Tcl. Tcl's threading model requires that widgets only be accessed from the thread in which they're created; accesses from another thread can cause Tcl to panic. For certain Tcl interfaces, Tkinter will now automatically avoid this when a widget is accessed from a different thread by marshalling a command, passing it to the correct thread, and waiting for the results. Other interfaces can't be handled automatically but Tkinter will now raise an exception on such an access so that you can at least find out about the problem. See https://mail.python.org/pipermail/python-dev/2002-December/031107.html for a more detailed explanation of this change. (Implemented by Martin von Löwis.)

  • Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists. This behavior can be controlled through the wantobjects() method of tkapp objects.

    When using _tkinter through the Tkinter module (as most Tkinter applications will), this feature is always activated. It should not cause compatibility problems, since Tkinter would always convert string results to Python types where possible.

    If any incompatibilities are found, the old behavior can be restored by setting the wantobjects variable in the Tkinter module to false before creating the first tkapp object.

    import Tkinter
    Tkinter.wantobjects = 0
    

    由此更改引起的任何问题都应作为错误报告。

  • The UserDict module has a new DictMixin class which defines all dictionary methods for classes that already have a minimum mapping interface. This greatly simplifies writing classes that need to be substitutable for dictionaries, such as the classes in the shelve module.

    Adding the mix-in as a superclass provides the full dictionary interface whenever the class defines __getitem__(), __setitem__(), __delitem__(), and keys(). For example:

    >>> import UserDict
    >>> class SeqDict(UserDict.DictMixin):
    ...     """Dictionary lookalike implemented with lists."""
    ...     def __init__(self):
    ...         self.keylist = []
    ...         self.valuelist = []
    ...     def __getitem__(self, key):
    ...         try:
    ...             i = self.keylist.index(key)
    ...         except ValueError:
    ...             raise KeyError
    ...         return self.valuelist[i]
    ...     def __setitem__(self, key, value):
    ...         try:
    ...             i = self.keylist.index(key)
    ...             self.valuelist[i] = value
    ...         except ValueError:
    ...             self.keylist.append(key)
    ...             self.valuelist.append(value)
    ...     def __delitem__(self, key):
    ...         try:
    ...             i = self.keylist.index(key)
    ...         except ValueError:
    ...             raise KeyError
    ...         self.keylist.pop(i)
    ...         self.valuelist.pop(i)
    ...     def keys(self):
    ...         return list(self.keylist)
    ...
    >>> s = SeqDict()
    >>> dir(s)      # See that other dictionary methods are implemented
    ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
     '__init__', '__iter__', '__len__', '__module__', '__repr__',
     '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
     'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
     'setdefault', 'update', 'valuelist', 'values']
    

    (由 Raymond Hettinger 贡献。)

  • The DOM implementation in xml.dom.minidom can now generate XML output in a particular encoding by providing an optional encoding argument to the toxml() and toprettyxml() methods of DOM nodes.

  • The xmlrpclib module now supports an XML-RPC extension for handling nil data values such as Python's None. Nil values are always supported on unmarshalling an XML-RPC response. To generate requests containing None, you must supply a true value for the allow_none parameter when creating a Marshaller instance.

  • The new DocXMLRPCServer module allows writing self-documenting XML-RPC servers. Run it in demo mode (as a program) to see it in action. Pointing the Web browser to the RPC server produces pydoc-style documentation; pointing xmlrpclib to the server allows invoking the actual methods. (Contributed by Brian Quinlan.)

  • 已添加对国际化域名(RFC 3454、3490、3491 和 3492)的支持。可以使用 "idna" 编码在 Unicode 域名和该名称的 ASCII 兼容编码(ACE)之间进行转换。:

    >{}>{}> u"www.Alliancefrançaise.nu".encode("idna")
    'www.xn--alliancefranaise-npb.nu'
    

    The socket module has also been extended to transparently convert Unicode hostnames to the ACE version before passing them to the C library. Modules that deal with hostnames such as httplib and ftplib) also support Unicode host names; httplib also sends HTTP Host headers using the ACE version of the domain name. urllib supports Unicode URLs with non-ASCII host names as long as the path part of the URL is ASCII only.

    为实现此项更改,增加了 stringprep 模块,mkstringprep 工具以及 punycode 编码格式。

Date/Time 类型

通过 datetime 模块增加了适用于表示时间戳的日期和时间类型。 这些类型并不支持其他的历法或很多丰富的特性,只专注于简单地表示时间。

The three primary types are: date, representing a day, month, and year; time, consisting of hour, minute, and second; and datetime, which contains all the attributes of both date and time. There's also a timedelta class representing differences between two points in time, and time zone logic is implemented by classes inheriting from the abstract tzinfo class.

You can create instances of date and time by either supplying keyword arguments to the appropriate constructor, e.g. datetime.date(year=1972, month=10, day=15), or by using one of a number of class methods. For example, the date.today() class method returns the current local date.

一旦创建,日期/时间类的实例都是不可变的。有几种方法可以从对象生成格式化字符串:

>>> import datetime
>>> now = datetime.datetime.now()
>>> now.isoformat()
'2002-12-30T21:27:03.994956'
>>> now.ctime()  # Only available on date, datetime
'Mon Dec 30 21:27:03 2002'
>>> now.strftime('%Y %d %b')
'2002 30 Dec'

The replace() method allows modifying one or more fields of a date or datetime instance, returning a new instance:

>>> d = datetime.datetime.now()
>>> d
datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
>>> d.replace(year=2001, hour = 12)
datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
>>>

Instances can be compared, hashed, and converted to strings (the result is the same as that of isoformat()). date and datetime instances can be subtracted from each other, and added to timedelta instances. The largest missing feature is that there's no standard library support for parsing strings and getting back a date or datetime.

更多相关信息,请参阅模块的参考文档。 (由 Tim Peters 贡献。)

optparse 模块

getopt 模块提供了简单的命令行参数解析。新的 optparse 模块(原名Optik)提供了更复杂的命令行解析,遵循Unix约定,自动创建 --help 的输出,并且可以为不同的选项执行不同的操作。

You start by creating an instance of OptionParser and telling it what your program's options are.

import sys
from optparse import OptionParser

op = OptionParser()
op.add_option('-i', '--input',
              action='store', type='string', dest='input',
              help='set input filename')
op.add_option('-l', '--length',
              action='store', type='int', dest='length',
              help='set maximum length of output')

Parsing a command line is then done by calling the parse_args() method.

options, args = op.parse_args(sys.argv[1:])
print options
print args

这将返回一个包含所有选项值的对象,以及一个包含剩余参数的字符串列表。

现在,使用各种参数调用脚本将按预期工作。请注意,长度参数会自动转换为整数。

$ ./python opt.py -i data arg1
<Values at 0x400cad4c: {'input': 'data', 'length': None}>
['arg1']
$ ./python opt.py --input=data --length=4
<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
[]
$

帮助信息会自动为你生成:

$ ./python opt.py --help
usage: opt.py [options]

options:
  -h, --help            show this help message and exit
  -iINPUT, --input=INPUT
                        set input filename
  -lLENGTH, --length=LENGTH
                        set maximum length of output
$

有关更多详细信息,请参见模块的文档。

Optik 由 Greg Ward 编写,吸收了 Getopt SIG 读者的建议。

Pymalloc:一种专用对象分配器

Pymalloc 是由 Vladimir Marangozov 编写的一种专用对象分配器,作为特性添加到 Python 2.1 中。Pymalloc 旨在比系统的 malloc() 更快,并且对于典型的 Python 程序分配模式具有更低的内存开销。该分配器使用 C 的 malloc() 函数获取大内存池,然后从这些池中满足较小的内存请求。

在 2.1 和 2.2 版中,pymalloc 是一个实验性特性,默认不启用;你必须在编译 Python 时通过向 configure 脚本提供 --with-pymalloc 选项来显式启用它。在 2.3 版中,pymalloc 进行了进一步改进,现在默认启用;你需要提供 --without-pymalloc 选项来禁用它。

这一更改对用 Python 编写的代码是透明的;然而,pymalloc 可能会暴露 C 扩展中的错误。C 扩展模块的作者应该测试其代码在 pymalloc 启用时的表现,因为一些不正确的代码可能导致运行时核心转储。

有一个特别常见的错误会导致问题。Python的C API中有许多内存分配函数,之前只是C库中 malloc()free() 的别名,这意味着如果你不小心调用了不匹配的函数,错误可能不会被注意到。当对象分配器启用时,这些函数不再是 malloc()free() 的别名,调用错误的函数来释放内存可能会导致核心转储。例如,如果内存是使用 PyObject_Malloc() 分配的,它必须使用 PyObject_Free() 来释放,而不是 free()。Python自带的一些模块就遇到了这个问题,并且必须修复;毫无疑问,还有更多的第三方模块也会存在同样的问题。

作为这一变化的一部分,令人困惑的多种内存分配接口已被整合为两个API家族。使用一个家族分配的内存不得使用另一个家族的函数进行操作。有一个家族用于分配内存块,另一个家族的函数专门用于分配Python对象。

感谢Tim Peters的大量工作,2.3中的pymalloc还提供了调试功能,用于捕获扩展模块和解释器本身的内存覆盖和双重释放。要启用此支持,请通过运行带:option:!--with-pydebug`选项的 :program:`configure 命令来编译Python解释器的调试版本。

为了帮助扩展开发者,Python 2.3的源代码中分发了一个头文件:file:Misc/pymemcompat.h,它允许Python扩展在编译针对自1.5.2以来的任何版本的Python时使用2.3的内存分配接口。你需要从Python的源代码分发中复制该文件,并将其与你的扩展源代码捆绑在一起。

参见

https://hg.python.org/cpython/file/default/Objects/obmalloc.c

有关pymalloc实现的详细信息,请参阅Python源代码中 Objects/obmalloc.c 文件顶部的注释。上述链接指向python.org SVN浏览器中的该文件。

构建和 C API 的改变

针对 Python 构建过程和 C API 的改变包括:

  • 垃圾收集使用的循环检测实现已被证明是稳定的,因此现在已被设为强制性。你不能再编译不带此功能的Python,且 configure 中的 --with-cycle-gc 选项已被移除。

  • 现在可以可选地将Python构建为共享库(libpython2.3.so),方法是在运行Python的 configure 脚本时提供 --enable-shared 选项。(由 Ondrej Palkovsky 贡献。)

  • The DL_EXPORT and DL_IMPORT macros are now deprecated. Initialization functions for Python extension modules should now be declared using the new macro PyMODINIT_FUNC, while the Python core will generally use the PyAPI_FUNC and PyAPI_DATA macros.

  • 通过向 configure 脚本提供 --without-doc-strings 选项,可以编译不带任何内置函数和模块文档字符串的解释器。这使得Python可执行文件大约减小了10%,但也意味着你无法获取Python内置函数的帮助。(由 Gustavo Niemeyer 贡献。)

  • The PyArg_NoArgs() macro is now deprecated, and code that uses it should be changed. For Python 2.2 and later, the method definition table can specify the METH_NOARGS flag, signalling that there are no arguments, and the argument checking can then be removed. If compatibility with pre-2.2 versions of Python is important, the code could use PyArg_ParseTuple(args, "") instead, but this will be slower than using METH_NOARGS.

  • PyArg_ParseTuple() accepts new format characters for various sizes of unsigned integers: B for unsigned char, H for unsigned short int, I for unsigned int, and K for unsigned long long.

  • 新增了一个函数 PyObject_DelItemString(mapping, char *key),作为 PyObject_DelItem(mapping, PyString_New(key)) 的简写。

  • 文件对象现在以不同的方式管理其内部字符串缓冲区,需要在时按指数增加。这导致 Lib/test/test_bufio.py 中的基准测试速度显著提升(根据一次测量,从 57 秒提升到 1.7 秒)。

  • It's now possible to define class and static methods for a C extension type by setting either the METH_CLASS or METH_STATIC flags in a method's PyMethodDef structure.

  • Python 现在包含 Expat XML 解析器的源代码副本,消除了对系统版本或本地安装的 Expat 的依赖。

  • If you dynamically allocate type objects in your extension, you should be aware of a change in the rules relating to the __module__ and __name__ attributes. In summary, you will want to ensure the type's dictionary contains a '__module__' key; making the module name the part of the type name leading up to the final period will no longer have the desired effect. For more detail, read the API reference documentation or the source.

移植专属的改变

Support for a port to IBM's OS/2 using the EMX runtime environment was merged into the main Python source tree. EMX is a POSIX emulation layer over the OS/2 system APIs. The Python port for EMX tries to support all the POSIX-like capability exposed by the EMX runtime, and mostly succeeds; fork() and fcntl() are restricted by the limitations of the underlying emulation layer. The standard OS/2 port, which uses IBM's Visual Age compiler, also gained support for case-sensitive import semantics as part of the integration of the EMX port into CVS. (Contributed by Andrew MacIntyre.)

在 MacOS 上,大多数工具箱模块已经通过弱链接来提高向后兼容性。这意味着如果当前 OS 版本中缺少单个例程,模块将不再加载失败。相反,调用缺失的例程将引发异常。(由 Jack Jansen 贡献。)

位于 Python 源代码分发版中的 Misc/RPM/ 目录中的 RPM 规范文件已更新为 2.3 版。(由 Sean Reifschneider 贡献。)

Python 现在支持的其他新平台包括 AtheOS(http://www.atheos.cx/)、GNU/Hurd 和 OpenVMS。

其他的改变和修正

一如既往,源代码树中散布着许多其他改进和错误修复。通过搜索 CVS 更改日志,发现在 Python 2.2 和 2.3 之间应用了 523 个补丁并修复了 514 个错误。这两个数字可能都被低估了。

一些较为重要的改变:

  • 如果设置了 PYTHONINSPECT 环境变量,Python 解释器在运行 Python 程序后会进入交互式提示符,就像使用 -i 选项调用 Python 一样。环境变量可以在运行 Python 解释器之前设置,也可以由 Python 程序在其执行过程中设置。

  • regrtest.py 脚本现在提供了一种方法,允许“所有资源除了 foo。”传递给 -u 选项的资源名现在可以以连字符('-')为前缀,表示“移除此资源”。例如,选项 '-uall,-bsddb' 可以用来启用除 bsddb 以外的所有资源。

  • 用于构建文档的工具现在在 Cygwin 和 Unix 下都能工作。

  • SET_LINENO 操作码已被移除。在很久以前,这个操作码用于在追溯中生成行号和支持跟踪函数(例如,pdb)。自 Python 1.5 起,追溯中的行号已通过不同的机制计算,该机制与“python -O”兼容。对于 Python 2.3,Michael Hudson 实现了一个类似的方案来确定何时调用跟踪函数,完全移除了对 SET_LINENO 的需求。

    从 Python 代码中很难检测到由此产生的差异,除非在未使用 -O 的情况下运行 Python 时会有轻微的速度提升。

    C extensions that access the f_lineno field of frame objects should instead call PyCode_Addr2Line(f->f_code, f->f_lasti). This will have the added effect of making the code work as desired under "python -O" in earlier versions of Python.

    A nifty new feature is that trace functions can now assign to the f_lineno attribute of frame objects, changing the line that will be executed next. A jump command has been added to the pdb debugger taking advantage of this new feature. (Implemented by Richie Hindle.)

移植到 Python 2.3

本节列出了先前描述的可能需要修改你的代码的改变:

  • 现在 yield 始终是一个关键字;如果它在你的代码中被用作变量名,则必须选择不同的名称。

  • 对于字符串 XYX in Y 现在当 X 长度超过一个字符时也是有效的。

  • 现在 int() 类型构造器在字符串或浮点数因太大而无法以整数类型来容纳时将返回一个长整数而不是引发 OverflowError

  • 如果你的 Unicode 字符串包含 8 位字符,你必须通过在文件顶部添加注释来声明文件的编码(UTF-8、Latin-1 或其他)。更多信息请参见 PEP 263: 源代码的字符编码格式 部分。

  • Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists.

  • 0xffffffff 这样的大型八进制和十六进制字面量现在会触发 FutureWarning。目前它们被存储为 32 位数字,并导致负值,但在 Python 2.4 中,它们将变为正的长整数。

    有几种方法可以修复这个警告。如果你确实需要一个正数,只需在字面量末尾添加一个 L。如果你试图获取一个低位设置的 32 位整数,并且之前使用了类似 ~(1 << 31) 的表达式,最清晰的方法是从所有位都设置开始,然后清除所需的高位。例如,要仅清除最高位(位 31),你可以写 0xffffffffL &~(1L<<31)

  • 你不能再通过赋值给 __debug__ 来禁用断言。

  • The Distutils setup() function has gained various new keyword arguments such as depends. Old versions of the Distutils will abort if passed unknown keywords. A solution is to check for the presence of the new get_distutil_options() function in your setup.py and only uses the new keywords with a version of the Distutils that supports them:

    from distutils import core
    
    kw = {'sources': 'foo.c', ...}
    if hasattr(core, 'get_distutil_options'):
        kw['depends'] = ['foo.h']
    ext = Extension(**kw)
    
  • 现在使用 None 作为变量名将会导致 SyntaxWarning 警告。

  • 由 Python 包含的模块定义的扩展类型名称现在包含模块和一个在类型名称前的 '.'

致谢

作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler, Just van Rossum.