Python 2.3 有什么新变化

作者:

A.M. Kuchling

本文介绍了 Python 2.3 的新特性。 Python 2.3 发布于 2003 年 7 月 29 日。

Python 2.3 的主要主题是完善在 2.2 中添加的一些功能、为核心语言添加各种小但实用的增强功能,以及扩展标准库。 上一版本引入的新对象模型已经受益于 18 个月的错误修复和优化努力,这些优化提升了新式类的性能。 新增了几个内置函数,例如 sum()enumerate()in 操作符现在可以用于子字符串搜索 (例如,"ab" in "abc" 将返回 True)。

许多新库功能包括布尔值、集合、堆、日期/时间数据类型,从ZIP格式的归档文件中导入模块的能力,期待已久的 Python 目录的元数据支持,更新版本的 IDLE,以及用于日志记录、文本包装、解析 CSV 文件、处理命令行选项、使用 BerkeleyDB 数据库的模块…… 新模块和增强模块的列表相当长。

本文并不试图提供对新功能的完整规范,而是提供了一个方便的概览。 有关详细信息,你应该参考 Python 2.3 的文档,例如 Python 库参考和 Python 参考手册。 如果你想了解完整的实现和设计原理,请参阅特定新功能的 PEP。

PEP 218: 标准集合数据类型

新的 sets 模块包含一个集合数据类型的实现。 Set 类用于可变集合,即可以添加和删除成员的集合。 ImmutableSet 类用于不可修改的集合,因此 ImmutableSet 的实例可以用作字典的键。 集合是基于字典构建的,因此集合中的元素必须是可哈希的。

这是一个简单的示例:

>>> import sets
>>> S = sets.Set([1,2,3])
>>> S
Set([1, 2, 3])
>>> 1 in S
True
>>> 0 in S
False
>>> S.add(5)
>>> S.remove(3)
>>> S
Set([1, 2, 5])
>>>

集合的并集和交集可以通过 union()intersection() 方法计算;另一种表示法是使用按位操作符 &|。 可变集合还具有这些方法的原地版本,分别为 union_update()intersection_update()

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([4,5,6])
>>> S1.union(S2)
Set([1, 2, 3, 4, 5, 6])
>>> S1 | S2                  # Alternative notation
Set([1, 2, 3, 4, 5, 6])
>>> S1.intersection(S2)
Set([])
>>> S1 & S2                  # Alternative notation
Set([])
>>> S1.union_update(S2)
>>> S1
Set([1, 2, 3, 4, 5, 6])
>>>

还可以计算两个集合的对称差集。 这是并集中不在交集中的所有元素。 换句话说,对称差集包含所有只在一个集合中的元素。 同样,还有一种替代表示法是使用按位操作符 (^),并且有一个原地修改版本,名字比较长,叫 symmetric_difference_update()

>>> S1 = sets.Set([1,2,3,4])
>>> S2 = sets.Set([3,4,5,6])
>>> S1.symmetric_difference(S2)
Set([1, 2, 5, 6])
>>> S1 ^ S2
Set([1, 2, 5, 6])
>>>

另外还有 issubset()issuperset() 方法用来检查一个集合是否为另一个集合的子集或超集:

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([2,3])
>>> S2.issubset(S1)
True
>>> S1.issubset(S2)
False
>>> S1.issuperset(S2)
True
>>>

参见

PEP 218 - 添加内置Set对象类型

PEP 由 Greg V. Wilson 撰写 ; 由 Greg V. Wilson, Alex Martelli 和 GvR 实现。

PEP 255: 简单的生成器

在 Python 2.2 中,生成器作为一个可选特性被添加,需要通过 from __future__ import generators 指令来启用。 在 2.3 版本中,生成器不再需要特别启用,现在总是存在;这意味着 yield 现在始终是一个关键字。 本节的其余部分是从《Python 2.2的新特性》文档中复制的生成器描述;如果你在 Python 2.2 发布时已经阅读过,可以跳过本节的其余部分。

你一定熟悉在 Python 或 C 语言中函数调用的工作方式。 当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。 当函数执行到 return 语句时,这些局部变量会被销毁,并将结果值返回给调用者。 稍后对同一个函数的调用将获得一套全新的局部变量。 但是,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。

这里是一个生成器函数的最简示例:

def generate_ints(N):
    for i in range(N):
        yield i

一个新的关键字 yield 被引入用于生成器。 任何包含 yield 语句的函数都是生成器函数;这由 Python 的字节码编译器检测到,并因此对函数进行特殊编译。

当您调用生成器函数时,它不会返回一个单独的值;相反,它会返回一个支持迭代器协议的生成器对象。在执行 yield 语句时,生成器会输出 i 的值 ,类似于 return 语句。 yieldreturn 语句之间的最大区别在于,在到达 yield 时,生成器的执行状态会暂停,并保留本地变量。 在下一次调用生成器 的 .next() 方法时,函数将在 yield 语句之后立即恢复执行。 (由于复杂的原因,yield 语句不允许在 try...finally 语句的 try 代码块内出现;有关 yield 和异常之间交互的完整解释,请阅读 PEP 255。)

下面是 generate_ints() 生成器的用法示例:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "stdin", line 1, in ?
  File "stdin", line 2, in generate_ints
StopIteration

你可以等价地写成 for i in generate_ints(5)a,b,c = generate_ints(3)

在生成器函数内部, return 语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return,例如 return 5,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration 异常来指示,或者只是让执行流自然地从函数底部流出。

你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count 设置为0,并让 next() 方法递增 self.count 并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py 包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Lib/test/test_generators.py 中还有另外两个例子,它们分别解决了N皇后问题(在$NxN$的棋盘上放置$N$个皇后,使得没有任何皇后威胁到其他皇后)和骑士巡游问题(在$NxN$的棋盘上,骑士访问每一个方格且不重复访问任何方格的路径)。

生成器的概念源自其他编程语言,尤其是 Icon(https://www2.cs.arizona.edu/icon/ ),在 Icon 语言中,生成器的概念是核心。在 Icon 中,每个表达式和函数调用生成器的概念源自其他编程语言,尤其是 Icon。 在 Icon 中,每个表达式和函数调用都可以表现得像一个生成器。 以下是来自“Icon 编程语言概述”中的一个示例,展示了生成器的用法 https://www2.cs.arizona.edu/icon/docs/ipd266.htm

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

在Icon中,find() 函数返回子字符串"or"所在的索引:3、23、33。在 if 语句中,i 首先被赋值为 3,但 3 小于 5,因此比较失败,Icon 会使用第二个值 23 进行重试。 23 大于 5,因此比较成功,代码将值 23 打印到屏幕上。

Python并不像Icon那样将生成器作为核心概念来采用。生成器被视为Python核心语言的一部分,但学习或使用它们并不是强制的;如果它们不能解决你遇到的问题,可以完全忽略它们。与Icon相比,Python接口的一个新颖特性是生成器的状态表示为一个具体的对象(迭代器),可以传递给其他函数或存储在数据结构中。

参见

PEP 255 - 简单生成器

由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 Python Labs 团队的修正。

PEP 263: 源代码的字符编码格式

现在可以声明Python源文件使用不同的字符集编码。通过在源文件的第一行或第二行包含特定格式的注释来声明编码。例如,一个UTF-8文件可以这样声明:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

如果没有这样的编码声明,默认使用7位ASCII编码。执行或导入包含8位字符的字符串字面量且没有编码声明的模块时,在Python 2.3中会触发 DeprecationWarning 警告;而在Python 2.4中,这将成为语法错误

编码声明只影响Unicode字符串字面量,这些字面量将使用指定的编码转换为Unicode。请注意,Python的标识符仍然限制为ASCII字符,因此变量名不能使用超出常规字母数字字符范围的字符。

参见

PEP 263 - 定义 Python 源代码的编码格式

由 Marc-André Lemburg 和 Martin von Löwis 撰写 ; 由 Suzuki Hisao 和 Martin von Löwis 实现。

PEP 273: 从ZIP压缩包导入模块

The new zipimport module adds support for importing modules from a ZIP-format archive. You don't need to import the module explicitly; it will be automatically imported if a ZIP archive's filename is added to sys.path. For example:

amk@nyman:~/src/python$ unzip -l /tmp/example.zip
Archive:  /tmp/example.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
     8467  11-26-02 22:30   jwzthreading.py
 --------                   -------
     8467                   1 file
amk@nyman:~/src/python$ ./python
Python 2.3 (#1, Aug 1 2003, 19:54:32)
>>> import sys
>>> sys.path.insert(0, '/tmp/example.zip')  # Add .zip file to front of path
>>> import jwzthreading
>>> jwzthreading.__file__
'/tmp/example.zip/jwzthreading.py'
>>>

An entry in sys.path can now be the filename of a ZIP archive. The ZIP archive can contain any kind of files, but only files named *.py, *.pyc, or *.pyo can be imported. If an archive only contains *.py files, Python will not attempt to modify the archive by adding the corresponding *.pyc file, meaning that if a ZIP archive doesn't contain *.pyc files, importing may be rather slow.

A path within the archive can also be specified to only import from a subdirectory; for example, the path /tmp/example.zip/lib/ would only import from the lib/ subdirectory within the archive.

参见

PEP 273 - 从 ZIP 压缩包导入模块

由James C. Ahlstrom撰写,并提供了一个实现。Python 2.3遵循 PEP 273 中的规范,但使用了Just van Rossum编写的实现,该实现利用了 PEP 302 中描述的导入钩子。有关新导入钩子的描述,请参见 PEP 302: 新导入钩子 的相关部分。

PEP 277: 针对 Windows NT 的 Unicode 文件名支持

在Windows NT、2000和XP上,系统将文件名存储为Unicode字符串。传统上,Python将文件名表示为字节字符串,这种方式不够完善,因为它会导致某些文件名无法访问。

Python now allows using arbitrary Unicode strings (within the limitations of the file system) for all functions that expect file names, most notably the open() built-in function. If a Unicode string is passed to os.listdir(), Python now returns a list of Unicode strings. A new function, os.getcwdu(), returns the current directory as a Unicode string.

字节串仍可被用作文件名,并且在 Windows 上 Python 将透明地使用 mbcs 编码格式将其转换为 Unicode。

Other systems also allow Unicode strings as file names but convert them to byte strings before passing them to the system, which can cause a UnicodeError to be raised. Applications can test whether arbitrary Unicode strings are supported as file names by checking os.path.supports_unicode_filenames, a Boolean value.

在 MacOS 下,os.listdir() 现在可以返回 Unicode 文件名。

参见

PEP 277 - 针对 Windows NT 的 Unicode 文件名支持

由 Neil Hodgson 撰写 ; 由 Neil Hodgson, Martin von Löwis 和 Mark Hammond 实现。

PEP 278: 通用换行支持

目前使用的三大操作系统是微软的 Windows、苹果的 Macintosh OS 和各种 Unix 衍生系统。跨平台工作的一个小麻烦是,这三个平台都使用不同的字符来标记文本文件中的行结束。Unix 使用换行符(ASCII 字符 10),MacOS 使用回车符(ASCII 字符 13),Windows 使用回车符加换行符的双字符序列。

Python 的文件对象现在可以支持与 Python 运行平台不同的行结束约定。使用 'U''rU' 模式打开文件将以 universal newlines 模式打开文件供读取。 所有这三种行结束约定都将在各种文件方法如 read()readline() 返回的字符串中翻译为 '\n'

在导入模块和使用 execfile() 函数执行文件时,也会使用通用换行支持。 这意味着 Python 模块可以在所有三种操作系统之间共享,而无需转换行尾。

在编译 Python 时,可以通过在运行 Python 的 configure 脚本时指定 --without-universal-newlines 开关禁用该功能。

参见

PEP 278 - 通用换行支持

由 Jack Jansen 撰写并实现。

PEP 279: enumerate()

新的内置函数 enumerate() 将使某些循环更加清晰。 在 enumerate(thing) 中,如果 thing 是迭代器或序列,则返回一个迭代器,该迭代器将返回 (0, thing[0])(1, thing[1])(2, thing[2]),以此类推。

改变一个列表中每个元素的常见写法看起来像是这样:

for i in range(len(L)):
    item = L[i]
    # ... compute some result based on item ...
    L[i] = result

可以使用 enumerate() 重写为:

for i, item in enumerate(L):
    # ... compute some result based on item ...
    L[i] = result

参见

PEP 279 - 内置函数 enumerate()

由 Raymond D. Hettinger 撰写并实现。

PEP 282: logging 包

Python 2.3 中新增了一个用于编写日志的标准软件包 logging。 它为生成日志输出提供了一个强大而灵活的机制,这些输出可以通过各种方式进行过滤和处理。用标准格式编写的配置文件可以用来控制程序的日志行为。 Python 包含的处理器可以将日志记录写入标准错误、文件或套接字,发送到系统日志,甚至通过电子邮件发送到特定地址;当然,您也可以编写自己的处理器类。

The Logger class is the primary class. Most application code will deal with one or more Logger objects, each one used by a particular subsystem of the application. Each Logger is identified by a name, and names are organized into a hierarchy using . as the component separator. For example, you might have Logger instances named server, server.auth and server.network. The latter two instances are below server in the hierarchy. This means that if you turn up the verbosity for server or direct server messages to a different handler, the changes will also apply to records logged to server.auth and server.network. There's also a root Logger that's the parent of all other loggers.

为了简化使用,logging 包提供了一些始终使用根日志的便捷函数:

import logging

logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

这会产生以下输出:

WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down

在默认配置中,信息和调试信息被忽略,输出被发送到标准错误。 你可以通过调用根日志记录器上的 setLevel() 方法来启用信息和调试信息的显示。

请注意 warning() 调用使用了字符串格式化运算符;所有记录信息的函数都使用参数 (msg, arg1, arg2, ...),并记录 msg % (arg1, arg2, ...) 产生的字符串。

还有一个 exception() 函数可记录最近的回溯。如果为关键字参数 exc_info 指定了真值,其他函数也会记录回溯:

def f():
    try:    1/0
    except: logging.exception('Problem recorded')

f()

这会产生以下输出:

ERROR:root:Problem recorded
Traceback (most recent call last):
  File "t.py", line 6, in f
    1/0
ZeroDivisionError: integer division or modulo by zero

Slightly more advanced programs will use a logger other than the root logger. The getLogger(name) function is used to get a particular log, creating it if it doesn't exist yet. getLogger(None) returns the root logger.

log = logging.getLogger('server')
 ...
log.info('Listening on port %i', port)
 ...
log.critical('Disk full')
 ...

日志记录通常会向上传播,因此 serverroot 也会看到记录到 server.auth 的信息,但 Logger 可以通过将其 propagate 属性设置为 False 来避免这种情况。

There are more classes provided by the logging package that can be customized. When a Logger instance is told to log a message, it creates a LogRecord instance that is sent to any number of different Handler instances. Loggers and handlers can also have an attached list of filters, and each filter can cause the LogRecord to be ignored or can modify the record before passing it along. When they're finally output, LogRecord instances are converted to text by a Formatter class. All of these classes can be replaced by your own specially written classes.

logging 软件包具有所有这些功能,即使是最复杂的应用程序也能灵活运用。 本文仅是对其功能的不完整概述,因此请参阅软件包的参考文档了解所有细节。 阅读 PEP 282 也会有所帮助。

参见

PEP 282 - Logging 系统

由 Vinay Sajip 和 Trent Mick 撰写 ; 由 Vinay Sajip 实现。

PEP 285: 布尔类型

Python 2.3 中增加了布尔类型。 __builtin__ 模块中新增了两个常量: TrueFalse。 (TrueFalse 常量被添加到了 Python 2.2.1 的内置模块中,但 2.2.1 版本的常量只是被设置为 1 和 0 的整数值,并不是一种不同的类型。)

这个新类型的类型对象名为 bool;它的构造函数接收任何 Python 值,并将其转换为 TrueFalse。:

>>> bool(1)
True
>>> bool(0)
False
>>> bool([])
False
>>> bool( (1,) )
True

大多数标准库模块和内置函数都改为返回布尔值:

>>> obj = []
>>> hasattr(obj, 'append')
True
>>> isinstance(obj, list)
True
>>> isinstance(obj, tuple)
False

添加 Python 布尔运算的主要目的是使代码更清晰。 例如,如果您在阅读一个函数时遇到 return 1 语句,您可能会想知道 1 代表的是布尔真值、索引还是乘以其他量的系数。 然而,如果语句是 return True,返回值的含义就非常清楚了。

Python 的布尔值 不是 为了严格的类型检查而添加的。 像 Pascal 这样非常严格的语言也会阻止您使用布尔进行算术运算,并要求 if 语句中的表达式总是求布尔结果。 正如 PEP 285 所明确指出的,Python 没有这么严格,以后也不会有。 这意味着您仍然可以在 if 语句中使用任何表达式,甚至是求值为 list、tuple 或一些随机对象的表达式。 布尔类型是 int 类的子类,因此使用布尔值进行算术运算仍然有效:

>>> True + 1
2
>>> False + 1
1
>>> False * 75
0
>>> True * 75
75

用一句话概括 TrueFalse: 它们是拼写整数值 1 和 0 的另一种方式,唯一不同的是 str()repr() 返回的字符串是 'True''False',而不是 '1''0'

参见

PEP 285 - 添加布尔类型

由 GvR 撰写并实现。

PEP 293: 编解码器错误处理回调

将 Unicode 字符串编码为字节字符串时,可能会遇到无法编码的字符。 到目前为止,Python 允许将错误处理指定为 "strict" (引发 UnicodeError)、"ignore" (跳过该字符) 或 "replace" (在输出字符串中使用问号),其中 "strict" 是默认行为。 可能需要指定对此类错误的其他处理方式,例如在转换后的字符串中插入 XML 字符引用或 HTML 实体引用。

Python 现在有一个灵活的框架,可以添加不同的处理策略。可以通过 codecs.register_error() 添加新的错误处理器,然后编解码器可以通过 codecs.lookup_error() 访问错误处理器。 错误处理器会获取必要的状态信息,如正在转换的字符串、字符串中检测到错误的位置以及目标编码。 然后,处理器可以引发异常或返回替换字符串。

使用该框架还实现了两个额外的错误处理器: "backslashreplace" 使用 Python 反斜杠引号来表示无法编码的字符,而 "xmlcharrefreplace" 则转换为 XML 字符引用。

参见

PEP 293 - 编解码器错误处理回调

由 Walter Dörwald 撰写并实现。

PEP 301: Distutils的软件包索引和元数据

广受期待的对 Python 编目的支持在 2.3 版中首次出现。

编目功能的核心是新的 Distutils register 命令。 运行 python setup.py register 将会收集描述软件包的元数据,例如其名称、版本、维护者、描述信息等等,并将其发送给中央编目服务器。 结果编目数据可在 https://pypi.org 获取。

为了使目录更加有用,Distutils 的 setup() 函数中新增了一个可选的 classifiers 关键字参数。 可以提供一系列 Trove 风格的字符串来帮助对软件进行分类。

下面是一个带有分类器的 setup.py 示例,其编写是为了兼容旧版本的 Distutils:

from distutils import core
kw = {'name': "Quixote",
      'version': "0.5.1",
      'description': "A highly Pythonic Web application framework",
      # ...
      }

if (hasattr(core, 'setup_keywords') and
    'classifiers' in core.setup_keywords):
    kw['classifiers'] = \
        ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
         'Environment :: No Input/Output (Daemon)',
         'Intended Audience :: Developers'],

core.setup(**kw)

完整的 classifiers 列表可通过运行 python setup.py register --list-classifiers 来获取。

参见

PEP 301 - Distutils 的软件包索引和元数据

由 Richard Jones 撰写并实现。

PEP 302: 新导入钩子

虽然自从在 Python 1.3 中引入 ihooks 模块后,就可以编写自定义导入钩子了,但由于编写新的导入钩子既困难又混乱,所以从来没有人对它真正满意过。 曾有人提出过各种替代方案,如 imputiliu 模块,但都没有得到广泛认可,而且都不容易从 C 代码中使用。

PEP 302 借鉴了其前身,尤其是 Gordon McMillan 的 iu 模块。 sys 模块新增了三个条目:

  • sys.path_hooks 是一个可调用对象列表,通常是类。 每个可调用对象都接收一个包含路径的字符串,然后返回一个可处理从该路径导入的导入器对象,如果不能处理该路径,则引发 ImportError 异常。

  • sys.path_importer_cache 会缓存每条路径的导入器对象,因此 sys.path_hooks 只需为每条路径遍历一次。

  • sys.meta_path 是一个导入器对象列表,在检查 sys.path 之前将遍历该列表。 该列表最初为空,但用户代码可以向其中添加对象。 其他内置模块和冻结模块可以通过添加到该列表中的对象导入。

Importer objects must have a single method, find_module(fullname, path=None). fullname will be a module or package name, e.g. string or distutils.core. find_module() must return a loader object that has a single method, load_module(fullname), that creates and returns the corresponding module object.

因此,Python 新导入逻辑的伪代码如下 (略有简化;详情请参见 PEP 302):

for mp in sys.meta_path:
    loader = mp(fullname)
    if loader is not None:
        <module> = loader.load_module(fullname)

for path in sys.path:
    for hook in sys.path_hooks:
        try:
            importer = hook(path)
        except ImportError:
            # ImportError, so try the other path hooks
            pass
        else:
            loader = importer.find_module(fullname)
            <module> = loader.load_module(fullname)

# Not found!
raise ImportError

参见

PEP 302 - 新导入钩

由 Just van Rossum 和 Paul Moore 撰写 ; 由 Just van Rossum 实现。

PEP 305: 逗号分隔文件

以逗号作为分隔符的文件是一种常用于从数据库和电子表格导出数据的格式。 Python 2.3 增加了一个针对逗号分隔文件的解析器。

逗号分隔文件乍一看非常简单:

Costs,150,200,3.95

读取一行并调用 line.split(','): 再简单不过了吧? 但是考虑到可能包含逗号的字符串数据,事件就变得复杂起来:

"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"

一个大的丑陋的正则表达式可以解析这些内容,但使用新的 csv 软件包要简单得多:

import csv

input = open('datafile', 'rb')
reader = csv.reader(input)
for line in reader:
    print line

reader() 函数有多种不同的选项。 字段分隔符不限于逗号,可以改为任何字符,引号和行尾字符也是如此。

Different dialects of comma-separated files can be defined and registered; currently there are two dialects, both used by Microsoft Excel. A separate csv.writer class will generate comma-separated files from a succession of tuples or lists, quoting strings that contain the delimiter.

参见

该实现在“Python 增强提议” - PEP 305 (CSV 文件 API) 中被提出

由 Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells 撰写并实现。

PEP 307:对 pickle 的改进

picklecPickle 模块在 2.3 开发周期中受到了关注。 在 2.2 中,新式类的 pickle 并不困难,但 pickle 得并不紧凑;PEP 307 引用了一个微不足道的例子,在这个例子中,新式类的 pickle 字符串比经典类的 pickle 字符串长三倍。

解决办法就是发明一种新的 pickle 协议。 pickle.dumps() 函数很早就支持文本或二进制标志。 在 2.3 中,该标志从布尔值重新定义为整数:0 表示旧的文本模式 pickle 格式,1 表示旧的二进制格式,现在 2 表示新的 2.3 专用格式。 一个新常量 pickle.HIGHEST_PROTOCOL 可用来选择最先进的协议。

unpickle 不再被视为安全操作。 2.2 的 pickle 提供了钩子,试图阻止不安全的类被 unpickle (特别是 __safe_for_unpickling__ 属性),但这些代码都没有经过审计,因此在 2.3 中都被删除了。 在任何版本的 Python 中,您都不应该 unpickle 不信任的数据。

To reduce the pickling overhead for new-style classes, a new interface for customizing pickling was added using three special methods: __getstate__(), __setstate__(), and __getnewargs__(). Consult PEP 307 for the full semantics of these methods.

为了进一步压缩 pickle 类,现在可以使用整数代码而不是长字符串来标识 pickle 类。 Python 软件基金会将维护一个标准化代码列表;还有一系列供私人使用的代码。 目前还没有指定任何代码。

参见

PEP 307 - pickle 协议的扩展

PEP 由 Guido van Rossum 和 Tim Peters 撰写和实现。

扩展切片

从 Python 1.4 开始,切片语法支持可选的第三个“step”或“stride”参数。例如,这些都是合法的 Python 语法: L[1:10:2]L[:-1:1]L[::-1]。 这是应 Numerical Python 开发者的要求添加到 Python 中的,因为 Numerical Python 广泛使用第三个参数。 然而,Python 内置的 list、tuple 和字符串序列类型从未支持过这一特性,如果您尝试使用,会引发 TypeError。 Michael Hudson 提供了一个补丁来修复这一缺陷。

例如,您现在可以轻松地提取出具有偶数索引的列表元素:

>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]

也可以用负值以按相反顺序复制相同的列表:

>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

这也适用于元组、数组和字符串:

>>> s='abcd'
>>> s[::2]
'ac'
>>> s[::-1]
'dcba'

如果你有一个可变序列如列表或数组,你可以对扩展切片进行赋值或删除,但对扩展切片的赋值与对常规切片的赋值有一些区别。对常规片段的赋值可以用来改变序列的长度:

>>> a = range(3)
>>> a
[0, 1, 2]
>>> a[1:3] = [4, 5, 6]
>>> a
[0, 4, 5, 6]

扩展分片则没有这种灵活性。 在为扩展分片赋值时,语句右侧的列表必须包含与要替换的分片相同数量的项目:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> a[::2] = [0, -1]
>>> a
[0, 1, -1, 3]
>>> a[::2] = [0,1,2]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: attempt to assign sequence of size 3 to extended slice of size 2

删除操作更为直观:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> del a[::2]
>>> a
[1, 3]

现在,我们还可以将切片对象传递给内置序列的 __getitem__() 方法:

>>> range(10).__getitem__(slice(0, 5, 2))
[0, 2, 4]

或者直接在下标中使用切片对象:

>>> range(10)[slice(0, 5, 2)]
[0, 2, 4]

为了简化支持扩展切片的序列的实现,切片对象现在有了一个方法 indices(length),在给定序列长度的情况下,它返回一个 (start, stop, step) 元组,可以直接传给 range()indices() 处理省略和越界索引的方式与常规切片一致(这个无伤大雅的短语隐藏了大量令人困惑的细节!)。 该方法的使用方法如下:

class FakeSeq:
    ...
    def calc_item(self, i):
        ...
    def __getitem__(self, item):
        if isinstance(item, slice):
            indices = item.indices(len(self))
            return FakeSeq([self.calc_item(i) for i in range(*indices)])
        else:
            return self.calc_item(i)

从这个例子中还可以看到,内置的 slice 对象现在是 slice 类型的类型对象,而不再是函数。 这与 Python 2.2 是一致的,在 Python 2.2 中,intstr 等也经历了同样的变化。

其他语言特性修改

以下是 Python 2.3 针对核心 Python 语言的所有改变。

  • yield 语句现在将始终是关键字,如本文档的 PEP 255: 简单的生成器 一节所描述的。

  • 新增内置函数 enumerate(),如本文档的 PEP 279: enumerate() 一节所描述的。

  • 新增两个常量 TrueFalse 以及内置的 bool 类型,如本文档的 PEP 285: 布尔类型 一节所描述的。

  • int() 类型构造函数现在会返回一个长整数,而不会在字符串或浮点数太大而无法放入整数时引发 OverflowError。 这可能会导致 isinstance(int(expression), int) 为假的矛盾结果,但在实践中似乎不太可能造成问题。

  • 内置类型现在支持扩展的切分语法,详见本文档 扩展切片 一节。

  • A new built-in function, sum(iterable, start=0), adds up the numeric items in the iterable object and returns their sum. sum() only accepts numbers, meaning that you can't use it to concatenate a bunch of strings. (Contributed by Alex Martelli.)

  • 以前 list.insert(pos, value)pos 为负值时会将 value 插入到列表的前面。 现在,该行为已被修改为与切片索引一致,因此当 pos 为 -1 时,值将被插入最后一个元素之前,以此类推。

  • list.index(value) 会在列表中搜索 value,并返回其索引,现在可以使用可选的 startstop 参数,将搜索范围限制在列表的一部分。

  • 字典有一个新方法 pop(key[, *default*]),可返回 key 对应的值,并从字典中删除该键/值对。如果请求的键不在字典中,如果指定了 default,则返回 default,如果没有指定则会引发 KeyError

    >>> d = {1:2}
    >>> d
    {1: 2}
    >>> d.pop(4)
    Traceback (most recent call last):
      File "stdin", line 1, in ?
    KeyError: 4
    >>> d.pop(1)
    2
    >>> d.pop(1)
    Traceback (most recent call last):
      File "stdin", line 1, in ?
    KeyError: 'pop(): dictionary is empty'
    >>> d
    {}
    >>>
    

    还有一个新的类方法 dict.fromkeys(iterable, value),用于创建一个字典,其键取自所提供的迭代器 iterable,所有值设置为 value,默认为 None

    (由 Raymond Hettinger 贡献补丁。)

    此外,现在 dict() 构建器可接受关键字参数以简化小型字典的创建:

    >>> dict(red=1, blue=2, green=3, black=4)
    {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
    

    (由 Just van Rossum 贡献。)

  • assert 语句将不再检查 __debug__ 旗标,因此你无法再通过为 __debug__ 赋值来禁用断言。 使用 -O 开关运行 Python 仍会生成不执行任何断言的代码。

  • 大多数类型对象现在都是可调用的,因此您可以用它们来创建新对象,如函数、类和模块。(这意味着 new 模块可以在未来的 Python 版本中被废弃,因为您现在可以使用 types 模块中可用的类型对象)。例如,您可以用下面的代码创建一个新的模块对象:

    >>> import types
    >>> m = types.ModuleType('abc','docstring')
    >>> m
    <module 'abc' (built-in)>
    >>> m.__doc__
    'docstring'
    
  • 添加了一个新的警告 PendingDeprecationWarning,用于指示正在被废弃的功能。 默认情况下 不会 打印该警告。 要检查是否使用了将来会被废弃的功能,可在命令行中提供 -Walways::PendingDeprecationWarning:: 或使用 warnings.filterwarnings()

  • raise "Error occurred" 一样,基于字符串的异常的废弃过程已经开始。 现在,引发字符串异常将触发 PendingDeprecationWarning

  • 现在使用 None 作为变量名将导致 SyntaxWarning 警告。 在未来的 Python 版本中,None 将最终成为一个保留关键字。

  • 在 Python 2.1 中引入的文件对象的 xreadlines() 方法已不再需要,因为文件现在可以作为自己的迭代器来运行。 引入 xreadlines() 的初衷是为了更快地循环遍历文件中的所有行,但现在只需写入 for line in file_obj 即可。 文件对象还有一个新的只读 encoding 属性,它给出了文件使用的编码;写入文件的 Unicode 字符串将使用给定的编码自动转换为字节。

  • The method resolution order used by new-style classes has changed, though you'll only notice the difference if you have a really complicated inheritance hierarchy. Classic classes are unaffected by this change. Python 2.2 originally used a topological sort of a class's ancestors, but 2.3 now uses the C3 algorithm as described in the paper "A Monotonic Superclass Linearization for Dylan". To understand the motivation for this change, read Michele Simionato's article "Python 2.3 Method Resolution Order", or read the thread on python-dev starting with the message at https://mail.python.org/pipermail/python-dev/2002-October/029035.html. Samuele Pedroni first pointed out the problem and also implemented the fix by coding the C3 algorithm.

  • Python 运行多线程程序时,会在执行 N 个字节码后切换线程。 N 的默认值已从 10 个字节码增加到 100 个,通过减少切换开销来加快单线程应用程序的速度。 一些多线程应用程序的响应时间可能会变慢,但这很容易解决,只需使用 sys.setcheckinterval(N) 将限制设回一个较低的数值即可。 使用新的 sys.getcheckinterval() 函数可以检索限制值。

  • 一个微小但影响深远的变化是,由 Python 附带的模块定义的扩展类型的名称现在包含模块和类型名称前面的 '.'。 例如,在 Python 2.2 中,如果你创建了一个套接字并打印了它的 __class__,你会得到这样的输出:

    >>> s = socket.socket()
    >>> s.__class__
    <type 'socket'>
    

    在 2.3 中,您会得到以下信息:

    >>> s.__class__
    <type '_socket.socket'>
    
  • 旧式和新式类之间的不兼容问题之一已被消除:您现在可以为新式类的 __name____bases__ 属性赋值。 对 __bases__ 的赋值有一些限制,与对实例的 __class__ 属性的赋值类似。

字符串的改变

  • in 运算符现在对字符串的作用不同了。 以前,当计算 X in Y 时,XY 都是字符串,X 只能是单字符。 现在情况有所改变;X 可以是任意长度的字符串,如果 XY 的子串,X in Y 将返回 True。 如果 X 是空字符串,结果总是 True

    >>> 'ab' in 'abcd'
    True
    >>> 'ad' in 'abcd'
    False
    >>> '' in 'abcd'
    True
    

    请注意,这不会告诉您子串从哪里开始;如果需要该信息,请使用字符串方法 find()

  • strip()lstrip()rstrip() 字符串方法现在有了一个可选参数,用于指定要删除的字符。默认值仍然是删除所有空白字符:

    >>> '   abc '.strip()
    'abc'
    >>> '><><abc<><><>'.strip('<>')
    'abc'
    >>> '><><abc<><><>\n'.strip('<>')
    'abc<><><>\n'
    >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
    u'\u4001abc'
    >>>
    

    (由 Simon Brunning 提议并由 Walter Dörwald 实现。)

  • startswith()endswith() 字符串方法的 startend 参数现在可接受负数。

  • 另一个新增的字符串方法是 zfill(),原本是 string 模块中的一个函数。 zfill() 会在一个表示数字的字符串左侧填充零直至达到指定的宽度。 请注意 % 运算符相比 zfill() 仍然是更灵活和更强大的。

    >>> '45'.zfill(4)
    '0045'
    >>> '12345'.zfill(4)
    '12345'
    >>> 'goofy'.zfill(6)
    '0goofy'
    

    (由 Walter Dörwald 贡献。)

  • A new type object, basestring, has been added. Both 8-bit strings and Unicode strings inherit from this type, so isinstance(obj, basestring) will return True for either kind of string. It's a completely abstract type, so you can't create basestring instances.

  • Interned strings are no longer immortal and will now be garbage-collected in the usual way when the only reference to them is from the internal dictionary of interned strings. (Implemented by Oren Tirosh.)

性能优化

  • The creation of new-style class instances has been made much faster; they're now faster than classic classes!

  • The sort() method of list objects has been extensively rewritten by Tim Peters, and the implementation is significantly faster.

  • Multiplication of large long integers is now much faster thanks to an implementation of Karatsuba multiplication, an algorithm that scales better than the O(n2) required for the grade-school multiplication algorithm. (Original patch by Christopher A. Craig, and significantly reworked by Tim Peters.)

  • The SET_LINENO opcode is now gone. This may provide a small speed increase, depending on your compiler's idiosyncrasies. See section 其他的改变和修正 for a longer explanation. (Removed by Michael Hudson.)

  • xrange() objects now have their own iterator, making for i in xrange(n) slightly faster than for i in range(n). (Patch by Raymond Hettinger.)

  • A number of small rearrangements have been made in various hotspots to improve performance, such as inlining a function or removing some code. (Implemented mostly by GvR, but lots of people have contributed single changes.)

The net result of the 2.3 optimizations is that Python 2.3 runs the pystone benchmark around 25% faster than Python 2.2.

新增,改进和弃用的模块

As usual, Python's standard library received a number of enhancements and bug fixes. Here's a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the CVS logs for all the details.

  • The array module now supports arrays of Unicode characters using the 'u' format character. Arrays also now support using the += assignment operator to add another array's contents, and the *= assignment operator to repeat an array. (Contributed by Jason Orendorff.)

  • The bsddb module has been replaced by version 4.1.6 of the PyBSDDB package, providing a more complete interface to the transactional features of the BerkeleyDB library.

    The old version of the module has been renamed to bsddb185 and is no longer built automatically; you'll have to edit Modules/Setup to enable it. Note that the new bsddb package is intended to be compatible with the old module, so be sure to file bugs if you discover any incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled with a new version of the underlying BerkeleyDB library, you will almost certainly have to convert your database files to the new version. You can do this fairly easily with the new scripts db2pickle.py and pickle2db.py which you will find in the distribution's Tools/scripts directory. If you've already been using the PyBSDDB package and importing it as bsddb3, you will have to change your import statements to import it as bsddb.

  • The new bz2 module is an interface to the bz2 data compression library. bz2-compressed data is usually smaller than corresponding zlib-compressed data. (Contributed by Gustavo Niemeyer.)

  • A set of standard date/time types has been added in the new datetime module. See the following section for more details.

  • The Distutils Extension class now supports an extra constructor argument named depends for listing additional source files that an extension depends on. This lets Distutils recompile the module if any of the dependency files are modified. For example, if sampmodule.c includes the header file sample.h, you would create the Extension object like this:

    ext = Extension("samp",
                    sources=["sampmodule.c"],
                    depends=["sample.h"])
    

    Modifying sample.h would then cause the module to be recompiled. (Contributed by Jeremy Hylton.)

  • Other minor changes to Distutils: it now checks for the CC, CFLAGS, CPP, LDFLAGS, and CPPFLAGS environment variables, using them to override the settings in Python's configuration (contributed by Robert Weber).

  • Previously the doctest module would only search the docstrings of public methods and functions for test cases, but it now also examines private ones as well. The DocTestSuite() function creates a unittest.TestSuite object from a set of doctest tests.

  • 新的 gc.get_referents(object) 函数将返回由 object 引用的所有对象组成的列表。

  • The getopt module gained a new function, gnu_getopt(), that supports the same arguments as the existing getopt() function but uses GNU-style scanning mode. The existing getopt() stops processing options as soon as a non-option argument is encountered, but in GNU-style mode processing continues, meaning that options and arguments can be mixed. For example:

    >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
    ([('-f', 'filename')], ['output', '-v'])
    >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
    ([('-f', 'filename'), ('-v', '')], ['output'])
    

    (由 Peter Åstrand 贡献。)

  • 现在 grp, pwdresource 模块将返回加强版的元组:

    >>> import grp
    >>> g = grp.getgrnam('amk')
    >>> g.gr_name, g.gr_gid
    ('amk', 500)
    
  • 现在 gzip 模块能够处理超过 2 GiB 的文件。

  • The new heapq module contains an implementation of a heap queue algorithm. A heap is an array-like data structure that keeps items in a partially sorted order such that, for every index k, heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2]. This makes it quick to remove the smallest item, and inserting a new item while maintaining the heap property is O(log n). (See https://xlinux.nist.gov/dads//HTML/priorityque.html for more information about the priority queue data structure.)

    The heapq module provides heappush() and heappop() functions for adding and removing items while maintaining the heap property on top of some other mutable Python sequence type. Here's an example that uses a Python list:

    >>> import heapq
    >>> heap = []
    >>> for item in [3, 7, 5, 11, 1]:
    ...    heapq.heappush(heap, item)
    ...
    >>> heap
    [1, 3, 5, 11, 7]
    >>> heapq.heappop(heap)
    1
    >>> heapq.heappop(heap)
    3
    >>> heap
    [5, 7, 11]
    

    (由 Kevin O'Connor 贡献。)

  • The IDLE integrated development environment has been updated using the code from the IDLEfork project (https://idlefork.sourceforge.net). The most notable feature is that the code being developed is now executed in a subprocess, meaning that there's no longer any need for manual reload() operations. IDLE's core code has been incorporated into the standard library as the idlelib package.

  • The imaplib module now supports IMAP over SSL. (Contributed by Piers Lauder and Tino Lange.)

  • The itertools contains a number of useful functions for use with iterators, inspired by various functions provided by the ML and Haskell languages. For example, itertools.ifilter(predicate, iterator) returns all elements in the iterator for which the function predicate() returns True, and itertools.repeat(obj, N) returns obj N times. There are a number of other functions in the module; see the package's reference documentation for details. (Contributed by Raymond Hettinger.)

  • Two new functions in the math module, degrees(rads) and radians(degs), convert between radians and degrees. Other functions in the math module such as math.sin() and math.cos() have always required input values measured in radians. Also, an optional base argument was added to math.log() to make it easier to compute logarithms for bases other than e and 10. (Contributed by Raymond Hettinger.)

  • Several new POSIX functions (getpgid(), killpg(), lchown(), loadavg(), major(), makedev(), minor(), and mknod()) were added to the posix module that underlies the os module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)

  • In the os module, the *stat() family of functions can now report fractions of a second in a timestamp. Such time stamps are represented as floats, similar to the value returned by time.time().

    During testing, it was found that some applications will break if time stamps are floats. For compatibility, when using the tuple interface of the stat_result time stamps will be represented as integers. When using named fields (a feature first introduced in Python 2.2), time stamps are still represented as integers, unless os.stat_float_times() is invoked to enable float return values:

    >>> os.stat("/tmp").st_mtime
    1034791200
    >>> os.stat_float_times(True)
    >>> os.stat("/tmp").st_mtime
    1034791200.6335014
    

    在 Python 2.4 中,默认将改为总是返回浮点数。

    Application developers should enable this feature only if all their libraries work properly when confronted with floating point time stamps, or if they use the tuple API. If used, the feature should be activated on an application level instead of trying to enable it on a per-use basis.

  • The optparse module contains a new parser for command-line arguments that can convert option values to a particular Python type and will automatically generate a usage message. See the following section for more details.

  • The old and never-documented linuxaudiodev module has been deprecated, and a new version named ossaudiodev has been added. The module was renamed because the OSS sound drivers can be used on platforms other than Linux, and the interface has also been tidied and brought up to date in various ways. (Contributed by Greg Ward and Nicholas FitzRoy-Dale.)

  • The new platform module contains a number of functions that try to determine various properties of the platform you're running on. There are functions for getting the architecture, CPU type, the Windows OS version, and even the Linux distribution version. (Contributed by Marc-André Lemburg.)

  • The parser objects provided by the pyexpat module can now optionally buffer character data, resulting in fewer calls to your character data handler and therefore faster performance. Setting the parser object's buffer_text attribute to True will enable buffering.

  • The sample(population, k) function was added to the random module. population is a sequence or xrange object containing the elements of a population, and sample() chooses k elements from the population without replacing chosen elements. k can be any value up to len(population). For example:

    >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
    >>> random.sample(days, 3)      # Choose 3 elements
    ['St', 'Sn', 'Th']
    >>> random.sample(days, 7)      # Choose 7 elements
    ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
    >>> random.sample(days, 7)      # Choose 7 again
    ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
    >>> random.sample(days, 8)      # Can't choose eight
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "random.py", line 414, in sample
          raise ValueError, "sample larger than population"
    ValueError: sample larger than population
    >>> random.sample(xrange(1,10000,2), 10)   # Choose ten odd nos. under 10000
    [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
    

    The random module now uses a new algorithm, the Mersenne Twister, implemented in C. It's faster and more extensively studied than the previous algorithm.

    (所有改变均由 Raymond Hettinger 贡献。)

  • The readline module also gained a number of new functions: get_history_item(), get_current_history_length(), and redisplay().

  • The rexec and Bastion modules have been declared dead, and attempts to import them will fail with a RuntimeError. New-style classes provide new ways to break out of the restricted execution environment provided by rexec, and no one has interest in fixing them or time to do so. If you have applications using rexec, rewrite them to use something else.

    (Sticking with Python 2.2 or 2.1 will not make your applications any safer because there are known bugs in the rexec module in those versions. To repeat: if you're using rexec, stop using it immediately.)

  • The rotor module has been deprecated because the algorithm it uses for encryption is not believed to be secure. If you need encryption, use one of the several AES Python modules that are available separately.

  • The shutil module gained a move(src, dest) function that recursively moves a file or directory to a new location.

  • Support for more advanced POSIX signal handling was added to the signal but then removed again as it proved impossible to make it work reliably across platforms.

  • The socket module now supports timeouts. You can call the settimeout(t) method on a socket object to set a timeout of t seconds. Subsequent socket operations that take longer than t seconds to complete will abort and raise a socket.timeout exception.

    The original timeout implementation was by Tim O'Malley. Michael Gilfix integrated it into the Python socket module and shepherded it through a lengthy review. After the code was checked in, Guido van Rossum rewrote parts of it. (This is a good example of a collaborative development process in action.)

  • 在 Windows,socket 模块现在将附带安全套接字层(SSL)支持。

  • 现在 C PYTHON_API_VERSION 宏的值将在 Python 层级上暴露为 sys.api_version。 当前的异常可通过调用新的 sys.exc_clear() 函数来清除。

  • The new tarfile module allows reading from and writing to tar-format archive files. (Contributed by Lars Gustäbel.)

  • The new textwrap module contains functions for wrapping strings containing paragraphs of text. The wrap(text, width) function takes a string and returns a list containing the text split into lines of no more than the chosen width. The fill(text, width) function returns a single string, reformatted to fit into lines no longer than the chosen width. (As you can guess, fill() is built on top of wrap(). For example:

    >>> import textwrap
    >>> paragraph = "Not a whit, we defy augury: ... more text ..."
    >>> textwrap.wrap(paragraph, 60)
    ["Not a whit, we defy augury: there's a special providence in",
     "the fall of a sparrow. If it be now, 'tis not to come; if it",
     ...]
    >>> print textwrap.fill(paragraph, 35)
    Not a whit, we defy augury: there's
    a special providence in the fall of
    a sparrow. If it be now, 'tis not
    to come; if it be not to come, it
    will be now; if it be not now, yet
    it will come: the readiness is all.
    >>>
    

    The module also contains a TextWrapper class that actually implements the text wrapping strategy. Both the TextWrapper class and the wrap() and fill() functions support a number of additional keyword arguments for fine-tuning the formatting; consult the module's documentation for details. (Contributed by Greg Ward.)

  • The thread and threading modules now have companion modules, dummy_thread and dummy_threading, that provide a do-nothing implementation of the thread module's interface for platforms where threads are not supported. The intention is to simplify thread-aware modules (ones that don't rely on threads to run) by putting the following code at the top:

    try:
        import threading as _threading
    except ImportError:
        import dummy_threading as _threading
    

    In this example, _threading is used as the module name to make it clear that the module being used is not necessarily the actual threading module. Code can call functions and use classes in _threading whether or not threads are supported, avoiding an if statement and making the code slightly clearer. This module will not magically make multithreaded code run without threads; code that waits for another thread to return or to do something will simply hang forever.

  • The time module's strptime() function has long been an annoyance because it uses the platform C library's strptime() implementation, and different platforms sometimes have odd bugs. Brett Cannon contributed a portable implementation that's written in pure Python and should behave identically on all platforms.

  • The new timeit module helps measure how long snippets of Python code take to execute. The timeit.py file can be run directly from the command line, or the module's Timer class can be imported and used directly. Here's a short example that figures out whether it's faster to convert an 8-bit string to Unicode by appending an empty Unicode string to it or by using the unicode() function:

    import timeit
    
    timer1 = timeit.Timer('unicode("abc")')
    timer2 = timeit.Timer('"abc" + u""')
    
    # Run three trials
    print timer1.repeat(repeat=3, number=100000)
    print timer2.repeat(repeat=3, number=100000)
    
    # On my laptop this outputs:
    # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
    # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
    
  • The Tix module has received various bug fixes and updates for the current version of the Tix package.

  • The Tkinter module now works with a thread-enabled version of Tcl. Tcl's threading model requires that widgets only be accessed from the thread in which they're created; accesses from another thread can cause Tcl to panic. For certain Tcl interfaces, Tkinter will now automatically avoid this when a widget is accessed from a different thread by marshalling a command, passing it to the correct thread, and waiting for the results. Other interfaces can't be handled automatically but Tkinter will now raise an exception on such an access so that you can at least find out about the problem. See https://mail.python.org/pipermail/python-dev/2002-December/031107.html for a more detailed explanation of this change. (Implemented by Martin von Löwis.)

  • Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists. This behavior can be controlled through the wantobjects() method of tkapp objects.

    When using _tkinter through the Tkinter module (as most Tkinter applications will), this feature is always activated. It should not cause compatibility problems, since Tkinter would always convert string results to Python types where possible.

    If any incompatibilities are found, the old behavior can be restored by setting the wantobjects variable in the Tkinter module to false before creating the first tkapp object.

    import Tkinter
    Tkinter.wantobjects = 0
    

    Any breakage caused by this change should be reported as a bug.

  • The UserDict module has a new DictMixin class which defines all dictionary methods for classes that already have a minimum mapping interface. This greatly simplifies writing classes that need to be substitutable for dictionaries, such as the classes in the shelve module.

    Adding the mix-in as a superclass provides the full dictionary interface whenever the class defines __getitem__(), __setitem__(), __delitem__(), and keys(). For example:

    >>> import UserDict
    >>> class SeqDict(UserDict.DictMixin):
    ...     """Dictionary lookalike implemented with lists."""
    ...     def __init__(self):
    ...         self.keylist = []
    ...         self.valuelist = []
    ...     def __getitem__(self, key):
    ...         try:
    ...             i = self.keylist.index(key)
    ...         except ValueError:
    ...             raise KeyError
    ...         return self.valuelist[i]
    ...     def __setitem__(self, key, value):
    ...         try:
    ...             i = self.keylist.index(key)
    ...             self.valuelist[i] = value
    ...         except ValueError:
    ...             self.keylist.append(key)
    ...             self.valuelist.append(value)
    ...     def __delitem__(self, key):
    ...         try:
    ...             i = self.keylist.index(key)
    ...         except ValueError:
    ...             raise KeyError
    ...         self.keylist.pop(i)
    ...         self.valuelist.pop(i)
    ...     def keys(self):
    ...         return list(self.keylist)
    ...
    >>> s = SeqDict()
    >>> dir(s)      # See that other dictionary methods are implemented
    ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
     '__init__', '__iter__', '__len__', '__module__', '__repr__',
     '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
     'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
     'setdefault', 'update', 'valuelist', 'values']
    

    (由 Raymond Hettinger 贡献。)

  • The DOM implementation in xml.dom.minidom can now generate XML output in a particular encoding by providing an optional encoding argument to the toxml() and toprettyxml() methods of DOM nodes.

  • The xmlrpclib module now supports an XML-RPC extension for handling nil data values such as Python's None. Nil values are always supported on unmarshalling an XML-RPC response. To generate requests containing None, you must supply a true value for the allow_none parameter when creating a Marshaller instance.

  • The new DocXMLRPCServer module allows writing self-documenting XML-RPC servers. Run it in demo mode (as a program) to see it in action. Pointing the web browser to the RPC server produces pydoc-style documentation; pointing xmlrpclib to the server allows invoking the actual methods. (Contributed by Brian Quinlan.)

  • Support for internationalized domain names (RFCs 3454, 3490, 3491, and 3492) has been added. The "idna" encoding can be used to convert between a Unicode domain name and the ASCII-compatible encoding (ACE) of that name.

    >{}>{}> u"www.Alliancefrançaise.nu".encode("idna")
    'www.xn--alliancefranaise-npb.nu'
    

    The socket module has also been extended to transparently convert Unicode hostnames to the ACE version before passing them to the C library. Modules that deal with hostnames such as httplib and ftplib) also support Unicode host names; httplib also sends HTTP Host headers using the ACE version of the domain name. urllib supports Unicode URLs with non-ASCII host names as long as the path part of the URL is ASCII only.

    To implement this change, the stringprep module, the mkstringprep tool and the punycode encoding have been added.

Date/Time 类型

Date and time types suitable for expressing timestamps were added as the datetime module. The types don't support different calendars or many fancy features, and just stick to the basics of representing time.

The three primary types are: date, representing a day, month, and year; time, consisting of hour, minute, and second; and datetime, which contains all the attributes of both date and time. There's also a timedelta class representing differences between two points in time, and time zone logic is implemented by classes inheriting from the abstract tzinfo class.

You can create instances of date and time by either supplying keyword arguments to the appropriate constructor, e.g. datetime.date(year=1972, month=10, day=15), or by using one of a number of class methods. For example, the today() class method returns the current local date.

Once created, instances of the date/time classes are all immutable. There are a number of methods for producing formatted strings from objects:

>>> import datetime
>>> now = datetime.datetime.now()
>>> now.isoformat()
'2002-12-30T21:27:03.994956'
>>> now.ctime()  # Only available on date, datetime
'Mon Dec 30 21:27:03 2002'
>>> now.strftime('%Y %d %b')
'2002 30 Dec'

The replace() method allows modifying one or more fields of a date or datetime instance, returning a new instance:

>>> d = datetime.datetime.now()
>>> d
datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
>>> d.replace(year=2001, hour = 12)
datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
>>>

Instances can be compared, hashed, and converted to strings (the result is the same as that of isoformat()). date and datetime instances can be subtracted from each other, and added to timedelta instances. The largest missing feature is that there's no standard library support for parsing strings and getting back a date or datetime.

For more information, refer to the module's reference documentation. (Contributed by Tim Peters.)

optparse 模块

The getopt module provides simple parsing of command-line arguments. The new optparse module (originally named Optik) provides more elaborate command-line parsing that follows the Unix conventions, automatically creates the output for --help, and can perform different actions for different options.

You start by creating an instance of OptionParser and telling it what your program's options are.

import sys
from optparse import OptionParser

op = OptionParser()
op.add_option('-i', '--input',
              action='store', type='string', dest='input',
              help='set input filename')
op.add_option('-l', '--length',
              action='store', type='int', dest='length',
              help='set maximum length of output')

Parsing a command line is then done by calling the parse_args() method.

options, args = op.parse_args(sys.argv[1:])
print options
print args

This returns an object containing all of the option values, and a list of strings containing the remaining arguments.

Invoking the script with the various arguments now works as you'd expect it to. Note that the length argument is automatically converted to an integer.

$ ./python opt.py -i data arg1
<Values at 0x400cad4c: {'input': 'data', 'length': None}>
['arg1']
$ ./python opt.py --input=data --length=4
<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
[]
$

The help message is automatically generated for you:

$ ./python opt.py --help
usage: opt.py [options]

options:
  -h, --help            show this help message and exit
  -iINPUT, --input=INPUT
                        set input filename
  -lLENGTH, --length=LENGTH
                        set maximum length of output
$

有关更多详细信息,请参见模块的文档。

Optik was written by Greg Ward, with suggestions from the readers of the Getopt SIG.

Pymalloc: A Specialized Object Allocator

Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a feature added to Python 2.1. Pymalloc is intended to be faster than the system malloc() and to have less memory overhead for allocation patterns typical of Python programs. The allocator uses C's malloc() function to get large pools of memory and then fulfills smaller memory requests from these pools.

In 2.1 and 2.2, pymalloc was an experimental feature and wasn't enabled by default; you had to explicitly enable it when compiling Python by providing the --with-pymalloc option to the configure script. In 2.3, pymalloc has had further enhancements and is now enabled by default; you'll have to supply --without-pymalloc to disable it.

This change is transparent to code written in Python; however, pymalloc may expose bugs in C extensions. Authors of C extension modules should test their code with pymalloc enabled, because some incorrect code may cause core dumps at runtime.

There's one particularly common error that causes problems. There are a number of memory allocation functions in Python's C API that have previously just been aliases for the C library's malloc() and free(), meaning that if you accidentally called mismatched functions the error wouldn't be noticeable. When the object allocator is enabled, these functions aren't aliases of malloc() and free() any more, and calling the wrong function to free memory may get you a core dump. For example, if memory was allocated using PyObject_Malloc(), it has to be freed using PyObject_Free(), not free(). A few modules included with Python fell afoul of this and had to be fixed; doubtless there are more third-party modules that will have the same problem.

As part of this change, the confusing multiple interfaces for allocating memory have been consolidated down into two API families. Memory allocated with one family must not be manipulated with functions from the other family. There is one family for allocating chunks of memory and another family of functions specifically for allocating Python objects.

Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides debugging features to catch memory overwrites and doubled frees in both extension modules and in the interpreter itself. To enable this support, compile a debugging version of the Python interpreter by running configure with --with-pydebug.

To aid extension writers, a header file Misc/pymemcompat.h is distributed with the source to Python 2.3 that allows Python extensions to use the 2.3 interfaces to memory allocation while compiling against any version of Python since 1.5.2. You would copy the file from Python's source distribution and bundle it with the source of your extension.

参见

https://hg.python.org/cpython/file/default/Objects/obmalloc.c

For the full details of the pymalloc implementation, see the comments at the top of the file Objects/obmalloc.c in the Python source code. The above link points to the file within the python.org SVN browser.

构建和 C API 的改变

针对 Python 构建过程和 C API 的改变包括:

  • The cycle detection implementation used by the garbage collection has proven to be stable, so it's now been made mandatory. You can no longer compile Python without it, and the --with-cycle-gc switch to configure has been removed.

  • Python can now optionally be built as a shared library (libpython2.3.so) by supplying --enable-shared when running Python's configure script. (Contributed by Ondrej Palkovsky.)

  • The DL_EXPORT and DL_IMPORT macros are now deprecated. Initialization functions for Python extension modules should now be declared using the new macro PyMODINIT_FUNC, while the Python core will generally use the PyAPI_FUNC and PyAPI_DATA macros.

  • The interpreter can be compiled without any docstrings for the built-in functions and modules by supplying --without-doc-strings to the configure script. This makes the Python executable about 10% smaller, but will also mean that you can't get help for Python's built-ins. (Contributed by Gustavo Niemeyer.)

  • The PyArg_NoArgs() macro is now deprecated, and code that uses it should be changed. For Python 2.2 and later, the method definition table can specify the METH_NOARGS flag, signalling that there are no arguments, and the argument checking can then be removed. If compatibility with pre-2.2 versions of Python is important, the code could use PyArg_ParseTuple(args, "") instead, but this will be slower than using METH_NOARGS.

  • PyArg_ParseTuple() accepts new format characters for various sizes of unsigned integers: B for unsigned char, H for unsigned short int, I for unsigned int, and K for unsigned long long.

  • A new function, PyObject_DelItemString(mapping, char *key) was added as shorthand for PyObject_DelItem(mapping, PyString_New(key)).

  • File objects now manage their internal string buffer differently, increasing it exponentially when needed. This results in the benchmark tests in Lib/test/test_bufio.py speeding up considerably (from 57 seconds to 1.7 seconds, according to one measurement).

  • It's now possible to define class and static methods for a C extension type by setting either the METH_CLASS or METH_STATIC flags in a method's PyMethodDef structure.

  • Python now includes a copy of the Expat XML parser's source code, removing any dependence on a system version or local installation of Expat.

  • If you dynamically allocate type objects in your extension, you should be aware of a change in the rules relating to the __module__ and __name__ attributes. In summary, you will want to ensure the type's dictionary contains a '__module__' key; making the module name the part of the type name leading up to the final period will no longer have the desired effect. For more detail, read the API reference documentation or the source.

移植专属的改变

Support for a port to IBM's OS/2 using the EMX runtime environment was merged into the main Python source tree. EMX is a POSIX emulation layer over the OS/2 system APIs. The Python port for EMX tries to support all the POSIX-like capability exposed by the EMX runtime, and mostly succeeds; fork() and fcntl() are restricted by the limitations of the underlying emulation layer. The standard OS/2 port, which uses IBM's Visual Age compiler, also gained support for case-sensitive import semantics as part of the integration of the EMX port into CVS. (Contributed by Andrew MacIntyre.)

On MacOS, most toolbox modules have been weaklinked to improve backward compatibility. This means that modules will no longer fail to load if a single routine is missing on the current OS version. Instead calling the missing routine will raise an exception. (Contributed by Jack Jansen.)

The RPM spec files, found in the Misc/RPM/ directory in the Python source distribution, were updated for 2.3. (Contributed by Sean Reifschneider.)

Other new platforms now supported by Python include AtheOS (http://www.atheos.cx/), GNU/Hurd, and OpenVMS.

其他的改变和修正

As usual, there were a bunch of other improvements and bugfixes scattered throughout the source tree. A search through the CVS change logs finds there were 523 patches applied and 514 bugs fixed between Python 2.2 and 2.3. Both figures are likely to be underestimates.

一些较为重要的改变:

  • If the PYTHONINSPECT environment variable is set, the Python interpreter will enter the interactive prompt after running a Python program, as if Python had been invoked with the -i option. The environment variable can be set before running the Python interpreter, or it can be set by the Python program as part of its execution.

  • The regrtest.py script now provides a way to allow "all resources except foo." A resource name passed to the -u option can now be prefixed with a hyphen ('-') to mean "remove this resource." For example, the option '-uall,-bsddb' could be used to enable the use of all resources except bsddb.

  • The tools used to build the documentation now work under Cygwin as well as Unix.

  • The SET_LINENO opcode has been removed. Back in the mists of time, this opcode was needed to produce line numbers in tracebacks and support trace functions (for, e.g., pdb). Since Python 1.5, the line numbers in tracebacks have been computed using a different mechanism that works with "python -O". For Python 2.3 Michael Hudson implemented a similar scheme to determine when to call the trace function, removing the need for SET_LINENO entirely.

    It would be difficult to detect any resulting difference from Python code, apart from a slight speed up when Python is run without -O.

    C extensions that access the f_lineno field of frame objects should instead call PyCode_Addr2Line(f->f_code, f->f_lasti). This will have the added effect of making the code work as desired under "python -O" in earlier versions of Python.

    A nifty new feature is that trace functions can now assign to the f_lineno attribute of frame objects, changing the line that will be executed next. A jump command has been added to the pdb debugger taking advantage of this new feature. (Implemented by Richie Hindle.)

移植到 Python 2.3

本节列出了先前描述的可能需要修改你的代码的改变:

  • 现在 yield 始终是一个关键字;如果它在你的代码中被用作变量名,则必须选择不同的名称。

  • 对于字符串 XYX in Y 现在当 X 长度超过一个字符时也是有效的。

  • 现在 int() 类型构造器在字符串或浮点数因太大而无法以整数类型来容纳时将返回一个长整数而不是引发 OverflowError

  • If you have Unicode strings that contain 8-bit characters, you must declare the file's encoding (UTF-8, Latin-1, or whatever) by adding a comment to the top of the file. See section PEP 263: 源代码的字符编码格式 for more information.

  • Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists.

  • Large octal and hex literals such as 0xffffffff now trigger a FutureWarning. Currently they're stored as 32-bit numbers and result in a negative value, but in Python 2.4 they'll become positive long integers.

    There are a few ways to fix this warning. If you really need a positive number, just add an L to the end of the literal. If you're trying to get a 32-bit integer with low bits set and have previously used an expression such as ~(1 << 31), it's probably clearest to start with all bits set and clear the desired upper bits. For example, to clear just the top bit (bit 31), you could write 0xffffffffL &~(1L<<31).

  • You can no longer disable assertions by assigning to __debug__.

  • The Distutils setup() function has gained various new keyword arguments such as depends. Old versions of the Distutils will abort if passed unknown keywords. A solution is to check for the presence of the new get_distutil_options() function in your setup.py and only uses the new keywords with a version of the Distutils that supports them:

    from distutils import core
    
    kw = {'sources': 'foo.c', ...}
    if hasattr(core, 'get_distutil_options'):
        kw['depends'] = ['foo.h']
    ext = Extension(**kw)
    
  • Using None as a variable name will now result in a SyntaxWarning warning.

  • Names of extension types defined by the modules included with Python now contain the module and a '.' in front of the type name.

致谢

作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler, Just van Rossum.