Python 3.1 有什么新变化
***********************

作者:
   Raymond Hettinger

This article explains the new features in Python 3.1, compared to 3.0.


PEP 372: 有序字典
=================

常规的 Python 字典会以任意顺序迭代键/值对。 多年以来，有好几位作者编写
了可以记住键的初始插入顺序的替代实现。 基于这些实现的经验，现在引入了
新的 "collections.OrderedDict" 类。

The OrderedDict API is substantially the same as regular dictionaries
but will iterate over keys and values in a guaranteed order depending
on when a key was first inserted.  If a new entry overwrites an
existing entry, the original insertion position is left unchanged.
Deleting an entry and reinserting it will move it to the end.

The standard library now supports use of ordered dictionaries in
several modules.  The "configparser" module uses them by default.
This lets configuration files be read, modified, and then written back
in their original order.  The *_asdict()* method for
"collections.namedtuple()" now returns an ordered dictionary with the
values appearing in the same order as the underlying tuple indices.
The "json" module is being built-out with an *object_pairs_hook* to
allow OrderedDicts to be built by the decoder. Support was also added
for third-party tools like PyYAML.

参见:

  **PEP 372** - 有序字典
     PEP 由 Armin Ronacher 和 Raymond Hettinger 撰写，由 Raymond
     Hettinger 实现。


PEP 378: 千位分隔符的格式说明符
===============================

The built-in "format()" function and the "str.format()" method use a
mini-language that now includes a simple, non-locale aware way to
format a number with a thousands separator.  That provides a way to
humanize a program's output, improving its professional appearance and
readability:

   >>> format(1234567, ',d')
   '1,234,567'
   >>> format(1234567.89, ',.2f')
   '1,234,567.89'
   >>> format(12345.6 + 8901234.12j, ',f')
   '12,345.600000+8,901,234.120000j'
   >>> format(Decimal('1234567.89'), ',f')
   '1,234,567.89'

支持的类型有 "int", "float", "complex" 和 "decimal.Decimal"。

Discussions are underway about how to specify alternative separators
like dots, spaces, apostrophes, or underscores.  Locale-aware
applications should use the existing *n* format specifier which
already has some support for thousands separators.

参见:

  **PEP 378** - 千位分隔符的格式说明符
     PEP 由 Raymond Hettinger 撰写，并由 Eric Smith 和 Mark Dickinson
     实现


其他语言特性修改
================

对Python 语言核心进行的小改动：

* Directories and zip archives containing a "__main__.py" file can now
  be executed directly by passing their name to the interpreter. The
  directory/zipfile is automatically inserted as the first entry in
  sys.path.  (Suggestion and initial patch by Andy Chu; revised patch
  by Phillip J. Eby and Nick Coghlan; bpo-1739468.)

* "int()" 类型增加了一个 "bit_length" 方法用来返回以二进制代表其参数值
  所需的比特位数量:

     >>> n = 37
     >>> bin(37)
     '0b100101'
     >>> n.bit_length()
     6
     >>> n = 2**123-1
     >>> n.bit_length()
     123
     >>> (n+1).bit_length()
     124

  （由 Fredrik Johansson, Victor Stinner, Raymond Hettinger 和 Mark
  Dickinson 在 bpo-3439 中贡献。）

* "format()" 字符串中的字段现在可被自动编号:

     >>> 'Sir {} of {}'.format('Gallahad', 'Camelot')
     'Sir Gallahad of Camelot'

  之前，字符串应当具有所需的编号字段例如: "'Sir {0} of {1}'"。

  （由 Eric Smith在 bpo-5237 中贡献）

* The "string.maketrans()" function is deprecated and is replaced by
  new static methods, "bytes.maketrans()" and "bytearray.maketrans()".
  This change solves the confusion around which types were supported
  by the "string" module. Now, "str", "bytes", and "bytearray" each
  have their own **maketrans** and **translate** methods with
  intermediate translation tables of the appropriate type.

  （由Georg Brandl在 bpo-5675 中贡献）

* The syntax of the "with" 语句的语法现在允许单个语句中存在多个上下文
  管理器:

     >>> with open('mylog.txt') as infile, open('a.out', 'w') as outfile:
     ...     for line in infile:
     ...         if '<critical>' in line:
     ...             outfile.write(line)

  有了这个新语法，"contextlib.nested()" 函数已不再必要因而现在已被弃用
  。

  （由 Georg Brandl 和 Mattias Brändström 贡献; appspot issue 53094。
  ）

* 现在 "round(x, n)" 当 *x* 为整数时将返回整数。 之前是返回浮点数:

     >>> round(1123, -2)
     1100

  （由 Mark Dickinson在 bpo-4707 贡献）

* Python now uses David Gay's algorithm for finding the shortest
  floating point representation that doesn't change its value.  This
  should help mitigate some of the confusion surrounding binary
  floating point numbers.

  The significance is easily seen with a number like "1.1" which does
  not have an exact equivalent in binary floating point.  Since there
  is no exact equivalent, an expression like "float('1.1')" evaluates
  to the nearest representable value which is "0x1.199999999999ap+0"
  in hex or "1.100000000000000088817841970012523233890533447265625" in
  decimal. That nearest value was and still is used in subsequent
  floating point calculations.

  What is new is how the number gets displayed.  Formerly, Python used
  a simple approach.  The value of "repr(1.1)" was computed as
  "format(1.1, '.17g')" which evaluated to "'1.1000000000000001'". The
  advantage of using 17 digits was that it relied on IEEE-754
  guarantees to assure that "eval(repr(1.1))" would round-trip exactly
  to its original value.  The disadvantage is that many people found
  the output to be confusing (mistaking intrinsic limitations of
  binary floating point representation as being a problem with Python
  itself).

  The new algorithm for "repr(1.1)" is smarter and returns "'1.1'".
  Effectively, it searches all equivalent string representations (ones
  that get stored with the same underlying float value) and returns
  the shortest representation.

  The new algorithm tends to emit cleaner representations when
  possible, but it does not change the underlying values.  So, it is
  still the case that "1.1 + 2.2 != 3.3" even though the
  representations may suggest otherwise.

  The new algorithm depends on certain features in the underlying
  floating point implementation.  If the required features are not
  found, the old algorithm will continue to be used.  Also, the text
  pickle protocols assure cross-platform portability by using the old
  algorithm.

  （由 Eric Smith 和 Mark Dickinson 在 bpo-1580 贡献）


新增，改进和弃用的模块
======================

* 增加了一个 "collections.Counter" 类以支持方便地统计一个序列或可迭代
  对象中的唯一条目数量。:

     >>> Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
     Counter({'blue': 3, 'red': 2, 'green': 1})

  （由 Raymond Hettinger 在 bpo-1696199 中贡献。）

* Added a new module, "tkinter.ttk" for access to the Tk themed widget
  set. The basic idea of ttk is to separate, to the extent possible,
  the code implementing a widget's behavior from the code implementing
  its appearance.

  （由 Guilherme Polo 在 bpo-2983 中贡献。）

* "gzip.GzipFile" 和 "bz2.BZ2File" 类现在已支持上下文管理协议:

     >>> # Automatically close file after writing
     >>> with gzip.GzipFile(filename, "wb") as f:
     ...     f.write(b"xxx")

  （由 Antoine Pitrou 贡献。）

* "decimal" 模块现在支持基于一个二进制 "float" 来创建 decimal 对象。
  转换是准确的但有时也会令人吃惊:

     >>> Decimal.from_float(1.1)
     Decimal('1.100000000000000088817841970012523233890533447265625')

  这个长长的 decimal 结果值显示了 *1.1* 所保存的实际二进制分数。 这个
  分数有许多位因为 *1.1* 无法用二进制来精确表示。

  （由Raymond Hettinger 和 Mark Dickinson贡献。）

* The "itertools" module grew two new functions.  The
  "itertools.combinations_with_replacement()" function is one of four
  for generating combinatorics including permutations and Cartesian
  products.  The "itertools.compress()" function mimics its namesake
  from APL.  Also, the existing "itertools.count()" function now has
  an optional *step* argument and can accept any type of counting
  sequence including "fractions.Fraction" and "decimal.Decimal":

     >>> [p+q for p,q in combinations_with_replacement('LOVE', 2)]
     ['LL', 'LO', 'LV', 'LE', 'OO', 'OV', 'OE', 'VV', 'VE', 'EE']

     >>> list(compress(data=range(10), selectors=[0,0,1,1,0,1,0,1,0,0]))
     [2, 3, 5, 7]

     >>> c = count(start=Fraction(1,2), step=Fraction(1,6))
     >>> [next(c), next(c), next(c), next(c)]
     [Fraction(1, 2), Fraction(2, 3), Fraction(5, 6), Fraction(1, 1)]

  （由 Raymond Hettinger 贡献。）

* "collections.namedtuple()" now supports a keyword argument *rename*
  which lets invalid fieldnames be automatically converted to
  positional names in the form _0, _1, etc.  This is useful when the
  field names are being created by an external source such as a CSV
  header, SQL field list, or user input:

     >>> query = input()
     SELECT region, dept, count(*) FROM main GROUPBY region, dept

     >>> cursor.execute(query)
     >>> query_fields = [desc[0] for desc in cursor.description]
     >>> UserQuery = namedtuple('UserQuery', query_fields, rename=True)
     >>> pprint.pprint([UserQuery(*row) for row in cursor])
     [UserQuery(region='South', dept='Shipping', _2=185),
      UserQuery(region='North', dept='Accounting', _2=37),
      UserQuery(region='West', dept='Sales', _2=419)]

  （由 Raymond Hettinger 在 bpo-1818 中贡献。）

* "re.sub()", "re.subn()" 和 "re.split()" 函数现在可接受一个 flags 形
  参。

  （由 Gregory Smith 贡献）

* The "logging" module now implements a simple "logging.NullHandler"
  class for applications that are not using logging but are calling
  library code that does.  Setting-up a null handler will suppress
  spurious warnings such as "No handlers could be found for logger
  foo":

     >>> h = logging.NullHandler()
     >>> logging.getLogger("foo").addHandler(h)

  （由 Vinay Sajip 在 bpo-4384 中贡献。）

* 支持 "-m" 命令行开关的 "runpy" 模块现在也支持当提供包名称时通过查找
  并执行 "__main__" 子模块来执行包。

  （由 Andi Vajda 在 bpo-4195 中贡献。）

* "pdb" 模块现在可以访问并显示通过 "zipimport" (或其他符合规范的 **PEP
  302** 加载器) 加载的源代码。

  （由 Alexander Belopolsky 在 bpo-4201 中贡献。）

* "functools.partial" 对象现在可以被封存。

   （由 Antoine Pitrou 和 Jesse Noller 提议，由 Jack Diederich 实现;
   bpo-5228。）

* 为符号增加 "pydoc" 帮助主题以使得在交互环境下 "help('@')" 能符合预期
  的效果。

  （由 David Laban 在 bpo-4739 中贡献。）

* The "unittest" module now supports skipping individual tests or
  classes of tests. And it supports marking a test as an expected
  failure, a test that is known to be broken, but shouldn't be counted
  as a failure on a TestResult:

     class TestGizmo(unittest.TestCase):

         @unittest.skipUnless(sys.platform.startswith("win"), "requires Windows")
         def test_gizmo_on_windows(self):
             ...

         @unittest.expectedFailure
         def test_gimzo_without_required_library(self):
             ...

  Also, tests for exceptions have been builtout to work with context
  managers using the "with" statement:

     def test_division_by_zero(self):
         with self.assertRaises(ZeroDivisionError):
             x / 0

  In addition, several new assertion methods were added including
  "assertSetEqual()", "assertDictEqual()",
  "assertDictContainsSubset()", "assertListEqual()",
  "assertTupleEqual()", "assertSequenceEqual()",
  "assertRaisesRegexp()", "assertIsNone()", and "assertIsNotNone()".

  （由Benjamin Peterson 和 Antoine Pitrou 贡献。）

* "io" 模块新增了三个常量来代表 "seek()" 方法  "SEEK_SET", "SEEK_CUR"
  和 "SEEK_END"。

* "sys.version_info" 元组现在是一个具名元组:

     >>> sys.version_info
     sys.version_info(major=3, minor=1, micro=0, releaselevel='alpha', serial=2)

  （由 Ross Light 在 bpo-4285 中贡献。）

* The "nntplib" and "imaplib" modules now support IPv6.

  （由 Derek Morr 在 bpo-1655 和 bpo-1664 中贡献。）

* The "pickle" module has been adapted for better interoperability
  with Python 2.x when used with protocol 2 or lower.  The
  reorganization of the standard library changed the formal reference
  for many objects.  For example, "__builtin__.set" in Python 2 is
  called "builtins.set" in Python 3. This change confounded efforts to
  share data between different versions of Python.  But now when
  protocol 2 or lower is selected, the pickler will automatically use
  the old Python 2 names for both loading and dumping. This remapping
  is turned-on by default but can be disabled with the *fix_imports*
  option:

     >>> s = {1, 2, 3}
     >>> pickle.dumps(s, protocol=0)
     b'c__builtin__\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'
     >>> pickle.dumps(s, protocol=0, fix_imports=False)
     b'cbuiltins\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'

  An unfortunate but unavoidable side-effect of this change is that
  protocol 2 pickles produced by Python 3.1 won't be readable with
  Python 3.0. The latest pickle protocol, protocol 3, should be used
  when migrating data between Python 3.x implementations, as it
  doesn't attempt to remain compatible with Python 2.x.

  （由 Alexandre Vassalotti 和 Antoine Pitrou 在 bpo-6137 中贡献。）

* A new module, "importlib" was added.  It provides a complete,
  portable, pure Python reference implementation of the "import"
  statement and its counterpart, the "__import__()" function.  It
  represents a substantial step forward in documenting and defining
  the actions that take place during imports.

  （由 Brett Cannon 贡献。）


性能优化
========

Major performance enhancements have been added:

* The new I/O library (as defined in **PEP 3116**) was mostly written
  in Python and quickly proved to be a problematic bottleneck in
  Python 3.0. In Python 3.1, the I/O library has been entirely
  rewritten in C and is 2 to 20 times faster depending on the task at
  hand. The pure Python version is still available for experimentation
  purposes through the "_pyio" module.

  （由 Amaury Forgeot d'Arc 和 Antoine Pitrou 贡献。）

* Added a heuristic so that tuples and dicts containing only
  untrackable objects are not tracked by the garbage collector. This
  can reduce the size of collections and therefore the garbage
  collection overhead on long-running programs, depending on their
  particular use of datatypes.

  （由 Antoine Pitrou 在 bpo-4688 中贡献。）

* Enabling a configure option named "--with-computed-gotos" on
  compilers that support it (notably: gcc, SunPro, icc), the bytecode
  evaluation loop is compiled with a new dispatch mechanism which
  gives speedups of up to 20%, depending on the system, the compiler,
  and the benchmark.

  （由 Antoine Pitrou 以及其他一些参与者在 bpo-4753 中贡献。）

* The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times
  faster.

  （由 Antoine Pitrou 和 Amaury Forgeot d'Arc 在 bpo-4868 中贡献。）

* The "json" module now has a C extension to substantially improve its
  performance.  In addition, the API was modified so that json works
  only with "str", not with "bytes".  That change makes the module
  closely match the JSON specification which is defined in terms of
  Unicode.

  （由 Bob Ippolito 在 bpo-4136 中贡献。并由 Antoine Pitrou 和
  Benjamin Peterson 转换为Py3.1）

* Unpickling now interns the attribute names of pickled objects.  This
  saves memory and allows pickles to be smaller.

  （由 Jake McGuire 和 Antoine Pitrou 在 bpo-5084 中贡献。）


IDLE
====

* IDLE's format menu now provides an option to strip trailing
  whitespace from a source file.

  （由 Roger D. Serwy 在 bpo-5150 中贡献。）


构建和 C API 的改变
===================

针对 Python 构建过程和 C API 的改变包括:

* Integers are now stored internally either in base "2**15" or in base
  "2**30", the base being determined at build time.  Previously, they
  were always stored in base "2**15".  Using base "2**30" gives
  significant performance improvements on 64-bit machines, but
  benchmark results on 32-bit machines have been mixed.  Therefore,
  the default is to use base "2**30" on 64-bit machines and base
  "2**15" on 32-bit machines; on Unix, there's a new configure option
  "--enable-big-digits" that can be used to override this default.

  Apart from the performance improvements this change should be
  invisible to end users, with one exception: for testing and
  debugging purposes there's a new "sys.int_info" that provides
  information about the internal format, giving the number of bits per
  digit and the size in bytes of the C type used to store each digit:

     >>> import sys
     >>> sys.int_info
     sys.int_info(bits_per_digit=30, sizeof_digit=4)

  （由 Mark Dickinson在 bpo-4258 贡献）

* The "PyLong_AsUnsignedLongLong()" function now handles a negative
  *pylong* by raising "OverflowError" instead of "TypeError".

  （由 Mark Dickinson 和 Lisandro Dalcrin 在 bpo-5175 中贡献。）

* Deprecated "PyNumber_Int()".  Use "PyNumber_Long()" instead.

  （由 Mark Dickinson在 bpo-4910 贡献）

* Added a new "PyOS_string_to_double()" function to replace the
  deprecated functions "PyOS_ascii_strtod()" and "PyOS_ascii_atof()".

  （由 Mark Dickinson 在 bpo-5914 贡献）

* Added "PyCapsule" as a replacement for the "PyCObject" API. The
  principal difference is that the new type has a well defined
  interface for passing typing safety information and a less
  complicated signature for calling a destructor.  The old type had a
  problematic API and is now deprecated.

  （由 Larry Hastings 在 bpo-5630 中贡献。）


移植到 Python 3.1
=================

This section lists previously described changes and other bugfixes
that may require changes to your code:

* The new floating point string representations can break existing
  doctests. For example:

     def e():
         '''Compute the base of natural logarithms.

         >>> e()
         2.7182818284590451

         '''
         return sum(1/math.factorial(x) for x in reversed(range(30)))

     doctest.testmod()

     **********************************************************************
     Failed example:
         e()
     Expected:
         2.7182818284590451
     Got:
         2.718281828459045
     **********************************************************************

* The automatic name remapping in the pickle module for protocol 2 or
  lower can make Python 3.1 pickles unreadable in Python 3.0.  One
  solution is to use protocol 3.  Another solution is to set the
  *fix_imports* option to "False". See the discussion above for more
  details.
