11. 标准库简介 —— 第二部分
**************************

第二部分涵盖了专业编程所需要的更高级的模块。这些模块很少用在小脚本中。


11.1. 格式化输出
================

"reprlib" 模块提供了一个定制化版本的 "repr()" 函数，用于缩略显示大型或
深层嵌套的容器对象:

   >>> import reprlib
   >>> reprlib.repr(set('supercalifragilisticexpialidocious'))
   "{'a', 'c', 'd', 'e', 'f', 'g', ...}"

"pprint" 模块提供了更加复杂的打印控制，其输出的内置对象和用户自定义对
象能够被解释器直接读取。当输出结果过长而需要折行时，“美化输出机制”会添
加换行符和缩进，以更清楚地展示数据结构:

   >>> import pprint
   >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
   ...     'yellow'], 'blue']]]
   ...
   >>> pprint.pprint(t, width=30)
   [[[['black', 'cyan'],
      'white',
      ['green', 'red']],
     [['magenta', 'yellow'],
      'blue']]]

"textwrap" 模块能够格式化文本段落，以适应给定的屏幕宽度:

   >>> import textwrap
   >>> doc = """The wrap() method is just like fill() except that it returns
   ... a list of strings instead of one big string with newlines to separate
   ... the wrapped lines."""
   ...
   >>> print(textwrap.fill(doc, width=40))
   The wrap() method is just like fill()
   except that it returns a list of strings
   instead of one big string with newlines
   to separate the wrapped lines.

"locale" 模块处理与特定地域文化相关的数据格式。locale 模块的 format 函
数包含一个 grouping 属性，可直接将数字格式化为带有组分隔符的样式:

   >>> import locale
   >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
   'English_United States.1252'
   >>> conv = locale.localeconv()          # get a mapping of conventions
   >>> x = 1234567.8
   >>> locale.format("%d", x, grouping=True)
   '1,234,567'
   >>> locale.format_string("%s%.*f", (conv['currency_symbol'],
   ...                      conv['frac_digits'], x), grouping=True)
   '$1,234,567.80'


11.2. 模板
==========

"string" 模块包含一个通用的 "Template" 类，具有适用于最终用户的简化语
法。它允许用户在不更改应用逻辑的情况下定制自己的应用。

上述格式化操作是通过占位符实现的，占位符由 "$" 加上合法的 Python 标识
符（只能包含字母、数字和下划线）构成。一旦使用花括号将占位符括起来，就
可以在后面直接跟上更多的字母和数字而无需空格分割。"$$" 将被转义成单个
字符 "$":

   >>> from string import Template
   >>> t = Template('${village}folk send $$10 to $cause.')
   >>> t.substitute(village='Nottingham', cause='the ditch fund')
   'Nottinghamfolk send $10 to the ditch fund.'

如果在字典或关键字参数中未提供某个占位符的值，那么 "substitute()" 方法
将抛出 "KeyError"。对于邮件合并类型的应用，用户提供的数据有可能是不完
整的，此时使用 "safe_substitute()" 方法更加合适 —— 如果数据缺失，它会
直接将占位符原样保留。

   >>> t = Template('Return the $item to $owner.')
   >>> d = dict(item='unladen swallow')
   >>> t.substitute(d)
   Traceback (most recent call last):
     ...
   KeyError: 'owner'
   >>> t.safe_substitute(d)
   'Return the unladen swallow to $owner.'

Template 的子类可以自定义定界符。例如，以下是某个照片浏览器的批量重命
名功能，采用了百分号作为日期、照片序号和照片格式的占位符:

   >>> import time, os.path
   >>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
   >>> class BatchRename(Template):
   ...     delimiter = '%'
   >>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
   Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

   >>> t = BatchRename(fmt)
   >>> date = time.strftime('%d%b%y')
   >>> for i, filename in enumerate(photofiles):
   ...     base, ext = os.path.splitext(filename)
   ...     newname = t.substitute(d=date, n=i, f=ext)
   ...     print('{0} --> {1}'.format(filename, newname))

   img_1074.jpg --> Ashley_0.jpg
   img_1076.jpg --> Ashley_1.jpg
   img_1077.jpg --> Ashley_2.jpg

模板的另一个应用是将程序逻辑与多样的格式化输出细节分离开来。这使得对
XML 文件、纯文本报表和 HTML 网络报表使用自定义模板成为可能。


11.3. 使用二进制数据记录格式
============================

"struct" 模块提供了 "pack()" 和 "unpack()" 函数，用于处理不定长度的二
进制记录格式。下面的例子展示了在不使用 "zipfile" 模块的情况下，如何循
环遍历一个 ZIP 文件的所有头信息。Pack 代码 ""H"" 和 ""I"" 分别代表两字
节和四字节无符号整数。""<"" 代表它们是标准尺寸的小尾型字节序:

   import struct

   with open('myfile.zip', 'rb') as f:
       data = f.read()

   start = 0
   for i in range(3):                      # show the first 3 file headers
       start += 14
       fields = struct.unpack('<IIIHH', data[start:start+16])
       crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

       start += 16
       filename = data[start:start+filenamesize]
       start += filenamesize
       extra = data[start:start+extra_size]
       print(filename, hex(crc32), comp_size, uncomp_size)

       start += extra_size + comp_size     # skip to the next header


11.4. 多线程
============

线程是一种对于非顺序依赖的多个任务进行解耦的技术。多线程可以提高应用的
响应效率，当接收用户输入的同时，保持其他任务在后台运行。一个有关的应用
场景是，将 I/O 和计算运行在两个并行的线程中。

以下代码展示了高阶的 "threading" 模块如何在后台运行任务，且不影响主程
序的继续运行:

   import threading, zipfile

   class AsyncZip(threading.Thread):
       def __init__(self, infile, outfile):
           threading.Thread.__init__(self)
           self.infile = infile
           self.outfile = outfile

       def run(self):
           f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
           f.write(self.infile)
           f.close()
           print('Finished background zip of:', self.infile)

   background = AsyncZip('mydata.txt', 'myarchive.zip')
   background.start()
   print('The main program continues to run in foreground.')

   background.join()    # Wait for the background task to finish
   print('Main program waited until background was done.')

多线程应用面临的主要挑战是，相互协调的多个线程之间需要共享数据或其他资
源。为此，threading 模块提供了多个同步操作原语，包括线程锁、事件、条件
变量和信号量。

尽管这些工具非常强大，但微小的设计错误却可以导致一些难以复现的问题。因
此，实现多任务协作的首选方法是将对资源的所有请求集中到一个线程中，然后
使用 "queue" 模块向该线程供应来自其他线程的请求。应用程序使用 "Queue"
对象进行线程间通信和协调，更易于设计，更易读，更可靠。


11.5. 日志
==========

"logging" 模块提供功能齐全且灵活的日志记录系统。在最简单的情况下，日志
消息被发送到文件或 "sys.stderr"

   import logging
   logging.debug('Debugging information')
   logging.info('Informational message')
   logging.warning('Warning:config file %s not found', 'server.conf')
   logging.error('Error occurred')
   logging.critical('Critical error -- shutting down')

这会产生以下输出:

   WARNING:root:Warning:config file server.conf not found
   ERROR:root:Error occurred
   CRITICAL:root:Critical error -- shutting down

默认情况下，informational 和 debugging 消息被压制，输出会发送到标准错
误流。其他输出选项包括将消息转发到电子邮件，数据报，套接字或 HTTP 服务
器。新的过滤器可以根据消息优先级选择不同的路由方式："DEBUG"，"INFO"，
"WARNING"，"ERROR"，和 "CRITICAL"。

日志系统可以直接从 Python 配置，也可以从用户配置文件加载，以便自定义日
志记录而无需更改应用程序。


11.6. 弱引用
============

Python does automatic memory management (reference counting for most
objects and *garbage collection* to eliminate cycles).  The memory is
freed shortly after the last reference to it has been eliminated.

This approach works fine for most applications but occasionally there
is a need to track objects only as long as they are being used by
something else. Unfortunately, just tracking them creates a reference
that makes them permanent. The "weakref" module provides tools for
tracking objects without creating a reference.  When the object is no
longer needed, it is automatically removed from a weakref table and a
callback is triggered for weakref objects.  Typical applications
include caching objects that are expensive to create:

   >>> import weakref, gc
   >>> class A:
   ...     def __init__(self, value):
   ...         self.value = value
   ...     def __repr__(self):
   ...         return str(self.value)
   ...
   >>> a = A(10)                   # create a reference
   >>> d = weakref.WeakValueDictionary()
   >>> d['primary'] = a            # does not create a reference
   >>> d['primary']                # fetch the object if it is still alive
   10
   >>> del a                       # remove the one reference
   >>> gc.collect()                # run garbage collection right away
   0
   >>> d['primary']                # entry was automatically removed
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
       d['primary']                # entry was automatically removed
     File "C:/python36/lib/weakref.py", line 46, in __getitem__
       o = self.data[key]()
   KeyError: 'primary'


11.7. 用于操作列表的工具
========================

Many data structure needs can be met with the built-in list type.
However, sometimes there is a need for alternative implementations
with different performance trade-offs.

The "array" module provides an "array()" object that is like a list
that stores only homogeneous data and stores it more compactly.  The
following example shows an array of numbers stored as two byte
unsigned binary numbers (typecode ""H"") rather than the usual 16
bytes per entry for regular lists of Python int objects:

   >>> from array import array
   >>> a = array('H', [4000, 10, 700, 22222])
   >>> sum(a)
   26932
   >>> a[1:3]
   array('H', [10, 700])

The "collections" module provides a "deque()" object that is like a
list with faster appends and pops from the left side but slower
lookups in the middle. These objects are well suited for implementing
queues and breadth first tree searches:

   >>> from collections import deque
   >>> d = deque(["task1", "task2", "task3"])
   >>> d.append("task4")
   >>> print("Handling", d.popleft())
   Handling task1

   unsearched = deque([starting_node])
   def breadth_first_search(unsearched):
       node = unsearched.popleft()
       for m in gen_moves(node):
           if is_goal(m):
               return m
           unsearched.append(m)

In addition to alternative list implementations, the library also
offers other tools such as the "bisect" module with functions for
manipulating sorted lists:

   >>> import bisect
   >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
   >>> bisect.insort(scores, (300, 'ruby'))
   >>> scores
   [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]

The "heapq" module provides functions for implementing heaps based on
regular lists.  The lowest valued entry is always kept at position
zero.  This is useful for applications which repeatedly access the
smallest element but do not want to run a full list sort:

   >>> from heapq import heapify, heappop, heappush
   >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
   >>> heapify(data)                      # rearrange the list into heap order
   >>> heappush(data, -5)                 # add a new entry
   >>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
   [-5, 0, 1]


11.8. 十进制浮点运算
====================

The "decimal" module offers a "Decimal" datatype for decimal floating
point arithmetic.  Compared to the built-in "float" implementation of
binary floating point, the class is especially helpful for

* 财务应用和其他需要精确十进制表示的用途，

* 控制精度，

* 控制四舍五入以满足法律或监管要求，

* 跟踪有效小数位，或

* 用户期望结果与手工完成的计算相匹配的应用程序。

例如，使用十进制浮点和二进制浮点数计算70美分手机和5％税的总费用，会产
生的不同结果。如果结果四舍五入到最接近的分数差异会更大:

   >>> from decimal import *
   >>> round(Decimal('0.70') * Decimal('1.05'), 2)
   Decimal('0.74')
   >>> round(.70 * 1.05, 2)
   0.73

The "Decimal" result keeps a trailing zero, automatically inferring
four place significance from multiplicands with two place
significance.  Decimal reproduces mathematics as done by hand and
avoids issues that can arise when binary floating point cannot exactly
represent decimal quantities.

Exact representation enables the "Decimal" class to perform modulo
calculations and equality tests that are unsuitable for binary
floating point:

   >>> Decimal('1.00') % Decimal('.10')
   Decimal('0.00')
   >>> 1.00 % 0.10
   0.09999999999999995

   >>> sum([Decimal('0.1')]*10) == Decimal('1.0')
   True
   >>> sum([0.1]*10) == 1.0
   False

"decimal" 模块的算法提供了尽可能的精度:

   >>> getcontext().prec = 36
   >>> Decimal(1) / Decimal(7)
   Decimal('0.142857142857142857142857142857142857')
