10. 标准库简介¶
10.1. 操作系统接口¶
os
模块提供了许多与操作系统交互的函数:
>>> import os
>>> os.getcwd() # Return the current working directory
'C:\\Python26'
>>> os.chdir('/server/accesslogs') # Change current working directory
>>> os.system('mkdir today') # Run the command mkdir in the system shell
0
一定要使用 import os
而不是 from os import *
。这将避免内建的 open()
函数被 os.open()
隐式替换掉,它们的使用方式大不相同。
内置的 dir()
和 help()
函数可用作交互式辅助工具,用于处理大型模块,如 os
:
>>> import os
>>> dir(os)
<returns a list of all module functions>
>>> help(os)
<returns an extensive manual page created from the module's docstrings>
对于日常文件和目录管理任务, shutil
模块提供了更易于使用的更高级别的接口:
>>> import shutil
>>> shutil.copyfile('data.db', 'archive.db')
>>> shutil.move('/build/executables', 'installdir')
10.2. 文件通配符¶
glob
模块提供了一个在目录中使用通配符搜索创建文件列表的函数:
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']
10.3. 命令行参数¶
通用实用程序脚本通常需要处理命令行参数。这些参数作为列表存储在 sys
模块的 argv 属性中。例如,以下输出来自在命令行运行 python demo.py one two three
>>> import sys
>>> print sys.argv
['demo.py', 'one', 'two', 'three']
The getopt
module processes sys.argv using the conventions of the Unix
getopt()
function. More powerful and flexible command line processing is
provided by the argparse
module.
10.4. 错误输出重定向和程序终止¶
sys
模块还具有 stdin , stdout 和 stderr 的属性。后者对于发出警告和错误消息非常有用,即使在 stdout 被重定向后也可以看到它们:
>>> sys.stderr.write('Warning, log file not found starting a new one\n')
Warning, log file not found starting a new one
终止脚本的最直接方法是使用 sys.exit()
。
10.5. 字符串模式匹配¶
re
模块为高级字符串处理提供正则表达式工具。对于复杂的匹配和操作,正则表达式提供简洁,优化的解决方案:
>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'
当只需要简单的功能时,首选字符串方法因为它们更容易阅读和调试:
>>> 'tea for too'.replace('too', 'two')
'tea for two'
10.6. 数学¶
math
模块提供对浮点数学的底层C库函数的访问:
>>> import math
>>> math.cos(math.pi / 4.0)
0.70710678118654757
>>> math.log(1024, 2)
10.0
random
模块提供了进行随机选择的工具:
>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(xrange(100), 10) # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random() # random float
0.17970987693706186
>>> random.randrange(6) # random integer chosen from range(6)
4
10.7. 互联网访问¶
There are a number of modules for accessing the internet and processing internet
protocols. Two of the simplest are urllib2
for retrieving data from URLs
and smtplib
for sending mail:
>>> import urllib2
>>> for line in urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
... if 'EST' in line or 'EDT' in line: # look for Eastern Time
... print line
<BR>Nov. 25, 09:43:32 PM EST
>>> import smtplib
>>> server = smtplib.SMTP('localhost')
>>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org',
... """To: jcaesar@example.org
... From: soothsayer@example.org
...
... Beware the Ides of March.
... """)
>>> server.quit()
(请注意,第二个示例需要在localhost上运行的邮件服务器。)
10.8. 日期和时间¶
datetime
模块提供了以简单和复杂的方式操作日期和时间的类。虽然支持日期和时间算法,但实现的重点是有效的成员提取以进行输出格式化和操作。该模块还支持可感知时区的对象。
>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
>>> # dates support calendar arithmetic
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368
10.9. 数据压缩¶
Common data archiving and compression formats are directly supported by modules
including: zlib
, gzip
, bz2
, zipfile
and
tarfile
.
>>> import zlib
>>> s = 'witch which has which witches wrist watch'
>>> len(s)
41
>>> t = zlib.compress(s)
>>> len(t)
37
>>> zlib.decompress(t)
'witch which has which witches wrist watch'
>>> zlib.crc32(s)
226805979
10.10. 性能测量¶
一些Python用户对了解同一问题的不同方法的相对性能产生了浓厚的兴趣。 Python提供了一种可以立即回答这些问题的测量工具。
例如,元组封包和拆包功能相比传统的交换参数可能更具吸引力。timeit
模块可以快速演示在运行效率方面一定的优势:
>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791
与 timeit
的精细粒度级别相反, profile
和 pstats
模块提供了用于在较大的代码块中识别时间关键部分的工具。
10.11. 质量控制¶
开发高质量软件的一种方法是在开发过程中为每个函数编写测试,并在开发过程中经常运行这些测试。
doctest
模块提供了一个工具,用于扫描模块并验证程序文档字符串中嵌入的测试。测试构造就像将典型调用及其结果剪切并粘贴到文档字符串一样简单。这通过向用户提供示例来改进文档,并且它允许doctest模块确保代码保持对文档的真实:
def average(values):
"""Computes the arithmetic mean of a list of numbers.
>>> print average([20, 30, 70])
40.0
"""
return sum(values, 0.0) / len(values)
import doctest
doctest.testmod() # automatically validate the embedded tests
unittest
模块不像 doctest
模块那样易于使用,但它允许在一个单独的文件中维护更全面的测试集:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self):
self.assertEqual(average([20, 30, 70]), 40.0)
self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
with self.assertRaises(ZeroDivisionError):
average([])
with self.assertRaises(TypeError):
average(20, 30, 70)
unittest.main() # Calling from the command line invokes all tests
10.12. 自带电池¶
Python有“自带电池”的理念。通过其包的复杂和强大功能可以最好地看到这一点。例如:
The
xmlrpclib
andSimpleXMLRPCServer
modules make implementing remote procedure calls into an almost trivial task. Despite the modules names, no direct knowledge or handling of XML is needed.The
email
package is a library for managing email messages, including MIME and other RFC 2822-based message documents. Unlikesmtplib
andpoplib
which actually send and receive messages, the email package has a complete toolset for building or decoding complex message structures (including attachments) and for implementing internet encoding and header protocols.The
xml.dom
andxml.sax
packages provide robust support for parsing this popular data interchange format. Likewise, thecsv
module supports direct reads and writes in a common database format. Together, these modules and packages greatly simplify data interchange between Python applications and other tools.