5. 数据结构

本章节将详细介绍一些您已经了解的内容,并添加了一些新内容。

5.1. 列表的更多特性

列表数据类型还有很多的方法。这里是列表对象方法的清单:

list.append(x)

Add an item to the end of the list; equivalent to a[len(a):] = [x].

list.extend(L)

Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.

list.insert(i, x)

在给定的位置插入一个元素。第一个参数是要插入的元素的索引,所以 a.insert(0, x) 插入列表头部, a.insert(len(a), x) 等同于 a.append(x)

list.remove(x)

Remove the first item from the list whose value is x. It is an error if there is no such item.

list.pop([i])

删除列表中给定位置的元素并返回它。如果没有给定位置,a.pop() 将会删除并返回列表中的最后一个元素。( 方法签名中 i 两边的方括号表示这个参数是可选的,而不是要你输入方括号。你会在 Python 参考库中经常看到这种表示方法)。

list.index(x)

Return the index in the list of the first item whose value is x. It is an error if there is no such item.

list.count(x)

返回元素 x 在列表中出现的次数。

list.sort(cmp=None, key=None, reverse=False)

对列表中的元素进行排序(参数可用于自定义排序,解释请参见 sorted())。

list.reverse()

Reverse the elements of the list, in place.

多数列表方法示例:

>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print a.count(333), a.count(66.25), a.count('x')
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
>>> a.remove(333)
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
>>> a.pop()
1234.5
>>> a
[-1, 1, 66.25, 333, 333]

You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.

5.1.1. 列表作为栈使用

列表方法使得列表作为堆栈非常容易,最后一个插入,最先取出(“后进先出”)。要添加一个元素到堆栈的顶端,使用 append() 。要从堆栈顶部取出一个元素,使用 pop() ,不用指定索引。例如

>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]

5.1.2. 列表作为队列使用

列表也可以用作队列,其中先添加的元素被最先取出 (“先进先出”);然而列表用作这个目的相当低效。因为在列表的末尾添加和弹出元素非常快,但是在列表的开头插入或弹出元素却很慢 (因为所有的其他元素都必须移动一位)。

若要实现一个队列,可使用 collections.deque,它被设计成可以快速地从两端添加或弹出元素。例如

>>> from collections import deque
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry")           # Terry arrives
>>> queue.append("Graham")          # Graham arrives
>>> queue.popleft()                 # The first to arrive now leaves
'Eric'
>>> queue.popleft()                 # The second to arrive now leaves
'John'
>>> queue                           # Remaining queue in order of arrival
deque(['Michael', 'Terry', 'Graham'])

5.1.3. Functional Programming Tools

There are three built-in functions that are very useful when used with lists: filter(), map(), and reduce().

filter(function, sequence) returns a sequence consisting of those items from the sequence for which function(item) is true. If sequence is a str, unicode or tuple, the result will be of the same type; otherwise, it is always a list. For example, to compute a sequence of numbers divisible by 3 or 5:

>>> def f(x): return x % 3 == 0 or x % 5 == 0
...
>>> filter(f, range(2, 25))
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]

map(function, sequence) calls function(item) for each of the sequence’s items and returns a list of the return values. For example, to compute some cubes:

>>> def cube(x): return x*x*x
...
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

More than one sequence may be passed; the function must then have as many arguments as there are sequences and is called with the corresponding item from each sequence (or None if some sequence is shorter than another). For example:

>>> seq = range(8)
>>> def add(x, y): return x+y
...
>>> map(add, seq, seq)
[0, 2, 4, 6, 8, 10, 12, 14]

reduce(function, sequence) returns a single value constructed by calling the binary function function on the first two items of the sequence, then on the result and the next item, and so on. For example, to compute the sum of the numbers 1 through 10:

>>> def add(x,y): return x+y
...
>>> reduce(add, range(1, 11))
55

If there’s only one item in the sequence, its value is returned; if the sequence is empty, an exception is raised.

A third argument can be passed to indicate the starting value. In this case the starting value is returned for an empty sequence, and the function is first applied to the starting value and the first sequence item, then to the result and the next item, and so on. For example,

>>> def sum(seq):
...     def add(x,y): return x+y
...     return reduce(add, seq, 0)
...
>>> sum(range(1, 11))
55
>>> sum([])
0

Don’t use this example’s definition of sum(): since summing numbers is such a common need, a built-in function sum(sequence) is already provided, and works exactly like this.

5.1.4. 列表推导式

列表推导式提供了一个更简单的创建列表的方法。常见的用法是把某种操作应用于序列或可迭代对象的每个元素上,然后使用其结果来创建列表,或者通过满足某些特定条件元素来创建子序列。

例如,假设我们想创建一个平方列表,像这样

>>> squares = []
>>> for x in range(10):
...     squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can obtain the same result with:

squares = [x**2 for x in range(10)]

This is also equivalent to squares = map(lambda x: x**2, range(10)), but it’s more concise and readable.

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:

>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

and it’s equivalent to:

>>> combs = []
>>> for x in [1,2,3]:
...     for y in [3,1,4]:
...         if x != y:
...             combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

注意在上面两个代码片段中, forif 的顺序是相同的。

如果表达式是一个元组(例如上面的 (x, y)),那么就必须加上括号

>>> vec = [-4, -2, 0, 2, 4]
>>> # create a new list with the values doubled
>>> [x*2 for x in vec]
[-8, -4, 0, 4, 8]
>>> # filter the list to exclude negative numbers
>>> [x for x in vec if x >= 0]
[0, 2, 4]
>>> # apply a function to all the elements
>>> [abs(x) for x in vec]
[4, 2, 0, 2, 4]
>>> # call a method on each element
>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
>>> [weapon.strip() for weapon in freshfruit]
['banana', 'loganberry', 'passion fruit']
>>> # create a list of 2-tuples like (number, square)
>>> [(x, x**2) for x in range(6)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]
>>> # the tuple must be parenthesized, otherwise an error is raised
>>> [x, x**2 for x in range(6)]
  File "<stdin>", line 1, in <module>
    [x, x**2 for x in range(6)]
               ^
SyntaxError: invalid syntax
>>> # flatten a list using a listcomp with two 'for'
>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
>>> [num for elem in vec for num in elem]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

列表推导式可以使用复杂的表达式和嵌套函数

>>> from math import pi
>>> [str(round(pi, i)) for i in range(1, 6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']

5.1.4.1. 嵌套的列表推导式

列表推导式中的初始表达式可以是任何表达式,包括另一个列表推导式。

考虑下面这个 3x4的矩阵,它由3个长度为4的列表组成

>>> matrix = [
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12],
... ]

下面的列表推导式将交换其行和列

>>> [[row[i] for row in matrix] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

如上节所示,嵌套的列表推导式是基于跟随其后的 for 进行求值的,所以这个例子等价于:

>>> transposed = []
>>> for i in range(4):
...     transposed.append([row[i] for row in matrix])
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

反过来说,也等价于

>>> transposed = []
>>> for i in range(4):
...     # the following 3 lines implement the nested listcomp
...     transposed_row = []
...     for row in matrix:
...         transposed_row.append(row[i])
...     transposed.append(transposed_row)
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

实际应用中,你应该会更喜欢使用内置函数去组成复杂的流程语句。 zip() 函数将会很好地处理这种情况

>>> zip(*matrix)
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

关于本行中星号的详细说明,参见 解包参数列表

5.2. The del statement

There is a way to remove an item from a list given its index instead of its value: the del statement. This differs from the pop() method which returns a value. The del statement can also be used to remove slices from a list or clear the entire list (which we did earlier by assignment of an empty list to the slice). For example:

>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]

del 也可以删除整个变量

>>> del a

此后再引用 a 时会报错(直到另一个值被赋给它)。我们会在后面了解到 del 的其他用法。

5.3. 元组和序列

我们看到列表和字符串有很多共同特性,例如索引和切片操作。他们是 序列 数据类型(参见 Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange)中的两种。随着 Python 语言的发展,其他的序列类型也会被加入其中。这里介绍另一种标准序列类型: 元组

一个元组由几个被逗号隔开的值组成,例如

>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
>>> # Tuples are immutable:
... t[0] = 88888
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> # but they can contain mutable objects:
... v = ([1, 2, 3], [3, 2, 1])
>>> v
([1, 2, 3], [3, 2, 1])

如你所见,元组在输出时总是被圆括号包围的,以便正确表示嵌套元组。输入时圆括号可有可无,不过经常会是必须的(如果这个元组是一个更大的表达式的一部分)。给元组中的一个单独的元素赋值是不允许的,当然你可以创建包含可变对象的元组,例如列表。

虽然元组可能看起来与列表很像,但它们通常是在不同的场景被使用,并且有着不同的用途。元组是 immutable ,其序列通常包含不同种类的元素,并且通过解包(这一节下面会解释)或者索引来访问(如果是 namedtuples 的话甚至还可以通过属性访问)。列表是 mutable ,并且列表中的元素一般是同种类型的,并且通过迭代访问。

一个特殊的问题是构造包含0个或1个元素的元组:为了适应这种情况,语法有一些额外的改变。空元组可以直接被一对空圆括号创建,含有一个元素的元组可以通过在这个元素后添加一个逗号来构建(圆括号里只有一个值的话不够明确)。丑陋,但是有效。例如

>>> empty = ()
>>> singleton = 'hello',    # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)

语句 t = 12345, 54321, 'hello!'元组打包 的一个例子:值 12345, 54321'hello!' 被打包进元组。其逆操作也是允许的

>>> x, y, z = t

This is called, appropriately enough, sequence unpacking and works for any sequence on the right-hand side. Sequence unpacking requires the list of variables on the left to have the same number of elements as the length of the sequence. Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.

5.4. 集合

Python也包含有 集合 类型。集合是由不重复元素组成的无序的集。它的基本用法包括成员检测和消除重复元素。集合对象也支持像 联合,交集,差集,对称差分等数学运算。

花括号或 set() 函数可以用来创建集合。注意:要创建一个空集合你只能用 set() 而不能用 {},因为后者是创建一个空字典,这种数据结构我们会在下一节进行讨论。

以下是一些简单的示例

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> fruit = set(basket)               # create a set without duplicates
>>> fruit
set(['orange', 'pear', 'apple', 'banana'])
>>> 'orange' in fruit                 # fast membership testing
True
>>> 'crabgrass' in fruit
False

>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a                                  # unique letters in a
set(['a', 'r', 'b', 'c', 'd'])
>>> a - b                              # letters in a but not in b
set(['r', 'd', 'b'])
>>> a | b                              # letters in either a or b
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
>>> a & b                              # letters in both a and b
set(['a', 'c'])
>>> a ^ b                              # letters in a or b but not both
set(['r', 'd', 'b', 'm', 'z', 'l'])

类似于 列表推导式,集合也支持推导式形式

>>> a = {x for x in 'abracadabra' if x not in 'abc'}
>>> a
set(['r', 'd'])

5.5. 字典

另一个非常有用的 Python 內置数据类型是 字典 (参见 映射类型 — dict)。字典在其他语言里可能会被叫做 联合内存联合数组。与以连续整数为索引的序列不同,字典是以 关键字 为索引的,关键字可以是任意不可变类型,通常是字符串或数字。如果一个元组只包含字符串、数字或元组,那么这个元组也可以用作关键字。但如果元组直接或间接地包含了可变对象,那么它就不能用作关键字。列表不能用作关键字,因为列表可以通过索引、切片或 append()extend() 之类的方法来改变。

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

字典主要的操作是使用关键字存储和解析值。也可以用 del 来删除一个键值对。如果你使用了一个已经存在的关键字来存储值,那么之前与这个关键字关联的值就会被遗忘。用一个不存在的键来取值则会报错。

The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the in keyword.

以下是使用字典的一些简单示例

>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> tel.keys()
['guido', 'irv', 'jack']
>>> 'guido' in tel
True

dict() 构造函数可以直接从键值对序列里创建字典。

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}

此外,字典推导式可以从任意的键值表达式中创建字典

>>> {x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}

当关键字是简单字符串时,有时直接通过关键字参数来指定键值对更方便

>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}

5.6. 循环的技巧

当在序列中循环时,用 enumerate() 函数可以将索引位置和其对应的值同时取出

>>> for i, v in enumerate(['tic', 'tac', 'toe']):
...     print i, v
...
0 tic
1 tac
2 toe

当同时在两个或更多序列中循环时,可以用 zip() 函数将其内元素一一匹配。

>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in zip(questions, answers):
...     print 'What is your {0}?  It is {1}.'.format(q, a)
...
What is your name?  It is lancelot.
What is your quest?  It is the holy grail.
What is your favorite color?  It is blue.

如果要逆向循环一个序列,可以先正向定位序列,然后调用 reversed() 函数

>>> for i in reversed(xrange(1,10,2)):
...     print i
...
9
7
5
3
1

如果要按某个指定顺序循环一个序列,可以用 sorted() 函数,它可以在不改动原序列的基础上返回一个新的排好序的序列

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> for f in sorted(set(basket)):
...     print f
...
apple
banana
orange
pear

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the iteritems() method.

>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.iteritems():
...     print k, v
...
gallahad the pure
robin the brave

有时可能会想在循环时修改列表内容,一般来说改为创建一个新列表是比较简单且安全的

>>> import math
>>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8]
>>> filtered_data = []
>>> for value in raw_data:
...     if not math.isnan(value):
...         filtered_data.append(value)
...
>>> filtered_data
[56.2, 51.7, 55.3, 52.5, 47.8]

5.7. 深入条件控制

whileif 条件句中可以使用任意操作,而不仅仅是比较操作。

比较操作符 innot in 校验一个值是否在(或不在)一个序列里。操作符 isis not 比较两个对象是不是同一个对象,这只对像列表这样的可变对象比较重要。所有的比较操作符都有相同的优先级,且这个优先级比数值运算符低。

比较操作可以传递。例如 a < b == c 会校验是否 a 小于 b 并且 b 等于 c

比较操作可以通过布尔运算符 andor 来组合,并且比较操作(或其他任何布尔运算)的结果都可以用 not 来取反。这些操作符的优先级低于比较操作符;在它们之中,not 优先级最高, or 优先级最低,因此 A and not B or C 等价于 (A and (not B)) or C。和之前一样,你也可以在这种式子里使用圆括号。

布尔运算符 andor 也被称为 短路 运算符:它们的参数从左至右解析,一旦可以确定结果解析就会停止。例如,如果 AC 为真而 B 为假,那么 A and B and C 不会解析 C。当用作普通值而非布尔值时,短路操作符的返回值通常是最后一个变量。

也可以把比较操作或者逻辑表达式的结果赋值给一个变量,例如

>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
>>> non_null = string1 or string2 or string3
>>> non_null
'Trondheim'

注意 Python 与 C 不同,赋值操作不能发生在表达式内部。C程序员可能会对此抱怨,但它避免了一类C程序中常见的错误:想在表达式中写 == 时却写成了 =

5.8. 比较序列和其他类型

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings uses the ASCII ordering for individual characters. Some examples of comparisons between sequences of the same type:

(1, 2, 3)              < (1, 2, 4)
[1, 2, 3]              < [1, 2, 4]
'ABC' < 'C' < 'Pascal' < 'Python'
(1, 2, 3, 4)           < (1, 2, 4)
(1, 2)                 < (1, 2, -1)
(1, 2, 3)             == (1.0, 2.0, 3.0)
(1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)

Note that comparing objects of different types is legal. The outcome is deterministic but arbitrary: the types are ordered by their name. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. 1 Mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc.

备注

1

The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.