5. 資料結構

這個章節將會更深入的介紹一些你已經學過的東西的細節上,並且加入一些你還沒有接觸過的部分。

5.1. 進一步了解 List(串列)

List(串列)這個資料型態,具有更多操作的方法。下面條列了所有關於 list 的物件方法:

list.append(x)

Add an item to the end of the list; equivalent to a[len(a):] = [x].

list.extend(L)

Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.

list.insert(i, x)

將一個項目插入至 list 中給定的位置。第一個引數為插入處前元素的索引值,所以 a.insert(0, x) 會插入為 list 首位,而 a.insert(len(a), x) 則相當於 a.append(x)

list.remove(x)

Remove the first item from the list whose value is x. It is an error if there is no such item.

list.pop([i])

移除 list 中給定位置的項目,並回傳它。如果沒有指定位置, a.pop() 將會移除 list 中最後的項目並回傳它。(在 i 周圍的方括號代表這個參數是選用的,並不代表你應該在該位置輸入方括號。你將會常常在 Python 函式庫參考指南中看見這個表示法)

list.index(x)

Return the index in the list of the first item whose value is x. It is an error if there is no such item.

list.count(x)

回傳數值為 x 在 list 中所出現的次數。

list.sort(cmp=None, key=None, reverse=False)

將 list 中的項目排序。(有參數可以使用來進行客製化的排序,請參考 sorted() 部分的解釋)

list.reverse()

Reverse the elements of the list, in place.

以下是一個使用到許多 list 物件方法的例子:

>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print a.count(333), a.count(66.25), a.count('x')
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
>>> a.remove(333)
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
>>> a.pop()
1234.5
>>> a
[-1, 1, 66.25, 333, 333]

You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.

5.1.1. 將 List 作為 Stack(堆疊)使用

List 的操作方法使得它非常簡單可以用來實作 stack(堆疊)。Stack 為一個遵守最後加入元素最先被取回(後進先出,」last-in, first-out」)規則的資料結構。你可以使用方法 append() 將一個項目放到堆疊的頂層。而使用方法 pop() 且不給定索引值去取得堆疊最上面的項目。舉例而言:

>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]

5.1.2. 將 List 作為 Queue(佇列)使用

我們也可以將 list 當作 queue(佇列)使用,即最先加入元素最先被取回(先進先出,」first-in, first-out」)的資料結構。然而,list 在這種使用方式下效率較差。使用 appendpop 來加入和取出尾端的元素較快,而使用 insertpop 來插入和取出頭端的元素較慢(因為其他元素都需要挪動一格)。

如果要實作 queue,請使用 collections.deque ,其被設計成能快速的從頭尾兩端加入和取出。例如:

>>> from collections import deque
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry")           # Terry arrives
>>> queue.append("Graham")          # Graham arrives
>>> queue.popleft()                 # The first to arrive now leaves
'Eric'
>>> queue.popleft()                 # The second to arrive now leaves
'John'
>>> queue                           # Remaining queue in order of arrival
deque(['Michael', 'Terry', 'Graham'])

5.1.3. Functional Programming Tools

There are three built-in functions that are very useful when used with lists: filter(), map(), and reduce().

filter(function, sequence) returns a sequence consisting of those items from the sequence for which function(item) is true. If sequence is a str, unicode or tuple, the result will be of the same type; otherwise, it is always a list. For example, to compute a sequence of numbers divisible by 3 or 5:

>>> def f(x): return x % 3 == 0 or x % 5 == 0
...
>>> filter(f, range(2, 25))
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]

map(function, sequence) calls function(item) for each of the sequence’s items and returns a list of the return values. For example, to compute some cubes:

>>> def cube(x): return x*x*x
...
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

More than one sequence may be passed; the function must then have as many arguments as there are sequences and is called with the corresponding item from each sequence (or None if some sequence is shorter than another). For example:

>>> seq = range(8)
>>> def add(x, y): return x+y
...
>>> map(add, seq, seq)
[0, 2, 4, 6, 8, 10, 12, 14]

reduce(function, sequence) returns a single value constructed by calling the binary function function on the first two items of the sequence, then on the result and the next item, and so on. For example, to compute the sum of the numbers 1 through 10:

>>> def add(x,y): return x+y
...
>>> reduce(add, range(1, 11))
55

If there’s only one item in the sequence, its value is returned; if the sequence is empty, an exception is raised.

A third argument can be passed to indicate the starting value. In this case the starting value is returned for an empty sequence, and the function is first applied to the starting value and the first sequence item, then to the result and the next item, and so on. For example,

>>> def sum(seq):
...     def add(x,y): return x+y
...     return reduce(add, seq, 0)
...
>>> sum(range(1, 11))
55
>>> sum([])
0

Don’t use this example’s definition of sum(): since summing numbers is such a common need, a built-in function sum(sequence) is already provided, and works exactly like this.

5.1.4. List Comprehensions(串列綜合運算)

List Comprehension(串列綜合運算)讓你可以用簡潔的方法創建 list。常見的應用是基於一個 list 或 iterable(可列舉物件),將每一個元素經過某個運算的結果串接起來成為一個新的 list 。或是創建一個 list 的子序列,其每一個元素皆滿足一個特定的條件。

舉例來說,假設我們要創建一個「平方的 list」:

>>> squares = []
>>> for x in range(10):
...     squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can obtain the same result with:

squares = [x**2 for x in range(10)]

This is also equivalent to squares = map(lambda x: x**2, range(10)), but it’s more concise and readable.

一個 list comprehension 的組成,是包含著一個 expression(運算式)和一個 for 語句,再接著零個或多個 forif 語句的一對方括號。結果會是一個新的串列,內容是在接著的 forif 語句的環境下,執行前面 expression 的結果。例如,這個 list comprehension 是由兩個串列中互相不同的元素組合所組成:

>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

and it’s equivalent to:

>>> combs = []
>>> for x in [1,2,3]:
...     for y in [3,1,4]:
...         if x != y:
...             combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

注意 forif 在這兩段程式裡的順序是相同的。

如果 expression 是一個 tuple(例如上面例子中的 (x, y)),它必須加上小括弧:

>>> vec = [-4, -2, 0, 2, 4]
>>> # create a new list with the values doubled
>>> [x*2 for x in vec]
[-8, -4, 0, 4, 8]
>>> # filter the list to exclude negative numbers
>>> [x for x in vec if x >= 0]
[0, 2, 4]
>>> # apply a function to all the elements
>>> [abs(x) for x in vec]
[4, 2, 0, 2, 4]
>>> # call a method on each element
>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
>>> [weapon.strip() for weapon in freshfruit]
['banana', 'loganberry', 'passion fruit']
>>> # create a list of 2-tuples like (number, square)
>>> [(x, x**2) for x in range(6)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]
>>> # the tuple must be parenthesized, otherwise an error is raised
>>> [x, x**2 for x in range(6)]
  File "<stdin>", line 1, in <module>
    [x, x**2 for x in range(6)]
               ^
SyntaxError: invalid syntax
>>> # flatten a list using a listcomp with two 'for'
>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
>>> [num for elem in vec for num in elem]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

List comprehensions 可以含有複雜的 expression 和巢狀的函式呼叫:

>>> from math import pi
>>> [str(round(pi, i)) for i in range(1, 6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']

5.1.4.1. 巢狀的 List Comprehensions

最初放在 list comprehesion 中的 expression 可以是任何形式的 expression,包括再寫一個 list comprehension。

考慮以下表示 3x4 矩陣的範例,使用 list 包含 3 個長度為 4 的 list :

>>> matrix = [
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12],
... ]

以下的 list comprehesion 會將矩陣的行與列作轉置:

>>> [[row[i] for row in matrix] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

如同我們在上一節看到的,此巢狀的 list comprehension 為一個 list comprehension在 for 之前先被計算,接著再作一次 list comprehension,所以,這個例子和下者相同:

>>> transposed = []
>>> for i in range(4):
...     transposed.append([row[i] for row in matrix])
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

因此,也和下者相同:

>>> transposed = []
>>> for i in range(4):
...     # the following 3 lines implement the nested listcomp
...     transposed_row = []
...     for row in matrix:
...         transposed_row.append(row[i])
...     transposed.append(transposed_row)
...
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

在實際運用上,我們傾向於使用內建函式 (built-in functions) 而不是複雜的流程控制陳述式。在這個例子中,使用 zip() 函式會非常有效率:

>>> zip(*matrix)
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

關於星號的更多細節,請參考 Unpacking Argument Lists

5.2. del 陳述式

有一個方法可以藉由索引而不是值來刪除 list 中的項: del 陳述式。這和 pop() 方法傳回一個值不同, del 陳述式可以用來刪除 list 中的片段或者清空整個 list(我們之前藉由指派一個空的 list 給想刪除的片段來完成這件事)。例如:

>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]

del 也可以用來刪除變數:

>>> del a

刪除之後,對 a 的參照將會造成錯誤(至少在另一個值又被指派到它之前)。我們將在後面看到更多關於 del 的其他用法。

5.3. Tuples 和序列 (Sequences)

我們看到 lists 和 strings 有許多共同的特性,像是索引操作 (indexing) 以及切片操作 (slicing) 。他們是 序列 資料結構中的兩個例子(請參考 Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange )。由於 Python 是個持續發展中的語言,未來可能還會有其他的序列資料結構加入。接著要介紹是下一個標準序列資料結構: tuple

一個 tuple 是由若干個值藉由逗號區隔而組成,例如:

>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
>>> # Tuples are immutable:
... t[0] = 88888
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> # but they can contain mutable objects:
... v = ([1, 2, 3], [3, 2, 1])
>>> v
([1, 2, 3], [3, 2, 1])

如同我們看到的,被輸出的 tuple 總是以括號包著,如此巢狀的 tuple 才能被正確的直譯 (interpret);他們可以是以被括號包著或不被包著的方式當作輸入,雖然括號的使用常常是有其必要的(譬如此 tuple 是一個較大的陳述式的一部分)。指派東西給 tuple 中個別的項是不行的,但是可以在 tuple 中放入含有可變項的物件,譬如 list 。

雖然 tuple 和 list 看起來很類似,但是他們通常用在不同的情況與不同目的。 tuple 是 immutable (不可變的),通常儲存異質的序列元素,並可經由拆解(unpacking) (請參考本節後段)或索引 (indexing) 來存取(或者在使用 namedtuples 的時候藉由屬性 (attribute) 來存取)。 list 是 mutable (可變的),其元素通常是同質的且可藉由迭代整個串列來存取。

一個特別的議題是,關於創建一個含有 0 個或 1 個項目的 tuple:語法上會採納一些奇怪的用法。空的 tuple 藉由一對空括號來創建;含有一個項目的 tuple 經由一個值加上一個逗點來創建(不需要用括號把一個單一的值包住)。醜,但有效率。例如:

>>> empty = ()
>>> singleton = 'hello',    # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)

陳述式 t = 12345, 54321, 'hello!' 就是一個 tuple packing 的例子: 1234554321'hello!' 一起被放進 tuple 裡。反向操作也可以:

>>> x, y, z = t

This is called, appropriately enough, sequence unpacking and works for any sequence on the right-hand side. Sequence unpacking requires the list of variables on the left to have the same number of elements as the length of the sequence. Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.

5.4. 集合 (Sets)

Python 也包含了一種用在集合 (sets) 的資料結構。一個 set 是一組無序且沒有重複的元素。基本的使用方式包括了成員測試和消除重複項。 Set 物件也支援聯集,交集,差集和互斥等數學操作。

大括號或 set() 函式都可以用來創建 set 。注意:要創建一個空的 set ,我們必須使用 set() 而不是 {} ;後者會創建一個空的 dictionary ,一種我們將在下一節討論的資料結構。

這裡是一個簡單的演示:

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> fruit = set(basket)               # create a set without duplicates
>>> fruit
set(['orange', 'pear', 'apple', 'banana'])
>>> 'orange' in fruit                 # fast membership testing
True
>>> 'crabgrass' in fruit
False

>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a                                  # unique letters in a
set(['a', 'r', 'b', 'c', 'd'])
>>> a - b                              # letters in a but not in b
set(['r', 'd', 'b'])
>>> a | b                              # letters in either a or b
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
>>> a & b                              # letters in both a and b
set(['a', 'c'])
>>> a ^ b                              # letters in a or b but not both
set(['r', 'd', 'b', 'm', 'z', 'l'])

list comprehensions 類似,也有 set comprehensions (集合綜合運算):

>>> a = {x for x in 'abracadabra' if x not in 'abc'}
>>> a
set(['r', 'd'])

5.5. 字典(Dictionary)

下一個常用的 python 內建資料結構為 dictionary (請參考 Mapping Types — dict )。 Dictionary 有時被稱為 「關聯記憶體」 (associative memories) 或 「關聯矩陣」 (associative arrays) 。不像序列是由一個範圍內的數字當作索引, dictionary 是由 key (鍵)來當索引, key 可以是任何不可變的型態;字串和數字都可以當作 key 。 Tuple 也可以當作 key 如果他們只含有字串、數字或 tuple;若一個 tuple 直接或間接地含有任何可變的物件,它就不能當作 key 。我們無法使用 list 當作 key ,因為 list 可以經由索引操作、切片操作或是方法像是 append()extend() 來修改。

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.

Dict 主要的操作為藉由鍵來儲存一個值並且可藉由該鍵來取出該值。也可以使用 del 來刪除鍵值對。如果我們使用用過的鍵來儲存,該鍵所對應的較舊的值會被覆蓋。使用不存在的鍵來取出值會造成錯誤。

The keys() method of a dictionary object returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the in keyword.

這是個使用一個字典的簡單範例:

>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> tel.keys()
['guido', 'irv', 'jack']
>>> 'guido' in tel
True

函式 dict() 可直接透過一串鍵值對序列來創建 dict:

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}

此外, dict comprehensions 也可以透過鍵與值的陳述式來創建 dict :

>>> {x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}

當鍵是簡單的字串時,使用關鍵字引數 (keyword arguments) 有時會較為簡潔:

>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}

5.6. 迴圈技巧

當對序列作迴圈時,位置索引及其對應的值可以藉由使用 enumerate() 函式來同時取得:

>>> for i, v in enumerate(['tic', 'tac', 'toe']):
...     print i, v
...
0 tic
1 tac
2 toe

要同時對兩個以上的序列作迴圈,可以將其以成對的方式放入 zip() 函式:

>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in zip(questions, answers):
...     print 'What is your {0}?  It is {1}.'.format(q, a)
...
What is your name?  It is lancelot.
What is your quest?  It is the holy grail.
What is your favorite color?  It is blue.

要對序列作反向的迴圈,首先先寫出正向的序列,在對其使用 reversed() 函式:

>>> for i in reversed(xrange(1,10,2)):
...     print i
...
9
7
5
3
1

要以迴圈對序列作排序,使用 sorted() 函式會得到一個新的經排序過的 list ,但不會改變原本的序列:

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> for f in sorted(set(basket)):
...     print f
...
apple
banana
orange
pear

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the iteritems() method.

>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.iteritems():
...     print k, v
...
gallahad the pure
robin the brave

有時我們會想要以迴圈來改變的一個 list,但是,通常創建一個新的 list 會更簡單且安全:

>>> import math
>>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8]
>>> filtered_data = []
>>> for value in raw_data:
...     if not math.isnan(value):
...         filtered_data.append(value)
...
>>> filtered_data
[56.2, 51.7, 55.3, 52.5, 47.8]

5.7. 更多條件式主題

使用在 whileif 內的陳述式可以包含任何運算子,而不是只有比較運算子 (comparisons) 。

比較運算子 innot in 檢查一個數值是否存在(不存在)於一個序列中。運算子 isnot is 比較兩個物件是否真的是相同的物件;這對可變物件例如 list 來說很重要。所有的比較運算子優先度都相同且都低於數值運算子。

比較運算是可以串連在一起的。例如, a < b == c 就是在測試 a 是否小於 bb 是否等於 c

比較運算可以結合布林運算子 andor ,且一個比較運算的結果(或任何其他布林表達式)可以加上 not 來否定。這些運算子的優先度都比比較運算子還低,其中, not 的優先度最高, or 的優先度最低,因此 A and not B or C 等同於 (A and (not B)) or C 。一如往常,括號可以用來表示任何想要的組合。

布林運算子 andor 也被稱為短路 (short-circuit) 運算子:會將其引數從左至右進行運算,當結果出現時即結束運算。例如,若 AC 為真但 B 為假,則 A and B and C 的運算並不會執行到 C 。當運算結果被當成一般數值而非布林值時,短路運算子的回傳值為最後被運算的引數。

將一個比較運算或其他布林表達式的結果指派給一個變數是可以的。例如:

>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
>>> non_null = string1 or string2 or string3
>>> non_null
'Trondheim'

注意, Python 不像 C 語言,在表達式裡進行指派是不行的。 C 語言的程式設計師可能會抱怨這件事,但這樣做避免了在 C 語言裡常見的一種問題:想要打 == 卻在表達式裡輸入 =

5.8. 序列和其他資料結構之比較

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings uses the ASCII ordering for individual characters. Some examples of comparisons between sequences of the same type:

(1, 2, 3)              < (1, 2, 4)
[1, 2, 3]              < [1, 2, 4]
'ABC' < 'C' < 'Pascal' < 'Python'
(1, 2, 3, 4)           < (1, 2, 4)
(1, 2)                 < (1, 2, -1)
(1, 2, 3)             == (1.0, 2.0, 3.0)
(1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)

Note that comparing objects of different types is legal. The outcome is deterministic but arbitrary: the types are ordered by their name. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. 1 Mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc.

註解

1

The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.