3. 一個非正式的 Python 簡介

在下面的例子中,輸入與輸出的區別在於有無提示符(prompt,>>>):如果要重做範例,你必須在提示符出現的時候,輸入提示符後方的所有內容;那些非提示符開始的文字行是直譯器的輸出。注意到在範例中,若出現單行只有次提示符時,代表該行你必須直接換行;這被使用在多行指令結束輸入時。

在本手冊中的許多範例中,即便他們為互動式地輸入,仍然包含註解。Python 中的註解 (comments) 由 hash 字元 # 開始一直到該行結束。註解可以從該行之首、空白後、或程式碼之後開始,但不會出現在字串之中。hash 字元在字串之中時仍視為一 hash 字元。因為註解只是用來說明程式而不會被 Python 解讀,在練習範例時不一定要輸入。

一些範例如下:

# this is the first comment
spam = 1  # and this is the second comment
          # ... and now a third!
text = "# This is not a comment because it's inside quotes."

3.1. 把 Python 當作計算機使用

讓我們來試試一些簡單的 Python 指令。啟動直譯器並等待第一個主提示符 >>> 出現。(應該不會等太久)

3.1.1. 數字 (Number)

直譯器如同一台簡單的計算機:你可以輸入一個 expression(運算式),它會寫出該式的值。Expression 的語法很使用:運算子 +-*/ 的行為如同大多數的程式語言(例如:Pascal 或 C);括號 () 可以用來分群。例如:

>>> 2 + 2
4
>>> 50 - 5*6
20
>>> (50 - 5.0*6) / 4
5.0
>>> 8 / 5.0
1.6

整數數字(即 2420)為 int 型態,數字有小數點部份的(即 5.01.6)為 float 型態。我們將在之後的教學中看到更多數字相關的型態。

The return type of a division (/) operation depends on its operands. If both operands are of type int, floor division is performed and an int is returned. If either operand is a float, classic division is performed and a float is returned. The // operator is also provided for doing floor division no matter what the operands are. The remainder can be calculated with the % operator:

>>> 17 / 3  # int / int -> int
5
>>> 17 / 3.0  # int / float -> float
5.666666666666667
>>> 17 // 3.0  # explicit floor division discards the fractional part
5.0
>>> 17 % 3  # the % operator returns the remainder of the division
2
>>> 5 * 3 + 2  # result * divisor + remainder
17

在 Python 中,計算冪次 (powers) 可以使用 ** 運算子 1

>>> 5 ** 2  # 5 squared
25
>>> 2 ** 7  # 2 to the power of 7
128

等於符號 (=) 可以用於為變數賦值。賦值完之後,在下個指示符前並不會顯示任何結果:

>>> width = 20
>>> height = 5 * 9
>>> width * height
900

如果一個變數未被「定義 (defined)」(即變數未被賦值),試著使用它時會出現一個錯誤:

>>> n  # try to access an undefined variable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'n' is not defined

浮點數的運算有完善的支援,運算子 (operator) 遇上混合的運算元 (operand) 時會把整數的運算元轉換為浮點數:

>>> 3 * 3.75 / 1.5
7.5
>>> 7.0 / 2
3.5

在互動式模式中,最後一個印出的運算式的結果會被指派至變數 _ 中。這表示當你把 Python 當作桌上計算機使用者,要接續計算變得容易許多:

>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> round(_, 2)
113.06

這個變數應該被使用者視為只能讀取。不應該明確地為它賦值 — 你可以創一個獨立但名稱相同的本地變數來覆蓋掉預設變數和它的神奇行為。

除了 intfloat,Python 還支援了其他的數字型態,包含 DecimalFraction。Python 亦內建支援複數 (complex numbers),並使用 jJ 後綴來指定虛數的部份(即 3+5j)。

3.1.2. 字串 (String)

除了數字之外,Python 也可以操作字串,而表達字串有數種方式。它們可以用包含在單引號 ('...') 或雙引號 ("...") 之中,兩者會得到相同的結果2。使用 \ 跳脫出現於字串中的引號:

>>> 'spam eggs'  # single quotes
'spam eggs'
>>> 'doesn\'t'  # use \' to escape the single quote...
"doesn't"
>>> "doesn't"  # ...or use double quotes instead
"doesn't"
>>> '"Yes," they said.'
'"Yes," they said.'
>>> "\"Yes,\" they said."
'"Yes," they said.'
>>> '"Isn\'t," they said.'
'"Isn\'t," they said.'

In the interactive interpreter, the output string is enclosed in quotes and special characters are escaped with backslashes. While this might sometimes look different from the input (the enclosing quotes could change), the two strings are equivalent. The string is enclosed in double quotes if the string contains a single quote and no double quotes, otherwise it is enclosed in single quotes. The print statement produces a more readable output, by omitting the enclosing quotes and by printing escaped and special characters:

>>> '"Isn\'t," they said.'
'"Isn\'t," they said.'
>>> print '"Isn\'t," they said.'
"Isn't," they said.
>>> s = 'First line.\nSecond line.'  # \n means newline
>>> s  # without print, \n is included in the output
'First line.\nSecond line.'
>>> print s  # with print, \n produces a new line
First line.
Second line.

如果你不希望字元前出現 \ 就被當成特殊字元時,可以改使用 raw string,在第一個包圍引號前加上 r

>>> print 'C:\some\name'  # here \n means newline!
C:\some
ame
>>> print r'C:\some\name'  # note the r before the quote
C:\some\name

字串值可以跨越數行。其中一方式是使用三個重覆引號:"""..."""'''...'''。此時換行會被自動加入字串值中,但也可以在換行前加入 \ 來取消這個行為。在以下的例子中:

print """\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
"""

會產生以下的輸出(注意第一個換行並沒有被包含進字串值中):

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

字串可以使用 + 運算子連接 (concatenate),並用 * 重覆該字串的內容:

>>> # 3 times 'un', followed by 'ium'
>>> 3 * 'un' + 'ium'
'unununium'

兩個以上相鄰的字串值(string literal,即被引號包圍的字串)會被自動連接起來:

>>> 'Py' 'thon'
'Python'

當你想要分段一個非常長的字串時,兩相鄰字串值自動連接的特性十分有用:

>>> text = ('Put several strings within parentheses '
...         'to have them joined together.')
>>> text
'Put several strings within parentheses to have them joined together.'

但這特性只限於兩相鄰的字串值間,而非兩相鄰變數或表達式:

>>> prefix = 'Py'
>>> prefix 'thon'  # can't concatenate a variable and a string literal
  ...
SyntaxError: invalid syntax
>>> ('un' * 3) 'ium'
  ...
SyntaxError: invalid syntax

如果要連接變數們或一個變數與一個字串值,使用 +

>>> prefix + 'thon'
'Python'

字串可以被「索引 indexed」(下標,即 subscripted),第一個字元的索引值為 0。沒有獨立表示字元的型別;一個字元就是一個大小為 1 的字串:

>>> word = 'Python'
>>> word[0]  # character in position 0
'P'
>>> word[5]  # character in position 5
'n'

索引值可以是負的,此時改成從右開始計數:

>>> word[-1]  # last character
'n'
>>> word[-2]  # second-last character
'o'
>>> word[-6]
'P'

注意到因為 -0 等同於 0,負的索引值由 -1 開始。

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

>>> word[0:2]  # characters from position 0 (included) to 2 (excluded)
'Py'
>>> word[2:5]  # characters from position 2 (included) to 5 (excluded)
'tho'

注意到起點永遠被包含,而結尾永遠不被包含。這確保了 s[:i] + s[i:] 永遠等於 s

>>> word[:2] + word[2:]
'Python'
>>> word[:4] + word[4:]
'Python'

切片索引 (slice indices) 有很常用的預設值,省略起點索引值時預設為 0,而省略第二個索引值時預設整個字串被包含在 slice 中:

>>> word[:2]   # character from the beginning to position 2 (excluded)
'Py'
>>> word[4:]   # characters from position 4 (included) to the end
'on'
>>> word[-2:]  # characters from the second-last (included) to the end
'on'

這裡有個簡單記住 slice 是如何運作的方式。想像 slice 的索引值指著字元們之間,其中第一個字元的左側邊緣由 0 計數。則 n 個字元的字串中最後一個字元的右側邊緣會有索引值 n,例如:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

第一行數字給定字串索引值為 0…6 的位置;第二行則標示了負索引值的位置。由 ij 的 slice 包含了標示 ij 邊緣間的所有字元。

對非負數的索引值而言,一個 slice 的長度等於其索引值之差,如果索引值落在字串邊界內。例如,word[1:3] 的長度是 2。

嘗試使用一個過大的索引值會造成錯誤:

>>> word[42]  # the word only has 6 characters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

然而,超出範圍的索引值在 slice 中會被妥善的處理:

>>> word[4:42]
'on'
>>> word[42:]
''

Python 字串無法被改變 — 它們是 immutable。因此,嘗試對字串中某個索引位置賦值會產生錯誤:

>>> word[0] = 'J'
  ...
TypeError: 'str' object does not support item assignment
>>> word[2:] = 'py'
  ...
TypeError: 'str' object does not support item assignment

如果你需要一個不一樣的字串,你必須建立一個新的:

>>> 'J' + word[1:]
'Jython'
>>> word[:2] + 'py'
'Pypy'

內建的函式 len() 回傳一個字串的長度:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

也參考

Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange

Strings, and the Unicode strings described in the next section, are examples of sequence types, and support the common operations supported by such types.

String Methods

Both strings and Unicode strings support a large number of methods for basic transformations and searching.

Format String Syntax

關於透過 str.format() 字串格式化 (string formatting) 的資訊。

String Formatting Operations

The old formatting operations invoked when strings and Unicode strings are the left operand of the % operator are described in more detail here.

3.1.3. Unicode Strings

Starting with Python 2.0 a new data type for storing text data is available to the programmer: the Unicode object. It can be used to store and manipulate Unicode data (see http://www.unicode.org/) and integrates well with the existing string objects, providing auto-conversions where necessary.

Unicode has the advantage of providing one ordinal for every character in every script used in modern and ancient texts. Previously, there were only 256 possible ordinals for script characters. Texts were typically bound to a code page which mapped the ordinals to script characters. This lead to very much confusion especially with respect to internationalization (usually written as i18n'i' + 18 characters + 'n') of software. Unicode solves these problems by defining one code page for all scripts.

Creating Unicode strings in Python is just as simple as creating normal strings:

>>> u'Hello World !'
u'Hello World !'

The small 'u' in front of the quote indicates that a Unicode string is supposed to be created. If you want to include special characters in the string, you can do so by using the Python Unicode-Escape encoding. The following example shows how:

>>> u'Hello\u0020World !'
u'Hello World !'

The escape sequence \u0020 indicates to insert the Unicode character with the ordinal value 0x0020 (the space character) at the given position.

Other characters are interpreted by using their respective ordinal values directly as Unicode ordinals. If you have literal strings in the standard Latin-1 encoding that is used in many Western countries, you will find it convenient that the lower 256 characters of Unicode are the same as the 256 characters of Latin-1.

For experts, there is also a raw mode just like the one for normal strings. You have to prefix the opening quote with 『ur』 to have Python use the Raw-Unicode-Escape encoding. It will only apply the above \uXXXX conversion if there is an uneven number of backslashes in front of the small 『u』.

>>> ur'Hello\u0020World !'
u'Hello World !'
>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'

The raw mode is most useful when you have to enter lots of backslashes, as can be necessary in regular expressions.

Apart from these standard encodings, Python provides a whole set of other ways of creating Unicode strings on the basis of a known encoding.

The built-in function unicode() provides access to all registered Unicode codecs (COders and DECoders). Some of the more well known encodings which these codecs can convert are Latin-1, ASCII, UTF-8, and UTF-16. The latter two are variable-length encodings that store each Unicode character in one or more bytes. The default encoding is normally set to ASCII, which passes through characters in the range 0 to 127 and rejects any other characters with an error. When a Unicode string is printed, written to a file, or converted with str(), conversion takes place using this default encoding.

>>> u"abc"
u'abc'
>>> str(u"abc")
'abc'
>>> u"äöü"
u'\xe4\xf6\xfc'
>>> str(u"äöü")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

To convert a Unicode string into an 8-bit string using a specific encoding, Unicode objects provide an encode() method that takes one argument, the name of the encoding. Lowercase names for encodings are preferred.

>>> u"äöü".encode('utf-8')
'\xc3\xa4\xc3\xb6\xc3\xbc'

If you have data in a specific encoding and want to produce a corresponding Unicode string from it, you can use the unicode() function with the encoding name as the second argument.

>>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
u'\xe4\xf6\xfc'

3.1.4. List(串列)

Python 理解數種複合型資料型別,用來組合不同的數值。當中最多樣變化的型別為 list,可以寫成一系列以逗號分隔的數值(稱之元素,即 item),包含在方括號之中。List 可以包合不同型別的元素,但通常這些元素會有相同的型別:

>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]

如同字串(以及其他內建的 sequence 型別),list 可以被索引和切片 (slice):

>>> squares[0]  # indexing returns the item
1
>>> squares[-1]
25
>>> squares[-3:]  # slicing returns a new list
[9, 16, 25]

所有 slice 操作都會回傳一個新的 list 包含要求的元素。這意謂著以下這個 slice 複製了原本 list(淺複製,即 shallow copy):

>>> squares[:]
[1, 4, 9, 16, 25]

Lists also supports operations like concatenation:

>>> squares + [36, 49, 64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

不同於字串是 immutable,list 是 mutable 型別,即改變 list 的內容是可能的:

>>> cubes = [1, 8, 27, 65, 125]  # something's wrong here
>>> 4 ** 3  # the cube of 4 is 64, not 65!
64
>>> cubes[3] = 64  # replace the wrong value
>>> cubes
[1, 8, 27, 64, 125]

你也可以在 list 的最後加入新元素,透過使用 append() 方法 (method)(我們稍後會看到更多方法的說明):

>>> cubes.append(216)  # add the cube of 6
>>> cubes.append(7 ** 3)  # and the cube of 7
>>> cubes
[1, 8, 27, 64, 125, 216, 343]

也可以對 slice 賦值,這能改變 list 的大小,甚至是清空一個 list:

>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> letters
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> # replace some values
>>> letters[2:5] = ['C', 'D', 'E']
>>> letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']
>>> # now remove them
>>> letters[2:5] = []
>>> letters
['a', 'b', 'f', 'g']
>>> # clear the list by replacing all the elements with an empty list
>>> letters[:] = []
>>> letters
[]

內建的函式 len() 亦可以作用在 list 上:

>>> letters = ['a', 'b', 'c', 'd']
>>> len(letters)
4

也可以嵌套多層 list (建立 list 包含其他 list),例如:

>>> a = ['a', 'b', 'c']
>>> n = [1, 2, 3]
>>> x = [a, n]
>>> x
[['a', 'b', 'c'], [1, 2, 3]]
>>> x[0]
['a', 'b', 'c']
>>> x[0][1]
'b'

3.2. 初探程式設計的前幾步

Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:

>>> # Fibonacci series:
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
...     print b
...     a, b = b, a+b
...
1
1
2
3
5
8

這例子引入了許多新的特性。

  • 第一行出現了多重賦值:變數 ab 同時得到了新的值 0 與 1。在最後一行同樣的賦值再被使用了一次,示範了等號的右項運算 (expression) 會先被計算 (evaluate),賦值再發生。右項的運算式由左至右依序被計算。

  • The while loop executes as long as the condition (here: b < 10) remains true. In Python, like in C, any non-zero integer value is true; zero is false. The condition may also be a string or list value, in fact any sequence; anything with a non-zero length is true, empty sequences are false. The test used in the example is a simple comparison. The standard comparison operators are written the same as in C: < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to).

  • 迴圈的主體會縮排:縮排在 Python 中用來關連一群陳述式。在互動式提示符中,你必須在迴圈內的每一行一開始鍵入 tab 或者(數個)空白來維持縮排。實務上,你會先在文字編輯器中準備好比較複雜的輸入;多數編輯器都有自動縮排的功能。當一個複合陳述式以互動地方式輸入,必須在結束時多加一行空行來代表結束(因為語法解析器無法判斷你何時輸入複合陳述的最後一行)。注意在一個縮排段落內的縮排方式與數量必須維持一致。

  • The print statement writes the value of the expression(s) it is given. It differs from just writing the expression you want to write (as we did earlier in the calculator examples) in the way it handles multiple expressions and strings. Strings are printed without quotes, and a space is inserted between items, so you can format things nicely, like this:

    >>> i = 256*256
    >>> print 'The value of i is', i
    The value of i is 65536
    

    A trailing comma avoids the newline after the output:

    >>> a, b = 0, 1
    >>> while b < 1000:
    ...     print b,
    ...     a, b = b, a+b
    ...
    1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
    

    Note that the interpreter inserts a newline before it prints the next prompt if the last line was not completed.

註解

1

因為 ** 擁有較 - 高的優先次序,-3**2 會被解釋為 -(3**2) 並得到 -9。如果要避免這樣的優先順序以得到 9,你可以使用 (-3)**2

2

不像其他語言,特殊符號如 \n 在單 ('...') 和雙 ("...") 括號中有相同的意思。兩種刮號的唯一差別,在於使用單刮號時,不需要跳脫 (escape) "(但需要跳脫 \'),反之亦同。