3. 파이썬의 간략한 소개¶

다음에 나올 예에서, 입력과 출력은 프롬프트(>>> 와 …)의 존재 여부로 구분됩니다: 예제를 실행하기 위해서는 프롬프트가 나올 때 프롬프트 뒤에 오는 모든 것들을 입력해야 합니다; 프롬프트로 시작하지 않는 줄들은 인터프리터가 출력하는 것들입니다. 예에서 보조 프롬프트 외에 아무것도 없는 줄은 빈 줄을 입력해야 한다는 뜻임에 주의하세요; 여러 줄로 구성된 명령을 끝내는 방법입니다.

이 설명서에 나오는 많은 예는 (대화형 프롬프트에서 입력되는 것들조차도) 주석을 포함하고 있습니다. 파이썬에서 주석은 해시 문자, #, 로 시작하고 줄의 끝까지 이어집니다. 주석은 줄의 처음에서 시작할 수도 있고, 공백이나 코드 뒤에 나올 수도 있습니다. 하지만 문자열 리터럴 안에는 들어갈 수 없습니다. 문자열 리터럴 안에 등장하는 해시 문자는 주석이 아니라 해시 문자일 뿐입니다. 주석은 코드의 의미를 정확히 전달하기 위한 것이고, 파이썬이 해석하지 않는 만큼, 예를 입력할 때는 생략해도 됩니다.

몇 가지 예를 듭니다:

# this is the first comment
spam = 1  # and this is the second comment
          # ... and now a third!
text = "# This is not a comment because it's inside quotes."

3.1. 파이썬을 계산기로 사용하기¶

몇 가지 간단한 파이썬 명령을 사용해봅시다. 인터프리터를 실행하고 기본 프롬프트, >>>, 를 기다리세요. (얼마 걸리지 않아야 합니다.)

3.1.1. 숫자¶

인터프리터는 간단한 계산기로 기능합니다: 표현식을 입력하면 값을 출력합니다. 표현식 문법은 간단합니다. +, -, *, / 연산자들은 대부분의 다른 언어들 (예를 들어, 파스칼이나 C)처럼 동작합니다; 괄호 (()) 는 묶는 데 사용합니다. 예를 들어:

>>> 2 + 2
4
>>> 50 - 5*6
20
>>> (50 - 5.0*6) / 4
5.0
>>> 8 / 5.0
1.6

정수 (예를 들어 2, 4, 20)는 int 형입니다. 소수부가 있는 것들 (예를 들어 5.0, 1.6)은 float 형입니다. 이 자습서 뒤에서 숫자 형들에 관해 더 자세히 살펴볼 예정입니다.

The return type of a division (/) operation depends on its operands. If both operands are of type int, floor division is performed and an int is returned. If either operand is a float, classic division is performed and a float is returned. The // operator is also provided for doing floor division no matter what the operands are. The remainder can be calculated with the % operator:

>>> 17 / 3  # int / int -> int
5
>>> 17 / 3.0  # int / float -> float
5.666666666666667
>>> 17 // 3.0  # explicit floor division discards the fractional part
5.0
>>> 17 % 3  # the % operator returns the remainder of the division
2
>>> 5 * 3 + 2  # result * divisor + remainder
17

파이썬에서는 거듭제곱을 계산할 때 ** 연산자를 사용합니다 1:

>>> 5 ** 2  # 5 squared
25
>>> 2 ** 7  # 2 to the power of 7
128

변수에 값을 대입할 때는 등호(=)를 사용합니다. 이 경우 다음 대화형 프롬프트 전에 표시되는 출력은 없습니다:

>>> width = 20
>>> height = 5 * 9
>>> width * height
900

변수가 《정의되어》 있지 않을 때 (값을 대입하지 않았을 때) 사용하려고 시도하는 것은 에러를 일으킵니다:

>>> n  # try to access an undefined variable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'n' is not defined

실수를 본격적으로 지원합니다; 서로 다른 형의 피연산자를 갖는 연산자는 정수 피연산자를 실수로 변환합니다:

>>> 3 * 3.75 / 1.5
7.5
>>> 7.0 / 2
3.5

대화형 모드에서는, 마지막에 인쇄된 표현식은 변수 _ 에 대입됩니다. 이것은 파이썬을 탁상용 계산기로 사용할 때, 계산을 이어 가기가 좀 더 쉬워짐을 의미합니다. 예를 들어:

>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> round(_, 2)
113.06

이 변수는 사용자로서는 읽기만 가능한 것처럼 취급되어야 합니다. 값을 직접 대입하지 마세요 — 만약 그렇게 한다면 같은 이름의 지역 변수를 새로 만드는 것이 되는데, 내장 변수의 마술 같은 동작을 차단하는 결과를 낳습니다.

int 와 float 에 더해, 파이썬은 Decimal 이나 Fraction 등의 다른 형의 숫자들도 지원합니다. 파이썬은 복소수 에 대한 지원도 내장하고 있는데, 허수부를 가리키는데 j 나 J 접미사를 사용합니다 (예를 들어 3+5j).

3.1.2. 문자열¶

숫자와는 별개로, 파이썬은 문자열도 다룰 수 있는데 여러 가지 방법으로 표현됩니다. 작은따옴표('...') 나 큰따옴표("...")로 둘러쌀 수 있는데 모두 같은 결과를 줍니다 2. 따옴표를 이스케이핑 할 때는 \ 를 사용할 수 있습니다:

>>> 'spam eggs'  # single quotes
'spam eggs'
>>> 'doesn\'t'  # use \' to escape the single quote...
"doesn't"
>>> "doesn't"  # ...or use double quotes instead
"doesn't"
>>> '"Yes," they said.'
'"Yes," they said.'
>>> "\"Yes,\" they said."
'"Yes," they said.'
>>> '"Isn\'t," they said.'
'"Isn\'t," they said.'

In the interactive interpreter, the output string is enclosed in quotes and special characters are escaped with backslashes. While this might sometimes look different from the input (the enclosing quotes could change), the two strings are equivalent. The string is enclosed in double quotes if the string contains a single quote and no double quotes, otherwise it is enclosed in single quotes. The print statement produces a more readable output, by omitting the enclosing quotes and by printing escaped and special characters:

>>> '"Isn\'t," they said.'
'"Isn\'t," they said.'
>>> print '"Isn\'t," they said.'
"Isn't," they said.
>>> s = 'First line.\nSecond line.'  # \n means newline
>>> s  # without print, \n is included in the output
'First line.\nSecond line.'
>>> print s  # with print, \n produces a new line
First line.
Second line.

\ 뒤에 나오는 문자가 특수 문자로 취급되게 하고 싶지 않다면, 첫 따옴표 앞에 r 을 붙여서 날 문자열 (raw string) 을 만들 수 있습니다:

>>> print 'C:\some\name'  # here \n means newline!
C:\some
ame
>>> print r'C:\some\name'  # note the r before the quote
C:\some\name

문자열 리터럴은 여러 줄로 확장될 수 있습니다. 한 가지 방법은 삼중 따옴표를 사용하는 것입니다: """...""" 또는 '''...'''. 줄 넘김 문자는 자동으로 문자열에 포함됩니다. 하지만 줄 끝에 \ 를 붙여 이를 방지할 수도 있습니다. 다음 예:

print """\
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
"""

는 이런 결과를 출력합니다 (첫 번째 개행문자가 포함되지 않는 것에 주목하세요):

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

문자열은 + 연산자로 이어붙이고, * 연산자로 반복시킬 수 있습니다:

>>> # 3 times 'un', followed by 'ium'
>>> 3 * 'un' + 'ium'
'unununium'

두 개 이상의 문자열 리터럴 (즉, 따옴표로 둘러싸인 것들) 가 연속해서 나타나면 자동으로 이어 붙여집니다.

>>> 'Py' 'thon'
'Python'

이 기능은 긴 문자열을 쪼개고자 할 때 특별히 쓸모 있습니다:

>>> text = ('Put several strings within parentheses '
...         'to have them joined together.')
>>> text
'Put several strings within parentheses to have them joined together.'

이것은 오직 두 개의 리터럴에만 적용될 뿐 변수나 표현식에는 해당하지 않습니다:

>>> prefix = 'Py'
>>> prefix 'thon'  # can't concatenate a variable and a string literal
  ...
SyntaxError: invalid syntax
>>> ('un' * 3) 'ium'
  ...
SyntaxError: invalid syntax

변수들끼리 혹은 변수와 문자열 리터럴을 이어붙이려면 + 를 사용해야 합니다

>>> prefix + 'thon'
'Python'

문자열은 인덱스 (서브 스크립트) 될 수 있습니다. 첫 번째 문자가 인덱스 0에 대응됩니다. 문자를 위한 별도의 형은 없습니다; 단순히 길이가 1인 문자열입니다:

>>> word = 'Python'
>>> word[0]  # character in position 0
'P'
>>> word[5]  # character in position 5
'n'

인덱스는 음수가 될 수도 있는데, 끝에서부터 셉니다:

>>> word[-1]  # last character
'n'
>>> word[-2]  # second-last character
'o'
>>> word[-6]
'P'

-0은 0과 같으므로, 음의 인덱스는 -1에서 시작한다는 것에 주목하세요.

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

>>> word[0:2]  # characters from position 0 (included) to 2 (excluded)
'Py'
>>> word[2:5]  # characters from position 2 (included) to 5 (excluded)
'tho'

시작 위치의 문자는 항상 포함되는 반면, 종료 위치의 문자는 항상 포함되지 않는 것에 주의하세요. 이 때문에 s[:i] + s[i:] 는 항상 s 와 같아집니다

>>> word[:2] + word[2:]
'Python'
>>> word[:4] + word[4:]
'Python'

슬라이스 인덱스는 편리한 기본값을 갖고 있습니다; 첫 번째 인덱스를 생략하면 기본값 0 이 사용되고, 두 번째 인덱스가 생략되면 기본값으로 슬라이싱 되는 문자열의 길이가 사용됩니다.

>>> word[:2]   # character from the beginning to position 2 (excluded)
'Py'
>>> word[4:]   # characters from position 4 (included) to the end
'on'
>>> word[-2:]  # characters from the second-last (included) to the end
'on'

슬라이스가 동작하는 방법을 기억하는 한 가지 방법은 인덱스가 문자들 사이의 위치를 가리킨다고 생각하는 것입니다. 첫 번째 문자의 왼쪽 경계가 0입니다. n 개의 문자들로 구성된 문자열의 오른쪽 끝 경계는 인덱스 n 이 됩니다, 예를 들어:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

첫 번째 숫자 행은 인덱스 0…6 의 위치를 보여주고; 두 번째 행은 대응하는 음의 인덱스들을 보여줍니다. i 에서 j 범위의 슬라이스는 i 와 j 로 번호 붙여진 경계 사이의 문자들로 구성됩니다.

음이 아닌 인덱스들의 경우, 두 인덱스 모두 범위 내에 있다면 슬라이스의 길이는 인덱스 간의 차입니다. 예를 들어 word[1:3] 의 길이는 2입니다.

너무 큰 값을 인덱스로 사용하는 것은 에러입니다:

>>> word[42]  # the word only has 6 characters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

하지만, 범위를 벗어나는 슬라이스 인덱스는 슬라이싱할 때 부드럽게 처리됩니다:

>>> word[4:42]
'on'
>>> word[42:]
''

파이썬 문자열은 변경할 수 없다 — 불변 이라고 합니다. 그래서 문자열의 인덱스로 참조한 위치에 대입하려고 하면 에러를 일으킵니다:

>>> word[0] = 'J'
  ...
TypeError: 'str' object does not support item assignment
>>> word[2:] = 'py'
  ...
TypeError: 'str' object does not support item assignment

다른 문자열이 필요하면, 새로 만들어야 합니다:

>>> 'J' + word[1:]
'Jython'
>>> word[:2] + 'py'
'Pypy'

내장 함수 len() 은 문자열의 길이를 돌려줍니다:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange: Strings, and the Unicode strings described in the next section, are examples of sequence types, and support the common operations supported by such types.
문자열 메서드: Both strings and Unicode strings support a large number of methods for basic transformations and searching.
Format String Syntax: str.format() 으로 문자열을 포맷하는 방법에 대한 정보.
String Formatting Operations: The old formatting operations invoked when strings and Unicode strings are the left operand of the % operator are described in more detail here.

3.1.3. Unicode Strings¶

Starting with Python 2.0 a new data type for storing text data is available to the programmer: the Unicode object. It can be used to store and manipulate Unicode data (see http://www.unicode.org/) and integrates well with the existing string objects, providing auto-conversions where necessary.

Unicode has the advantage of providing one ordinal for every character in every script used in modern and ancient texts. Previously, there were only 256 possible ordinals for script characters. Texts were typically bound to a code page which mapped the ordinals to script characters. This lead to very much confusion especially with respect to internationalization (usually written as i18n — 'i' + 18 characters + 'n') of software. Unicode solves these problems by defining one code page for all scripts.

Creating Unicode strings in Python is just as simple as creating normal strings:

>>> u'Hello World !'
u'Hello World !'

The small 'u' in front of the quote indicates that a Unicode string is supposed to be created. If you want to include special characters in the string, you can do so by using the Python Unicode-Escape encoding. The following example shows how:

>>> u'Hello\u0020World !'
u'Hello World !'

The escape sequence \u0020 indicates to insert the Unicode character with the ordinal value 0x0020 (the space character) at the given position.

Other characters are interpreted by using their respective ordinal values directly as Unicode ordinals. If you have literal strings in the standard Latin-1 encoding that is used in many Western countries, you will find it convenient that the lower 256 characters of Unicode are the same as the 256 characters of Latin-1.

For experts, there is also a raw mode just like the one for normal strings. You have to prefix the opening quote with 〈ur〉 to have Python use the Raw-Unicode-Escape encoding. It will only apply the above \uXXXX conversion if there is an uneven number of backslashes in front of the small 〈u〉.

>>> ur'Hello\u0020World !'
u'Hello World !'
>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'

The raw mode is most useful when you have to enter lots of backslashes, as can be necessary in regular expressions.

Apart from these standard encodings, Python provides a whole set of other ways of creating Unicode strings on the basis of a known encoding.

The built-in function unicode() provides access to all registered Unicode codecs (COders and DECoders). Some of the more well known encodings which these codecs can convert are Latin-1, ASCII, UTF-8, and UTF-16. The latter two are variable-length encodings that store each Unicode character in one or more bytes. The default encoding is normally set to ASCII, which passes through characters in the range 0 to 127 and rejects any other characters with an error. When a Unicode string is printed, written to a file, or converted with str(), conversion takes place using this default encoding.

>>> u"abc"
u'abc'
>>> str(u"abc")
'abc'
>>> u"äöü"
u'\xe4\xf6\xfc'
>>> str(u"äöü")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

To convert a Unicode string into an 8-bit string using a specific encoding, Unicode objects provide an encode() method that takes one argument, the name of the encoding. Lowercase names for encodings are preferred.

>>> u"äöü".encode('utf-8')
'\xc3\xa4\xc3\xb6\xc3\xbc'

If you have data in a specific encoding and want to produce a corresponding Unicode string from it, you can use the unicode() function with the encoding name as the second argument.

>>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
u'\xe4\xf6\xfc'

3.1.4. 리스트¶

파이썬은 다른 값들을 덩어리로 묶는데 사용되는 여러 가지 컴파운드 (compound) 자료 형을 알고 있습니다. 가장 융통성이 있는 것은 리스트 인데, 꺾쇠괄호 사이에 쉼표로 구분된 값(항목)들의 목록으로 표현될 수 있습니다. 리스트는 서로 다른 형의 항목들을 포함할 수 있지만, 항목들이 모두 같은 형인 경우가 많습니다.

>>> squares = [1, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]

문자열(그리고, 다른 모든 내장 시퀀스 형들)처럼 리스트는 인덱싱하고 슬라이싱할 수 있습니다:

>>> squares[0]  # indexing returns the item
1
>>> squares[-1]
25
>>> squares[-3:]  # slicing returns a new list
[9, 16, 25]

모든 슬라이스 연산은 요청한 항목들을 포함하는 새 리스트를 돌려줍니다. 이는 다음과 같은 슬라이스가 리스트의 새로운 (얕은) 복사본을 돌려준다는 뜻입니다:

>>> squares[:]
[1, 4, 9, 16, 25]

Lists also supports operations like concatenation:

>>> squares + [36, 49, 64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

불변 인 문자열과는 달리, 리스트는 가변 입니다. 즉 내용을 변경할 수 있습니다:

>>> cubes = [1, 8, 27, 65, 125]  # something's wrong here
>>> 4 ** 3  # the cube of 4 is 64, not 65!
64
>>> cubes[3] = 64  # replace the wrong value
>>> cubes
[1, 8, 27, 64, 125]

append() 메서드 (method) (나중에 메서드에 대해 더 자세히 알아볼 것입니다) 를 사용하면 리스트의 끝에 새 항목을 추가할 수 있습니다:

>>> cubes.append(216)  # add the cube of 6
>>> cubes.append(7 ** 3)  # and the cube of 7
>>> cubes
[1, 8, 27, 64, 125, 216, 343]

슬라이스에 대입하는 것도 가능한데, 리스트의 길이를 변경할 수 있고, 모든 항목을 삭제할 수조차 있습니다:

>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> letters
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> # replace some values
>>> letters[2:5] = ['C', 'D', 'E']
>>> letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']
>>> # now remove them
>>> letters[2:5] = []
>>> letters
['a', 'b', 'f', 'g']
>>> # clear the list by replacing all the elements with an empty list
>>> letters[:] = []
>>> letters
[]

내장 함수 len() 은 리스트에도 적용됩니다:

>>> letters = ['a', 'b', 'c', 'd']
>>> len(letters)
4

리스트를 중첩할 수도 있습니다. (다른 리스트를 포함하는 리스트를 만듭니다). 예를 들어:

>>> a = ['a', 'b', 'c']
>>> n = [1, 2, 3]
>>> x = [a, n]
>>> x
[['a', 'b', 'c'], [1, 2, 3]]
>>> x[0]
['a', 'b', 'c']
>>> x[0][1]
'b'

3.2. 프로그래밍으로의 첫걸음¶

물론, 2 에 2를 더하는 것보다는 더 복잡한 방법으로 파이썬을 사용할 수 있습니다. 예를 들어, 다음처럼 피보나치 (Fibonacci) 수열의 앞부분을 계산할 수 있습니다:

>>> # Fibonacci series:
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
...     print b
...     a, b = b, a+b
...
1
1
2
3
5
8

이 예는 몇 가지 새로운 기능을 소개하고 있습니다.

첫 줄은 다중 대입 을 포함하고 있습니다: 변수 a 와 b 에 동시에 값 0과1이 대입됩니다. 마지막 줄에서 다시 사용되는데, 대입이 어느 하나라도 이루어지기 전에 우변의 표현식들이 모두 계산됩니다. 우변의 표현식은 왼쪽부터 오른쪽으로 가면서 순서대로 계산됩니다.
while 루프는 조건(여기서는: b < 10)이 참인 동안 실행됩니다. C와 마찬가지로 파이썬에서 0 이 아닌 모든 정수는 참이고, 0은 거짓입니다. 조건은 문자열이나 리스트 (사실 모든 종류의 시퀀스)가 될 수도 있는데 길이가 0 이 아닌 것은 모두 참이고, 빈 시퀀스는 거짓입니다. 이 예에서 사용한 검사는 간단한 비교입니다. 표준 비교 연산자는 C와 같은 방식으로 표현됩니다: < (작다), > (크다), == (같다), <= (작거나 같다), >= (크거나 같다), != (다르다).
루프의 바디 (body) 는 들여쓰기 됩니다. 들여쓰기는 파이썬에서 문장을 덩어리로 묶는 방법입니다. 대화형 프롬프트에서 각각 들여 쓰는 줄에서 탭(tab)이나 공백(space)을 입력해야 합니다. 실제적으로는 텍스트 편집기를 사용해서 좀 더 복잡한 파이썬 코드를 준비하게 됩니다; 웬만한 텍스트 편집기들은 자동 들여쓰기 기능을 제공합니다. 복합문을 대화형으로 입력할 때는 끝을 알리기 위해 빈 줄을 입력해야 합니다. (해석기가 언제 마지막 줄을 입력할지 짐작할 수 없기 때문입니다.) 같은 블록에 포함되는 모든 줄은 같은 양만큼 들여쓰기 되어야 함에 주의하세요.
The print statement writes the value of the expression(s) it is given. It differs from just writing the expression you want to write (as we did earlier in the calculator examples) in the way it handles multiple expressions and strings. Strings are printed without quotes, and a space is inserted between items, so you can format things nicely, like this:
```
>>> i = 256*256
>>> print 'The value of i is', i
The value of i is 65536
```
A trailing comma avoids the newline after the output:
```
>>> a, b = 0, 1
>>> while b < 1000:
...     print b,
...     a, b = b, a+b
...
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
```
Note that the interpreter inserts a newline before it prints the next prompt if the last line was not completed.

각주

1: ** 가 - 보다 우선순위가 높으므로, -3**2 는 -(3**2)``로 해석되어서 결과는 ``-9``가 됩니다. ``9``를 얻고 싶으면 ``(-3)**2 를 사용할 수 있습니다.
2: 다른 언어들과는 달리, `` `` 과 같은 특수 문자들은 작은따옴표('...')와 큰따옴표("...")에서 같은 의미가 있습니다. 둘 간의 유일한 차이는 작은따옴표 안에서 " 를 이스케이핑할 필요가 없고 (하지만 \' 는 이스케이핑 시켜야 합니다), 그 역도 성립한다는 것입니다.