7.1. string — 常见的字符串操作

源代码: Lib/string.py


The string module contains a number of useful constants and classes, as well as some deprecated legacy functions that are also available as methods on strings. In addition, Python’s built-in string classes support the sequence type methods described in the Sequence Types — str, unicode, list, tuple, bytearray, buffer, xrange section, and also the string-specific methods described in the 字符串的方法 section. To output formatted strings use template strings or the % operator described in the String Formatting Operations section. Also, see the re module for string functions based on regular expressions.

7.1.1. 字符串常量

此模块中定义的常量为:

string.ascii_letters

下文所述 ascii_lowercaseascii_uppercase 常量的拼连。 该值不依赖于语言区域。

string.ascii_lowercase

小写字母 'abcdefghijklmnopqrstuvwxyz'。 该值不依赖于语言区域,不会发生改变。

string.ascii_uppercase

大写字母 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'。 该值不依赖于语言区域,不会发生改变。

string.digits

字符串 '0123456789'

string.hexdigits

字符串 '0123456789abcdefABCDEF'

string.letters

The concatenation of the strings lowercase and uppercase described below. The specific value is locale-dependent, and will be updated when locale.setlocale() is called.

string.lowercase

A string containing all the characters that are considered lowercase letters. On most systems this is the string 'abcdefghijklmnopqrstuvwxyz'. The specific value is locale-dependent, and will be updated when locale.setlocale() is called.

string.octdigits

字符串 '01234567'

string.punctuation

由在 C 语言区域中被视为标点符号的 ASCII 字符组成的字符串。

string.printable

String of characters which are considered printable. This is a combination of digits, letters, punctuation, and whitespace.

string.uppercase

A string containing all the characters that are considered uppercase letters. On most systems this is the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. The specific value is locale-dependent, and will be updated when locale.setlocale() is called.

string.whitespace

A string containing all characters that are considered whitespace. On most systems this includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

7.1.2. 自定义字符串格式化

2.6 新版功能.

The built-in str and unicode classes provide the ability to do complex variable substitutions and value formatting via the str.format() method described in PEP 3101. The Formatter class in the string module allows you to create and customize your own string formatting behaviors using the same implementation as the built-in format() method.

class string.Formatter

Formatter 类包含下列公有方法:

format(format_string, *args, **kwargs)

首要的 API 方法。 它接受一个格式字符串和任意一组位置和关键字参数。 它只是一个调用 vformat() 的包装器。

vformat(format_string, args, kwargs)

此函数执行实际的格式化操作。 它被公开为一个单独的函数,用于需要传入一个预定义字母作为参数,而不是使用 *args**kwargs 语法将字典解包为多个单独参数并重打包的情况。 vformat() 完成将格式字符串分解为字符数据和替换字段的工作。 它会调用下文所述的几种不同方法。

此外,Formatter 还定义了一些旨在被子类替换的方法:

parse(format_string)

循环遍历 format_string 并返回一个由可迭代对象组成的元组 (literal_text, field_name, format_spec, conversion)。 它会被 vformat() 用来将字符串分解为文本字面值或替换字段。

元组中的值在概念上表示一段字面文本加上一个替换字段。 如果没有字面文本(如果连续出现两个替换字段就会发生这种情况),则 literal_text 将是一个长度为零的字符串。 如果没有替换字段,则 field_name, format_specconversion 的值将为 None

get_field(field_name, args, kwargs)

给定 field_name 作为 parse() (见上文) 的返回值,将其转换为要格式化的对象。 返回一个元组 (obj, used_key)。 默认版本接受在 PEP 3101 所定义形式的字符串,例如 “0[name]” 或 “label.title”。 argskwargs 与传给 vformat() 的一样。 返回值 used_keyget_value()key 形参具有相同的含义。

get_value(key, args, kwargs)

提取给定的字段值。 key 参数将为整数或字符串。 如果是整数,它表示 args 中位置参数的索引;如果是字符串,它表示 kwargs 中的关键字参数名。

args 形参会被设为 vformat() 的位置参数列表,而 kwargs 形参会被设为由关键字参数组成的字典。

对于复合字段名称,仅会为字段名称的第一个组件调用这些函数;后续组件会通过普通属性和索引操作来进行处理。

因此举例来说,字段表达式 ‘0.name’ 将导致调用 get_value() 时附带 key 参数值 0。 在 get_value() 通过调用内置的 getattr() 函数返回后将会查找 name 属性。

如果索引或关键字引用了一个不存在的项,则将引发 IndexErrorKeyError

check_unused_args(used_args, args, kwargs)

在必要时实现对未使用参数进行检测。 此函数的参数是是格式字符串中实际引用的所有参数键的集合(整数表示位置参数,字符串表示名称参数),以及被传给 vformat 的 argskwargs 的引用。 未使用参数的集合可以根据这些形参计算出来。 如果检测失败则 check_unused_args() 应会引发一个异常。

format_field(value, format_spec)

format_field() 会简单地调用内置全局函数 format()。 提供该方法是为了让子类能够重载它。

convert_field(value, conversion)

使用给定的转换类型(来自 parse() 方法所返回的元组)来转换(由 get_field() 所返回的)值。 默认版本支持 ‘s’ (str), ‘r’ (repr) 和 ‘a’ (ascii) 等转换类型。

7.1.3. 格式字符串语法

The str.format() method and the Formatter class share the same syntax for format strings (although in the case of Formatter, subclasses can define their own format string syntax).

格式字符串包含有以花括号 {} 括起来的“替换字段”。 不在花括号之内的内容被视为字面文本,会不加修改地复制到输出中。 如果你需要在字面文本中包含花括号字符,可以通过重复来转义: {{ and }}

替换字段的语法如下:

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | integer]
attribute_name    ::=  identifier
element_index     ::=  integer | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s"
format_spec       ::=  <described in the next section>

用不太正式的术语来描述,替换字段开头可以用一个 field_name 指定要对值进行格式化并取代替换字符被插入到输出结果的对象。 field_name 之后有可选的 conversion 字段,它是一个感叹号 '!' 加一个 format_spec,并以一个冒号 ':' 打头。 这些指明了替换值的非默认格式。

另请参阅 格式规格迷你语言 一节。

field_name 本身以一个数字或关键字 arg_name 打头。 如果为数字,则它指向一个位置参数,而如果为关键字,则它指向一个命名关键字参数。 如果格式字符串中的数字 arg_names 为 0, 1, 2, … 的序列,它们可以全部省略(而非部分省略),数字 0, 1, 2, … 将会按顺序自动插入。 由于 arg_name 不使用引号分隔,因此无法在格式字符串中指定任意的字典键 (例如字符串 '10'':-]')。 arg_name 之后可以带上任意数量的索引或属性表达式。 '.name' 形式的表达式会使用 getattr() 选择命名属性,而 '[index]' 形式的表达式会使用 __getitem__() 执行索引查找。

在 2.7 版更改: The positional argument specifiers can be omitted for str.format() and unicode.format(), so '{} {}' is equivalent to '{0} {1}', u'{} {}' is equivalent to u'{0} {1}'.

一些简单的格式字符串示例

"First, thou shalt count to {0}"  # References first positional argument
"Bring me a {}"                   # Implicitly references the first positional argument
"From {} to {}"                   # Same as "From {0} to {1}"
"My quest is {name}"              # References keyword argument 'name'
"Weight in tons {0.weight}"       # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}"   # First element of keyword argument 'players'.

使用 conversion 字段在格式化之前进行类型强制转换。 通常,格式化值的工作由值本身的 __format__() 方法来完成。 但是,在某些情况下最好强制将类型格式化为一个字符串,覆盖其本身的格式化定义。 通过在调用 __format__() 之前将值转换为字符串,可以绕过正常的格式化逻辑。

Two conversion flags are currently supported: '!s' which calls str() on the value, and '!r' which calls repr().

几个例子:

"Harold's a clever {0!s}"        # Calls str() on the argument first
"Bring out the holy {name!r}"    # Calls repr() on the argument first

format_spec 字段包含值应如何呈现的规格描述,例如字段宽度、对齐、填充、小数精度等细节信息。 每种值类型可以定义自己的“格式化迷你语言”或对 format_spec 的解读方式。

大多数内置类型都支持同样的格式化迷你语言,具体描述见下一节。

format_spec 字段还可以在其内部包含嵌套的替换字段。 这些嵌套的替换字段可能包括字段名称、转换旗标和格式规格描述,但是不再允许更深层的嵌套。 format_spec 内部的替换字段会在解读 format_spec 字符串之前先被解读。 这将允许动态地指定特定值的格式。

请参阅 格式示例 一节查看相关示例。

7.1.3.1. 格式规格迷你语言

“Format specifications” are used within replacement fields contained within a format string to define how individual values are presented (see 格式字符串语法). They can also be passed directly to the built-in format() function. Each formattable type may define how the format specification is to be interpreted.

大多数内置类型都为格式规格实现了下列选项,不过某些格式化选项只被数值类型所支持。

A general convention is that an empty format string ("") produces the same result as if you had called str() on the value. A non-empty format string typically modifies the result.

标准格式说明符 的一般形式如下:

format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

If a valid align value is specified, it can be preceded by a fill character that can be any character and defaults to a space if omitted. It is not possible to use a literal curly brace (“{” or “}”) as the fill character when using the str.format() method. However, it is possible to insert a curly brace with a nested replacement field. This limitation doesn’t affect the format() function.

各种对齐选项的含义如下:

选项

含义

'<'

强制字段在可用空间内左对齐(这是大多数对象的默认值)。

'>'

强制字段在可用空间内右对齐(这是数字的默认值)。

'='

强制将填充放置在符号(如果有)之后但在数字之前。这用于以“+000000120”形式打印字段。此对齐选项仅对数字类型有效。当’0’紧接在字段宽度之前时,它成为默认值。

'^'

强制字段在可用空间内居中。

请注意,除非定义了最小字段宽度,否则字段宽度将始终与填充它的数据大小相同,因此在这种情况下,对齐选项没有意义。

sign 选项仅对数字类型有效,可以是以下之一:

选项

含义

'+'

表示标志应该用于正数和负数。

'-'

表示标志应仅用于负数(这是默认行为)。

space

表示应在正数上使用前导空格,在负数上使用减号。

The '#' option is only valid for integers, and only for binary, octal, or hexadecimal output. If present, it specifies that the output will be prefixed by '0b', '0o', or '0x', respectively.

',' 选项表示使用逗号作为千位分隔符。 对于感应区域设置的分隔符,请改用 'n' 整数表示类型。

在 2.7 版更改: 添加了 ',' 选项 (另请参阅 PEP 378)。

width is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content.

当未显式给出对齐方式时,在 width 字段前加一个零 ('0') 字段将为数字类型启用感知正负号的零填充。 这相当于设置 fill 字符为 '0'alignment 类型为 '='

precision 是一个十进制数字,表示对于以 'f' and 'F' 格式化的浮点数值要在小数点后显示多少个数位,或者对于以 'g''G' 格式化的浮点数值要在小数点前后共显示多少个数位。 对于非数字类型,该字段表示最大字段大小 —— 换句话说就是要使用多少个来自字段内容的字符。 对于整数值则不允许使用 precision

最后,type 确定了数据应如何呈现。

可用的字符串表示类型是:

类型

含义

's'

字符串格式。这是字符串的默认类型,可以省略。

None

's' 一样。

可用的整数表示类型是:

类型

含义

'b'

二进制格式。 输出以 2 为基数的数字。

'c'

字符。在打印之前将整数转换为相应的unicode字符。

'd'

十进制整数。 输出以 10 为基数的数字。

'o'

八进制格式。 输出以 8 为基数的数字。

'x'

Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9.

'X'

Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9.

'n'

数字。 这与 'd' 相似,不同之处在于它会使用当前区域设置来插入适当的数字分隔字符。

None

'd' 相同。

在上述的表示类型之外,整数还可以通过下列的浮点表示类型来格式化 (除了 'n'None)。 当这样做时,会在格式化之前使用 float() 将整数转换为浮点数。

浮点数和小数值可用的表示类型有:

类型

含义

'e'

指数表示。 以使用字母 ‘e’ 来标示指数的科学计数法打印数字。 默认的精度为 6

'E'

指数表示。 与 'e' 相似,不同之处在于它使用大写字母 ‘E’ 作为分隔字符。

'f'

定点表示。 将数字显示为一个定点数。 默认的精确度为 6

'F'

Fixed point notation. Same as 'f'.

'g'

常规格式。 对于给定的精度 p >= 1,这会将数值舍入到 p 位有效数字,再将结果以定点格式或科学计数法进行格式化,具体取决于其值的大小。

The precise rules are as follows: suppose that the result formatted with presentation type 'e' and precision p-1 would have exponent exp. Then if -4 <= exp < p, the number is formatted with presentation type 'f' and precision p-1-exp. Otherwise, the number is formatted with presentation type 'e' and precision p-1. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it.

正负无穷,正负零和 nan 会分别被格式化为 inf, -inf, 0, -0nan,无论精度如何设定。

精度 0 会被视为等同于精度 1。 默认精度为 6

'G'

常规格式。 类似于 'g',不同之处在于当数值非常大时会切换为 'E'。 无穷与 NaN 也会表示为大写形式。

'n'

数字。 这与 'g' 相似,不同之处在于它会使用当前区域设置来插入适当的数字分隔字符。

'%'

百分比。 将数字乘以 100 并显示为定点 ('f') 格式,后面带一个百分号。

None

The same as 'g'.

7.1.3.2. 格式示例

本节包含 str.format() 语法的示例以及与旧式 % 格式化的比较。

该语法在大多数情况下与旧式的 % 格式化类似,只是增加了 {}: 来取代 %。 例如,,'%03.2f' 可以被改写为 '{:03.2f}'

新的格式语法还支持新增的不同选项,将在以下示例中说明。

按位置访问参数:

>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{}, {}, {}'.format('a', 'b', 'c')  # 2.7+ only
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc')      # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad')   # arguments' indices can be repeated
'abracadabra'

按名称访问参数:

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'
>>> coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
>>> 'Coordinates: {latitude}, {longitude}'.format(**coord)
'Coordinates: 37.24N, -115.81W'

访问参数的属性:

>>> c = 3-5j
>>> ('The complex number {0} is formed from the real part {0.real} '
...  'and the imaginary part {0.imag}.').format(c)
'The complex number (3-5j) is formed from the real part 3.0 and the imaginary part -5.0.'
>>> class Point(object):
...     def __init__(self, x, y):
...         self.x, self.y = x, y
...     def __str__(self):
...         return 'Point({self.x}, {self.y})'.format(self=self)
...
>>> str(Point(4, 2))
'Point(4, 2)'

访问参数的项:

>>> coord = (3, 5)
>>> 'X: {0[0]};  Y: {0[1]}'.format(coord)
'X: 3;  Y: 5'

替代 %s%r:

>>> "repr() shows quotes: {!r}; str() doesn't: {!s}".format('test1', 'test2')
"repr() shows quotes: 'test1'; str() doesn't: test2"

对齐文本以及指定宽度:

>>> '{:<30}'.format('left aligned')
'left aligned                  '
>>> '{:>30}'.format('right aligned')
'                 right aligned'
>>> '{:^30}'.format('centered')
'           centered           '
>>> '{:*^30}'.format('centered')  # use '*' as a fill char
'***********centered***********'

替代 %+f, %-f% f 以及指定正负号:

>>> '{:+f}; {:+f}'.format(3.14, -3.14)  # show it always
'+3.140000; -3.140000'
>>> '{: f}; {: f}'.format(3.14, -3.14)  # show a space for positive numbers
' 3.140000; -3.140000'
>>> '{:-f}; {:-f}'.format(3.14, -3.14)  # show only the minus -- same as '{:f}; {:f}'
'3.140000; -3.140000'

替代 %x%o 以及转换基于不同进位制的值:

>>> # format also supports binary numbers
>>> "int: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(42)
'int: 42;  hex: 2a;  oct: 52;  bin: 101010'
>>> # with 0x, 0o, or 0b as prefix:
>>> "int: {0:d};  hex: {0:#x};  oct: {0:#o};  bin: {0:#b}".format(42)
'int: 42;  hex: 0x2a;  oct: 0o52;  bin: 0b101010'

使用逗号作为千位分隔符:

>>> '{:,}'.format(1234567890)
'1,234,567,890'

表示为百分数:

>>> points = 19.5
>>> total = 22
>>> 'Correct answers: {:.2%}'.format(points/total)
'Correct answers: 88.64%'

使用特定类型的专属格式化:

>>> import datetime
>>> d = datetime.datetime(2010, 7, 4, 12, 15, 58)
>>> '{:%Y-%m-%d %H:%M:%S}'.format(d)
'2010-07-04 12:15:58'

嵌套参数以及更复杂的示例:

>>> for align, text in zip('<^>', ['left', 'center', 'right']):
...     '{0:{fill}{align}16}'.format(text, fill=align, align=align)
...
'left<<<<<<<<<<<<'
'^^^^^center^^^^^'
'>>>>>>>>>>>right'
>>>
>>> octets = [192, 168, 0, 1]
>>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets)
'C0A80001'
>>> int(_, 16)
3232235521
>>>
>>> width = 5
>>> for num in range(5,12):
...     for base in 'dXob':
...         print '{0:{width}{base}}'.format(num, base=base, width=width),
...     print
...
    5     5     5   101
    6     6     6   110
    7     7     7   111
    8     8    10  1000
    9     9    11  1001
   10     A    12  1010
   11     B    13  1011

7.1.4. 模板字符串

2.4 新版功能.

Templates provide simpler string substitutions as described in PEP 292. Instead of the normal %-based substitutions, Templates support $-based substitutions, using the following rules:

  • $$ 为转义符号;它会被替换为单个的 $

  • $identifier names a substitution placeholder matching a mapping key of "identifier". By default, "identifier" must spell a Python identifier. The first non-identifier character after the $ character terminates this placeholder specification.

  • ${identifier} 等价于 $identifier。 当占位符之后紧跟着有效的但又不是占位符一部分的标识符字符时需要使用,例如 "${noun}ification"

在字符串的其他位置出现 $ 将导致引发 ValueError

string 模块提供了实现这些规则的 Template 类。 Template 有下列方法:

class string.Template(template)

该构造器接受一个参数作为模板字符串。

substitute(mapping[, **kws])

Performs the template substitution, returning a new string. mapping is any dictionary-like object with keys that match the placeholders in the template. Alternatively, you can provide keyword arguments, where the keywords are the placeholders. When both mapping and kws are given and there are duplicates, the placeholders from kws take precedence.

safe_substitute(mapping[, **kws])

Like substitute(), except that if placeholders are missing from mapping and kws, instead of raising a KeyError exception, the original placeholder will appear in the resulting string intact. Also, unlike with substitute(), any other appearances of the $ will simply return $ instead of raising ValueError.

此方法被认为“安全”,因为虽然仍有可能发生其他异常,但它总是尝试返回可用的字符串而不是引发一个异常。 从另一方面来说,safe_substitute() 也可能根本算不上安全,因为它将静默地忽略错误格式的模板,例如包含多余的分隔符、不成对的花括号或不是合法 Python 标识符的占位符等等。

Template 的实例还提供一个公有数据属性:

template

这是作为构造器的 template 参数被传入的对象。 一般来说,你不应该修改它,但并不强制要求只读访问。

以下是一个如何使用模版的示例:

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
...
ValueError: Invalid placeholder in string: line 1, col 11
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
...
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

进阶用法:你可以派生 Template 的子类来自定义占位符语法、分隔符,或用于解析模板字符串的整个正则表达式。 为此目的,你可以重载这些类属性:

  • delimiter – This is the literal string describing a placeholder introducing delimiter. The default value is $. Note that this should not be a regular expression, as the implementation will call re.escape() on this string as needed.

  • idpattern – This is the regular expression describing the pattern for non-braced placeholders (the braces will be added automatically as appropriate). The default value is the regular expression [_a-z][_a-z0-9]*.

作为另一种选项,你可以通过重载类属性 pattern 来提供整个正则表达式模式。 如果你这样做,该值必须为一个具有四个命名捕获组的正则表达式对象。 这些捕获组对应于上面已经给出的规则,以及无效占位符的规则:

  • escaped – 这个组匹配转义序列,在默认模式中即 $$

  • named – 这个组匹配不带花括号的占位符名称;它不应当包含捕获组中的分隔符。

  • braced – 这个组匹配带有花括号的占位符名称;它不应当包含捕获组中的分隔符或者花括号。

  • invalid – 这个组匹配任何其他分隔符模式(通常为单个分隔符),并且它应当出现在正则表达式的末尾。

7.1.5. String functions

The following functions are available to operate on string and Unicode objects. They are not available as string methods.

string.capwords(s[, sep])

使用 str.split() 将参数拆分为单词,使用 str.capitalize() 将单词转为大写形式,使用 str.join() 将大写的单词进行拼接。 如果可选的第二个参数 sep 被省略或为 None,则连续的空白字符会被替换为单个空格符并且开头和末尾的空白字符会被移除,否则 sep 会被用来拆分和拼接单词。

string.maketrans(from, to)

Return a translation table suitable for passing to translate(), that will map each character in from into the character at the same position in to; from and to must have the same length.

注解

Don’t use strings derived from lowercase and uppercase as arguments; in some locales, these don’t have the same length. For case conversions, always use str.lower() and str.upper().

7.1.6. Deprecated string functions

The following list of functions are also defined as methods of string and Unicode objects; see section 字符串的方法 for more information on those. You should consider these functions as deprecated, although they will not be removed until Python 3. The functions defined in this module are:

string.atof(s)

2.0 版后已移除: Use the float() built-in function.

Convert a string to a floating point number. The string must have the standard syntax for a floating point literal in Python, optionally preceded by a sign (+ or -). Note that this behaves identical to the built-in function float() when passed a string.

注解

When passing in a string, values for NaN and Infinity may be returned, depending on the underlying C library. The specific set of strings accepted which cause these values to be returned depends entirely on the C library and is known to vary.

string.atoi(s[, base])

2.0 版后已移除: Use the int() built-in function.

Convert string s to an integer in the given base. The string must consist of one or more digits, optionally preceded by a sign (+ or -). The base defaults to 10. If it is 0, a default base is chosen depending on the leading characters of the string (after stripping the sign): 0x or 0X means 16, 0 means 8, anything else means 10. If base is 16, a leading 0x or 0X is always accepted, though not required. This behaves identically to the built-in function int() when passed a string. (Also note: for a more flexible interpretation of numeric literals, use the built-in function eval().)

string.atol(s[, base])

2.0 版后已移除: Use the long() built-in function.

Convert string s to a long integer in the given base. The string must consist of one or more digits, optionally preceded by a sign (+ or -). The base argument has the same meaning as for atoi(). A trailing l or L is not allowed, except if the base is 0. Note that when invoked without base or with base set to 10, this behaves identical to the built-in function long() when passed a string.

string.capitalize(word)

Return a copy of word with only its first character capitalized.

string.expandtabs(s[, tabsize])

Expand tabs in a string replacing them by one or more spaces, depending on the current column and the given tab size. The column number is reset to zero after each newline occurring in the string. This doesn’t understand other non-printing characters or escape sequences. The tab size defaults to 8.

string.find(s, sub[, start[, end]])

Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.

string.rfind(s, sub[, start[, end]])

Like find() but find the highest index.

string.index(s, sub[, start[, end]])

Like find() but raise ValueError when the substring is not found.

string.rindex(s, sub[, start[, end]])

Like rfind() but raise ValueError when the substring is not found.

string.count(s, sub[, start[, end]])

Return the number of (non-overlapping) occurrences of substring sub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.

string.lower(s)

Return a copy of s, but with upper case letters converted to lower case.

string.split(s[, sep[, maxsplit]])

Return a list of the words of the string s. If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed). If the second argument sep is present and not None, it specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string. If maxsplit is given, at most maxsplit number of splits occur, and the remainder of the string is returned as the final element of the list (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

The behavior of split on an empty string depends on the value of sep. If sep is not specified, or specified as None, the result will be an empty list. If sep is specified as any string, the result will be a list containing one element which is an empty string.

string.rsplit(s[, sep[, maxsplit]])

Return a list of the words of the string s, scanning s from the end. To all intents and purposes, the resulting list of words is the same as returned by split(), except when the optional third argument maxsplit is explicitly specified and nonzero. If maxsplit is given, at most maxsplit number of splits – the rightmost ones – occur, and the remainder of the string is returned as the first element of the list (thus, the list will have at most maxsplit+1 elements).

2.4 新版功能.

string.splitfields(s[, sep[, maxsplit]])

This function behaves identically to split(). (In the past, split() was only used with one argument, while splitfields() was only used with two arguments.)

string.join(words[, sep])

Concatenate a list or tuple of words with intervening occurrences of sep. The default value for sep is a single space character. It is always true that string.join(string.split(s, sep), sep) equals s.

string.joinfields(words[, sep])

This function behaves identically to join(). (In the past, join() was only used with one argument, while joinfields() was only used with two arguments.) Note that there is no joinfields() method on string objects; use the join() method instead.

string.lstrip(s[, chars])

Return a copy of the string with leading characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the beginning of the string this method is called on.

在 2.2.3 版更改: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.

string.rstrip(s[, chars])

Return a copy of the string with trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the end of the string this method is called on.

在 2.2.3 版更改: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.

string.strip(s[, chars])

Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.

在 2.2.3 版更改: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.

string.swapcase(s)

Return a copy of s, but with lower case letters converted to upper case and vice versa.

string.translate(s, table[, deletechars])

Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.

string.upper(s)

Return a copy of s, but with lower case letters converted to upper case.

string.ljust(s, width[, fillchar])
string.rjust(s, width[, fillchar])
string.center(s, width[, fillchar])

These functions respectively left-justify, right-justify and center a string in a field of given width. They return a string that is at least width characters wide, created by padding the string s with the character fillchar (default is a space) until the given width on the right, left or both sides. The string is never truncated.

string.zfill(s, width)

Pad a numeric string s on the left with zero digits until the given width is reached. Strings starting with a sign are handled correctly.

string.replace(s, old, new[, maxreplace])

Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.