15.2. io — 处理流的核心工具

2.6 新版功能.

The io module provides the Python interfaces to stream handling. Under Python 2.x, this is proposed as an alternative to the built-in file object, but in Python 3.x it is the default interface to access files and streams.

注解

Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Furthermore, those two types are not interchangeable in the io APIs.

At the top of the I/O hierarchy is the abstract base class IOBase. It defines the basic interface to a stream. Note, however, that there is no separation between reading and writing to streams; implementations are allowed to raise an IOError if they do not support a given operation.

Extending IOBase is RawIOBase which deals simply with the reading and writing of raw bytes to a stream. FileIO subclasses RawIOBase to provide an interface to files in the machine’s file system.

BufferedIOBase deals with buffering on a raw byte stream (RawIOBase). Its subclasses, BufferedWriter, BufferedReader, and BufferedRWPair buffer streams that are readable, writable, and both readable and writable. BufferedRandom provides a buffered interface to random access streams. BytesIO is a simple stream of in-memory bytes.

Another IOBase subclass, TextIOBase, deals with streams whose bytes represent text, and handles encoding and decoding from and to unicode strings. TextIOWrapper, which extends it, is a buffered text interface to a buffered raw stream (BufferedIOBase). Finally, StringIO is an in-memory stream for unicode text.

Argument names are not part of the specification, and only the arguments of open() are intended to be used as keyword arguments.

15.2.1. Module Interface

io.DEFAULT_BUFFER_SIZE

An int containing the default buffer size used by the module’s buffered I/O classes. open() uses the file’s blksize (as obtained by os.stat()) if possible.

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)

Open file and return a corresponding stream. If the file cannot be opened, an IOError is raised.

file is either a string giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are:

Character

Meaning

'r'

open for reading (default)

'w'

open for writing, truncating the file first

'a'

open for writing, appending to the end of the file if it exists

'b'

binary mode

't'

text mode (default)

'+'

open a disk file for updating (reading and writing)

'U'

universal newlines mode (for backwards compatibility; should not be used in new code)

The default mode is 'rt' (open for reading text). For binary random access, the mode 'w+b' opens and truncates the file to 0 bytes, while 'r+b' opens the file without truncation.

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

  • Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.

  • “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. When writing, 'xmlcharrefreplace' (replace with the appropriate XML character reference) or 'backslashreplace' (replace with backslashed escape sequences) can be used. Any other error handling name that has been registered with codecs.register_error() is also valid.

newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

If closefd is False and a file descriptor rather than a filename was given, the underlying file descriptor will be kept open when the file is closed. If a filename is given closefd has no effect and must be True (the default).

The type of file object returned by the open() function depends on the mode. When open() is used to open a file in a text mode ('w', 'r', 'wt', 'rt', etc.), it returns a subclass of TextIOBase (specifically TextIOWrapper). When used to open a file in a binary mode with buffering, the returned class is a subclass of BufferedIOBase. The exact class varies: in read binary mode, it returns a BufferedReader; in write binary and append binary modes, it returns a BufferedWriter, and in read/write mode, it returns a BufferedRandom. When buffering is disabled, the raw stream, a subclass of RawIOBase, FileIO, is returned.

It is also possible to use an unicode or bytes string as a file for both reading and writing. For unicode strings StringIO can be used like a file opened in text mode, and for bytes a BytesIO can be used like a file opened in a binary mode.

exception io.BlockingIOError

Error raised when blocking would occur on a non-blocking stream. It inherits IOError.

In addition to those of IOError, BlockingIOError has one attribute:

characters_written

An integer containing the number of characters written to the stream before it blocked.

exception io.UnsupportedOperation

An exception inheriting IOError and ValueError that is raised when an unsupported operation is called on a stream.

15.2.2. I/O 基类

class io.IOBase

所有 I/O 类的抽象基类,作用于字节流。没有公共构造函数。

此类为许多方法提供了空的抽象实现,派生类可以有选择地重写。默认实现无法读取、写入或查找的文件。

Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface. Also, implementations may raise an IOError when operations they do not support are called.

The basic type used for binary data read from or written to a file is bytes (also known as str). Method arguments may also be bytearray or memoryview of arrays of bytes. In some cases, such as readinto(), a writable object such as bytearray is required. Text I/O classes work with unicode data.

在 2.7 版更改: Implementations should support memoryview arguments.

Note that calling any method (even inquiries) on a closed stream is undefined. Implementations may raise IOError in this case.

IOBase (and its subclasses) support the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding unicode strings). See readline() below.

IOBase is also a context manager and therefore supports the with statement. In this example, file is closed after the with statement’s suite is finished—even if an exception occurs:

with io.open('spam.txt', 'w') as file:
    file.write(u'Spam and eggs!')

IOBase 提供以下数据属性和方法:

close()

刷新并关闭此流。如果文件已经关闭,则此方法无效。文件关闭后,对文件的任何操作(例如读取或写入)都会引发 ValueError

为方便起见,允许多次调用此方法。但是,只有第一个调用才会生效。

closed

True if the stream is closed.

fileno()

Return the underlying file descriptor (an integer) of the stream if it exists. An IOError is raised if the IO object does not use a file descriptor.

flush()

刷新流的写入缓冲区(如果适用)。这对只读和非阻塞流不起作用。

isatty()

如果流是交互式的(即连接到终端/tty设备),则返回 True

readable()

Return True if the stream can be read from. If False, read() will raise IOError.

readline(limit=-1)

Read and return one line from the stream. If limit is specified, at most limit bytes will be read.

The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.

readlines(hint=-1)

从流中读取并返回包含多行的列表。 可以指定 hint 来控制要读取的行数:如果(以字节/字符数表示的)所有行的总大小超出了 hint 则将不会读取更多的行。

请注意使用 for line in file: ... 就足够对文件对象进行迭代了,可以不必调用 file.readlines()

seek(offset, whence=SEEK_SET)

将流位置修改到给定的字节 offsetoffset 将相对于由 whence 指定的位置进行解析。 whence 的默认值为 SEEK_SETwhence 的可用值有:

  • SEEK_SET0 – 流的开头(默认值);offset 应为零或正值

  • SEEK_CUR or 1 – 当前流位置;offset 可以为负值

  • SEEK_END or 2 – 流的末尾;offset 通常为负值

返回新的绝对位置。

2.7 新版功能: The SEEK_* constants

seekable()

Return True if the stream supports random access. If False, seek(), tell() and truncate() will raise IOError.

tell()

返回当前流的位置。

truncate(size=None)

Resize the stream to the given size in bytes (or the current position if size is not specified). The current stream position isn’t changed. This resizing can extend or reduce the current file size. In case of extension, the contents of the new file area depend on the platform (on most systems, additional bytes are zero-filled, on Windows they’re undetermined). The new file size is returned.

writable()

Return True if the stream supports writing. If False, write() and truncate() will raise IOError.

writelines(lines)

将行列表写入到流。 不会添加行分隔符,因此通常所提供的每一行都带有末尾行分隔符。

__del__()

为对象销毁进行准备。 IOBase 提供了此方法的默认实现,该实现会调用实例的 close() 方法。

class io.RawIOBase

原始二进制 I/O 的基类。 它继承自 IOBase。 没有公共构造器。

原始二进制 I/O 通常提供对下层 OS 设备或 API 的低层级访问,而不尝试将其封装到高层级的基元中(这是留给缓冲 I/O 和 Text I/O 的,将在下文中描述)。

In addition to the attributes and methods from IOBase, RawIOBase provides the following methods:

read(n=-1)

Read up to n bytes from the object and return them. As a convenience, if n is unspecified or -1, readall() is called. Otherwise, only one system call is ever made. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes.

If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, None is returned.

readall()

从流中读取并返回所有字节直到 EOF,如有必要将对流执行多次调用。

readinto(b)

Read up to len(b) bytes into b, and return the number of bytes read. The object b should be a pre-allocated, writable array of bytes, either bytearray or memoryview. If the object is in non-blocking mode and no bytes are available, None is returned.

write(b)

Write b to the underlying raw stream, and return the number of bytes written. The object b should be an array of bytes, either bytes, bytearray, or memoryview. The return value can be less than len(b), depending on specifics of the underlying raw stream, and especially if it is in non-blocking mode. None is returned if the raw stream is set not to block and no single byte could be readily written to it. The caller may release or mutate b after this method returns, so the implementation should only access b during the method call.

class io.BufferedIOBase

支持某种缓冲的二进制流的基类。 它继承自 IOBase。 没有公共构造器。

RawIOBase 的主要差别在于 read(), readinto()write() 等方法将(分别)尝试按照要求读取尽可能多的输入或是耗尽所有给定的输出,其代价是可能会执行一次以上的系统调用。

除此之外,那些方法还可能引发 BlockingIOError,如果下层的原始数据流处于非阻塞模式并且无法接受或给出足够数据的话;不同于对应的 RawIOBase 方法,它们将永远不会返回 None

并且,read() 方法也没有转向 readinto() 的默认实现。

典型的 BufferedIOBase 实现不应当继承自 RawIOBase 实现,而要包装一个该实现,正如 BufferedWriterBufferedReader 所做的那样。

BufferedIOBaseIOBase 的现有成员以外还提供或重载了下列方法和属性:

raw

BufferedIOBase 处理的下层原始流 (RawIOBase 的实例)。 它不是 BufferedIOBase API 的组成部分并且不存在于某些实现中。

detach()

从缓冲区分离出下层原始流并将其返回。

在原始流被分离之后,缓冲区将处于不可用的状态。

某些缓冲区例如 BytesIO 并无可从此方法返回的单独原始流的概念。 它们将会引发 UnsupportedOperation

2.7 新版功能.

read(n=-1)

Read and return up to n bytes. If the argument is omitted, None, or negative, data is read and returned until EOF is reached. An empty bytes object is returned if the stream is already at EOF.

如果此参数为正值,并且下层原始流不可交互,则可能发起多个原始读取以满足字节计数(直至先遇到 EOF)。 但对于可交互原始流,则将至多发起一个原始读取,并且简短的结果并不意味着已到达 EOF。

BlockingIOError 会在下层原始流不处于阻塞模式,并且当前没有可用数据时被引发。

read1(n=-1)

Read and return up to n bytes, with at most one call to the underlying raw stream’s read() method. This can be useful if you are implementing your own buffering on top of a BufferedIOBase object.

readinto(b)

Read up to len(b) bytes into b, and return the number of bytes read. The object b should be a pre-allocated, writable array of bytes, either bytearray or memoryview.

Like read(), multiple reads may be issued to the underlying raw stream, unless the latter is ‘interactive’.

BlockingIOError 会在下层原始流不处于阻塞模式,并且当前没有可用数据时被引发。

write(b)

Write b, and return the number of bytes written (always equal to len(b), since if the write fails an IOError will be raised). The object b should be an array of bytes, either bytes, bytearray, or memoryview. Depending on the actual implementation, these bytes may be readily written to the underlying stream, or held in a buffer for performance and latency reasons.

当处于非阻塞模式时,如果需要将数据写入原始流但它无法在不阻塞的情况下接受所有数据则将引发 BlockingIOError

调用者可能会在此方法返回后释放或改变 b,因此该实现应当仅在方法调用期间访问 b

15.2.3. 原始文件 I/O

class io.FileIO(name, mode='r', closefd=True)

FileIO 代表在 OS 层级上包含文件的字节数据。 它实现了 RawIOBase 接口(因而也实现了 IOBase 接口)。

name 可以是以下两项之一:

  • a string representing the path to the file which will be opened;

  • an integer representing the number of an existing OS-level file descriptor to which the resulting FileIO object will give access.

The mode can be 'r', 'w' or 'a' for reading (default), writing, or appending. The file will be created if it doesn’t exist when opened for writing or appending; it will be truncated when opened for writing. Add a '+' to the mode to allow simultaneous reading and writing.

该类的 read() (当附带正值参数调用时), readinto()write() 方法将只执行一次系统调用。

In addition to the attributes and methods from IOBase and RawIOBase, FileIO provides the following data attributes and methods:

mode

构造函数中给定的模式。

name

文件名。当构造函数中没有给定名称时,这是文件的文件描述符。

15.2.4. 缓冲流

相比原始 I/O,缓冲 I/O 流提供了针对 I/O 设备的更高层级接口.

class io.BytesIO([initial_bytes])

A stream implementation using an in-memory bytes buffer. It inherits BufferedIOBase.

The optional argument initial_bytes is a bytes object that contains initial data.

BytesIO 在继承自 BufferedIOBaseIOBase 的成员以外还提供或重载了下列方法:

getvalue()

Return bytes containing the entire contents of the buffer.

read1()

In BytesIO, this is the same as read().

class io.BufferedReader(raw, buffer_size=DEFAULT_BUFFER_SIZE)

一个提供对可读的序列型 RawIOBase 对象更高层级访问的缓冲区。 它继承自 BufferedIOBase。 当从此对象读取数据时,可能会从下层原始流请求更大量的数据,并存放到内部缓冲区中。 接下来可以在后续读取时直接返回缓冲数据。

根据给定的可读 raw 流和 buffer_size 创建 BufferedReader 的构造器。 如果省略 buffer_size,则会使用 DEFAULT_BUFFER_SIZE

BufferedReader 在继承自 BufferedIOBaseIOBase 的成员以外还提供或重载了下列方法:

peek([n])

从流返回字节数据而不前移位置。 完成此调用将至多读取一次原始流。 返回的字节数量可能少于或多于请求的数量。

read([n])

Read and return n bytes, or if n is not given or negative, until EOF or if the read call would block in non-blocking mode.

read1(n)

Read and return up to n bytes with only one call on the raw stream. If at least one byte is buffered, only buffered bytes are returned. Otherwise, one raw stream read call is made.

class io.BufferedWriter(raw, buffer_size=DEFAULT_BUFFER_SIZE)

A buffer providing higher-level access to a writeable, sequential RawIOBase object. It inherits BufferedIOBase. When writing to this object, data is normally held into an internal buffer. The buffer will be written out to the underlying RawIOBase object under various conditions, including:

  • 当缓冲区对于所有挂起数据而言太小时;

  • flush() 被调用时

  • 当(为 BufferedRandom 对象)请求 seek() 时;

  • BufferedWriter 对象被关闭或销毁时。

该构造器会为给定的可写 raw 流创建一个 BufferedWriter。 如果未给定 buffer_size,则使用默认的 DEFAULT_BUFFER_SIZE

A third argument, max_buffer_size, is supported, but unused and deprecated.

BufferedWriter 在继承自 BufferedIOBaseIOBase 的成员以外还提供或重载了下列方法:

flush()

将缓冲区中保存的字节数据强制放入原始流。 如果原始流发生阻塞则应当引发 BlockingIOError

write(b)

Write b, and return the number of bytes written. The object b should be an array of bytes, either bytes, bytearray, or memoryview. When in non-blocking mode, a BlockingIOError is raised if the buffer needs to be written out but the raw stream blocks.

class io.BufferedRandom(raw, buffer_size=DEFAULT_BUFFER_SIZE)

随机访问流的带缓冲的接口。 它继承自 BufferedReaderBufferedWriter,并进一步支持 seek()tell() 功能。

该构造器会为在第一个参数中给定的可查找原始流创建一个读取器和定稿器。 如果省略 buffer_size 则使用默认的 DEFAULT_BUFFER_SIZE

A third argument, max_buffer_size, is supported, but unused and deprecated.

BufferedRandom 能做到 BufferedReaderBufferedWriter 所能做的任何事。

class io.BufferedRWPair(reader, writer, buffer_size=DEFAULT_BUFFER_SIZE)

一个带缓冲的 I/O 对象,它将两个单向 RawIOBase 对象 – 一个可读,另一个可写 – 组合为单个双向端点。 它继承自 BufferedIOBase

readerwriter 分别是可读和可写的 RawIOBase 对象。 如果省略 buffer_size 则使用默认的 DEFAULT_BUFFER_SIZE

A fourth argument, max_buffer_size, is supported, but unused and deprecated.

BufferedRWPair 实现了 BufferedIOBase 的所有方法,但 detach() 除外,调用该方法将引发 UnsupportedOperation

警告

BufferedRWPair 不会尝试同步访问其下层的原始流。 你不应当将传给它与读取器和写入器相同的对象;而要改用 BufferedRandom

15.2.5. 文本 I/O

class io.TextIOBase

Base class for text streams. This class provides a unicode character and line based interface to stream I/O. There is no readinto() method because Python’s unicode strings are immutable. It inherits IOBase. There is no public constructor.

TextIOBase 在来自 IOBase 的成员以外还提供或重载了以下数据属性和方法:

encoding

用于将流的字节串解码为字符串以及将字符串编码为字节串的编码格式名称。

errors

解码器或编码器的错误设置。

newlines

一个字符串、字符串元组或者 None,表示目前已经转写的新行。 根据具体实现和初始构造器旗标的不同,此属性或许会不可用。

buffer

The underlying binary buffer (a BufferedIOBase instance) that TextIOBase deals with. This is not part of the TextIOBase API and may not exist on some implementations.

detach()

TextIOBase 分离出下层二进制缓冲区并将其返回。

在下层缓冲区被分离后,TextIOBase 将处于不可用的状态。

Some TextIOBase implementations, like StringIO, may not have the concept of an underlying buffer and calling this method will raise UnsupportedOperation.

2.7 新版功能.

read(n=-1)

Read and return at most n characters from the stream as a single unicode. If n is negative or None, reads until EOF.

readline(limit=-1)

Read until newline or EOF and return a single unicode. If the stream is already at EOF, an empty string is returned.

If limit is specified, at most limit characters will be read.

seek(offset, whence=SEEK_SET)

将流位置改为给定的偏移位置 offset。 具体行为取决于 whence 形参。 whence 的默认值为 SEEK_SET

  • SEEK_SET0: 从流的开始位置起查找(默认值);offset 必须为 TextIOBase.tell() 所返回的数值或为零。 任何其他 offset 值都将导致未定义的行为。

  • SEEK_CUR1: “查找” 到当前位置;offset 必须为零,表示无操作(所有其他值均不受支持)。

  • SEEK_END2: 查找到流的末尾;offset 必须为零(所有其他值均不受支持)。

以数字形式返回新的绝对位置。

2.7 新版功能: SEEK_* 常量.

tell()

以不透明数字形式返回当前流的位置。 该数字通常并不代表下层二进制存储中对应的字节数。

write(s)

Write the unicode string s to the stream and return the number of characters written.

class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False)

一个基于 BufferedIOBase 二进制流的缓冲文本流。 它继承自 TextIOBase

encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to locale.getpreferredencoding().

errors is an optional string that specifies how encoding and decoding errors are to be handled. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. When writing, 'xmlcharrefreplace' (replace with the appropriate XML character reference) or 'backslashreplace' (replace with backslashed escape sequences) can be used. Any other error handling name that has been registered with codecs.register_error() is also valid.

newline 控制行结束符处理方式。 它可以为 None, '', '\n', '\r''\r\n'。 其工作原理如下:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

如果 line_bufferingTrue,则当一个写入调用包含换行符或回车时将会应用 flush()

TextIOWrapper provides one attribute in addition to those of TextIOBase and its parents:

line_buffering

是否启用行缓冲。

class io.StringIO(initial_value=u'', newline=u'\n')

An in-memory stream for unicode text. It inherits TextIOWrapper.

缓冲区的初始值可通过提供 initial_value 来设置。 如果启用了行结束符转写,换行将以 write() 所用的方式被编码。 数据流位置将被设为缓冲区的开头。

newline 参数的规则与 TextIOWrapper 所用的一致。 默认规则是仅将 \n 字符视为行结束符并且不执行换行符转写。 如果 newline 设为 None,在所有平台上换行符都将被写入为 \n,但当读取时仍然会执行通用换行编码格式。

StringIO provides this method in addition to those from TextIOWrapper and its parents:

getvalue()

Return a unicode containing the entire contents of the buffer at any time before the StringIO object’s close() method is called. Newlines are decoded as if by read(), although the stream position is not changed.

用法示例:

import io

output = io.StringIO()
output.write(u'First line.\n')
output.write(u'Second line.\n')

# Retrieve file contents -- this will be
# u'First line.\nSecond line.\n'
contents = output.getvalue()

# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
class io.IncrementalNewlineDecoder

用于在 universal newlines 模式下解码换行符的辅助编解码器。 它继承自 codecs.IncrementalDecoder

15.2.6. Advanced topics

Here we will discuss several advanced topics pertaining to the concrete I/O implementations described above.

15.2.6.1. 性能

15.2.6.1.1. 二进制 I/O

By reading and writing only large chunks of data even when the user asks for a single byte, buffered I/O is designed to hide any inefficiency in calling and executing the operating system’s unbuffered I/O routines. The gain will vary very much depending on the OS and the kind of I/O which is performed (for example, on some contemporary OSes such as Linux, unbuffered disk I/O can be as fast as buffered I/O). The bottom line, however, is that buffered I/O will offer you predictable performance regardless of the platform and the backing device. Therefore, it is most always preferable to use buffered I/O rather than unbuffered I/O.

15.2.6.1.2. 文本 I/O

Text I/O over a binary storage (such as a file) is significantly slower than binary I/O over the same storage, because it implies conversions from unicode to binary data using a character codec. This can become noticeable if you handle huge amounts of text data (for example very large log files). Also, TextIOWrapper.tell() and TextIOWrapper.seek() are both quite slow due to the reconstruction algorithm used.

StringIO, however, is a native in-memory unicode container and will exhibit similar speed to BytesIO.

15.2.6.2. 多线程

FileIO objects are thread-safe to the extent that the operating system calls (such as read(2) under Unix) they are wrapping are thread-safe too.

二进制缓冲对象(例如 BufferedReader, BufferedWriter, BufferedRandomBufferedRWPair)使用锁来保护其内部结构;因此,可以安全地一次从多个线程中调用它们。

TextIOWrapper 对象不再是线程安全的。

15.2.6.3. 可重入性

Binary buffered objects (instances of BufferedReader, BufferedWriter, BufferedRandom and BufferedRWPair) are not reentrant. While reentrant calls will not happen in normal situations, they can arise if you are doing I/O in a signal handler. If it is attempted to enter a buffered object again while already being accessed from the same thread, then a RuntimeError is raised.

上面的内容隐含地扩展到文本文件,因为 open() 函数会把缓冲对象包装在 TextIOWrapper 中。这包括标准流,因此也会影响内置函数 print()