io
— 處理資料串流的核心工具¶
原始碼:Lib/io.py
總覽¶
io
模組替 Python 提供處理各種類型 IO 的主要工具。有三種主要的 IO 類型: 文字 I/O (text I/O)、二進位 I/O (binary I/O) 以及原始 I/O (raw I/O)。這些均為泛用 (generic) 類型,且每種類型都可以使用各式後端儲存 (backing store)。任一種屬於這些類型的具體物件稱為 file object。其它常見的名詞還有資料串流 (stream) 以及類檔案物件 (file-like objects)。
無論其類型為何,每個具體的資料串流物件也將具有各種能力:唯讀的、只接受寫入的、或者讀寫兼具的。它還允許任意的隨機存取(向前或向後尋找至任意位置),或者只能依順序存取(例如 socket 或 pipe 的情形下)。
所有的資料串流都會謹慎處理你所提供的資料的型別。舉例來說,提供一個 str
物件給二進位資料串流的 write()
方法將會引發 TypeError
。同樣地,若提供一個 bytes
物件給文字資料串流的 write()
方法,也會引發同樣的錯誤。
文字 I/O¶
文字 I/O 要求和產出 str
物件。這意味著每當後端儲存為原生 bytes 時(例如在檔案的情形下),資料的編碼與解碼會以清楚易懂的方式進行,也可選擇同時轉換特定於平台的換行字元。
建立文字資料串流最簡單的方法是使用 open()
,可選擇性地指定編碼:
f = open("myfile.txt", "r", encoding="utf-8")
記憶體內的文字資料串流也可以使用 StringIO
物件建立:
f = io.StringIO("some initial text data")
文字資料串流 API 的詳細說明在 TextIOBase
文件當中。
二進位 (Binary) I/O¶
二進位 I/O(也稱為緩衝 I/O (buffered I/O))要求的是類位元組物件 (bytes-like objects) 且產生 bytes
物件。不進行編碼、解碼或者換行字元轉換。這種類型的資料串流可用於各種非文字資料,以及需要手動控制對文字資料的處理時。
建立二進位資料串流最簡單的方法是使用 open()
,並在 mode 字串中加入 'b'
:
f = open("myfile.jpg", "rb")
記憶體內的二進位資料串流也可以透過 BytesIO
物件來建立:
f = io.BytesIO(b"some initial binary data: \x00\x01")
二進位資料串流 API 的詳細說明在 BufferedIOBase
文件當中。
其它函式庫模組可能提供額外的方法來建立文字或二進位資料串流。例如 socket.socket.makefile()
。
原始 (Raw) I/O¶
原始 I/O(也稱為無緩衝 I/O (unbuffered I/O))通常作為二進位以及文字資料串流的低階 building-block 使用;在使用者程式碼中直接操作原始資料串流很少有用。然而,你可以透過以無緩衝的二進位模式開啟一個檔案來建立一個原始資料串流:
f = open("myfile.jpg", "rb", buffering=0)
原始串流 API 在 RawIOBase
文件中有詳細描述。
文字編碼¶
TextIOWrapper
和 open()
預設編碼是根據區域設定的 (locale-specific) (locale.getencoding()
)。
然而,許多開發人員在開啟以 UTF-8 編碼的文字檔案(例如:JSON、TOML、Markdown等)時忘記指定編碼,因為多數 Unix 平台預設使用 UTF-8 區域設定。這會導致錯誤,因為對於大多數 Windows 使用者來說,預設地區編碼並非 UTF-8。舉例來說:
# May not work on Windows when non-ASCII characters in the file.
with open("README.md") as f:
long_description = f.read()
因此,強烈建議在開啟文字檔案時,明確指定編碼。若你想使用 UTF-8 編碼,請傳入 encoding="utf-8"
。若想使用目前的地區編碼,Python 3.10 以後的版本支援使用 encoding="locale"
。
也參考
- Python UTF-8 模式
在 Python UTF-8 模式下,可以將預設編碼從特定地區編碼改為 UTF-8。
- PEP 686
Python 3.15 將預設使用 Python UTF-8 模式。
選擇性加入的編碼警告¶
在 3.10 版被加入: 更多資訊請見 PEP 597。
要找出哪些地方使用到預設的地區編碼,你可以啟用 -X warn_default_encoding
命令列選項,或者設定環境變數 PYTHONWARNDEFAULTENCODING
。當使用到預設編碼時,會引發 EncodingWarning
。
如果你正在提供一個使用 open()
或 TextIOWrapper
且傳遞 encoding=None
作為參數的 API,你可以使用 text_encoding()
。如此一來如果 API 的呼叫方沒有傳遞 encoding
,呼叫方就會發出一個 EncodingWarning
。然而,對於新的 API,請考慮預設使用 UTF-8(即 encoding="utf-8"
)。
高階模組介面¶
- io.DEFAULT_BUFFER_SIZE¶
一個包含模組中緩衝 I/O 類別所使用的預設緩衝區大小的整數。若可能的話,
open()
會使用檔案的 blksize (透過os.stat()
取得)。
- io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)¶
這是內建函式
open()
的別名。此函式會引發一個帶有引數 path、mode 以及 flags 的稽核事件 (auditing event)
open
。mode 與 flags 引數可能已經被修改或者從原始呼叫中被推斷出來。
- io.open_code(path)¶
以
'rb'
模式開啟提供的檔案。此函式應用於意圖將內容視為可執行的程式碼的情況下。path 應該要屬於
str
類別,且是個絕對路徑。這個函式的行為可能會被之前對
PyFile_SetOpenCodeHook()
的呼叫覆寫。然而,假設 path 是個str
且為絕對路徑,則open_code(path)
總是與open(path, 'rb')
有相同行為。覆寫這個行為是為了對檔案進行額外驗證或預處理。在 3.8 版被加入.
- io.text_encoding(encoding, stacklevel=2, /)¶
這是個輔助函數,適用於使用
open()
或TextIOWrapper
且具有encoding=None
參數的可呼叫物件。若 encoding 不為
None
,此函式將回傳 encoding。否則,將根據 UTF-8 Mode 回傳"locale"
或"utf-8"
。若
sys.flags.warn_default_encoding
為真,且 encoding 為None
,此函式會發出一個EncodingWarning
。stacklevel 指定警告在哪層發出。範例:def read_text(path, encoding=None): encoding = io.text_encoding(encoding) # stacklevel=2 with open(path, encoding) as f: return f.read()
在此範例中,對於
read_text()
的呼叫方會引發一個EncodingWarning
。更多資訊請見 文字編碼。
在 3.10 版被加入.
在 3.11 版的變更: 當 UTF-8 模式啟用且 encoding 為
None
時,text_encoding()
會回傳 "utf-8"。
- exception io.BlockingIOError¶
這是內建的
BlockingIOError
例外的相容性別名。
- exception io.UnsupportedOperation¶
當在資料串流上呼叫不支援的操作時,會引發繼承自
OSError
與ValueError
的例外。
也參考
sys
包含標準的 IO 資料串流:
sys.stdin
、sys.stdout
以及sys.stderr
。
類別階層¶
I/O 串流的實作是由多個類別組合成的階層結構所構成。首先是 abstract base classes (抽象基底類別,ABCs),它們被用來規範各種不同類型的串流,接著具體類別會提供標準串流的實作。
備註
為了協助具體串流類別的實作,抽象基底類別提供了某些方法的預設實作。舉例來說,BufferedIOBase
提供未經最佳化的 readinto()
與 readline()
實作。
I/O 階層結構的最上層是抽象基底類別 IOBase
。它定義了串流的基礎的介面。然而,請注意,讀取串流與寫入串流之間並沒有分離;若不支援給定的操作,實作是允許引發 UnsupportedOperation
例外的。
抽象基底類別 RawIOBase
繼承 IOBase
。此類別處理對串流的位元組讀寫。FileIO
則繼承 RawIOBase
來提供一個介面以存取機器檔案系統內的檔案。
抽象基底類別 BufferedIOBase
繼承 IOBase
。此類別緩衝原始二進位串流 (RawIOBase
)。它的子類別 BufferedWriter
、BufferedReader
與 BufferedRWPair
分別緩衝可寫、可讀、可讀也可寫的的原始二進位串流。類別 BufferedRandom
則提供一個對可搜尋串流 (seekable stream) 的緩衝介面。另一個類別 BufferedIOBase
的子類別 BytesIO
,是一個記憶體內位元組串流。
抽象基底類別 TextIOBase
繼承 IOBase
。此類別處理文本位元組串流,並處理字串的編碼和解碼。類別 TextIOWrapper
繼承自 TextIOBase
,這是個對緩衝原始串流 (BufferedIOBase
) 的緩衝文本介面。最後,StringIO
是個文字記憶體內串流。
引數名稱不是規範的一部份,只有 open()
的引數將作為關鍵字引數。
以下表格總結了 io
模組提供的抽象基底類別 (ABC):
抽象基底類別 (ABC) |
繼承 |
Stub 方法 |
Mixin 方法與屬性 |
---|---|---|---|
|
|
||
|
繼承自 |
||
|
繼承自 |
||
|
繼承自 |
I/O 基礎類別¶
- class io.IOBase¶
所有 I/O 類別的抽象基礎類別。
為許多方法提供了空的抽象實作,衍生類別可以選擇性地覆寫這些方法;預設的實作代表一個無法讀取、寫入或搜尋的檔案。
即使
IOBase
因為實作的簽名差異巨大而沒有宣告read()
或write()
方法,實作與用戶端應把這些方法視為介面的一部份。此外,當呼叫不被它們支援的操作時,可能會引發ValueError
(或UnsupportedOperation
)例外。The basic type used for binary data read from or written to a file is
bytes
. Other bytes-like objects are accepted as method arguments too. Text I/O classes work withstr
data.請注意,在一個已經關閉的串流上呼叫任何方法(即使只是查詢)都是未定義的。在這種情況下,實作可能會引發
ValueError
例外。IOBase
(and its subclasses) supports the iterator protocol, meaning that anIOBase
object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). Seereadline()
below.IOBase
也是個情境管理器,因此支援with
陳述式。在這個例子中,file 會在with
陳述式執行完畢後關閉——即使發生了異常。with open('spam.txt', 'w') as file: file.write('Spam and eggs!')
IOBase
提供這些資料屬性與方法:- close()¶
清除並關閉這個串流。若檔案已經關閉,則此方法沒有作用。一旦檔案被關閉,任何對檔案的操作(例如讀取或寫入)將引發
ValueError
異常。為了方便起見,允許多次呼叫這個方法;然而,只有第一次呼叫會有效果。
- closed¶
如果串流已關閉,則為
True
。
- flush()¶
如果適用,清空串流的寫入緩衝區。對於唯讀和非阻塞串流,此操作不會執行任何操作。
- isatty()¶
如果串流是互動式的(即連接到終端機/tty 設備),則回傳
True
。
- readline(size=-1, /)¶
從串流讀取並回傳一行。如果指定了 size,則最多讀取 size 個位元組。
對於二進位檔案,行結束符總是
b'\n'
;對於文字檔案,可以使用open()
函式的 newline 引數來選擇識別的行結束符號。
- readlines(hint=-1, /)¶
從串流讀取並回傳一個含有一或多行的 list。可以指定 hint 來控制讀取的行數:如果到目前為止所有行的總大小(以位元組/字元計)超過 hint,則不會再讀取更多行。
hint 值為
0
或更小,以及None
,都被視為沒有提供 hint。請注意,已經可以使用
for line in file: ...
在檔案物件上進行疊代,而不一定需要呼叫file.readlines()
。
- seek(offset, whence=os.SEEK_SET, /)¶
將串流位置改變到給定的位元組 offset,此位置是相對於由 whence 指示的位置解釋的,並回傳新的絕對位置。whence 的值可為:
os.SEEK_SET
或0
-- 串流的起點(預設值);offset 應為零或正數os.SEEK_CUR
或1
-- 目前串流位置;offset 可以是負數os.SEEK_END
或2
-- 串流的結尾;offset 通常是負數
在 3.1 版被加入:
SEEK_*
常數。在 3.3 版被加入: 某些作業系統可以支援額外的值,例如
os.SEEK_HOLE
或os.SEEK_DATA
。檔案的合法值取決於它是以文字模式還是二進位模式開啟。
- seekable()¶
如果串流支援隨機存取,則回傳
True
。如果是False
,則seek()
、tell()
和truncate()
會引發OSError
。
- tell()¶
回傳目前串流的位置。
- truncate(size=None, /)¶
將串流的大小調整為指定的 size 位元組(如果沒有指定 size,則調整為目前位置)。目前串流位置不會改變。這種調整可以擴展或縮減當前檔案大小。在擴展的情況下,新檔案區域的內容取決於平台(在大多數系統上,額外的位元組會被填充為零)。回傳新的檔案大小。
在 3.5 版的變更: Windows 現在在擴展時會對檔案進行零填充 (zero-fill)。
- writable()¶
如果串流支援寫入,則回傳
True
。如果是False
,write()
和truncate()
將會引發OSError
。
- writelines(lines, /)¶
將一個包含每一行的 list 寫入串流。這不會新增行分隔符號,因此通常提供的每一行末尾都有一個行分隔符號。
- class io.RawIOBase¶
原始二進位串流的基底類別。它繼承自
IOBase
。原始二進位串流通常提供對底層作業系統設備或 API 的低階存取,並不嘗試將其封裝在高階基元 (primitive) 中(這項功能在緩衝二進位串流和文字串流中的更高階層級完成,後面的頁面會有描述)。
RawIOBase
除了IOBase
的方法外,還提供以下這些方法:- read(size=-1, /)¶
從物件中讀取最多 size 個位元組並回傳。方便起見,如果 size 未指定或為 -1,則回傳直到檔案結尾 (EOF) 的所有位元組。否則,只會進行一次系統呼叫。如果作業系統呼叫回傳的位元組少於 size,則可能回傳少於 size 的位元組。
如果回傳了 0 位元組,且 size 不是 0,這表示檔案結尾 (end of file)。如果物件處於非阻塞模式且沒有可用的位元組,則回傳
None
。預設的實作會遵守
readall()
和readinto()
的實作。
- readall()¶
讀取並回傳串流中直到檔案結尾的所有位元組,必要時使用多次對串流的呼叫。
- readinto(b, /)¶
將位元組讀入一個預先分配的、可寫的 bytes-like object (類位元組物件) b 中,並回傳讀取的位元組數量。例如,b 可能是一個
bytearray
。如果物件處於非阻塞模式且沒有可用的位元組,則回傳None
。
- write(b, /)¶
將給定的 bytes-like object (類位元組物件),b,寫入底層的原始串流,並回傳寫入的位元組大小。根據底層原始串流的具體情況,這可能少於 b 的位元組長度,尤其是當它處於非阻塞模式時。如果原始串流設置為非阻塞且無法立即寫入任何單一位元組,則回傳
None
。呼叫者在此方法回傳後可以釋放或變更 b,因此實作應該只在方法呼叫期間存取 b。
- class io.BufferedIOBase¶
支援某種緩衝的二進位串流的基底類別。它繼承自
IOBase
。與
RawIOBase
的主要差異在於,read()
、readinto()
及write()
方法將分別嘗試讀取所請求的盡可能多的輸入,或消耗所有給定的輸出,即使可能需要進行多於一次的系統呼叫。此外,如果底層的原始串流處於非阻塞模式且無法提供或接收足夠的資料,這些方法可能會引發
BlockingIOError
例外;與RawIOBase
不同之處在於,它們永遠不會回傳None
。此外,
read()
方法不存在一個遵從readinto()
的預設實作。一個典型的
BufferedIOBase
實作不應該繼承自一個RawIOBase
的實作,而是應該改用包裝的方式,像BufferedWriter
和BufferedReader
那樣的作法。BufferedIOBase
除了提供或覆寫來自IOBase
的資料屬性和方法以外,還包含了這些:- raw¶
底層的原始串流(一個
RawIOBase
實例),BufferedIOBase
處理的對象。這不是BufferedIOBase
API 的一部分,且在某些實作可能不存在。
- detach()¶
將底層的原始串流從緩衝區中分離出來,並回傳它。
在原始串流被分離後,緩衝區處於一個不可用的狀態。
某些緩衝區,如
BytesIO
,沒有單一原始串流的概念可從此方法回傳。它們會引發UnsupportedOperation
。在 3.1 版被加入.
- read(size=-1, /)¶
讀取並回傳最多 size 個位元組。如果引數被省略、為
None
或為負值,將讀取並回傳資料直到達到 EOF 為止。如果串流已經處於 EOF,則回傳一個空的bytes
物件。如果引數為正數,且底層原始串流不是互動式的,可能會發出多次原始讀取來滿足位元組數量(除非首先達到 EOF)。但對於互動式原始串流,最多只會發出一次原始讀取,且短少的資料不表示 EOF 即將到來。
如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發
BlockingIOError
。
- read1(size=-1, /)¶
讀取並回傳最多 size 個位元組,最多呼叫一次底層原始串流的
read()
(或readinto()
) 方法。如果你正在BufferedIOBase
物件之上實作自己的緩衝區,這可能會很有用。如果 size 為
-1
(預設值),則會回傳任意數量的位元組(除非達到 EOF,否則會超過零)。
- readinto(b, /)¶
讀取位元組到一個預先分配的、可寫的 bytes-like object b 當中,並回傳讀取的位元組數量。例如,b 可能是一個
bytearray
。類似於
read()
,除非後者是互動式的,否則可能會對底層原始串流發出多次讀取。如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發
BlockingIOError
。
- readinto1(b, /)¶
讀取位元組到一個預先分配的、可寫的 bytes-like object b 中,最多呼叫一次底層原始串流的
read()
(或readinto()
)方法。此方法回傳讀取的位元組數量。如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發
BlockingIOError
。在 3.5 版被加入.
- write(b, /)¶
寫入給定的 bytes-like object,b,並回傳寫入的位元組數量(總是等於 b 的長度,以位元組計,因為如果寫入失敗將會引發
OSError
)。根據實際的實作,這些位元組可能會立即寫入底層串流,或出於性能和延遲的緣故而被留在緩衝區當中。當處於非阻塞模式時,如果需要將資料寫入原始串流,但它無法接受所有資料而不阻塞,則會引發
BlockingIOError
。呼叫者可以在此方法回傳後釋放或變更 b,因此實作應該僅在方法呼叫期間存取 b。
原始檔案 I/O¶
- class io.FileIO(name, mode='r', closefd=True, opener=None)¶
一個代表包含位元組資料的 OS 層級檔案的原始二進制串流。它繼承自
RawIOBase
。name 可以是兩種事物之一:
代表將要打開的檔案路徑的一個字元串或
bytes
物件。在這種情況下,closefd 必須是True
(預設值),否則將引發錯誤。an integer representing the number of an existing OS-level file descriptor to which the resulting
FileIO
object will give access. When the FileIO object is closed this fd will be closed as well, unless closefd is set toFalse
.
The mode can be
'r'
,'w'
,'x'
or'a'
for reading (default), writing, exclusive creation or appending. The file will be created if it doesn't exist when opened for writing or appending; it will be truncated when opened for writing.FileExistsError
will be raised if it already exists when opened for creating. Opening a file for creating implies writing, so this mode behaves in a similar way to'w'
. Add a'+'
to the mode to allow simultaneous reading and writing.The
read()
(when called with a positive argument),readinto()
andwrite()
methods on this class will only make one system call.A custom opener can be used by passing a callable as opener. The underlying file descriptor for the file object is then obtained by calling opener with (name, flags). opener must return an open file descriptor (passing
os.open
as opener results in functionality similar to passingNone
).The newly created file is non-inheritable.
See the
open()
built-in function for examples on using the opener parameter.在 3.3 版的變更: The opener parameter was added. The
'x'
mode was added.在 3.4 版的變更: The file is now non-inheritable.
FileIO
provides these data attributes in addition to those fromRawIOBase
andIOBase
:- mode¶
The mode as given in the constructor.
- name¶
The file name. This is the file descriptor of the file when no name is given in the constructor.
Buffered Streams¶
Buffered I/O streams provide a higher-level interface to an I/O device than raw I/O does.
- class io.BytesIO(initial_bytes=b'')¶
A binary stream using an in-memory bytes buffer. It inherits from
BufferedIOBase
. The buffer is discarded when theclose()
method is called.The optional argument initial_bytes is a bytes-like object that contains initial data.
BytesIO
provides or overrides these methods in addition to those fromBufferedIOBase
andIOBase
:- getbuffer()¶
Return a readable and writable view over the contents of the buffer without copying them. Also, mutating the view will transparently update the contents of the buffer:
>>> b = io.BytesIO(b"abcdef") >>> view = b.getbuffer() >>> view[2:4] = b"56" >>> b.getvalue() b'ab56ef'
備註
As long as the view exists, the
BytesIO
object cannot be resized or closed.在 3.2 版被加入.
- read1(size=-1, /)¶
In
BytesIO
, this is the same asread()
.在 3.7 版的變更: The size argument is now optional.
- readinto1(b, /)¶
In
BytesIO
, this is the same asreadinto()
.在 3.5 版被加入.
- class io.BufferedReader(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffered binary stream providing higher-level access to a readable, non seekable
RawIOBase
raw binary stream. It inherits fromBufferedIOBase
.When reading data from this object, a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer. The buffered data can then be returned directly on subsequent reads.
The constructor creates a
BufferedReader
for the given readable raw stream and buffer_size. If buffer_size is omitted,DEFAULT_BUFFER_SIZE
is used.BufferedReader
provides or overrides these methods in addition to those fromBufferedIOBase
andIOBase
:- peek(size=0, /)¶
Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.
- read(size=-1, /)¶
Read and return size bytes, or if size is not given or negative, until EOF or if the read call would block in non-blocking mode.
- read1(size=-1, /)¶
Read and return up to size bytes with only one call on the raw stream. If at least one byte is buffered, only buffered bytes are returned. Otherwise, one raw stream read call is made.
在 3.7 版的變更: The size argument is now optional.
- class io.BufferedWriter(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffered binary stream providing higher-level access to a writeable, non seekable
RawIOBase
raw binary stream. It inherits fromBufferedIOBase
.When writing to this object, data is normally placed into an internal buffer. The buffer will be written out to the underlying
RawIOBase
object under various conditions, including:when the buffer gets too small for all pending data;
when
flush()
is called;when a
seek()
is requested (forBufferedRandom
objects);when the
BufferedWriter
object is closed or destroyed.
The constructor creates a
BufferedWriter
for the given writeable raw stream. If the buffer_size is not given, it defaults toDEFAULT_BUFFER_SIZE
.BufferedWriter
provides or overrides these methods in addition to those fromBufferedIOBase
andIOBase
:- flush()¶
Force bytes held in the buffer into the raw stream. A
BlockingIOError
should be raised if the raw stream blocks.
- write(b, /)¶
Write the bytes-like object, b, and return the number of bytes written. When in non-blocking mode, a
BlockingIOError
is raised if the buffer needs to be written out but the raw stream blocks.
- class io.BufferedRandom(raw, buffer_size=DEFAULT_BUFFER_SIZE)¶
A buffered binary stream providing higher-level access to a seekable
RawIOBase
raw binary stream. It inherits fromBufferedReader
andBufferedWriter
.The constructor creates a reader and writer for a seekable raw stream, given in the first argument. If the buffer_size is omitted it defaults to
DEFAULT_BUFFER_SIZE
.BufferedRandom
is capable of anythingBufferedReader
orBufferedWriter
can do. In addition,seek()
andtell()
are guaranteed to be implemented.
- class io.BufferedRWPair(reader, writer, buffer_size=DEFAULT_BUFFER_SIZE, /)¶
A buffered binary stream providing higher-level access to two non seekable
RawIOBase
raw binary streams---one readable, the other writeable. It inherits fromBufferedIOBase
.reader and writer are
RawIOBase
objects that are readable and writeable respectively. If the buffer_size is omitted it defaults toDEFAULT_BUFFER_SIZE
.BufferedRWPair
implements all ofBufferedIOBase
's methods except fordetach()
, which raisesUnsupportedOperation
.警告
BufferedRWPair
does not attempt to synchronize accesses to its underlying raw streams. You should not pass it the same object as reader and writer; useBufferedRandom
instead.
文字 I/O¶
- class io.TextIOBase¶
Base class for text streams. This class provides a character and line based interface to stream I/O. It inherits from
IOBase
.TextIOBase
provides or overrides these data attributes and methods in addition to those fromIOBase
:- encoding¶
The name of the encoding used to decode the stream's bytes into strings, and to encode strings into bytes.
- errors¶
The error setting of the decoder or encoder.
- newlines¶
A string, a tuple of strings, or
None
, indicating the newlines translated so far. Depending on the implementation and the initial constructor flags, this may not be available.
- buffer¶
The underlying binary buffer (a
BufferedIOBase
instance) thatTextIOBase
deals with. This is not part of theTextIOBase
API and may not exist in some implementations.
- detach()¶
Separate the underlying binary buffer from the
TextIOBase
and return it.After the underlying buffer has been detached, the
TextIOBase
is in an unusable state.Some
TextIOBase
implementations, likeStringIO
, may not have the concept of an underlying buffer and calling this method will raiseUnsupportedOperation
.在 3.1 版被加入.
- read(size=-1, /)¶
Read and return at most size characters from the stream as a single
str
. If size is negative orNone
, reads until EOF.
- readline(size=-1, /)¶
Read until newline or EOF and return a single
str
. If the stream is already at EOF, an empty string is returned.If size is specified, at most size characters will be read.
- seek(offset, whence=SEEK_SET, /)¶
Change the stream position to the given offset. Behaviour depends on the whence parameter. The default value for whence is
SEEK_SET
.SEEK_SET
or0
: seek from the start of the stream (the default); offset must either be a number returned byTextIOBase.tell()
, or zero. Any other offset value produces undefined behaviour.SEEK_CUR
or1
: "seek" to the current position; offset must be zero, which is a no-operation (all other values are unsupported).SEEK_END
or2
: seek to the end of the stream; offset must be zero (all other values are unsupported).
Return the new absolute position as an opaque number.
在 3.1 版被加入:
SEEK_*
常數。
- tell()¶
Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.
- write(s, /)¶
Write the string s to the stream and return the number of characters written.
- class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)¶
A buffered text stream providing higher-level access to a
BufferedIOBase
buffered binary stream. It inherits fromTextIOBase
.encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to
locale.getencoding()
.encoding="locale"
can be used to specify the current locale's encoding explicitly. See 文字編碼 for more information.errors is an optional string that specifies how encoding and decoding errors are to be handled. Pass
'strict'
to raise aValueError
exception if there is an encoding error (the default ofNone
has the same effect), or pass'ignore'
to ignore errors. (Note that ignoring encoding errors can lead to data loss.)'replace'
causes a replacement marker (such as'?'
) to be inserted where there is malformed data.'backslashreplace'
causes malformed data to be replaced by a backslashed escape sequence. When writing,'xmlcharrefreplace'
(replace with the appropriate XML character reference) or'namereplace'
(replace with\N{...}
escape sequences) can be used. Any other error handling name that has been registered withcodecs.register_error()
is also valid.newline controls how line endings are handled. It can be
None
,''
,'\n'
,'\r'
, and'\r\n'
. It works as follows:When reading input from the stream, if newline is
None
, universal newlines mode is enabled. Lines in the input can end in'\n'
,'\r'
, or'\r\n'
, and these are translated into'\n'
before being returned to the caller. If newline is''
, universal newlines mode is enabled, but line endings are returned to the caller untranslated. If newline has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.When writing output to the stream, if newline is
None
, any'\n'
characters written are translated to the system default line separator,os.linesep
. If newline is''
or'\n'
, no translation takes place. If newline is any of the other legal values, any'\n'
characters written are translated to the given string.
If line_buffering is
True
,flush()
is implied when a call to write contains a newline character or a carriage return.If write_through is
True
, calls towrite()
are guaranteed not to be buffered: any data written on theTextIOWrapper
object is immediately handled to its underlying binary buffer.在 3.3 版的變更: The write_through argument has been added.
在 3.3 版的變更: The default encoding is now
locale.getpreferredencoding(False)
instead oflocale.getpreferredencoding()
. Don't change temporary the locale encoding usinglocale.setlocale()
, use the current locale encoding instead of the user preferred encoding.在 3.10 版的變更: The encoding argument now supports the
"locale"
dummy encoding name.TextIOWrapper
provides these data attributes and methods in addition to those fromTextIOBase
andIOBase
:- line_buffering¶
Whether line buffering is enabled.
- write_through¶
Whether writes are passed immediately to the underlying binary buffer.
在 3.7 版被加入.
- reconfigure(*, encoding=None, errors=None, newline=None, line_buffering=None, write_through=None)¶
Reconfigure this text stream using new settings for encoding, errors, newline, line_buffering and write_through.
Parameters not specified keep current settings, except
errors='strict'
is used when encoding is specified but errors is not specified.It is not possible to change the encoding or newline if some data has already been read from the stream. On the other hand, changing encoding after write is possible.
This method does an implicit stream flush before setting the new parameters.
在 3.7 版被加入.
在 3.11 版的變更: The method supports
encoding="locale"
option.
- seek(cookie, whence=os.SEEK_SET, /)¶
Set the stream position. Return the new stream position as an
int
.Four operations are supported, given by the following argument combinations:
seek(0, SEEK_SET)
: Rewind to the start of the stream.seek(cookie, SEEK_SET)
: Restore a previous position; cookie must be a number returned bytell()
.seek(0, SEEK_END)
: Fast-forward to the end of the stream.seek(0, SEEK_CUR)
: Leave the current stream position unchanged.
Any other argument combinations are invalid, and may raise exceptions.
也參考
os.SEEK_SET
,os.SEEK_CUR
, andos.SEEK_END
.
- class io.StringIO(initial_value='', newline='\n')¶
A text stream using an in-memory text buffer. It inherits from
TextIOBase
.The text buffer is discarded when the
close()
method is called.The initial value of the buffer can be set by providing initial_value. If newline translation is enabled, newlines will be encoded as if by
write()
. The stream is positioned at the start of the buffer which emulates opening an existing file in aw+
mode, making it ready for an immediate write from the beginning or for a write that would overwrite the initial value. To emulate opening a file in ana+
mode ready for appending, usef.seek(0, io.SEEK_END)
to reposition the stream at the end of the buffer.The newline argument works like that of
TextIOWrapper
, except that when writing output to the stream, if newline isNone
, newlines are written as\n
on all platforms.StringIO
provides this method in addition to those fromTextIOBase
andIOBase
:- getvalue()¶
Return a
str
containing the entire contents of the buffer. Newlines are decoded as if byread()
, although the stream position is not changed.
使用範例:
import io output = io.StringIO() output.write('First line.\n') print('Second line.', file=output) # Retrieve file contents -- this will be # 'First line.\nSecond line.\n' contents = output.getvalue() # Close object and discard memory buffer -- # .getvalue() will now raise an exception. output.close()
- class io.IncrementalNewlineDecoder¶
A helper codec that decodes newlines for universal newlines mode. It inherits from
codecs.IncrementalDecoder
.
Performance¶
This section discusses the performance of the provided concrete I/O implementations.
二進位 (Binary) I/O¶
By reading and writing only large chunks of data even when the user asks for a single byte, buffered I/O hides any inefficiency in calling and executing the operating system's unbuffered I/O routines. The gain depends on the OS and the kind of I/O which is performed. For example, on some modern OSes such as Linux, unbuffered disk I/O can be as fast as buffered I/O. The bottom line, however, is that buffered I/O offers predictable performance regardless of the platform and the backing device. Therefore, it is almost always preferable to use buffered I/O rather than unbuffered I/O for binary data.
文字 I/O¶
Text I/O over a binary storage (such as a file) is significantly slower than
binary I/O over the same storage, because it requires conversions between
unicode and binary data using a character codec. This can become noticeable
handling huge amounts of text data like large log files. Also,
tell()
and seek()
are both quite slow
due to the reconstruction algorithm used.
StringIO
, however, is a native in-memory unicode container and will
exhibit similar speed to BytesIO
.
Multi-threading¶
FileIO
objects are thread-safe to the extent that the operating system
calls (such as read(2) under Unix) they wrap are thread-safe too.
Binary buffered objects (instances of BufferedReader
,
BufferedWriter
, BufferedRandom
and BufferedRWPair
)
protect their internal structures using a lock; it is therefore safe to call
them from multiple threads at once.
TextIOWrapper
objects are not thread-safe.
Reentrancy¶
Binary buffered objects (instances of BufferedReader
,
BufferedWriter
, BufferedRandom
and BufferedRWPair
)
are not reentrant. While reentrant calls will not happen in normal situations,
they can arise from doing I/O in a signal
handler. If a thread tries to
re-enter a buffered object which it is already accessing, a RuntimeError
is raised. Note this doesn't prohibit a different thread from entering the
buffered object.
The above implicitly extends to text files, since the open()
function
will wrap a buffered object inside a TextIOWrapper
. This includes
standard streams and therefore affects the built-in print()
function as
well.