"codecs" --- Codec registry and base classes
********************************************

**Source code:** Lib/codecs.py

======================================================================

This module defines base classes for standard Python codecs (encoders
and decoders) and provides access to the internal Python codec
registry, which manages the codec and error handling lookup process.
Most standard codecs are *text encodings*, which encode text to bytes,
but there are also codecs provided that encode text to text, and bytes
to bytes. Custom codecs may encode and decode between arbitrary types,
but some module features are restricted to use specifically with *text
encodings*, or with codecs that encode to "bytes".

The module defines the following functions for encoding and decoding
with any codec:

codecs.encode(obj, encoding='utf-8', errors='strict')

   Encodes *obj* using the codec registered for *encoding*.

   *Errors* may be given to set the desired error handling scheme. The
   default error handler is "'strict'" meaning that encoding errors
   raise "ValueError" (or a more codec specific subclass, such as
   "UnicodeEncodeError"). Refer to Codec Base Classes for more
   information on codec error handling.

codecs.decode(obj, encoding='utf-8', errors='strict')

   Decodes *obj* using the codec registered for *encoding*.

   *Errors* may be given to set the desired error handling scheme. The
   default error handler is "'strict'" meaning that decoding errors
   raise "ValueError" (or a more codec specific subclass, such as
   "UnicodeDecodeError"). Refer to Codec Base Classes for more
   information on codec error handling.

The full details for each codec can also be looked up directly:

codecs.lookup(encoding)

   Looks up the codec info in the Python codec registry and returns a
   "CodecInfo" object as defined below.

   Encodings are first looked up in the registry's cache. If not
   found, the list of registered search functions is scanned. If no
   "CodecInfo" object is found, a "LookupError" is raised. Otherwise,
   the "CodecInfo" object is stored in the cache and returned to the
   caller.

class codecs.CodecInfo(encode, decode, streamreader=None, streamwriter=None, incrementalencoder=None, incrementaldecoder=None, name=None)

   Codec details when looking up the codec registry. The constructor
   arguments are stored in attributes of the same name:

   name

      The name of the encoding.

   encode
   decode

      The stateless encoding and decoding functions. These must be
      functions or methods which have the same interface as the
      "encode()" and "decode()" methods of Codec instances (see Codec
      Interface). The functions or methods are expected to work in a
      stateless mode.

   incrementalencoder
   incrementaldecoder

      Incremental encoder and decoder classes or factory functions.
      These have to provide the interface defined by the base classes
      "IncrementalEncoder" and "IncrementalDecoder", respectively.
      Incremental codecs can maintain state.

   streamwriter
   streamreader

      Stream writer and reader classes or factory functions. These
      have to provide the interface defined by the base classes
      "StreamWriter" and "StreamReader", respectively. Stream codecs
      can maintain state.

To simplify access to the various codec components, the module
provides these additional functions which use "lookup()" for the codec
lookup:

codecs.getencoder(encoding)

   Look up the codec for the given encoding and return its encoder
   function.

   Raises a "LookupError" in case the encoding cannot be found.

codecs.getdecoder(encoding)

   Look up the codec for the given encoding and return its decoder
   function.

   Raises a "LookupError" in case the encoding cannot be found.

codecs.getincrementalencoder(encoding)

   Look up the codec for the given encoding and return its incremental
   encoder class or factory function.

   Raises a "LookupError" in case the encoding cannot be found or the
   codec doesn't support an incremental encoder.

codecs.getincrementaldecoder(encoding)

   Look up the codec for the given encoding and return its incremental
   decoder class or factory function.

   Raises a "LookupError" in case the encoding cannot be found or the
   codec doesn't support an incremental decoder.

codecs.getreader(encoding)

   Look up the codec for the given encoding and return its
   "StreamReader" class or factory function.

   Raises a "LookupError" in case the encoding cannot be found.

codecs.getwriter(encoding)

   Look up the codec for the given encoding and return its
   "StreamWriter" class or factory function.

   Raises a "LookupError" in case the encoding cannot be found.

Custom codecs are made available by registering a suitable codec
search function:

codecs.register(search_function)

   Register a codec search function. Search functions are expected to
   take one argument, being the encoding name in all lower case
   letters, and return a "CodecInfo" object. In case a search function
   cannot find a given encoding, it should return "None".

   備註: Search function registration is not currently reversible,
     which may cause problems in some cases, such as unit testing or
     module reloading.

While the builtin "open()" and the associated "io" module are the
recommended approach for working with encoded text files, this module
provides additional utility functions and classes that allow the use
of a wider range of codecs when working with binary files:

codecs.open(filename, mode='r', encoding=None, errors='strict', buffering=-1)

   Open an encoded file using the given *mode* and return an instance
   of "StreamReaderWriter", providing transparent encoding/decoding.
   The default file mode is "'r'", meaning to open the file in read
   mode.

   備註: Underlying encoded files are always opened in binary mode.
     No automatic conversion of "'\n'" is done on reading and writing.
     The *mode* argument may be any binary mode acceptable to the
     built-in "open()" function; the "'b'" is automatically added.

   *encoding* specifies the encoding which is to be used for the file.
   Any encoding that encodes to and decodes from bytes is allowed, and
   the data types supported by the file methods depend on the codec
   used.

   *errors* may be given to define the error handling. It defaults to
   "'strict'" which causes a "ValueError" to be raised in case an
   encoding error occurs.

   *buffering* 的含义与内置 "open()" 函数中的相同。 默认值 -1 表示将使
   用默认的缓冲区大小。

   3.9 版更變: The "'U'" mode has been removed.

codecs.EncodedFile(file, data_encoding, file_encoding=None, errors='strict')

   Return a "StreamRecoder" instance, a wrapped version of *file*
   which provides transparent transcoding. The original file is closed
   when the wrapped version is closed.

   Data written to the wrapped file is decoded according to the given
   *data_encoding* and then written to the original file as bytes
   using *file_encoding*. Bytes read from the original file are
   decoded according to *file_encoding*, and the result is encoded
   using *data_encoding*.

   If *file_encoding* is not given, it defaults to *data_encoding*.

   *errors* may be given to define the error handling. It defaults to
   "'strict'", which causes "ValueError" to be raised in case an
   encoding error occurs.

codecs.iterencode(iterator, encoding, errors='strict', **kwargs)

   Uses an incremental encoder to iteratively encode the input
   provided by *iterator*. This function is a *generator*. The
   *errors* argument (as well as any other keyword argument) is passed
   through to the incremental encoder.

   This function requires that the codec accept text "str" objects to
   encode. Therefore it does not support bytes-to-bytes encoders such
   as "base64_codec".

codecs.iterdecode(iterator, encoding, errors='strict', **kwargs)

   Uses an incremental decoder to iteratively decode the input
   provided by *iterator*. This function is a *generator*. The
   *errors* argument (as well as any other keyword argument) is passed
   through to the incremental decoder.

   This function requires that the codec accept "bytes" objects to
   decode. Therefore it does not support text-to-text encoders such as
   "rot_13", although "rot_13" may be used equivalently with
   "iterencode()".

The module also provides the following constants which are useful for
reading and writing to platform dependent files:

codecs.BOM
codecs.BOM_BE
codecs.BOM_LE
codecs.BOM_UTF8
codecs.BOM_UTF16
codecs.BOM_UTF16_BE
codecs.BOM_UTF16_LE
codecs.BOM_UTF32
codecs.BOM_UTF32_BE
codecs.BOM_UTF32_LE

   These constants define various byte sequences, being Unicode byte
   order marks (BOMs) for several encodings. They are used in UTF-16
   and UTF-32 data streams to indicate the byte order used, and in
   UTF-8 as a Unicode signature. "BOM_UTF16" is either "BOM_UTF16_BE"
   or "BOM_UTF16_LE" depending on the platform's native byte order,
   "BOM" is an alias for "BOM_UTF16", "BOM_LE" for "BOM_UTF16_LE" and
   "BOM_BE" for "BOM_UTF16_BE". The others represent the BOM in UTF-8
   and UTF-32 encodings.


Codec Base Classes
==================

The "codecs" module defines a set of base classes which define the
interfaces for working with codec objects, and can also be used as the
basis for custom codec implementations.

Each codec has to define four interfaces to make it usable as codec in
Python: stateless encoder, stateless decoder, stream reader and stream
writer. The stream reader and writers typically reuse the stateless
encoder/decoder to implement the file protocols. Codec authors also
need to define how the codec will handle encoding and decoding errors.


Error Handlers
--------------

为了简化和标准化错误处理，编解码器可以通过接受 *errors* 字符串参数来实
现不同的错误处理方案。 所有标准的 Python 编解码器都定义并实现了以下字
符串值：

+---------------------------+-------------------------------------------------+
| Value                     | Meaning                                         |
|===========================|=================================================|
| "'strict'"                | 引发 "UnicodeError" (或其子类)；这是默认的方案  |
|                           | 。 在 "strict_errors()" 中实现。                |
+---------------------------+-------------------------------------------------+
| "'ignore'"                | 忽略错误格式的数据并且不加进一步通知就继续执行  |
|                           | 。 在 "ignore_errors()" 中实现。                |
+---------------------------+-------------------------------------------------+

The following error handlers are only applicable to *text encodings*:

+---------------------------+-------------------------------------------------+
| Value                     | Meaning                                         |
|===========================|=================================================|
| "'replace'"               | 使用适当的替换标记进行替换；Python 内置编解码器 |
|                           | 将在解码时使用官方 "U+FFFD" 替换字符，而在编码  |
|                           | 时使用 '?' 。 在 "replace_errors()" 中实现 。   |
+---------------------------+-------------------------------------------------+
| "'xmlcharrefreplace'"     | 使用适当的 XML 字符引用进行替换（仅在编码时）。 |
|                           | 在 "xmlcharrefreplace_errors()" 中实现。        |
+---------------------------+-------------------------------------------------+
| "'backslashreplace'"      | Replace with backslashed escape sequences.      |
|                           | Implemented in "backslashreplace_errors()".     |
+---------------------------+-------------------------------------------------+
| "'namereplace'"           | 使用 "\N{...}" 转义序列进行替换（仅在编码时）。 |
|                           | 在 "namereplace_errors()" 中实现。              |
+---------------------------+-------------------------------------------------+
| "'surrogateescape'"       | 在解码时，将字节替换为 "U+DC80" 至 "U+DCFF" 范  |
|                           | 围内的单个代理代码。 当 在编码数据时使用        |
|                           | "'surrogateescape'" 错误处理方案时，此代理将被  |
|                           | 转换回 相同的字节。 （请参阅 **PEP 383** 了解详 |
|                           | 情。）                                          |
+---------------------------+-------------------------------------------------+

In addition, the following error handler is specific to the given
codecs:

+---------------------+--------------------------+---------------------------------------------+
| Value               | Codecs                   | Meaning                                     |
|=====================|==========================|=============================================|
| "'surrogatepass'"   | utf-8, utf-16, utf-32,   | 允许编码和解码代理代码。 这些编解码器通常会 |
|                     | utf-16-be, utf-16-le,    | 将出现的代理代码视为错误。                  |
|                     | utf-32-be, utf-32-le     |                                             |
+---------------------+--------------------------+---------------------------------------------+

3.1 版新加入: The "'surrogateescape'" and "'surrogatepass'" error
handlers.

3.4 版更變: The "'surrogatepass'" error handlers now works with
utf-16* and utf-32* codecs.

3.5 版新加入: The "'namereplace'" error handler.

3.5 版更變: The "'backslashreplace'" error handlers now works with
decoding and translating.

The set of allowed values can be extended by registering a new named
error handler:

codecs.register_error(name, error_handler)

   Register the error handling function *error_handler* under the name
   *name*. The *error_handler* argument will be called during encoding
   and decoding in case of an error, when *name* is specified as the
   errors parameter.

   对于编码操作，将会调用 *error_handler* 并传入一个
   "UnicodeEncodeError" 实例，其中包含有关错误位置的信息。 错误处理程
   序必须引发此异常或别的异常，或者也可以返回一个元组，其中包含输入的
   不可编码部分的替换对象，以及应当继续进行编码的位置。 替换对象可以为
   "str" 或 "bytes" 类型。 如果替换对象为字节串，编码器将简单地将其复
   制到输出缓冲区。 如果替换对象为字符串，编码器将对替换对象进行编码。
   对原始输入的编码操作会在指定位置继续进行。 负的位置值将被视为相对于
   输入字符串的末尾。 如果结果位置超出范围则将引发 "IndexError"。

   Decoding and translating works similarly, except
   "UnicodeDecodeError" or "UnicodeTranslateError" will be passed to
   the handler and that the replacement from the error handler will be
   put into the output directly.

Previously registered error handlers (including the standard error
handlers) can be looked up by name:

codecs.lookup_error(name)

   Return the error handler previously registered under the name
   *name*.

   Raises a "LookupError" in case the handler cannot be found.

The following standard error handlers are also made available as
module level functions:

codecs.strict_errors(exception)

   Implements the "'strict'" error handling: each encoding or decoding
   error raises a "UnicodeError".

codecs.replace_errors(exception)

   Implements the "'replace'" error handling (for *text encodings*
   only): substitutes "'?'" for encoding errors (to be encoded by the
   codec), and "'\ufffd'" (the Unicode replacement character) for
   decoding errors.

codecs.ignore_errors(exception)

   Implements the "'ignore'" error handling: malformed data is ignored
   and encoding or decoding is continued without further notice.

codecs.xmlcharrefreplace_errors(exception)

   Implements the "'xmlcharrefreplace'" error handling (for encoding
   with *text encodings* only): the unencodable character is replaced
   by an appropriate XML character reference.

codecs.backslashreplace_errors(exception)

   Implements the "'backslashreplace'" error handling (for *text
   encodings* only): malformed data is replaced by a backslashed
   escape sequence.

codecs.namereplace_errors(exception)

   Implements the "'namereplace'" error handling (for encoding with
   *text encodings* only): the unencodable character is replaced by a
   "\N{...}" escape sequence.

   3.5 版新加入.


Stateless Encoding and Decoding
-------------------------------

The base "Codec" class defines these methods which also define the
function interfaces of the stateless encoder and decoder:

Codec.encode(input[, errors])

   Encodes the object *input* and returns a tuple (output object,
   length consumed). For instance, *text encoding* converts a string
   object to a bytes object using a particular character set encoding
   (e.g., "cp1252" or "iso-8859-1").

   The *errors* argument defines the error handling to apply. It
   defaults to "'strict'" handling.

   The method may not store state in the "Codec" instance. Use
   "StreamWriter" for codecs which have to keep state in order to make
   encoding efficient.

   The encoder must be able to handle zero length input and return an
   empty object of the output object type in this situation.

Codec.decode(input[, errors])

   解码 *input* 对象并返回一个元组 (输出对象, 消耗长度)。 例如，*text
   encoding* 的解码操作会使用特定的字符集编码格式将字节串对象转换为字
   符串对象。

   For text encodings and bytes-to-bytes codecs, *input* must be a
   bytes object or one which provides the read-only buffer interface
   -- for example, buffer objects and memory mapped files.

   The *errors* argument defines the error handling to apply. It
   defaults to "'strict'" handling.

   The method may not store state in the "Codec" instance. Use
   "StreamReader" for codecs which have to keep state in order to make
   decoding efficient.

   The decoder must be able to handle zero length input and return an
   empty object of the output object type in this situation.


Incremental Encoding and Decoding
---------------------------------

The "IncrementalEncoder" and "IncrementalDecoder" classes provide the
basic interface for incremental encoding and decoding.
Encoding/decoding the input isn't done with one call to the stateless
encoder/decoder function, but with multiple calls to the
"encode()"/"decode()" method of the incremental encoder/decoder. The
incremental encoder/decoder keeps track of the encoding/decoding
process during method calls.

The joined output of calls to the "encode()"/"decode()" method is the
same as if all the single inputs were joined into one, and this input
was encoded/decoded with the stateless encoder/decoder.


IncrementalEncoder Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~

The "IncrementalEncoder" class is used for encoding an input in
multiple steps. It defines the following methods which every
incremental encoder must define in order to be compatible with the
Python codec registry.

class codecs.IncrementalEncoder(errors='strict')

   Constructor for an "IncrementalEncoder" instance.

   All incremental encoders must provide this constructor interface.
   They are free to add additional keyword arguments, but only the
   ones defined here are used by the Python codec registry.

   The "IncrementalEncoder" may implement different error handling
   schemes by providing the *errors* keyword argument. See Error
   Handlers for possible values.

   The *errors* argument will be assigned to an attribute of the same
   name. Assigning to this attribute makes it possible to switch
   between different error handling strategies during the lifetime of
   the "IncrementalEncoder" object.

   encode(object[, final])

      Encodes *object* (taking the current state of the encoder into
      account) and returns the resulting encoded object. If this is
      the last call to "encode()" *final* must be true (the default is
      false).

   reset()

      Reset the encoder to the initial state. The output is discarded:
      call ".encode(object, final=True)", passing an empty byte or
      text string if necessary, to reset the encoder and to get the
      output.

   getstate()

      返回编码器的当前状态，该值必须为一个整数。 实现应当确保 "0" 是最
      常见的状态。 （比整数更复杂的状态表示可以通过编组/选择状态并将结
      果字符串的字节数据编码为整数来转换为一个整数值）。

   setstate(state)

      Set the state of the encoder to *state*. *state* must be an
      encoder state returned by "getstate()".


IncrementalDecoder Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~

The "IncrementalDecoder" class is used for decoding an input in
multiple steps. It defines the following methods which every
incremental decoder must define in order to be compatible with the
Python codec registry.

class codecs.IncrementalDecoder(errors='strict')

   Constructor for an "IncrementalDecoder" instance.

   All incremental decoders must provide this constructor interface.
   They are free to add additional keyword arguments, but only the
   ones defined here are used by the Python codec registry.

   The "IncrementalDecoder" may implement different error handling
   schemes by providing the *errors* keyword argument. See Error
   Handlers for possible values.

   The *errors* argument will be assigned to an attribute of the same
   name. Assigning to this attribute makes it possible to switch
   between different error handling strategies during the lifetime of
   the "IncrementalDecoder" object.

   decode(object[, final])

      Decodes *object* (taking the current state of the decoder into
      account) and returns the resulting decoded object. If this is
      the last call to "decode()" *final* must be true (the default is
      false). If *final* is true the decoder must decode the input
      completely and must flush all buffers. If this isn't possible
      (e.g. because of incomplete byte sequences at the end of the
      input) it must initiate error handling just like in the
      stateless case (which might raise an exception).

   reset()

      Reset the decoder to the initial state.

   getstate()

      Return the current state of the decoder. This must be a tuple
      with two items, the first must be the buffer containing the
      still undecoded input. The second must be an integer and can be
      additional state info. (The implementation should make sure that
      "0" is the most common additional state info.) If this
      additional state info is "0" it must be possible to set the
      decoder to the state which has no input buffered and "0" as the
      additional state info, so that feeding the previously buffered
      input to the decoder returns it to the previous state without
      producing any output. (Additional state info that is more
      complicated than integers can be converted into an integer by
      marshaling/pickling the info and encoding the bytes of the
      resulting string into an integer.)

   setstate(state)

      将解码器的状态设为 *state*。 *state* 必须为 "getstate()" 所返回
      的一个解码器状态。


Stream Encoding and Decoding
----------------------------

The "StreamWriter" and "StreamReader" classes provide generic working
interfaces which can be used to implement new encoding submodules very
easily. See "encodings.utf_8" for an example of how this is done.


StreamWriter Objects
~~~~~~~~~~~~~~~~~~~~

The "StreamWriter" class is a subclass of "Codec" and defines the
following methods which every stream writer must define in order to be
compatible with the Python codec registry.

class codecs.StreamWriter(stream, errors='strict')

   Constructor for a "StreamWriter" instance.

   All stream writers must provide this constructor interface. They
   are free to add additional keyword arguments, but only the ones
   defined here are used by the Python codec registry.

   The *stream* argument must be a file-like object open for writing
   text or binary data, as appropriate for the specific codec.

   The "StreamWriter" may implement different error handling schemes
   by providing the *errors* keyword argument. See Error Handlers for
   the standard error handlers the underlying stream codec may
   support.

   The *errors* argument will be assigned to an attribute of the same
   name. Assigning to this attribute makes it possible to switch
   between different error handling strategies during the lifetime of
   the "StreamWriter" object.

   write(object)

      Writes the object's contents encoded to the stream.

   writelines(list)

      Writes the concatenated list of strings to the stream (possibly
      by reusing the "write()" method). The standard bytes-to-bytes
      codecs do not support this method.

   reset()

      Flushes and resets the codec buffers used for keeping state.

      Calling this method should ensure that the data on the output is
      put into a clean state that allows appending of new fresh data
      without having to rescan the whole stream to recover state.

In addition to the above methods, the "StreamWriter" must also inherit
all other methods and attributes from the underlying stream.


StreamReader Objects
~~~~~~~~~~~~~~~~~~~~

The "StreamReader" class is a subclass of "Codec" and defines the
following methods which every stream reader must define in order to be
compatible with the Python codec registry.

class codecs.StreamReader(stream, errors='strict')

   Constructor for a "StreamReader" instance.

   All stream readers must provide this constructor interface. They
   are free to add additional keyword arguments, but only the ones
   defined here are used by the Python codec registry.

   The *stream* argument must be a file-like object open for reading
   text or binary data, as appropriate for the specific codec.

   The "StreamReader" may implement different error handling schemes
   by providing the *errors* keyword argument. See Error Handlers for
   the standard error handlers the underlying stream codec may
   support.

   The *errors* argument will be assigned to an attribute of the same
   name. Assigning to this attribute makes it possible to switch
   between different error handling strategies during the lifetime of
   the "StreamReader" object.

   The set of allowed values for the *errors* argument can be extended
   with "register_error()".

   read([size[, chars[, firstline]]])

      Decodes data from the stream and returns the resulting object.

      The *chars* argument indicates the number of decoded code points
      or bytes to return. The "read()" method will never return more
      data than requested, but it might return less, if there is not
      enough available.

      *size* 参数指明要读取并解码的已编码字节或码位的最大数量近似值。
      解码器可以适当地修改此设置。 默认值 -1 表示尽可能多地读取并解码
      。 此形参的目的是防止一次性解码过于巨大的文件。

      The *firstline* flag indicates that it would be sufficient to
      only return the first line, if there are decoding errors on
      later lines.

      The method should use a greedy read strategy meaning that it
      should read as much data as is allowed within the definition of
      the encoding and the given size, e.g.  if optional encoding
      endings or state markers are available on the stream, these
      should be read too.

   readline([size[, keepends]])

      Read one line from the input stream and return the decoded data.

      *size*, if given, is passed as size argument to the stream's
      "read()" method.

      If *keepends* is false line-endings will be stripped from the
      lines returned.

   readlines([sizehint[, keepends]])

      Read all lines available on the input stream and return them as
      a list of lines.

      行结束符会使用编解码器的 "decode()" 方法来实现，并且如果
      *keepends* 为真值则会将其包含在列表条目中。

      *sizehint*, if given, is passed as the *size* argument to the
      stream's "read()" method.

   reset()

      Resets the codec buffers used for keeping state.

      请注意不应当对流进行重定位。 使用此方法的主要目的是为了能够从解
      码错误中恢复。

In addition to the above methods, the "StreamReader" must also inherit
all other methods and attributes from the underlying stream.


StreamReaderWriter Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~

The "StreamReaderWriter" is a convenience class that allows wrapping
streams which work in both read and write modes.

The design is such that one can use the factory functions returned by
the "lookup()" function to construct the instance.

class codecs.StreamReaderWriter(stream, Reader, Writer, errors='strict')

   Creates a "StreamReaderWriter" instance. *stream* must be a file-
   like object. *Reader* and *Writer* must be factory functions or
   classes providing the "StreamReader" and "StreamWriter" interface
   resp. Error handling is done in the same way as defined for the
   stream readers and writers.

"StreamReaderWriter" instances define the combined interfaces of
"StreamReader" and "StreamWriter" classes. They inherit all other
methods and attributes from the underlying stream.


StreamRecoder Objects
~~~~~~~~~~~~~~~~~~~~~

The "StreamRecoder" translates data from one encoding to another,
which is sometimes useful when dealing with different encoding
environments.

The design is such that one can use the factory functions returned by
the "lookup()" function to construct the instance.

class codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors='strict')

   Creates a "StreamRecoder" instance which implements a two-way
   conversion: *encode* and *decode* work on the frontend — the data
   visible to code calling "read()" and "write()", while *Reader* and
   *Writer* work on the backend — the data in *stream*.

   你可以使用这些对象来进行透明转码，例如从 Latin-1 转为 UTF-8 以及反
   向转换。

   The *stream* argument must be a file-like object.

   The *encode* and *decode* arguments must adhere to the "Codec"
   interface. *Reader* and *Writer* must be factory functions or
   classes providing objects of the "StreamReader" and "StreamWriter"
   interface respectively.

   Error handling is done in the same way as defined for the stream
   readers and writers.

"StreamRecoder" instances define the combined interfaces of
"StreamReader" and "StreamWriter" classes. They inherit all other
methods and attributes from the underlying stream.


Encodings and Unicode
=====================

字符串在系统内部存储为 "0x0"--"0x10FFFF" 范围内的码位序列。 （请参阅
**PEP 393** 了解有关实现的详情。） 一旦字符串对象要在 CPU 和内存以外使
用，字节的大小端顺序和字节数组的存储方式就成为一个关键问题。 如同使用
其他编解码器一样，将字符串序列化为字节序列被称为 *编码*，而从字节序列
重建字符串被称为 *解码*。 存在许多不同的文本序列化编解码器，它们被统称
为 *文本编码*。

The simplest text encoding (called "'latin-1'" or "'iso-8859-1'") maps
the code points 0--255 to the bytes "0x0"--"0xff", which means that a
string object that contains code points above "U+00FF" can't be
encoded with this codec. Doing so will raise a "UnicodeEncodeError"
that looks like the following (although the details of the error
message may differ): "UnicodeEncodeError: 'latin-1' codec can't encode
character '\u1234' in position 3: ordinal not in range(256)".

There's another group of encodings (the so called charmap encodings)
that choose a different subset of all Unicode code points and how
these code points are mapped to the bytes "0x0"--"0xff". To see how
this is done simply open e.g. "encodings/cp1252.py" (which is an
encoding that is used primarily on Windows). There's a string constant
with 256 characters that shows you which character is mapped to which
byte value.

All of these encodings can only encode 256 of the 1114112 code points
defined in Unicode. A simple and straightforward way that can store
each Unicode code point, is to store each code point as four
consecutive bytes. There are two possibilities: store the bytes in big
endian or in little endian order. These two encodings are called
"UTF-32-BE" and "UTF-32-LE" respectively. Their disadvantage is that
if e.g. you use "UTF-32-BE" on a little endian machine you will always
have to swap bytes on encoding and decoding. "UTF-32" avoids this
problem: bytes will always be in natural endianness. When these bytes
are read by a CPU with a different endianness, then bytes have to be
swapped though. To be able to detect the endianness of a "UTF-16" or
"UTF-32" byte sequence, there's the so called BOM ("Byte Order Mark").
This is the Unicode character "U+FEFF". This character can be
prepended to every "UTF-16" or "UTF-32" byte sequence. The byte
swapped version of this character ("0xFFFE") is an illegal character
that may not appear in a Unicode text. So when the first character in
an "UTF-16" or "UTF-32" byte sequence appears to be a "U+FFFE" the
bytes have to be swapped on decoding. Unfortunately the character
"U+FEFF" had a second purpose as a "ZERO WIDTH NO-BREAK SPACE": a
character that has no width and doesn't allow a word to be split. It
can e.g. be used to give hints to a ligature algorithm. With Unicode
4.0 using "U+FEFF" as a "ZERO WIDTH NO-BREAK SPACE" has been
deprecated (with "U+2060" ("WORD JOINER") assuming this role).
Nevertheless Unicode software still must be able to handle "U+FEFF" in
both roles: as a BOM it's a device to determine the storage layout of
the encoded bytes, and vanishes once the byte sequence has been
decoded into a string; as a "ZERO WIDTH NO-BREAK SPACE" it's a normal
character that will be decoded like any other.

There's another encoding that is able to encoding the full range of
Unicode characters: UTF-8. UTF-8 is an 8-bit encoding, which means
there are no issues with byte order in UTF-8. Each byte in a UTF-8
byte sequence consists of two parts: marker bits (the most significant
bits) and payload bits. The marker bits are a sequence of zero to four
"1" bits followed by a "0" bit. Unicode characters are encoded like
this (with x being payload bits, which when concatenated give the
Unicode character):

+-------------------------------------+------------------------------------------------+
| Range                               | Encoding                                       |
|=====================================|================================================|
| "U-00000000" ... "U-0000007F"       | 0xxxxxxx                                       |
+-------------------------------------+------------------------------------------------+
| "U-00000080" ... "U-000007FF"       | 110xxxxx 10xxxxxx                              |
+-------------------------------------+------------------------------------------------+
| "U-00000800" ... "U-0000FFFF"       | 1110xxxx 10xxxxxx 10xxxxxx                     |
+-------------------------------------+------------------------------------------------+
| "U-00010000" ... "U-0010FFFF"       | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx            |
+-------------------------------------+------------------------------------------------+

The least significant bit of the Unicode character is the rightmost x
bit.

As UTF-8 is an 8-bit encoding no BOM is required and any "U+FEFF"
character in the decoded string (even if it's the first character) is
treated as a "ZERO WIDTH NO-BREAK SPACE".

Without external information it's impossible to reliably determine
which encoding was used for encoding a string. Each charmap encoding
can decode any random byte sequence. However that's not possible with
UTF-8, as UTF-8 byte sequences have a structure that doesn't allow
arbitrary byte sequences. To increase the reliability with which a
UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8
(that Python 2.5 calls ""utf-8-sig"") for its Notepad program: Before
any of the Unicode characters is written to the file, a UTF-8 encoded
BOM (which looks like this as a byte sequence: "0xef", "0xbb", "0xbf")
is written. As it's rather improbable that any charmap encoded file
starts with these byte values (which would e.g. map to

      LATIN SMALL LETTER I WITH DIAERESIS
      RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
      INVERTED QUESTION MARK

对于 iso-8859-1 编码格式来说），这提升了根据字节序列来正确猜测
"utf-8-sig" 编码格式的成功率。 所以在这里 BOM 的作用并不是帮助确定生成
字节序列所使用的字节顺序，而是作为帮助猜测编码格式的记号。 在进行编码
时 utf-8-sig 编解码器将把 "0xef", "0xbb", "0xbf" 作为头三个字节写入文
件。 在进行解码时 "utf-8-sig" 将跳过这三个字节，如果它们作为文件的头三
个字节出现的话。 在 UTF-8 中并不推荐使用 BOM，通常应当避免它们的出现。


Standard Encodings
==================

Python comes with a number of codecs built-in, either implemented as C
functions or with dictionaries as mapping tables. The following table
lists the codecs by name, together with a few common aliases, and the
languages for which the encoding is likely used. Neither the list of
aliases nor the list of languages is meant to be exhaustive. Notice
that spelling alternatives that only differ in case or use a hyphen
instead of an underscore are also valid aliases; therefore, e.g.
"'utf-8'" is a valid alias for the "'utf_8'" codec.

**CPython implementation detail:** 有些常见编码格式可以绕过编解码器查
找机制来提升性能。 这些优化机会对于 CPython 来说仅能通过一组有限的别名
（大小写不敏感）来识别：utf-8, utf8, latin-1, latin1, iso-8859-1,
iso8859-1, mbcs (Windows 专属), ascii, us-ascii, utf-16, utf16,
utf-32, utf32, 也包括使用下划线替代连字符的的形式。 使用这些编码格式的
其他别名可能会导致更慢的执行速度。

3.6 版更變: Optimization opportunity recognized for us-ascii.

Many of the character sets support the same languages. They vary in
individual characters (e.g. whether the EURO SIGN is supported or
not), and in the assignment of characters to code positions. For the
European languages in particular, the following variants typically
exist:

* an ISO 8859 codeset

* a Microsoft Windows code page, which is typically derived from an
  8859 codeset, but replaces control characters with additional
  graphic characters

* an IBM EBCDIC code page

* an IBM PC code page, which is ASCII compatible

+-------------------+----------------------------------+----------------------------------+
| Codec             | Aliases                          | Languages                        |
|===================|==================================|==================================|
| ascii             | 646, us-ascii                    | English                          |
+-------------------+----------------------------------+----------------------------------+
| big5              | big5-tw, csbig5                  | Traditional Chinese              |
+-------------------+----------------------------------+----------------------------------+
| big5hkscs         | big5-hkscs, hkscs                | Traditional Chinese              |
+-------------------+----------------------------------+----------------------------------+
| cp037             | IBM037, IBM039                   | English                          |
+-------------------+----------------------------------+----------------------------------+
| cp273             | 273, IBM273, csIBM273            | German  3.4 版新加入.            |
+-------------------+----------------------------------+----------------------------------+
| cp424             | EBCDIC-CP-HE, IBM424             | Hebrew                           |
+-------------------+----------------------------------+----------------------------------+
| cp437             | 437, IBM437                      | English                          |
+-------------------+----------------------------------+----------------------------------+
| cp500             | EBCDIC-CP-BE, EBCDIC-CP-CH,      | Western Europe                   |
|                   | IBM500                           |                                  |
+-------------------+----------------------------------+----------------------------------+
| cp720             |                                  | Arabic                           |
+-------------------+----------------------------------+----------------------------------+
| cp737             |                                  | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| cp775             | IBM775                           | Baltic languages                 |
+-------------------+----------------------------------+----------------------------------+
| cp850             | 850, IBM850                      | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| cp852             | 852, IBM852                      | Central and Eastern Europe       |
+-------------------+----------------------------------+----------------------------------+
| cp855             | 855, IBM855                      | Bulgarian, Byelorussian,         |
|                   |                                  | Macedonian, Russian, Serbian     |
+-------------------+----------------------------------+----------------------------------+
| cp856             |                                  | Hebrew                           |
+-------------------+----------------------------------+----------------------------------+
| cp857             | 857, IBM857                      | Turkish                          |
+-------------------+----------------------------------+----------------------------------+
| cp858             | 858, IBM858                      | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| cp860             | 860, IBM860                      | Portuguese                       |
+-------------------+----------------------------------+----------------------------------+
| cp861             | 861, CP-IS, IBM861               | Icelandic                        |
+-------------------+----------------------------------+----------------------------------+
| cp862             | 862, IBM862                      | Hebrew                           |
+-------------------+----------------------------------+----------------------------------+
| cp863             | 863, IBM863                      | Canadian                         |
+-------------------+----------------------------------+----------------------------------+
| cp864             | IBM864                           | Arabic                           |
+-------------------+----------------------------------+----------------------------------+
| cp865             | 865, IBM865                      | Danish, Norwegian                |
+-------------------+----------------------------------+----------------------------------+
| cp866             | 866, IBM866                      | Russian                          |
+-------------------+----------------------------------+----------------------------------+
| cp869             | 869, CP-GR, IBM869               | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| cp874             |                                  | Thai                             |
+-------------------+----------------------------------+----------------------------------+
| cp875             |                                  | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| cp932             | 932, ms932, mskanji, ms-kanji    | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| cp949             | 949, ms949, uhc                  | Korean                           |
+-------------------+----------------------------------+----------------------------------+
| cp950             | 950, ms950                       | Traditional Chinese              |
+-------------------+----------------------------------+----------------------------------+
| cp1006            |                                  | Urdu                             |
+-------------------+----------------------------------+----------------------------------+
| cp1026            | ibm1026                          | Turkish                          |
+-------------------+----------------------------------+----------------------------------+
| cp1125            | 1125, ibm1125, cp866u, ruscii    | Ukrainian  3.4 版新加入.         |
+-------------------+----------------------------------+----------------------------------+
| cp1140            | ibm1140                          | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| cp1250            | windows-1250                     | Central and Eastern Europe       |
+-------------------+----------------------------------+----------------------------------+
| cp1251            | windows-1251                     | Bulgarian, Byelorussian,         |
|                   |                                  | Macedonian, Russian, Serbian     |
+-------------------+----------------------------------+----------------------------------+
| cp1252            | windows-1252                     | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| cp1253            | windows-1253                     | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| cp1254            | windows-1254                     | Turkish                          |
+-------------------+----------------------------------+----------------------------------+
| cp1255            | windows-1255                     | Hebrew                           |
+-------------------+----------------------------------+----------------------------------+
| cp1256            | windows-1256                     | Arabic                           |
+-------------------+----------------------------------+----------------------------------+
| cp1257            | windows-1257                     | Baltic languages                 |
+-------------------+----------------------------------+----------------------------------+
| cp1258            | windows-1258                     | Vietnamese                       |
+-------------------+----------------------------------+----------------------------------+
| euc_jp            | eucjp, ujis, u-jis               | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| euc_jis_2004      | jisx0213, eucjis2004             | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| euc_jisx0213      | eucjisx0213                      | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| euc_kr            | euckr, korean, ksc5601,          | Korean                           |
|                   | ks_c-5601, ks_c-5601-1987,       |                                  |
|                   | ksx1001, ks_x-1001               |                                  |
+-------------------+----------------------------------+----------------------------------+
| gb2312            | chinese, csiso58gb231280, euc-   | Simplified Chinese               |
|                   | cn, euccn, eucgb2312-cn,         |                                  |
|                   | gb2312-1980, gb2312-80, iso-     |                                  |
|                   | ir-58                            |                                  |
+-------------------+----------------------------------+----------------------------------+
| gbk               | 936, cp936, ms936                | Unified Chinese                  |
+-------------------+----------------------------------+----------------------------------+
| gb18030           | gb18030-2000                     | Unified Chinese                  |
+-------------------+----------------------------------+----------------------------------+
| hz                | hzgb, hz-gb, hz-gb-2312          | Simplified Chinese               |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp        | csiso2022jp, iso2022jp,          | Japanese                         |
|                   | iso-2022-jp                      |                                  |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp_1      | iso2022jp-1, iso-2022-jp-1       | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp_2      | iso2022jp-2, iso-2022-jp-2       | Japanese, Korean, Simplified     |
|                   |                                  | Chinese, Western Europe, Greek   |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp_2004   | iso2022jp-2004, iso-2022-jp-2004 | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp_3      | iso2022jp-3, iso-2022-jp-3       | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| iso2022_jp_ext    | iso2022jp-ext, iso-2022-jp-ext   | Japanese                         |
+-------------------+----------------------------------+----------------------------------+
| iso2022_kr        | csiso2022kr, iso2022kr,          | Korean                           |
|                   | iso-2022-kr                      |                                  |
+-------------------+----------------------------------+----------------------------------+
| latin_1           | iso-8859-1, iso8859-1, 8859,     | Western Europe                   |
|                   | cp819, latin, latin1, L1         |                                  |
+-------------------+----------------------------------+----------------------------------+
| iso8859_2         | iso-8859-2, latin2, L2           | Central and Eastern Europe       |
+-------------------+----------------------------------+----------------------------------+
| iso8859_3         | iso-8859-3, latin3, L3           | Esperanto, Maltese               |
+-------------------+----------------------------------+----------------------------------+
| iso8859_4         | iso-8859-4, latin4, L4           | Baltic languages                 |
+-------------------+----------------------------------+----------------------------------+
| iso8859_5         | iso-8859-5, cyrillic             | Bulgarian, Byelorussian,         |
|                   |                                  | Macedonian, Russian, Serbian     |
+-------------------+----------------------------------+----------------------------------+
| iso8859_6         | iso-8859-6, arabic               | Arabic                           |
+-------------------+----------------------------------+----------------------------------+
| iso8859_7         | iso-8859-7, greek, greek8        | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| iso8859_8         | iso-8859-8, hebrew               | Hebrew                           |
+-------------------+----------------------------------+----------------------------------+
| iso8859_9         | iso-8859-9, latin5, L5           | Turkish                          |
+-------------------+----------------------------------+----------------------------------+
| iso8859_10        | iso-8859-10, latin6, L6          | Nordic languages                 |
+-------------------+----------------------------------+----------------------------------+
| iso8859_11        | iso-8859-11, thai                | Thai languages                   |
+-------------------+----------------------------------+----------------------------------+
| iso8859_13        | iso-8859-13, latin7, L7          | Baltic languages                 |
+-------------------+----------------------------------+----------------------------------+
| iso8859_14        | iso-8859-14, latin8, L8          | Celtic languages                 |
+-------------------+----------------------------------+----------------------------------+
| iso8859_15        | iso-8859-15, latin9, L9          | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| iso8859_16        | iso-8859-16, latin10, L10        | South-Eastern Europe             |
+-------------------+----------------------------------+----------------------------------+
| johab             | cp1361, ms1361                   | Korean                           |
+-------------------+----------------------------------+----------------------------------+
| koi8_r            |                                  | Russian                          |
+-------------------+----------------------------------+----------------------------------+
| koi8_t            |                                  | Tajik  3.5 版新加入.             |
+-------------------+----------------------------------+----------------------------------+
| koi8_u            |                                  | Ukrainian                        |
+-------------------+----------------------------------+----------------------------------+
| kz1048            | kz_1048, strk1048_2002, rk1048   | Kazakh  3.5 版新加入.            |
+-------------------+----------------------------------+----------------------------------+
| mac_cyrillic      | maccyrillic                      | Bulgarian, Byelorussian,         |
|                   |                                  | Macedonian, Russian, Serbian     |
+-------------------+----------------------------------+----------------------------------+
| mac_greek         | macgreek                         | Greek                            |
+-------------------+----------------------------------+----------------------------------+
| mac_iceland       | maciceland                       | Icelandic                        |
+-------------------+----------------------------------+----------------------------------+
| mac_latin2        | maclatin2, maccentraleurope,     | Central and Eastern Europe       |
|                   | mac_centeuro                     |                                  |
+-------------------+----------------------------------+----------------------------------+
| mac_roman         | macroman, macintosh              | Western Europe                   |
+-------------------+----------------------------------+----------------------------------+
| mac_turkish       | macturkish                       | Turkish                          |
+-------------------+----------------------------------+----------------------------------+
| ptcp154           | csptcp154, pt154, cp154,         | Kazakh                           |
|                   | cyrillic-asian                   |                                  |
+-------------------+----------------------------------+----------------------------------+
| shift_jis         | csshiftjis, shiftjis, sjis,      | Japanese                         |
|                   | s_jis                            |                                  |
+-------------------+----------------------------------+----------------------------------+
| shift_jis_2004    | shiftjis2004, sjis_2004,         | Japanese                         |
|                   | sjis2004                         |                                  |
+-------------------+----------------------------------+----------------------------------+
| shift_jisx0213    | shiftjisx0213, sjisx0213,        | Japanese                         |
|                   | s_jisx0213                       |                                  |
+-------------------+----------------------------------+----------------------------------+
| utf_32            | U32, utf32                       | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_32_be         | UTF-32BE                         | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_32_le         | UTF-32LE                         | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_16            | U16, utf16                       | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_16_be         | UTF-16BE                         | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_16_le         | UTF-16LE                         | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_7             | U7, unicode-1-1-utf-7            | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_8             | U8, UTF, utf8, cp65001           | all languages                    |
+-------------------+----------------------------------+----------------------------------+
| utf_8_sig         |                                  | all languages                    |
+-------------------+----------------------------------+----------------------------------+

3.4 版更變: The utf-16* and utf-32* encoders no longer allow surrogate
code points ("U+D800"--"U+DFFF") to be encoded. The utf-32* decoders
no longer decode byte sequences that correspond to surrogate code
points.

3.8 版更變: "cp65001" 现在是 "utf_8" 的一个别名。


Python Specific Encodings
=========================

有一些预定义编解码器是 Python 专属的，因此它们在 Python 之外没有意义。
这些编解码器按其所预期的输入和输出类型在下表中列出（请注意虽然文本编码
是编解码器最常见的使用场景，但下层的编解码器架构支持任意数据转换而不仅
是文本编码）。 对于非对称编解码器，该列描述的含义是编码方向。


Text Encodings
--------------

The following codecs provide "str" to "bytes" encoding and *bytes-like
object* to "str" decoding, similar to the Unicode text encodings.

+----------------------+-----------+-----------------------------+
| Codec                | Aliases   | Meaning                     |
|======================|===========|=============================|
| idna                 |           | 实现 **RFC 3490**，另请参阅 |
|                      |           | "encodings.idna" 。仅支持   |
|                      |           | "errors='strict'" 。        |
+----------------------+-----------+-----------------------------+
| mbcs                 | ansi,     | Windows 专属：根据 ANSI 代  |
|                      | dbcs      | 码页（CP_ACP）对操作数进行  |
|                      |           | 编码。                      |
+----------------------+-----------+-----------------------------+
| oem                  |           | Windows 专属：根据 OEM 代码 |
|                      |           | 页（CP_OEMCP）对操作数进行  |
|                      |           | 编码。  3.6 版新加入.       |
+----------------------+-----------+-----------------------------+
| palmos               |           | PalmOS 3.5 的编码格式       |
+----------------------+-----------+-----------------------------+
| punycode             |           | 实现 **RFC 3492**。 不支持  |
|                      |           | 有状态编解码器。            |
+----------------------+-----------+-----------------------------+
| raw_unicode_escape   |           | Latin-1 encoding with       |
|                      |           | "\uXXXX" and "\UXXXXXXXX"   |
|                      |           | for other code points.      |
|                      |           | Existing backslashes are    |
|                      |           | not escaped in any way. It  |
|                      |           | is used in the Python       |
|                      |           | pickle protocol.            |
+----------------------+-----------+-----------------------------+
| undefined            |           | Raise an exception for all  |
|                      |           | conversions, even empty     |
|                      |           | strings. The error handler  |
|                      |           | is ignored.                 |
+----------------------+-----------+-----------------------------+
| unicode_escape       |           | 适合用于以 ASCII 编码的     |
|                      |           | Python 源代码中的 Unicode   |
|                      |           | 字面值内容的编码格式 ，但引 |
|                      |           | 号不会被转义。 对 Latin-1   |
|                      |           | 源代码进行解码。 请注意     |
|                      |           | Python 源代码 实际上默认使  |
|                      |           | 用 UTF-8。                  |
+----------------------+-----------+-----------------------------+

3.8 版更變: "unicode_internal" 编解码器已被移除。


Binary Transforms
-----------------

以下编解码器提供了二进制转换: *bytes-like object* 到 "bytes" 的映射。
它们不被 "bytes.decode()" 所支持（该方法只生成 "str" 类型的输出）。

+------------------------+--------------------+--------------------------------+--------------------------------+
| Codec                  | Aliases            | Meaning                        | Encoder / decoder              |
|========================|====================|================================|================================|
| base64_codec [1]       | base64, base_64    | 将操作数转换为多行 MIME base64 | "base64.encodebytes()" /       |
|                        |                    | (结果总是包含一个末尾的        | "base64.decodebytes()"         |
|                        |                    | "'\n'")  3.4 版更變: accepts   |                                |
|                        |                    | any *bytes-like object* as     |                                |
|                        |                    | input for encoding and         |                                |
|                        |                    | decoding                       |                                |
+------------------------+--------------------+--------------------------------+--------------------------------+
| bz2_codec              | bz2                | 使用bz2压缩操作数              | "bz2.compress()" /             |
|                        |                    |                                | "bz2.decompress()"             |
+------------------------+--------------------+--------------------------------+--------------------------------+
| hex_codec              | hex                | 将操作数转换为十六进制表示，每 | "binascii.b2a_hex()" /         |
|                        |                    | 个字节有两位数                 | "binascii.a2b_hex()"           |
+------------------------+--------------------+--------------------------------+--------------------------------+
| quopri_codec           | quopri,            | 将操作数转换为 MIME 带引号的可 | "quopri.encode()" with         |
|                        | quotedprintable,   | 打印数据                       | "quotetabs=True" /             |
|                        | quoted_printable   |                                | "quopri.decode()"              |
+------------------------+--------------------+--------------------------------+--------------------------------+
| uu_codec               | uu                 | 使用uuencode转换操作数         | "uu.encode()" / "uu.decode()"  |
+------------------------+--------------------+--------------------------------+--------------------------------+
| zlib_codec             | zip, zlib          | 使用gzip压缩操作数             | "zlib.compress()" /            |
|                        |                    |                                | "zlib.decompress()"            |
+------------------------+--------------------+--------------------------------+--------------------------------+

[1] In addition to *bytes-like objects*, "'base64_codec'" also
    accepts ASCII-only instances of "str" for decoding

3.2 版新加入: Restoration of the binary transforms.

3.4 版更變: Restoration of the aliases for the binary transforms.


Text Transforms
---------------

以下编解码器提供了文本转换: "str" 到 "str" 的映射。 它不被
"str.encode()" 所支持（该方法只生成 "bytes" 类型的输出）。

+----------------------+-----------+-----------------------------+
| Codec                | Aliases   | Meaning                     |
|======================|===========|=============================|
| rot_13               | rot13     | 返回操作数的凯撒密码加密结  |
|                      |           | 果                          |
+----------------------+-----------+-----------------------------+

3.2 版新加入: Restoration of the "rot_13" text transform.

3.4 版更變: Restoration of the "rot13" alias.


"encodings.idna" --- Internationalized Domain Names in Applications
===================================================================

This module implements **RFC 3490** (Internationalized Domain Names in
Applications) and **RFC 3492** (Nameprep: A Stringprep Profile for
Internationalized Domain Names (IDN)). It builds upon the "punycode"
encoding and "stringprep".

These RFCs together define a protocol to support non-ASCII characters
in domain names. A domain name containing non-ASCII characters (such
as "www.Alliancefrançaise.nu") is converted into an ASCII-compatible
encoding (ACE, such as "www.xn--alliancefranaise-npb.nu"). The ACE
form of the domain name is then used in all places where arbitrary
characters are not allowed by the protocol, such as DNS queries, HTTP
*Host* fields, and so on. This conversion is carried out in the
application; if possible invisible to the user: The application should
transparently convert Unicode domain labels to IDNA on the wire, and
convert back ACE labels to Unicode before presenting them to the user.

Python 以多种方式支持这种转换:  "idna" 编解码器执行 Unicode 和 ACE 之
间的转换，基于在 **section 3.1 of RFC 3490** 中定义的分隔字符将输入字
符串拆分为标签，再根据需要将每个标签转换为 ACE，相反地又会基于 "." 分
隔符将输入字节串拆分为标签，再将找到的任何 ACE 标签转换为 Unicode。 此
外，"socket" 模块可透明地将 Unicode 主机名转换为 ACE，以便应用在将它们
传给 socket 模块时无须自行转换主机名。 除此之外，许多包含以主机名作为
函数参数的模块例如 "http.client" 和 "ftplib" 都接受 Unicode 主机名（并
且 "http.client" 也会在 *Host* 字段中透明地发送 IDNA 主机名，如果它需
要发送该字段的话）。

当从线路接收主机名时（例如反向名称查找），到 Unicode 的转换不会自动被
执行：希望向用户提供此种主机名的应用应当将它们解码为 Unicode。

The module "encodings.idna" also implements the nameprep procedure,
which performs certain normalizations on host names, to achieve case-
insensitivity of international domain names, and to unify similar
characters. The nameprep functions can be used directly if desired.

encodings.idna.nameprep(label)

   Return the nameprepped version of *label*. The implementation
   currently assumes query strings, so "AllowUnassigned" is true.

encodings.idna.ToASCII(label)

   Convert a label to ASCII, as specified in **RFC 3490**.
   "UseSTD3ASCIIRules" is assumed to be false.

encodings.idna.ToUnicode(label)

   Convert a label to Unicode, as specified in **RFC 3490**.


"encodings.mbcs" --- Windows ANSI codepage
==========================================

此模块实现ANSI代码页（CP_ACP）。

Availability: 仅Windows可用

3.3 版更變: Support any error handler.

3.2 版更變: Before 3.2, the *errors* argument was ignored; "'replace'"
was always used to encode, and "'ignore'" to decode.


"encodings.utf_8_sig" --- UTF-8 codec with BOM signature
========================================================

此模块实现了 UTF-8 编解码器的一个变种：在编码时将把 UTF-8 已编码 BOM
添加到 UTF-8 编码字节数据的开头。 对于有状态编码器此操作只执行一次（当
首次写入字节流时）。 在解码时将跳过数据开头作为可选项的 UTF-8 已编码
BOM。
