"urllib.request" --- 用來開啟 URLs 的可擴充函式庫
*************************************************

**原始碼：**Lib/urllib/request.py

======================================================================

"urllib.request" module（模組）定義了一些函式與 class（類別）用以開啟
URLs（大部分是 HTTP），並處理各式複雜情況如：basic 驗證與 digest 驗證
、重新導向、cookies。

也參考: 有關於更高階的 HTTP 用戶端介面，推薦使用 Requests 套件。

警告:

  On macOS it is unsafe to use this module in programs using
  "os.fork()" because the "getproxies()" implementation for macOS uses
  a higher-level system API. Set the environment variable "no_proxy"
  to "*" to avoid this problem (e.g. "os.environ["no_proxy"] = "*"").

可用性: not WASI.

此模組在 WebAssembly 平台上不起作用或無法使用。更多資訊請參閱
WebAssembly 平台。

"urllib.request" module 定義下列函式：

urllib.request.urlopen(url, data=None, [timeout, ]*, context=None)

   打開 *url*，其值可以是一個包含有效且適當編碼 URL 的字串或是一個
   "Request" 物件。

   *data* 必須是一個包含傳送給伺服器額外資料的物件，若不需要傳送額外資
   料則指定為 "None"。更多細節請見 "Request"。

   urllib.request module 使用 HTTP/1.1 並包含 "Connection:close"
   header（標頭）在其 HTTP 請求中。

   透過選擇性參數 *timeout* 來指定 blocking operations（阻塞性操作，如
   ：嘗試連接）的 timeout（超時時間），以秒為單位。若沒有指定值，則會
   使用全域預設超時時間設定。實際上，此參數僅作用於 HTTP、HTTPS 以及
   FTP 的連接。

   若 *context* 有被指定時，它必須是一個 "ssl.SSLContext" 的實例並描述
   著各種 SSL 選項。更多細節請見 "HTTPSConnection"。

   這個函式總是回傳一個可作為 *context manager* 使用的物件，並有著特性
   (property) *url*、*headers* 與 *status*。欲知更多這些特性細節請參見
   "urllib.response.addinfourl"。

   對於 HTTP 與 HTTPS 的 URLs，這個函式回傳一個稍有不同的
   "http.client.HTTPResponse" 物件。除了上述提到的三個方法外，另有 msg
   屬性並有著與 "reason" 相同的資訊 --- 由伺服器回傳的原因敘述 (reason
   phrase)，而不是在 "HTTPResponse" 文件中提到的回應 headers。

   對於 FTP、檔案及資料的 URLs，這個函式會回傳一個
   "urllib.response.addinfourl" 物件。

   當遇到協定上的錯誤時會引發 "URLError"。

   請注意若沒有 handler 處理請求時，"None" 值將會被回傳。（即使有預設
   的全域類別 "OpenerDirector" 使用 "UnknownHandler" 來確保這種情況不
   會發生）

   另外，若有偵測到代理服務的設定（例如當 "*_proxy" 環境變數像是：
   :envvar:!http_proxy` 有被設置時），"ProxyHandler" 會被預設使用以確
   保請求有透過代理服務來處理。

   Python 2.6 或更早版本的遺留函式 "urllib.urlopen" 已經不再被維護；新
   函式 "urllib.request.urlopen()" 對應到舊函式 "urllib2.urlopen"。有
   關代理服務的處理，以往是透過傳遞 dictionary（字典）參數給
   "urllib.urlopen" 來取得的，現在則可以透過 "ProxyHandler" 物件來取得
   。

   預設的 opener 會觸發一個 auditing event "urllib.Request" 與其從請求
   物件中所獲得的引數 "fullurl"、"data"、"headers"、"method"。

   在 3.2 版的變更: 新增 *cafile* 與 *capath*。HTTPS 虛擬主機 (virtual
   hosts) 現已支援，只要 "ssl.HAS_SNI" 的值為 true。*data* 可以是一個
   可疊代物件。

   在 3.3 版的變更: *cadefault* 被新增。

   在 3.4.3 版的變更: *context* 被新增。

   在 3.10 版的變更: 當 *context* 沒有被指定時，HTTPS 連線現在會傳送一
   個帶有協定指示器 "http/1.1" 的 ALPN 擴充 (extension)。自訂的
   *context* 應該利用 "set_alpn_protocols()" 來自行設定 ALPN 協定。

   在 3.13 版的變更: Remove *cafile*, *capath* and *cadefault*
   parameters: use the *context* parameter instead.

urllib.request.install_opener(opener)

   安裝一個 "OpenerDirector" 實例作為預設的全域 opener。僅在當你想要讓
   urlopen 使用該 opener 時安裝一個 opener，否則的話應直接呼叫
   "OpenerDirector.open()" 而非 "urlopen()"。程式碼不會檢查 class 是否
   真的為 "OpenerDirector"，而是任何具有正確介面的 class 都能適用。

urllib.request.build_opener([handler, ...])

   回傳一個 "OpenerDirector" 實例，以給定的順序把 handlers 串接起來。
   *handler*s 可以是 "BaseHandler" 的實例，亦或是 "BaseHandler" 的
   subclasses（這個情況下必須有不帶參數的建構函式能夠被呼叫）。以下
   classes 的實例順位會在 *handler*s 之前，除非 *handler*s 已經包含它
   們，是它們的實例，或是它們的 subclasses："ProxyHandler"（如果代理服
   務設定被偵測到）、"UnknownHandler"、"HTTPHandler"、
   "HTTPDefaultErrorHandler"、"HTTPRedirectHandler"、"FTPHandler"、
   "FileHandler"、"HTTPErrorProcessor"。

   如果 Python 安裝時已帶有 SSL 支援（如果 "ssl" module 能夠被 import
   ），則 "HTTPSHandler" 也在上述 class 之中。

   一個 "BaseHandler" 的 subclass 可能透過改變其 "handler_order" 屬性
   來調整它在 handlers list 中的位置。

urllib.request.pathname2url(path, *, add_scheme=False)

   Convert the given local path to a "file:" URL. This function uses
   "quote()" function to encode the path.

   If *add_scheme* is false (the default), the return value omits the
   "file:" scheme prefix. Set *add_scheme* to true to return a
   complete URL.

   This example shows the function being used on Windows:

      >>> from urllib.request import pathname2url
      >>> path = 'C:\\Program Files'
      >>> pathname2url(path, add_scheme=True)
      'file:///C:/Program%20Files'

   在 3.14 版的變更: Windows drive letters are no longer converted to
   uppercase, and ":" characters not following a drive letter no
   longer cause an "OSError" exception to be raised on Windows.

   在 3.14 版的變更: Paths beginning with a slash are converted to
   URLs with authority sections. For example, the path "/etc/hosts" is
   converted to the URL "///etc/hosts".

   在 3.14 版的變更: 新增 *add_scheme* 參數。

urllib.request.url2pathname(url, *, require_scheme=False, resolve_host=False)

   Convert the given "file:" URL to a local path. This function uses
   "unquote()" to decode the URL.

   If *require_scheme* is false (the default), the given value should
   omit a "file:" scheme prefix. If *require_scheme* is set to true,
   the given value should include the prefix; a "URLError" is raised
   if it doesn't.

   The URL authority is discarded if it is empty, "localhost", or the
   local hostname. Otherwise, if *resolve_host* is set to true, the
   authority is resolved using "socket.gethostbyname()" and discarded
   if it matches a local IP address (as per **RFC 8089 §3**). If the
   authority is still unhandled, then on Windows a UNC path is
   returned, and on other platforms a "URLError" is raised.

   This example shows the function being used on Windows:

      >>> from urllib.request import url2pathname
      >>> url = 'file:///C:/Program%20Files'
      >>> url2pathname(url, require_scheme=True)
      'C:\\Program Files'

   在 3.14 版的變更: Windows drive letters are no longer converted to
   uppercase, and ":" characters not following a drive letter no
   longer cause an "OSError" exception to be raised on Windows.

   在 3.14 版的變更: The URL authority is discarded if it matches the
   local hostname. Otherwise, if the authority isn't empty or
   "localhost", then on Windows a UNC path is returned (as before),
   and on other platforms a "URLError" is raised.

   在 3.14 版的變更: The URL query and fragment components are
   discarded if present.

   在 3.14 版的變更: The *require_scheme* and *resolve_host*
   parameters were added.

urllib.request.getproxies()

   這個輔助函式 (helper function) 回傳一個代理伺服器 URL mappings（對
   映）的 dictionary。在所有的作業系統中，它首先掃描環境中有著
   "<scheme>_proxy" 名稱的變數（忽略大小寫的），如果找不到的話就會在
   macOS 中的系統設定 (System Configuration) 或是 Windows 系統中的
   Windows Systems Registry 尋找代理服務設定。如果大小寫的環境變數同時
   存在且值有不同，小寫的環境變數會被選用。

   備註:

     如果環境變數 "REQUEST_METHOD" 有被設置（通常這代表著你的 script
     是運行在一個共用閘道介面 (CGI) 環境中），那麼環境變數
     "HTTP_PROXY" （大寫的 "_PROXY"）將被忽略。這是因為變數可以透過使
     用 "Proxy:" HTTP header 被注入。如果需要在共用閘道介面環境中使用
     HTTP 代理服務，可以明確使用 "ProxyHandler"，亦或是確認變數名稱是
     小寫的（或至少 "_proxy" 後綴是小寫的）。

提供了以下的 classes：

class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)

   這個 class 是一個 URL 請求的抽象 class。

   *url* 是一個包含有效且適當編碼的 URL 字串。

   *data* 必須是一個包含要送到伺服器的附加資料的物件，若不需帶附加資料
   則其值應為 "None"。目前 HTTP 請求是唯一有使用 *data* 參數的，其支援
   的物件型別包含位元組、類檔案物件 (file-like objects)、以及可疊代的
   類位元組串物件 (bytes-like objects)。如果沒有提供 "Content-Length"
   及 "Transfer-Encoding" headers 欄位，"HTTPHandler" 將會根據 *data*
   的型別設置這些 header。"Content-Length" 會被用來傳送位元組串物件，
   而 **RFC 7230** 章節 3.3.1 所定義的 "Transfer-Encoding: chunked" 則
   會被用來傳送檔案或是其它可疊代物件 (iterables)。

   對於一個 HTTP POST 請求方法，*data* 應為一個標準 *application/x
   -www-form-urlencoded* 格式的 buffer。"urllib.parse.urlencode()" 方
   法接受一個 mapping 或是 sequence（序列）的 2-tuples，並回傳一個對應
   格式的 ASCII 字串。在被作為 *data* 參數前它應該被編碼成位元組串。

   *headers* 必須是一個 dictionary，並會被視為如同每對 key 和 value 作
   為引數來呼叫 "add_header()"。經常用於「偽裝」 "User-Agent"  header
   的值，這個 header 是用來讓一個瀏覽器向伺服器表明自己的身分 --- 有些
   HTTP 伺服器僅允許來自普通瀏覽器的請求，而不接受來自程式腳本的請求。
   例如，Mozilla Firefox 會將 header 的值設為 ""Mozilla/5.0 (X11; U;
   Linux i686) Gecko/20071127 Firefox/2.0.0.11""，而 "urllib" 的值則是
   ""Python-urllib/2.6""（在 Python 2.6 上）。所有 header 的鍵都會以
   camel case（駝峰式大小寫）來傳送。

   當有給定 *data* 引數時，一個適當的 "Content-Type" header 應該被設置
   。如果這個 header 沒有被提供且 *data* 也不為 "None" 時，預設值
   "Content-Type: application/x-www-form-urlencoded" 會被新增至請求中
   。

   接下來的兩個引數的介紹提供給那些有興趣正確處理第三方 HTTP cookies
   的使用者：

   *origin_req_host* 應為原始傳輸互動的請求主機 (request-host)，如同在
   **RFC 2965** 中的定義。預設值為 "http.cookiejar.request_host(self)"
   。這是使用者發起的原始請求的主機名稱或是 IP 位址。例如當請求是要求
   一個 HTML 文件中的一個影像，則這個屬性應為請求包含影像頁面的請求主
   機。

   *unverifiable* 應該標示一個請求是否是無法驗證的，如同在 **RFC
   2965** 中的定義。其預設值為 "False"。一個無法驗證的請求是指使用者沒
   有機會去批准請求的 URL，例如一個對於 HTML 文件中的影像所做的請求，
   而使用者沒有機會去批准是否能自動擷取影像，則這個值應該為 true。

   *method* 應為一個標示 HTTP 請求方法的字串（例如："'HEAD'"）。如果有
   提供值，則會被存在 "method" 屬性中且被 "get_method()" 所使用。當
   *data* 是 "None" 時，其預設值為 "'GET'"，否則預設值為 "'POST'"。
   Subclasses 可以透過設置其 "method" 屬性來設定不一樣的預設請求方法。

   備註:

     如果資料物件無法重複提供其內容（例如一個檔案或是只能產生一次內容
     的可疊代物件）且請求因為 HTTP 重導向 (redirects) 或是 HTTP 驗證
     (authentication) 而被重新嘗試傳送，則該請求不會正常運作。*data*
     會接在 headers 之後被送至 HTTP 伺服器。此函式庫沒有支援
     100-continue expectation。

   在 3.3 版的變更: 新增 "Request.method" 引數到 Request class。

   在 3.4 版的變更: 能夠在 class 中設置預設的 "Request.method"。

   在 3.6 版的變更: 如果 "Content-Length" 尚未被提供且 *data* 既不是
   "None" 也不是一個位元組串物件，則不會觸發錯誤，並 fall back（後備）
   使用分塊傳輸編碼 (chunked transfer encoding)。

class urllib.request.OpenerDirector

   The "OpenerDirector" class opens URLs via "BaseHandler"s chained
   together. It manages the chaining of handlers, and recovery from
   errors.

class urllib.request.BaseHandler

   This is the base class for all registered handlers --- and handles
   only the simple mechanics of registration.

class urllib.request.HTTPDefaultErrorHandler

   A class which defines a default handler for HTTP error responses;
   all responses are turned into "HTTPError" exceptions.

class urllib.request.HTTPRedirectHandler

   用於處理重新導向的類別。

class urllib.request.HTTPCookieProcessor(cookiejar=None)

   用於處理 HTTP Cookie 的類別。

class urllib.request.ProxyHandler(proxies=None)

   Cause requests to go through a proxy. If *proxies* is given, it
   must be a dictionary mapping protocol names to URLs of proxies. The
   default is to read the list of proxies from the environment
   variables "<protocol>_proxy".  If no proxy environment variables
   are set, then in a Windows environment proxy settings are obtained
   from the registry's Internet Settings section, and in a macOS
   environment proxy information is retrieved from the System
   Configuration Framework.

   To disable autodetected proxy pass an empty dictionary.

   The "no_proxy" environment variable can be used to specify hosts
   which shouldn't be reached via proxy; if set, it should be a comma-
   separated list of hostname suffixes, optionally with ":port"
   appended, for example "cern.ch,ncsa.uiuc.edu,some.host:8080".

   備註:

     "HTTP_PROXY" will be ignored if a variable "REQUEST_METHOD" is
     set; see the documentation on "getproxies()".

class urllib.request.HTTPPasswordMgr

   Keep a database of  "(realm, uri) -> (user, password)" mappings.

class urllib.request.HTTPPasswordMgrWithDefaultRealm

   Keep a database of  "(realm, uri) -> (user, password)" mappings. A
   realm of "None" is considered a catch-all realm, which is searched
   if no other realm fits.

class urllib.request.HTTPPasswordMgrWithPriorAuth

   A variant of "HTTPPasswordMgrWithDefaultRealm" that also has a
   database of "uri -> is_authenticated" mappings.  Can be used by a
   BasicAuth handler to determine when to send authentication
   credentials immediately instead of waiting for a "401" response
   first.

   在 3.5 版被加入.

class urllib.request.AbstractBasicAuthHandler(password_mgr=None)

   This is a mixin class that helps with HTTP authentication, both to
   the remote host and to a proxy. *password_mgr*, if given, should be
   something that is compatible with "HTTPPasswordMgr"; refer to
   section HTTPPasswordMgr 物件 for information on the interface that
   must be supported.  If *passwd_mgr* also provides
   "is_authenticated" and "update_authenticated" methods (see
   HTTPPasswordMgrWithPriorAuth 物件), then the handler will use the
   "is_authenticated" result for a given URI to determine whether or
   not to send authentication credentials with the request.  If
   "is_authenticated" returns "True" for the URI, credentials are
   sent.  If "is_authenticated" is "False", credentials are not sent,
   and then if a "401" response is received the request is re-sent
   with the authentication credentials.  If authentication succeeds,
   "update_authenticated" is called to set "is_authenticated" "True"
   for the URI, so that subsequent requests to the URI or any of its
   super-URIs will automatically include the authentication
   credentials.

   在 3.5 版被加入: 新增 "is_authenticated" 的支援。

class urllib.request.HTTPBasicAuthHandler(password_mgr=None)

   Handle authentication with the remote host. *password_mgr*, if
   given, should be something that is compatible with
   "HTTPPasswordMgr"; refer to section HTTPPasswordMgr 物件 for
   information on the interface that must be supported.
   HTTPBasicAuthHandler will raise a "ValueError" when presented with
   a wrong Authentication scheme.

class urllib.request.ProxyBasicAuthHandler(password_mgr=None)

   Handle authentication with the proxy. *password_mgr*, if given,
   should be something that is compatible with "HTTPPasswordMgr";
   refer to section HTTPPasswordMgr 物件 for information on the
   interface that must be supported.

class urllib.request.AbstractDigestAuthHandler(password_mgr=None)

   This is a mixin class that helps with HTTP authentication, both to
   the remote host and to a proxy. *password_mgr*, if given, should be
   something that is compatible with "HTTPPasswordMgr"; refer to
   section HTTPPasswordMgr 物件 for information on the interface that
   must be supported.

   在 3.14 版的變更: Added support for HTTP digest authentication
   algorithm "SHA-256".

class urllib.request.HTTPDigestAuthHandler(password_mgr=None)

   Handle authentication with the remote host. *password_mgr*, if
   given, should be something that is compatible with
   "HTTPPasswordMgr"; refer to section HTTPPasswordMgr 物件 for
   information on the interface that must be supported. When both
   Digest Authentication Handler and Basic Authentication Handler are
   both added, Digest Authentication is always tried first. If the
   Digest Authentication returns a 40x response again, it is sent to
   Basic Authentication handler to Handle.  This Handler method will
   raise a "ValueError" when presented with an authentication scheme
   other than Digest or Basic.

   在 3.3 版的變更: 針對不支援的驗證方案 (Authentication Scheme) 引發
   "ValueError"。

class urllib.request.ProxyDigestAuthHandler(password_mgr=None)

   Handle authentication with the proxy. *password_mgr*, if given,
   should be something that is compatible with "HTTPPasswordMgr";
   refer to section HTTPPasswordMgr 物件 for information on the
   interface that must be supported.

class urllib.request.HTTPHandler

   用於處理開啟 HTTP URL 的類別。

class urllib.request.HTTPSHandler(debuglevel=0, context=None, check_hostname=None)

   A class to handle opening of HTTPS URLs.  *context* and
   *check_hostname* have the same meaning as in
   "http.client.HTTPSConnection".

   在 3.2 版的變更: 新增 *context* 與 *check_hostname*。

class urllib.request.FileHandler

   Open local files.

class urllib.request.DataHandler

   Open data URLs.

   在 3.4 版被加入.

class urllib.request.FTPHandler

   打開 FTP URLs。

class urllib.request.CacheFTPHandler

   Open FTP URLs, keeping a cache of open FTP connections to minimize
   delays.

class urllib.request.UnknownHandler

   A catch-all class to handle unknown URLs.

class urllib.request.HTTPErrorProcessor

   Process HTTP error responses.


Request 物件
============

The following methods describe "Request"'s public interface, and so
all may be overridden in subclasses.  It also defines several public
attributes that can be used by clients to inspect the parsed request.

Request.full_url

   The original URL passed to the constructor.

   在 3.4 版的變更.

   Request.full_url is a property with setter, getter and a deleter.
   Getting "full_url" returns the original request URL with the
   fragment, if it was present.

Request.type

   The URI scheme.

Request.host

   The URI authority, typically a host, but may also contain a port
   separated by a colon.

Request.origin_req_host

   The original host for the request, without port.

Request.selector

   The URI path.  If the "Request" uses a proxy, then selector will be
   the full URL that is passed to the proxy.

Request.data

   The entity body for the request, or "None" if not specified.

   在 3.4 版的變更: Changing value of "Request.data" now deletes
   "Content-Length" header if it was previously set or calculated.

Request.unverifiable

   boolean, indicates whether the request is unverifiable as defined
   by **RFC 2965**.

Request.method

   The HTTP request method to use.  By default its value is "None",
   which means that "get_method()" will do its normal computation of
   the method to be used.  Its value can be set (thus overriding the
   default computation in "get_method()") either by providing a
   default value by setting it at the class level in a "Request"
   subclass, or by passing a value in to the "Request" constructor via
   the *method* argument.

   在 3.3 版被加入.

   在 3.4 版的變更: A default value can now be set in subclasses;
   previously it could only be set via the constructor argument.

Request.get_method()

   Return a string indicating the HTTP request method.  If
   "Request.method" is not "None", return its value, otherwise return
   "'GET'" if "Request.data" is "None", or "'POST'" if it's not. This
   is only meaningful for HTTP requests.

   在 3.3 版的變更: get_method now looks at the value of
   "Request.method".

Request.add_header(key, val)

   Add another header to the request.  Headers are currently ignored
   by all handlers except HTTP handlers, where they are added to the
   list of headers sent to the server.  Note that there cannot be more
   than one header with the same name, and later calls will overwrite
   previous calls in case the *key* collides. Currently, this is no
   loss of HTTP functionality, since all headers which have meaning
   when used more than once have a (header-specific) way of gaining
   the same functionality using only one header.  Note that headers
   added using this method are also added to redirected requests.

Request.add_unredirected_header(key, header)

   Add a header that will not be added to a redirected request.

Request.has_header(header)

   Return whether the instance has the named header (checks both
   regular and unredirected).

Request.remove_header(header)

   Remove named header from the request instance (both from regular
   and unredirected headers).

   在 3.4 版被加入.

Request.get_full_url()

   Return the URL given in the constructor.

   在 3.4 版的變更.

   回傳 "Request.full_url"

Request.set_proxy(host, type)

   Prepare the request by connecting to a proxy server. The *host* and
   *type* will replace those of the instance, and the instance's
   selector will be the original URL given in the constructor.

Request.get_header(header_name, default=None)

   Return the value of the given header. If the header is not present,
   return the default value.

Request.header_items()

   Return a list of tuples (header_name, header_value) of the Request
   headers.

在 3.4 版的變更: The request methods add_data, has_data, get_data,
get_type, get_host, get_selector, get_origin_req_host and
is_unverifiable that were deprecated since 3.3 have been removed.


OpenerDirector 物件
===================

"OpenerDirector" 物件有以下的方法：

OpenerDirector.add_handler(handler)

   *handler* should be an instance of "BaseHandler".  The following
   methods are searched, and added to the possible chains (note that
   HTTP errors are a special case).  Note that, in the following,
   *protocol* should be replaced with the actual protocol to handle,
   for example "http_response()" would be the HTTP protocol response
   handler.  Also *type* should be replaced with the actual HTTP code,
   for example "http_error_404()" would handle HTTP 404 errors.

   * "<protocol>_open()" --- signal that the handler knows how to open
     *protocol* URLs.

     更多資訊請見 "BaseHandler.<protocol>_open()"。

   * "http_error_<type>()" --- signal that the handler knows how to
     handle HTTP errors with HTTP error code *type*.

     更多資訊請見 "BaseHandler.http_error_<nnn>()"。

   * "<protocol>_error()" --- signal that the handler knows how to
     handle errors from (non-"http") *protocol*.

   * "<protocol>_request()" --- signal that the handler knows how to
     pre-process *protocol* requests.

     更多資訊請見 "BaseHandler.<protocol>_request()"。

   * "<protocol>_response()" --- signal that the handler knows how to
     post-process *protocol* responses.

     更多資訊請見 "BaseHandler.<protocol>_response()"。

OpenerDirector.open(url, data=None[, timeout])

   Open the given *url* (which can be a request object or a string),
   optionally passing the given *data*. Arguments, return values and
   exceptions raised are the same as those of "urlopen()" (which
   simply calls the "open()" method on the currently installed global
   "OpenerDirector").  The optional *timeout* parameter specifies a
   timeout in seconds for blocking operations like the connection
   attempt (if not specified, the global default timeout setting will
   be used). The timeout feature actually works only for HTTP, HTTPS
   and FTP connections.

OpenerDirector.error(proto, *args)

   Handle an error of the given protocol.  This will call the
   registered error handlers for the given protocol with the given
   arguments (which are protocol specific).  The HTTP protocol is a
   special case which uses the HTTP response code to determine the
   specific error handler; refer to the "http_error_<type>()" methods
   of the handler classes.

   Return values and exceptions raised are the same as those of
   "urlopen()".

OpenerDirector objects open URLs in three stages:

The order in which these methods are called within each stage is
determined by sorting the handler instances.

1. Every handler with a method named like "<protocol>_request()" has
   that method called to pre-process the request.

2. Handlers with a method named like "<protocol>_open()" are called to
   handle the request. This stage ends when a handler either returns a
   non-"None" value (ie. a response), or raises an exception (usually
   "URLError").  Exceptions are allowed to propagate.

   In fact, the above algorithm is first tried for methods named
   "default_open()".  If all such methods return "None", the algorithm
   is repeated for methods named like "<protocol>_open()".  If all
   such methods return "None", the algorithm is repeated for methods
   named "unknown_open()".

   Note that the implementation of these methods may involve calls of
   the parent "OpenerDirector" instance's "open()" and "error()"
   methods.

3. Every handler with a method named like "<protocol>_response()" has
   that method called to post-process the response.


BaseHandler 物件
================

"BaseHandler" objects provide a couple of methods that are directly
useful, and others that are meant to be used by derived classes.
These are intended for direct use:

BaseHandler.add_parent(director)

   Add a director as parent.

BaseHandler.close()

   Remove any parents.

The following attribute and methods should only be used by classes
derived from "BaseHandler".

備註:

  The convention has been adopted that subclasses defining
  "<protocol>_request()" or "<protocol>_response()" methods are named
  "*Processor"; all others are named "*Handler".

BaseHandler.parent

   A valid "OpenerDirector", which can be used to open using a
   different protocol, or handle errors.

BaseHandler.default_open(req)

   This method is *not* defined in "BaseHandler", but subclasses
   should define it if they want to catch all URLs.

   This method, if implemented, will be called by the parent
   "OpenerDirector".  It should return a file-like object as described
   in the return value of the "open()" method of "OpenerDirector", or
   "None". It should raise "URLError", unless a truly exceptional
   thing happens (for example, "MemoryError" should not be mapped to
   "URLError").

   This method will be called before any protocol-specific open
   method.

BaseHandler.<protocol>_open(req)

   This method is *not* defined in "BaseHandler", but subclasses
   should define it if they want to handle URLs with the given
   protocol.

   This method, if defined, will be called by the parent
   "OpenerDirector". Return values should be the same as for
   "default_open()".

BaseHandler.unknown_open(req)

   This method is *not* defined in "BaseHandler", but subclasses
   should define it if they want to catch all URLs with no specific
   registered handler to open it.

   This method, if implemented, will be called by the "parent"
   "OpenerDirector".  Return values should be the same as for
   "default_open()".

BaseHandler.http_error_default(req, fp, code, msg, hdrs)

   This method is *not* defined in "BaseHandler", but subclasses
   should override it if they intend to provide a catch-all for
   otherwise unhandled HTTP errors.  It will be called automatically
   by the  "OpenerDirector" getting the error, and should not normally
   be called in other circumstances.

   "OpenerDirector" 會以五個位置引數呼叫此方法：

   1. 一個 "Request" 物件

   2. 一個類檔案物件，包含 HTTP 錯誤主體，

   3. HTTP 錯誤的三位數代號字串，

   4. HTTP 錯誤代號的使用者可見解釋字串，以及

   5. HTTP 錯誤的標頭，為一個對映物件。

   Return values and exceptions raised should be the same as those of
   "urlopen()".

BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs)

   *nnn* should be a three-digit HTTP error code.  This method is also
   not defined in "BaseHandler", but will be called, if it exists, on
   an instance of a subclass, when an HTTP error with code *nnn*
   occurs.

   Subclasses should override this method to handle specific HTTP
   errors.

   Arguments, return values and exceptions raised should be the same
   as for "http_error_default()".

BaseHandler.<protocol>_request(req)

   This method is *not* defined in "BaseHandler", but subclasses
   should define it if they want to pre-process requests of the given
   protocol.

   This method, if defined, will be called by the parent
   "OpenerDirector". *req* will be a "Request" object. The return
   value should be a "Request" object.

BaseHandler.<protocol>_response(req, response)

   This method is *not* defined in "BaseHandler", but subclasses
   should define it if they want to post-process responses of the
   given protocol.

   This method, if defined, will be called by the parent
   "OpenerDirector". *req* will be a "Request" object. *response* will
   be an object implementing the same interface as the return value of
   "urlopen()".  The return value should implement the same interface
   as the return value of "urlopen()".


HTTPRedirectHandler 物件
========================

備註:

  Some HTTP redirections require action from this module's client
  code.  If this is the case, "HTTPError" is raised.  See **RFC 2616**
  for details of the precise meanings of the various redirection
  codes.An "HTTPError" exception raised as a security consideration if
  the HTTPRedirectHandler is presented with a redirected URL which is
  not an HTTP, HTTPS or FTP URL.

HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)

   Return a "Request" or "None" in response to a redirect. This is
   called by the default implementations of the "http_error_30*()"
   methods when a redirection is received from the server.  If a
   redirection should take place, return a new "Request" to allow
   "http_error_30*()" to perform the redirect to *newurl*.  Otherwise,
   raise "HTTPError" if no other handler should try to handle this
   URL, or return "None" if you can't but another handler might.

   備註:

     The default implementation of this method does not strictly
     follow **RFC 2616**, which says that 301 and 302 responses to
     "POST" requests must not be automatically redirected without
     confirmation by the user.  In reality, browsers do allow
     automatic redirection of these responses, changing the POST to a
     "GET", and the default implementation reproduces this behavior.

HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)

   Redirect to the "Location:" or "URI:" URL.  This method is called
   by the parent "OpenerDirector" when getting an HTTP 'moved
   permanently' response.

HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)

   The same as "http_error_301()", but called for the 'found'
   response.

HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)

   The same as "http_error_301()", but called for the 'see other'
   response.

HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)

   The same as "http_error_301()", but called for the 'temporary
   redirect' response. It does not allow changing the request method
   from "POST" to "GET".

HTTPRedirectHandler.http_error_308(req, fp, code, msg, hdrs)

   The same as "http_error_301()", but called for the 'permanent
   redirect' response. It does not allow changing the request method
   from "POST" to "GET".

   在 3.11 版被加入.


HTTPCookieProcessor 物件
========================

"HTTPCookieProcessor" 實例有一個屬性：

HTTPCookieProcessor.cookiejar

   存放 cookies 的 "http.cookiejar.CookieJar"。


ProxyHandler 物件
=================

ProxyHandler.<protocol>_open(request)

   The "ProxyHandler" will have a method "<protocol>_open()" for every
   *protocol* which has a proxy in the *proxies* dictionary given in
   the constructor.  The method will modify requests to go through the
   proxy, by calling "request.set_proxy()", and call the next handler
   in the chain to actually execute the protocol.


HTTPPasswordMgr 物件
====================

These methods are available on "HTTPPasswordMgr" and
"HTTPPasswordMgrWithDefaultRealm" objects.

HTTPPasswordMgr.add_password(realm, uri, user, passwd)

   *uri* can be either a single URI, or a sequence of URIs. *realm*,
   *user* and *passwd* must be strings. This causes "(user, passwd)"
   to be used as authentication tokens when authentication for *realm*
   and a super-URI of any of the given URIs is given.

HTTPPasswordMgr.find_user_password(realm, authuri)

   Get user/password for given realm and URI, if any.  This method
   will return "(None, None)" if there is no matching user/password.

   For "HTTPPasswordMgrWithDefaultRealm" objects, the realm "None"
   will be searched if the given *realm* has no matching
   user/password.


HTTPPasswordMgrWithPriorAuth 物件
=================================

This password manager extends "HTTPPasswordMgrWithDefaultRealm" to
support tracking URIs for which authentication credentials should
always be sent.

HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, passwd, is_authenticated=False)

   *realm*, *uri*, *user*, *passwd* are as for
   "HTTPPasswordMgr.add_password()".  *is_authenticated* sets the
   initial value of the "is_authenticated" flag for the given URI or
   list of URIs. If *is_authenticated* is specified as "True", *realm*
   is ignored.

HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri)

   Same as for "HTTPPasswordMgrWithDefaultRealm" objects

HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, is_authenticated=False)

   Update the "is_authenticated" flag for the given *uri* or list of
   URIs.

HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri)

   Returns the current state of the "is_authenticated" flag for the
   given URI.


AbstractBasicAuthHandler 物件
=============================

AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)

   Handle an authentication request by getting a user/password pair,
   and re-trying the request.  *authreq* should be the name of the
   header where the information about the realm is included in the
   request, *host* specifies the URL and path to authenticate for,
   *req* should be the (failed) "Request" object, and *headers* should
   be the error headers.

   *host* is either an authority (e.g. ""python.org"") or a URL
   containing an authority component (e.g. ""http://python.org/""). In
   either case, the authority must not contain a userinfo component
   (so, ""python.org"" and ""python.org:80"" are fine,
   ""joe:password@python.org"" is not).


HTTPBasicAuthHandler 物件
=========================

HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)

   Retry the request with authentication information, if available.


ProxyBasicAuthHandler 物件
==========================

ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)

   Retry the request with authentication information, if available.


AbstractDigestAuthHandler 物件
==============================

AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)

   *authreq* should be the name of the header where the information
   about the realm is included in the request, *host* should be the
   host to authenticate to, *req* should be the (failed) "Request"
   object, and *headers* should be the error headers.


HTTPDigestAuthHandler 物件
==========================

HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)

   Retry the request with authentication information, if available.


ProxyDigestAuthHandler 物件
===========================

ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)

   Retry the request with authentication information, if available.


HTTPHandler 物件
================

HTTPHandler.http_open(req)

   Send an HTTP request, which can be either GET or POST, depending on
   "req.data".


HTTPSHandler 物件
=================

HTTPSHandler.https_open(req)

   Send an HTTPS request, which can be either GET or POST, depending
   on "req.data".


FileHandler 物件
================

FileHandler.file_open(req)

   Open the file locally, if there is no host name, or the host name
   is "'localhost'".

   在 3.2 版的變更: This method is applicable only for local
   hostnames.  When a remote hostname is given, a "URLError" is
   raised.


DataHandler 物件
================

DataHandler.data_open(req)

   Read a data URL. This kind of URL contains the content encoded in
   the URL itself. The data URL syntax is specified in **RFC 2397**.
   This implementation ignores white spaces in base64 encoded data
   URLs so the URL may be wrapped in whatever source file it comes
   from. But even though some browsers don't mind about a missing
   padding at the end of a base64 encoded data URL, this
   implementation will raise a "ValueError" in that case.


FTPHandler 物件
===============

FTPHandler.ftp_open(req)

   Open the FTP file indicated by *req*. The login is always done with
   empty username and password.


CacheFTPHandler 物件
====================

"CacheFTPHandler" objects are "FTPHandler" objects with the following
additional methods:

CacheFTPHandler.setTimeout(t)

   Set timeout of connections to *t* seconds.

CacheFTPHandler.setMaxConns(m)

   Set maximum number of cached connections to *m*.


UnknownHandler 物件
===================

UnknownHandler.unknown_open()

   Raise a "URLError" exception.


HTTPErrorProcessor 物件
=======================

HTTPErrorProcessor.http_response(request, response)

   Process HTTP error responses.

   For 200 error codes, the response object is returned immediately.

   For non-200 error codes, this simply passes the job on to the
   "http_error_<type>()" handler methods, via
   "OpenerDirector.error()". Eventually, "HTTPDefaultErrorHandler"
   will raise an "HTTPError" if no other handler handles the error.

HTTPErrorProcessor.https_response(request, response)

   Process HTTPS error responses.

   The behavior is same as "http_response()".


範例
====

In addition to the examples below, more examples are given in 如何使用
urllib 套件取得網路資源.

This example gets the python.org main page and displays the first 300
bytes of it:

   >>> import urllib.request
   >>> with urllib.request.urlopen('http://www.python.org/') as f:
   ...     print(f.read(300))
   ...
   b'<!doctype html>\n<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->\n<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->\n<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">

Note that urlopen returns a bytes object.  This is because there is no
way for urlopen to automatically determine the encoding of the byte
stream it receives from the HTTP server. In general, a program will
decode the returned bytes object to string once it determines or
guesses the appropriate encoding.

The following HTML spec document,
https://html.spec.whatwg.org/#charset, lists the various ways in which
an HTML or an XML document could have specified its encoding
information.

For additional information, see the W3C document:
https://www.w3.org/International/questions/qa-html-encoding-
declarations.

As the python.org website uses *utf-8* encoding as specified in its
meta tag, we will use the same for decoding the bytes object:

   >>> with urllib.request.urlopen('http://www.python.org/') as f:
   ...     print(f.read(100).decode('utf-8'))
   ...
   <!doctype html>
   <!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
   <!-

It is also possible to achieve the same result without using the
*context manager* approach:

   >>> import urllib.request
   >>> f = urllib.request.urlopen('http://www.python.org/')
   >>> try:
   ...     print(f.read(100).decode('utf-8'))
   ... finally:
   ...     f.close()
   ...
   <!doctype html>
   <!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
   <!--

In the following example, we are sending a data-stream to the stdin of
a CGI and reading the data it returns to us. Note that this example
will only work when the Python installation supports SSL.

   >>> import urllib.request
   >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
   ...                       data=b'This data is passed to stdin of the CGI')
   >>> with urllib.request.urlopen(req) as f:
   ...     print(f.read().decode('utf-8'))
   ...
   Got Data: "This data is passed to stdin of the CGI"

The code for the sample CGI used in the above example is:

   #!/usr/bin/env python
   import sys
   data = sys.stdin.read()
   print('Content-type: text/plain\n\nGot Data: "%s"' % data)

Here is an example of doing a "PUT" request using "Request":

   import urllib.request
   DATA = b'some data'
   req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT')
   with urllib.request.urlopen(req) as f:
       pass
   print(f.status)
   print(f.reason)

使用基本 HTTP 認證：

   import urllib.request
   # Create an OpenerDirector with support for Basic HTTP Authentication...
   auth_handler = urllib.request.HTTPBasicAuthHandler()
   auth_handler.add_password(realm='PDQ Application',
                             uri='https://mahler:8092/site-updates.py',
                             user='klem',
                             passwd='kadidd!ehopper')
   opener = urllib.request.build_opener(auth_handler)
   # ...and install it globally so it can be used with urlopen.
   urllib.request.install_opener(opener)
   with urllib.request.urlopen('http://www.example.com/login.html') as f:
       print(f.read().decode('utf-8'))

"build_opener()" provides many handlers by default, including a
"ProxyHandler".  By default, "ProxyHandler" uses the environment
variables named "<scheme>_proxy", where "<scheme>" is the URL scheme
involved.  For example, the "http_proxy" environment variable is read
to obtain the HTTP proxy's URL.

This example replaces the default "ProxyHandler" with one that uses
programmatically supplied proxy URLs, and adds proxy authorization
support with "ProxyBasicAuthHandler".

   proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
   proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
   proxy_auth_handler.add_password('realm', 'host', 'username', 'password')

   opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
   # 這次我們直接使用它而不安裝 OpenerDirector:
   with opener.open('http://www.example.com/login.html') as f:
      print(f.read().decode('utf-8'))

添加 HTTP 標頭：

使用 "Request" 建構函式的 *headers* 引數，或：

   import urllib.request
   req = urllib.request.Request('http://www.example.com/')
   req.add_header('Referer', 'http://www.python.org/')
   # 自訂預設的使用者代理標頭值：
   req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
   with urllib.request.urlopen(req) as f:
       print(f.read().decode('utf-8'))

"OpenerDirector" automatically adds a *User-Agent* header to every
"Request".  To change this:

   import urllib.request
   opener = urllib.request.build_opener()
   opener.addheaders = [('User-agent', 'Mozilla/5.0')]
   with opener.open('http://www.example.com/') as f:
      print(f.read().decode('utf-8'))

Also, remember that a few standard headers (*Content-Length*,
*Content-Type* and *Host*) are added when the "Request" is passed to
"urlopen()" (or "OpenerDirector.open()").

Here is an example session that uses the "GET" method to retrieve a
URL containing parameters:

   >>> import urllib.request
   >>> import urllib.parse
   >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
   >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
   >>> with urllib.request.urlopen(url) as f:
   ...     print(f.read().decode('utf-8'))
   ...

The following example uses the "POST" method instead. Note that params
output from urlencode is encoded to bytes before it is sent to urlopen
as data:

   >>> import urllib.request
   >>> import urllib.parse
   >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
   >>> data = data.encode('ascii')
   >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
   ...     print(f.read().decode('utf-8'))
   ...

The following example uses an explicitly specified HTTP proxy,
overriding environment settings:

   >>> import urllib.request
   >>> proxies = {'http': 'http://proxy.example.com:8080/'}
   >>> opener = urllib.request.build_opener(urllib.request.ProxyHandler(proxies))
   >>> with opener.open("http://www.python.org") as f:
   ...     f.read().decode('utf-8')
   ...

The following example uses no proxies at all, overriding environment
settings:

   >>> import urllib.request
   >>> opener = urllib.request.build_opener(urllib.request.ProxyHandler({}}))
   >>> with opener.open("http://www.python.org/") as f:
   ...     f.read().decode('utf-8')
   ...


遺留介面
========

The following functions and classes are ported from the Python 2
module "urllib" (as opposed to "urllib2").  They might become
deprecated at some point in the future.

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

   Copy a network object denoted by a URL to a local file. If the URL
   points to a local file, the object will not be copied unless
   filename is supplied. Return a tuple "(filename, headers)" where
   *filename* is the local file name under which the object can be
   found, and *headers* is whatever the "info()" method of the object
   returned by "urlopen()" returned (for a remote object). Exceptions
   are the same as for "urlopen()".

   The second argument, if present, specifies the file location to
   copy to (if absent, the location will be a tempfile with a
   generated name). The third argument, if present, is a callable that
   will be called once on establishment of the network connection and
   once after each block read thereafter.  The callable will be passed
   three arguments; a count of blocks transferred so far, a block size
   in bytes, and the total size of the file.  The third argument may
   be "-1" on older FTP servers which do not return a file size in
   response to a retrieval request.

   The following example illustrates the most common usage scenario:

      >>> import urllib.request
      >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
      >>> html = open(local_filename)
      >>> html.close()

   If the *url* uses the "http:" scheme identifier, the optional
   *data* argument may be given to specify a "POST" request (normally
   the request type is "GET").  The *data* argument must be a bytes
   object in standard *application/x-www-form-urlencoded* format; see
   the "urllib.parse.urlencode()" function.

   "urlretrieve()" will raise "ContentTooShortError" when it detects
   that the amount of data available  was less than the expected
   amount (which is the size reported by a  *Content-Length* header).
   This can occur, for example, when the  download is interrupted.

   The *Content-Length* is treated as a lower bound: if there's more
   data  to read, urlretrieve reads more data, but if less data is
   available,  it raises the exception.

   You can still retrieve the downloaded data in this case, it is
   stored in the "content" attribute of the exception instance.

   If no *Content-Length* header was supplied, urlretrieve can not
   check the size of the data it has downloaded, and just returns it.
   In this case you just have to assume that the download was
   successful.

urllib.request.urlcleanup()

   Cleans up temporary files that may have been left behind by
   previous calls to "urlretrieve()".


"urllib.request" 限制
=====================

* Currently, only the following protocols are supported: HTTP
  (versions 0.9 and 1.0), FTP, local files, and data URLs.

  在 3.4 版的變更: Added support for data URLs.

* The caching feature of "urlretrieve()" has been disabled until
  someone finds the time to hack proper processing of Expiration time
  headers.

* There should be a function to query whether a particular URL is in
  the cache.

* For backward compatibility, if a URL appears to point to a local
  file but the file can't be opened, the URL is re-interpreted using
  the FTP protocol.  This can sometimes cause confusing error
  messages.

* The "urlopen()" and "urlretrieve()" functions can cause arbitrarily
  long delays while waiting for a network connection to be set up.
  This means that it is difficult to build an interactive web client
  using these functions without using threads.

* The data returned by "urlopen()" or "urlretrieve()" is the raw data
  returned by the server.  This may be binary data (such as an image),
  plain text or (for example) HTML.  The HTTP protocol provides type
  information in the reply header, which can be inspected by looking
  at the *Content-Type* header.  If the returned data is HTML, you can
  use the module "html.parser" to parse it.

* The code handling the FTP protocol cannot differentiate between a
  file and a directory.  This can lead to unexpected behavior when
  attempting to read a URL that points to a file that is not
  accessible.  If the URL ends in a "/", it is assumed to refer to a
  directory and will be handled accordingly.  But if an attempt to
  read a file leads to a 550 error (meaning the URL cannot be found or
  is not accessible, often for permission reasons), then the path is
  treated as a directory in order to handle the case when a directory
  is specified by a URL but the trailing "/" has been left off.  This
  can cause misleading results when you try to fetch a file whose read
  permissions make it inaccessible; the FTP code will try to read it,
  fail with a 550 error, and then perform a directory listing for the
  unreadable file. If fine-grained control is needed, consider using
  the "ftplib" module.


"urllib.response" --- Response classes used by urllib
*****************************************************

The "urllib.response" module defines functions and classes which
define a minimal file-like interface, including "read()" and
"readline()". Functions defined by this module are used internally by
the "urllib.request" module. The typical response object is a
"urllib.response.addinfourl" instance:

class urllib.response.addinfourl

   url

      URL of the resource retrieved, commonly used to determine if a
      redirect was followed.

   headers

      Returns the headers of the response in the form of an
      "EmailMessage" instance.

   status

      在 3.9 版被加入.

      伺服器回傳的狀態碼。

   geturl()

      在 3.9 版之後被棄用: 已被 "url" 取代而棄用。

   info()

      在 3.9 版之後被棄用: 已被 "headers" 取代而棄用。

   code

      在 3.9 版之後被棄用: 已被 "status" 取代而棄用。

   getcode()

      在 3.9 版之後被棄用: 已被 "status" 取代而棄用。
