`urllib.parse` --- URL を構成要素に解析する¶

ソースコード: Lib/urllib/parse.py

このモジュールでは URL (Uniform Resource Locator) 文字列をその構成要素 (アドレススキーム、ネットワーク上の位置、パスその他) に分解したり、構成要素を URL に組みなおしたり、 "相対 URL (relative URL)" を指定した "基底 URL (base URL)" に基づいて絶対 URL に変換するための標準的なインターフェースを定義しています。

このモジュールは Relative Uniform Resource Locators (相対 URL) に関するインターネット RFC に適合するよう設計されており、次の URL スキームをサポートしています: file, ftp, gopher, hdl, http, https, imap, itms-services, mailto, mms, news, nntp, prospero, rsync, rtsp, rtsps, rtspu, sftp, shttp, sip, sips, snews, svn, svn+ssh, telnet, wais, ws, wss

CPython 実装の詳細: The inclusion of the itms-services URL scheme can prevent an app from passing Apple's App Store review process for the macOS and iOS App Stores. Handling for the itms-services scheme is always removed on iOS; on macOS, it may be removed if CPython has been built with the --with-app-store-compliance option.

The urllib.parse module defines functions that fall into two broad categories: URL parsing and URL quoting. These are covered in detail in the following sections.

This module's functions use the deprecated term netloc (or net_loc), which was introduced in RFC 1808. However, this term has been obsoleted by RFC 3986, which introduced the term authority as its replacement. The use of netloc is continued for backward compatibility.

URL の解析¶

URL 解析関数は、URL 文字列を各構成要素に分割するか、あるいは URL の構成要素を組み合わせて URL 文字列を生成します。

urllib.parse.urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)¶

Parse a URL into five components, returning a 5-item named tuple SplitResult or SplitResultBytes. This corresponds to the general structure of a URL: scheme://netloc/path?query#fragment. Each tuple item is a string, possibly empty, or None if missing_as_none is true. Not defined component are represented an empty string (by default) or None if missing_as_none is true. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present.

Additionally, the netloc property is broken down into these additional attributes added to the returned object: username, password, hostname, and port.

Percent-encoded sequences are not decoded.

例えば:

>>> from urllib.parse import urlsplit
>>> urlsplit("scheme://netloc/path?query#fragment")
SplitResult(scheme='scheme', netloc='netloc', path='/path',
            query='query', fragment='fragment')
>>> o = urlsplit("http://docs.python.org:80/3/library/urllib.parse.html?"
...              "highlight=params#url-parsing")
>>> o
SplitResult(scheme='http', netloc='docs.python.org:80',
            path='/3/library/urllib.parse.html',
            query='highlight=params', fragment='url-parsing')
>>> o.scheme
'http'
>>> o.netloc
'docs.python.org:80'
>>> o.hostname
'docs.python.org'
>>> o.port
80
>>> o._replace(fragment="").geturl()
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
>>> urlsplit("http://docs.python.org?")
SplitResult(scheme='http', netloc='docs.python.org', path='',
            query='', fragment='')
>>> urlsplit("http://docs.python.org?", missing_as_none=True)
SplitResult(scheme='http', netloc='docs.python.org', path='',
            query='', fragment=None)

Following the syntax specifications in RFC 1808, urlsplit() recognizes a netloc only if it is properly introduced by '//'. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

>>> from urllib.parse import urlsplit
>>> urlsplit('//www.cwi.nl:80/%7Eguido/Python.html')
SplitResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            query='', fragment='')
>>> urlsplit('www.cwi.nl/%7Eguido/Python.html')
SplitResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
            query='', fragment='')
>>> urlsplit('help/Python.html')
SplitResult(scheme='', netloc='', path='help/Python.html',
            query='', fragment='')
>>> urlsplit('help/Python.html', missing_as_none=True)
SplitResult(scheme=None, netloc=None, path='help/Python.html',
            query=None, fragment=None)

The scheme argument gives the default addressing scheme, to be used only if the URL does not specify one. It should be the same type (text or bytes) as urlstring or None, except that the '' is always allowed, and is automatically converted to b'' if appropriate.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path or query component, and fragment is set to None or the empty string (depending on the value of missing_as_none) in the return value.

戻り値は名前付きタプルです。これは、インデックス指定もしくは以下のような名前属性で要素にアクセスできることを意味します:

属性	インデックス	値	指定されなかった場合の値
`scheme`	0	URL スキーム	scheme parameter or empty string [1]
`netloc`	1	ネットワーク上の位置	`None` or empty string [1]
`path`	2	階層的パス	空文字列
`query`	3	クエリ要素	`None` or empty string [1]
`fragment`	4	フラグメント識別子	`None` or empty string [1]
`username`		ユーザ名	`None`
`password`		パスワード	`None`
`hostname`		ホスト名 (小文字)	`None`
`port`		ポート番号を表わす整数 (もしあれば)	`None`

URL中で不正なポートが指定されている場合、 port 属性を読みだすと、ValueError を送出します。結果オブジェクトのより詳しい情報は構造化された解析結果節を参照してください。

netloc 属性にマッチしなかった角括弧があると ValueError を送出します。

Characters in the netloc attribute that decompose under NFKC normalization (as used by the IDNA encoding) into any of /, ?, #, @, or : will raise a ValueError. If the URL is decomposed before parsing, no error will be raised.

Following some of the WHATWG spec that updates RFC 3986, leading C0 control and space characters are stripped from the URL. \n, \r and tab \t characters are removed from the URL at any position.

As is the case with all named tuples, the subclass has a few additional methods and attributes that are particularly useful. One such method is _replace(). The _replace() method will return a new SplitResult object replacing specified fields with new values.

>>> from urllib.parse import urlsplit
>>> u = urlsplit('//www.cwi.nl:80/%7Eguido/Python.html')
>>> u
SplitResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            query='', fragment='')
>>> u._replace(scheme='http')
SplitResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            query='', fragment='')

警告

urlsplit() does not perform validation. See URL parsing security for details.

バージョン 3.2 で変更: IPv6 URL の解析も行えるようになりました。

バージョン 3.3 で変更: The fragment is now parsed for all URL schemes (unless allow_fragments is false), in accordance with RFC 3986. Previously, an allowlist of schemes that support fragments existed.

バージョン 3.6 で変更: Out-of-range port numbers now raise ValueError, instead of returning None.

バージョン 3.8 で変更: Characters that affect netloc parsing under NFKC normalization will now raise ValueError.

バージョン 3.10 で変更: ASCII newline and tab characters are stripped from the URL.

バージョン 3.12 で変更: Leading WHATWG C0 control and space characters are stripped from the URL.

バージョン 3.15 で変更: Added the missing_as_none parameter.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')¶

文字列引数として渡されたクエリ文字列 (application/x-www-form-urlencoded 型のデータ) を解析します。解析されたデータを辞書として返します。辞書のキーは一意なクエリ変数名で、値は各変数名に対する値からなるリストです。

任意の引数 keep_blank_values は、パーセントエンコードされたクエリの中の値が入っていないクエリの値を空白文字列と見なすかどうかを示すフラグです。値が真であれば、値の入っていないフィールドは空文字列のままになります。標準では偽で、値の入っていないフィールドを無視し、そのフィールドはクエリに含まれていないものとして扱います。

任意の引数 strict_parsing はパース時のエラーをどう扱うかを決めるフラグです。値が偽なら (デフォルトの設定です)、エラーは暗黙のうちに無視します。値が真なら ValueError 例外を送出します。

任意のパラメータ encoding および errors はパーセントエンコードされたシーケンスを Unicode 文字にデコードする方法を指定します。これは bytes.decode() メソッドに渡されます。

The optional argument max_num_fields is the maximum number of fields to read. If set, then throws a ValueError if there are more than max_num_fields fields read.

The optional argument separator is the symbol to use for separating the query arguments. It defaults to &.

このような辞書をクエリ文字列に変換するには urllib.parse.urlencode() 関数を (doseq パラメータに True を指定して) 使用します。

バージョン 3.2 で変更: encoding および errors パラメータが追加されました。

バージョン 3.8 で変更: max_num_fields パラメータが追加されました。

バージョン 3.10 で変更: Added separator parameter with the default value of &. Python versions earlier than Python 3.10 allowed using both ; and & as query parameter separator. This has been changed to allow only a single separator key, with & as the default separator.

バージョン 3.14 で非推奨: Accepting objects with false values (like 0 and []) except empty strings and byte-like objects and None is now deprecated.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')¶

文字列引数として渡されたクエリ文字列 (application/x-www-form-urlencoded 型のデータ) を解析します。解析されたデータは名前と値のペアからなるリストです。

任意の引数 keep_blank_values は、パーセントエンコードされたクエリの中の値が入っていないクエリの値を空白文字列と見なすかどうかを示すフラグです。値が真であれば、値の入っていないフィールドは空文字列のままになります。標準では偽で、値の入っていないフィールドを無視し、そのフィールドはクエリに含まれていないものとして扱います。

任意の引数 strict_parsing はパース時のエラーをどう扱うかを決めるフラグです。値が偽なら (デフォルトの設定です)、エラーは暗黙のうちに無視します。値が真なら ValueError 例外を送出します。

任意のパラメータ encoding および errors はパーセントエンコードされたシーケンスを Unicode 文字にデコードする方法を指定します。これは bytes.decode() メソッドに渡されます。

The optional argument max_num_fields is the maximum number of fields to read. If set, then throws a ValueError if there are more than max_num_fields fields read.

The optional argument separator is the symbol to use for separating the query arguments. It defaults to &.

ペアのリストからクエリ文字列を生成する場合には urllib.parse.urlencode() 関数を使用します。

バージョン 3.2 で変更: encoding および errors パラメータが追加されました。

バージョン 3.8 で変更: max_num_fields パラメータが追加されました。

バージョン 3.10 で変更: Added separator parameter with the default value of &. Python versions earlier than Python 3.10 allowed using both ; and & as query parameter separator. This has been changed to allow only a single separator key, with & as the default separator.

urllib.parse.urlunsplit(parts)¶

urllib.parse.urlunsplit(parts, *, keep_empty)

Construct a URL from a tuple as returned by urlsplit(). The parts argument can be any five-item iterable.

This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent).

If keep_empty is true, empty strings are kept in the result (for example, a ? for an empty query), only None components are omitted. This allows rebuilding a URL that was parsed with option missing_as_none=True. By default, keep_empty is true if parts is the result of the urlsplit() call with missing_as_none=True.

バージョン 3.15 で変更: Added the keep_empty parameter.

urllib.parse.urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)¶

This is similar to urlsplit(), but additionally splits the path component on path and params. This function returns a 6-item named tuple ParseResult or ParseResultBytes. Its items are the same as for the urlsplit() result, except that params is inserted at index 3, between path and query.

This function is based on obsoleted RFC 1738 and RFC 1808, which listed params as the main URL component. The more recent URL syntax allows parameters to be applied to each segment of the path portion of the URL (see RFC 3986). urlsplit() should generally be used instead of urlparse(). A separate function is needed to separate the path segments and parameters.

urllib.parse.urlunparse(parts)¶

urllib.parse.urlunparse(parts, *, keep_empty)

Combine the elements of a tuple as returned by urlparse() into a complete URL as a string. The parts argument can be any six-item iterable.

This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent).

If keep_empty is true, empty strings are kept in the result (for example, a ? for an empty query), only None components are omitted. This allows rebuilding a URL that was parsed with option missing_as_none=True. By default, keep_empty is true if parts is the result of the urlparse() call with missing_as_none=True.

バージョン 3.15 で変更: Added the keep_empty parameter.

urllib.parse.urljoin(base, url, allow_fragments=True)¶

"基底 URL"(base)と別のURL(url)を組み合わせて、完全な URL ("絶対 URL") を構成します。くだけて言えば、この関数は相対 URL にない要素を提供するために基底 URL の要素、特にアドレススキーム、ネットワーク上の位置、およびパス (の一部) を使います。例えば:

>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'

The allow_fragments argument has the same meaning and default as for urlsplit().

注釈

url が (// か scheme:// で始まる) 絶対URLであれば、その url のホスト名と / もしくは scheme は結果に反映されます。例えば:

>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
...         '//www.python.org/%7Eguido')
'http://www.python.org/%7Eguido'

もしこの動作が望みのものでない場合は、 url を urlsplit() と urlunsplit() で先に処理して、 scheme と netloc を削除してください。

警告

Because an absolute URL may be passed as the url parameter, it is generally not secure to use urljoin with an attacker-controlled url. For example in, urljoin("https://website.com/users/", username), if username can contain an absolute URL, the result of urljoin will be the absolute URL.

バージョン 3.5 で変更: RFC 3986 で定義された意味論とマッチするように挙動がアップデートされました。

urllib.parse.urldefrag(url, *, missing_as_none=False)¶

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string (by default) or None if missing_as_none is true.

戻り値は名前付きタプルで、インデックスによってもしくは名前属性として要素にアクセスできます:

属性	インデックス	値	指定されなかった場合の値
`url`	0	フラグメントのない URL	空文字列
`fragment`	1	フラグメント識別子	`None` or empty string [3]

結果オブジェクトのより詳しい情報は構造化された解析結果節を参照してください。

バージョン 3.2 で変更: 結果はシンプルな 2 要素のタプルから構造化オブジェクトに変更されました。

バージョン 3.15 で変更: Added the missing_as_none parameter.

urllib.parse.unwrap(url)¶: Extract the url from a wrapped URL (that is, a string formatted as <URL:scheme://host/path>, <scheme://host/path>, URL:scheme://host/path or scheme://host/path). If url is not a wrapped URL, it is returned without changes.

URL parsing security¶

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

Instead of raising an exception on unusual input, they may instead return some component parts as empty strings or None (depending on the value of the missing_as_none argument). Or components may contain more than perhaps they should.

We recommend that users of these APIs where the values may be used anywhere with security implications code defensively. Do some verification within your code before trusting a returned component part. Does that scheme make sense? Is that a sensible path? Is there anything strange about that hostname? etc.

What constitutes a URL is not universally well defined. Different applications have different needs and desired constraints. For instance the living WHATWG spec describes what user facing web clients such as a web browser require. While RFC 3986 is more general. These functions incorporate some aspects of both, but cannot be claimed compliant with either. The APIs and existing user code with expectations on specific behaviors predate both standards leading us to be very cautious about making API behavior changes.

ASCII エンコードバイト列の解析¶

URL を解析する関数は元々文字列のみ操作するよう設計されていました。実際のところ、それは URL が正しくクオートされエンコードされた ASCII バイト列を操作できた方が有用でした。結果的にこのモジュールの URL 解析関数はすべて bytes および bytearray オブジェクトに加えて str オブジェクトでも処理するようになりました。

str データが渡された場合、戻り値は str データのみを含んだものになります。bytes あるいは bytearray が渡された場合、戻り値は bytes データのみを含んだものになります。

単一の関数を呼び出す時に bytes または bytearray が混在した str を渡した場合、TypeError が、非 ASCII バイト値が渡された場合 UnicodeDecodeError が送出されます。

str と bytes 間で容易に変換を行えるよう、すべての URL 解析関数は encode() メソッド (結果に str データが含まれる時用) か decode() メソッド (結果に bytes データが含まれる時用) のどちらかを提供しています。これらメソッドの動作は対応する str と bytes メソッドが持つものと同じです (ただしデフォルトのエンコーディングは 'utf-8' ではなく 'ascii' になります)。それぞれは encode() メソッドを持つ bytes データか decode() メソッドを持つ str データのどちらかに対応した型を生成します。

非 ASCII データを含むなど、不適切にクオートされた URL を操作する可能性のあるアプリケーションでは、URL 解析メソッドを呼び出す前に独自にバイト列から文字列にデコードする必要があります。

この項で説明された挙動は URL 解析関数にのみ該当します。URL クオート関数でバイトシーケンスを生成もしくは消化する際には、別にURL クオート関数の項で詳説されている通りのルールに従います。

バージョン 3.2 で変更: URL 解析関数は ASCII エンコードバイトシーケンスも受け付けるようになりました

構造化された解析結果¶

The result objects from the urlsplit(), urlparse() and urldefrag() functions are subclasses of the tuple type. These subclasses add the attributes listed in the documentation for those functions, the encoding and decoding support described in the previous section, as well as an additional method:

urllib.parse.SplitResult.geturl()¶

Return the re-combined version of the original URL as a string. This may differ from the original URL in that the scheme may be normalized to lower case and empty components may be dropped. Specifically, empty parameters, queries, and fragment identifiers will be removed unless the URL was parsed with missing_as_none=True.

urldefrag() の戻り値では、空のフラグメント識別子のみ削除されます。urlsplit() および urlparse() の戻り値では、このメソッドが返す URL には説明されているすべての変更が加えられます。

加えた解析関数を逆に行えばこのメソッドの戻り値は元の URL になります:

>>> from urllib.parse import urlsplit
>>> url = 'HTTP://www.Python.org/doc/#'
>>> r1 = urlsplit(url)
>>> r1.geturl()
'http://www.Python.org/doc/'
>>> r2 = urlsplit(r1.geturl())
>>> r2.geturl()
'http://www.Python.org/doc/'
>>> r3 = urlsplit(url, missing_as_none=True)
>>> r3.geturl()
'http://www.Python.org/doc/#'

以下のクラスは str オブジェクトを操作した場合、構造化された解析結果の実装を提供します:

class urllib.parse.DefragResult(url, fragment)¶: urldefrag() の具象クラスの結果には str データが含まれます。encode() メソッドは DefragResultBytes インスタンスを返します。

Added in version 3.2.

class urllib.parse.ParseResult(scheme, netloc, path, params, query, fragment)¶: urlparse() の具象クラスの結果には str データが含まれます。encode() メソッドは ParseResultBytes インスタンスを返します。

class urllib.parse.SplitResult(scheme, netloc, path, query, fragment)¶: urlsplit() の具象クラスの結果には str データが含まれます。encode() メソッドは SplitResultBytes インスタンスを返します。

以下のクラスは bytes または bytearray オブジェクトを操作した時に解析結果の実装を提供します:

class urllib.parse.DefragResultBytes(url, fragment)¶: urldefrag() の具象クラスの結果には bytes データが含まれます。decode() メソッドは DefragResult インスタンスを返します。

Added in version 3.2.

class urllib.parse.ParseResultBytes(scheme, netloc, path, params, query, fragment)¶: urlparse() の具象クラスの結果には bytes が含まれます。decode() メソッドは ParseResult インスタンスを返します。

Added in version 3.2.

class urllib.parse.SplitResultBytes(scheme, netloc, path, query, fragment)¶: urlsplit() の具象クラスの結果には bytes データが含まれます。decode() メソッドは SplitResult インスタンスを返します。

Added in version 3.2.

URL のクオート¶

URL クオート関数は、プログラムデータを取り URL 構成要素として使用できるよう特殊文字をクオートしたり非 ASCII 文字を適切にエンコードすることに焦点を当てています。これらは上述の URL 解析関数でカバーされていない URL 構成要素からオリジナルデータの再作成もサポートしています。

urllib.parse.quote(string, safe='/', encoding=None, errors=None)¶

string 内の特殊文字を %xx を使用してエスケープします。文字、数字、および '_.-~' はクオートされません。デフォルトでは、この関数は URL のパス部分のクオートのために用意されています。任意のパラメータ safe を指定すると、指定した ASCII 文字もクオートされません。デフォルトは '/' です。

string に使用できるのは str か bytes オブジェクトです。

バージョン 3.7 で変更: Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now included in the set of unreserved characters.

The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode() method. Although these parameters default to None in the function signature, when processing str inputs, encoding effectively defaults to 'utf-8' and errors to 'strict', meaning unsupported characters raise a UnicodeEncodeError. encoding and errors must not be supplied if string is a bytes, or a TypeError is raised.

quote(string, safe, encoding, errors) は quote_from_bytes(string.encode(encoding, errors), safe) と等価であることに留意してください。

例: quote('/El Niño/') は '/El%20Ni%C3%B1o/' を返します。

urllib.parse.quote_plus(string, safe='', encoding=None, errors=None)¶

Like quote(), but also replace spaces with plus signs, as required for quoting HTML form values when building up a query string to go into a URL. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to '/'.

例: quote_plus('/El Niño/') は '%2FEl+Ni%C3%B1o%2F' を返します。

urllib.parse.quote_from_bytes(bytes, safe='/')¶

quote() と似ていますが、str ではなく bytes オブジェクトを取り、文字列からバイト列へのエンコードを行いません。

例: quote_from_bytes(b'a&\xef') は 'a%26%EF' を返します。

urllib.parse.unquote(string, encoding='utf-8', errors='replace')¶

エスケープされた %xx をそれに対応した単一文字に置き換えます。オプション引数の encoding と errors は bytes.decode() メソッドで受け付けられるパーセントエンコードされたシーケンスから Unicode 文字へのデコード法を指定します。

string に使用できるのは str か bytes オブジェクトです。

encoding のデフォルトは 'utf-8'、errors のデフォルトは 'replace' で、不正なシーケンスはプレースホルダー文字に置き換えられます。

例: unquote('/El%20Ni%C3%B1o/') は '/El Niño/' を返します。

バージョン 3.9 で変更: string parameter supports bytes and str objects (previously only str).

urllib.parse.unquote_plus(string, encoding='utf-8', errors='replace')¶

unquote() と似ていますが、HTML フォームの値のアンクオートのために「+」を空白に置き換えます。

string は str でなければなりません。

例: unquote_plus('/El+Ni%C3%B1o/') は '/El Niño/' を返します。

urllib.parse.unquote_to_bytes(string)¶

スケープされた %xx をそれに対応した 1 オクテットに置き換え、bytes オブジェクトを返します。

string に使用できるのは str か bytes オブジェクトです。

str だった場合、string 内のエスケープされていない非 ASCII 文字は UTF-8 バイト列にエンコードされます。

例: unquote_to_bytes('a%26%EF') は b'a&\xef' を返します。

urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus)¶

マッピング型オブジェクトまたは 2 個の要素からなるタプルのシーケンス (str か bytes オブジェクトが含まれているかもしれません) を、パーセントエンコードされた ASCII 文字列に変換します。戻り値の文字列が urlopen() 関数での POST 操作の data で使用される場合はバイト列にエンコードしなければなりません。そうでない場合は TypeError が送出されます。

戻り値は '&' 文字で区切られた key=value のペアからなる一組の文字列になります。 key と value は quote_via を使用してクオートされます。デフォルトで、値をクォートするために quote_plus() が使用されます。つまり、スペースは '+' 文字に、 '/' 文字は %2F にクォートされます。これは GET リクエストの標準に準拠します (application/x-www-form-urlencoded)。 quote_via として渡すことができる別の関数は quote() です。それはスペースを %20 にエンコードし、 '/' をエンコードしません。何がクォートされるかを最大限コントロールしたければ、 quote を使って safe に値を指定してください。

引数 query が 2 要素のタプルのシーケンスの場合、各タプルの第一要素はキーに、第二要素は値になります。値となる要素はシーケンスを取ることもでき、この場合、オプションのパラメーター doseq が True と評価されるのであれば、キーに対し値シーケンスの各要素を個別に結び付けた key=value のペアを、 '&' 文字でつないだものを生成します。エンコードされた文字列内のパラメーターの順序はシーケンス内のパラメータータプルの順序と一致します。

safe, encoding, および errors パラメータは quote_via にそのまま渡されます (クエリ要素が str の場合は、 encoding と errors パラメータだけが渡されます)。

このエンコード処理の逆を行うには、このモジュールで提供されている parse_qs() と parse_qsl() を使用して、クエリ文字列を Python データ構造に変換できます。

POST リクエストのデータ、あるいは URL クエリ文字列を生成するために、 urllib.parse.urlencode() メソッドをどのように使えばよいかを見るには、urllib の使用例を参照してください。

バージョン 3.2 で変更: query はバイト列と文字列オブジェクトをサポートします。

バージョン 3.5 で変更: quote_via パラメータが追加されました。

バージョン 3.14 で非推奨: Accepting objects with false values (like 0 and []) except empty strings and byte-like objects and None is now deprecated.

参考

WHATWG - URL Living standard: Working Group for the URL Standard that defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.
RFC 3986 - Uniform Resource Identifiers: これが現在の標準規格 (STD66) です。urllib.parse モジュールに対するすべての変更はこの規格に準拠していなければなりませんが、若干の逸脱はありえます。これは主には後方互換性のため、また主要なブラウザで一般的に見られる、URL を解析する上でのいくつかの事実上の要件を満たすためです。
RFC 2732 - Format for Literal IPv6 Addresses in URL's.: この規格は IPv6 の URL を解析するときの要求事項を記述しています。
RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax: この RFC では Uniform Resource Name (URN) と Uniform Resource Locator (URL) の両方に対する一般的な文法的要求事項を記述しています。
RFC 2368 - The mailto URL scheme.: mailto URL スキームに対する文法的要求事項です。
RFC 1808 - Relative Uniform Resource Locators: この RFC には絶対 URL と相対 URL を結合するための規則がボーダケースの取扱い方を決定する "異常な例" つきで収められています。
RFC 1738 - Uniform Resource Locators (URL): この RFC では絶対 URL の形式的な文法と意味付けを仕様化しています。

`urllib.parse` --- URL を構成要素に解析する¶

URL の解析¶

URL parsing security¶

ASCII エンコードバイト列の解析¶

構造化された解析結果¶

URL のクオート¶

目次

前のトピックへ

次のトピックへ

This page

urllib.parse --- URL を構成要素に解析する¶

URL の解析¶

URL parsing security¶

ASCII エンコードバイト列の解析¶

構造化された解析結果¶

URL のクオート¶

`urllib.parse` --- URL を構成要素に解析する¶