CPython 実装の詳細: The inclusion of the itms-services URL scheme can prevent an app from
passing Apple's App Store review process for the macOS and iOS App Stores.
Handling for the itms-services scheme is always removed on iOS; on
macOS, it may be removed if CPython has been built with the
--with-app-store-compliance option.
This module's functions use the deprecated term netloc (or net_loc),
which was introduced in RFC 1808. However, this term has been obsoleted by
RFC 3986, which introduced the term authority as its replacement.
The use of netloc is continued for backward compatibility.
Parse a URL into six components, returning a 6-item named tuple. This
corresponds to the general structure of a URL:
scheme://netloc/path;parameters?query#fragment.
Each tuple item is a string, possibly empty, or None if
missing_as_none is true.
Not defined component are represented an empty string (by default) or
None if missing_as_none is true.
The components are not broken up
into smaller parts (for example, the network location is a single string), and %
escapes are not expanded. The delimiters as shown above are not part of the
result, except for a leading slash in the path component, which is retained if
present. For example:
The scheme argument gives the default addressing scheme, to be
used only if the URL does not specify one. It should be the same type
(text or bytes) as urlstring or None, except that the '' is
always allowed, and is automatically converted to b'' if appropriate.
If the allow_fragments argument is false, fragment identifiers are not
recognized. Instead, they are parsed as part of the path, parameters
or query component, and fragment is set to None or the empty
string (depending on the value of missing_as_none) in the return value.
Characters in the netloc attribute that decompose under NFKC
normalization (as used by the IDNA encoding) into any of /, ?,
#, @, or : will raise a ValueError. If the URL is
decomposed before parsing, no error will be raised.
As is the case with all named tuples, the subclass has a few additional methods
and attributes that are particularly useful. One such method is _replace().
The _replace() method will return a new ParseResult object replacing specified
fields with new values.
バージョン 3.3 で変更: The fragment is now parsed for all URL schemes (unless allow_fragments is
false), in accordance with RFC 3986. Previously, an allowlist of
schemes that support fragments existed.
バージョン 3.6 で変更: Out-of-range port numbers now raise ValueError, instead of
returning None.
バージョン 3.8 で変更: Characters that affect netloc parsing under NFKC normalization will
now raise ValueError.
バージョン 3.15.0a5 (unreleased) で変更: Added the missing_as_none parameter.
The optional argument max_num_fields is the maximum number of fields to
read. If set, then throws a ValueError if there are more than
max_num_fields fields read.
The optional argument separator is the symbol to use for separating the
query arguments. It defaults to &.
バージョン 3.10 で変更: Added separator parameter with the default value of &. Python
versions earlier than Python 3.10 allowed using both ; and & as
query parameter separator. This has been changed to allow only a single
separator key, with & as the default separator.
バージョン 3.14 で非推奨: Accepting objects with false values (like 0 and []) except empty
strings and byte-like objects and None is now deprecated.
The optional argument max_num_fields is the maximum number of fields to
read. If set, then throws a ValueError if there are more than
max_num_fields fields read.
The optional argument separator is the symbol to use for separating the
query arguments. It defaults to &.
バージョン 3.10 で変更: Added separator parameter with the default value of &. Python
versions earlier than Python 3.10 allowed using both ; and & as
query parameter separator. This has been changed to allow only a single
separator key, with & as the default separator.
Construct a URL from a tuple as returned by urlparse(). The parts
argument can be any six-item iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).
If keep_empty is true, empty strings are kept in the result (for example,
a ? for an empty query), only None components are omitted.
This allows rebuilding a URL that was parsed with option
missing_as_none=True.
By default, keep_empty is true if parts is the result of the
urlparse() call with missing_as_none=True.
バージョン 3.15.0a5 (unreleased) で変更: Added the keep_empty parameter.
Characters in the netloc attribute that decompose under NFKC
normalization (as used by the IDNA encoding) into any of /, ?,
#, @, or : will raise a ValueError. If the URL is
decomposed before parsing, no error will be raised.
Following some of the WHATWG spec that updates RFC 3986, leading C0
control and space characters are stripped from the URL. \n,
\r and tab \t characters are removed from the URL at any position.
Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The parts argument can be any five-item
iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).
If keep_empty is true, empty strings are kept in the result (for example,
a ? for an empty query), only None components are omitted.
This allows rebuilding a URL that was parsed with option
missing_as_none=True.
By default, keep_empty is true if parts is the result of the
urlsplit() call with missing_as_none=True.
バージョン 3.15.0a5 (unreleased) で変更: Added the keep_empty parameter.
Because an absolute URL may be passed as the url parameter, it is
generally not secure to use urljoin with an attacker-controlled
url. For example in,
urljoin("https://website.com/users/",username), if username can
contain an absolute URL, the result of urljoin will be the absolute
URL.
If url contains a fragment identifier, return a modified version of url
with no fragment identifier, and the fragment identifier as a separate
string. If there is no fragment identifier in url, return url unmodified
and an empty string (by default) or None if missing_as_none is true.
Extract the url from a wrapped URL (that is, a string formatted as
<URL:scheme://host/path>, <scheme://host/path>, URL:scheme://host/path
or scheme://host/path). If url is not a wrapped URL, it is returned
without changes.
The urlsplit() and urlparse() APIs do not perform validation of
inputs. They may not raise errors on inputs that other applications consider
invalid. They may also succeed on some inputs that might not be considered
URLs elsewhere. Their purpose is for practical functionality rather than
purity.
Instead of raising an exception on unusual input, they may instead return some
component parts as empty strings or None (depending on the value of the
missing_as_none argument).
Or components may contain more than perhaps they should.
We recommend that users of these APIs where the values may be used anywhere
with security implications code defensively. Do some verification within your
code before trusting a returned component part. Does that scheme make
sense? Is that a sensible path? Is there anything strange about that
hostname? etc.
What constitutes a URL is not universally well defined. Different applications
have different needs and desired constraints. For instance the living WHATWG
spec describes what user facing web clients such as a web browser require.
While RFC 3986 is more general. These functions incorporate some aspects of
both, but cannot be claimed compliant with either. The APIs and existing user
code with expectations on specific behaviors predate both standards leading us
to be very cautious about making API behavior changes.
Return the re-combined version of the original URL as a string. This may
differ from the original URL in that the scheme may be normalized to lower
case and empty components may be dropped. Specifically, empty parameters,
queries, and fragment identifiers will be removed unless the URL was parsed
with missing_as_none=True.
Like quote(), but also replace spaces with plus signs, as required for
quoting HTML form values when building up a query string to go into a URL.
Plus signs in the original string are escaped unless they are included in
safe. It also does not have safe default to '/'.