The urllib.parse module defines functions that fall into two broad
categories: URL parsing and URL quoting. These are covered in detail in
the following sections.
Parse a URL into five components, returning a 5-item named tupleSplitResult or SplitResultBytes.
This corresponds to the general structure of a URL:
scheme://netloc/path?query#fragment.
Each tuple item is a string, possibly empty, or None if
missing_as_none is true.
Not defined component are represented an empty string (by default) or
None if missing_as_none is true.
The components are not broken up
into smaller parts (for example, the network location is a single string), and %
escapes are not expanded. The delimiters as shown above are not part of the
result, except for a leading slash in the path component, which is retained if
present. For example:
Following the syntax specifications in RFC 1808, urlsplit() recognizes
a netloc only if it is properly introduced by '//'. Otherwise the
input is presumed to be a relative URL and thus to start with
a path component.
The scheme argument gives the default addressing scheme, to be
used only if the URL does not specify one. It should be the same type
(text or bytes) as urlstring or None, except that the '' is
always allowed, and is automatically converted to b'' if appropriate.
If the allow_fragments argument is false, fragment identifiers are not
recognized. Instead, they are parsed as part of the path
or query component, and fragment is set to None or the empty
string (depending on the value of missing_as_none) in the return value.
返回值是一个 named tuple,这意味着它的条目可以通过索引或作为命名属性来访问,这些属性是:
Following some of the WHATWG spec that updates RFC 3986, leading C0
control and space characters are stripped from the URL. \n,
\r and tab \t characters are removed from the URL at any position.
As is the case with all named tuples, the subclass has a few additional methods
and attributes that are particularly useful. One such method is _replace().
The _replace() method will return a new SplitResult object
replacing specified fields with new values.
Construct a URL from a tuple as returned by urlsplit(). The parts
argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).
If keep_empty is true, empty strings are kept in the result (for example,
a ? for an empty query), only None components are omitted.
This allows rebuilding a URL that was parsed with option
missing_as_none=True.
By default, keep_empty is true if parts is the result of the
urlsplit() call with missing_as_none=True.
在 3.15.0a5 (unreleased) 版本发生变更: Added the keep_empty parameter.
This is similar to urlsplit(), but additionally splits the path
component on path and params.
This function returns a 6-item named tupleParseResult
or ParseResultBytes.
Its items are the same as for the urlsplit() result, except that
params is inserted at index 3, between path and query.
This function is based on obsoleted RFC 1738 and RFC 1808, which
listed params as the main URL component.
The more recent URL syntax allows parameters to be applied to each segment
of the path portion of the URL (see RFC 3986).
urlsplit() should generally be used instead of urlparse().
A separate function is needed to separate the path segments and parameters.
Combine the elements of a tuple as returned by urlparse() into a
complete URL as a string. The parts argument can be any six-item
iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).
If keep_empty is true, empty strings are kept in the result (for example,
a ? for an empty query), only None components are omitted.
This allows rebuilding a URL that was parsed with option
missing_as_none=True.
By default, keep_empty is true if parts is the result of the
urlparse() call with missing_as_none=True.
在 3.15.0a5 (unreleased) 版本发生变更: Added the keep_empty parameter.
If url contains a fragment identifier, return a modified version of url
with no fragment identifier, and the fragment identifier as a separate
string. If there is no fragment identifier in url, return url unmodified
and an empty string (by default) or None if missing_as_none is true.
Instead of raising an exception on unusual input, they may instead return some
component parts as empty strings or None (depending on the value of the
missing_as_none argument).
Or components may contain more than perhaps they should.
The result objects from the urlsplit(), urlparse() and
urldefrag() functions are subclasses of the tuple type.
These subclasses add the attributes listed in the documentation for
those functions, the encoding and decoding support described in the
previous section, as well as an additional method:
Return the re-combined version of the original URL as a string. This may
differ from the original URL in that the scheme may be normalized to lower
case and empty components may be dropped. Specifically, empty parameters,
queries, and fragment identifiers will be removed unless the URL was parsed
with missing_as_none=True.