uritools
— RFC 3986 compliant replacement for urlparse
¶
This module defines RFC 3986 compliant replacements for the most
commonly used functions of the Python 2.7 Standard Library
urlparse
and Python 3 urllib.parse
modules.
>>> from uritools import urisplit, uriunsplit, urijoin, uridefrag
>>> parts = urisplit('foo://user@example.com:8042/over/there?name=ferret#nose')
>>> parts
SplitResult(scheme='foo', authority='user@example.com:8042', path='/over/there', query='name=ferret', fragment='nose')
>>> parts.scheme
'foo'
>>> parts.authority
'user@example.com:8042'
>>> parts.userinfo
'user'
>>> parts.host
'example.com'
>>> parts.port
'8042'
>>> uriunsplit(parts[:3] + ('name=swallow&type=African', 'beak'))
'foo://user@example.com:8042/over/there?name=swallow&type=African#beak'
>>> urijoin('http://www.cwi.nl/~guido/Python.html', 'FAQ.html')
'http://www.cwi.nl/~guido/FAQ.html'
>>> uridefrag('http://pythonhosted.org/uritools/index.html#constants')
DefragResult(uri='http://pythonhosted.org/uritools/index.html', fragment='constants')
For various reasons, the Python 2 urlparse
module is not
compliant with current Internet standards, does not include Unicode
support, and is generally unusable with proprietary URI schemes.
Python 3’s urllib.parse
improves on Unicode support, but the
other issues still remain. As stated in Lib/urllib/parse.py:
FC 3986 is considered the current standard and any future changes
to urlparse module should conform with it. The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes,
some parsing quirks from older RFCs are retained.
This module aims to provide fully RFC 3986 compliant replacements for
the most commonly used functions found in urlparse
and
urllib.parse
, plus additional functions for conveniently
composing URIs from their individual components.
See also
URI Decomposition¶
URI Composition¶
URI Encoding¶
Character Constants¶
-
uritools.
GEN_DELIMS
¶ A string containing all general delimiting characters specified in RFC 3986.
-
uritools.
RESERVED
¶ A string containing all reserved characters specified in RFC 3986.
-
uritools.
SUB_DELIMS
¶ A string containing all subcomponent delimiting characters specified in RFC 3986.
-
uritools.
UNRESERVED
¶ A string containing all unreserved characters specified in RFC 3986.
Structured Parse Results¶
The result objects from the uridefrag()
and urisplit()
functions are instances of subclasses of
collections.namedtuple
. These objects contain the attributes
described in the function documentation, as well as some additional
convenience methods.