uritools — RFC 3986 compliant replacement for urlparse

This module defines RFC 3986 compliant replacements for the most commonly used functions of the Python 2.7 Standard Library urlparse and Python 3 urllib.parse modules.

>>> from uritools import urisplit, uriunsplit, urijoin, uridefrag
>>> parts = urisplit('foo://user@example.com:8042/over/there?name=ferret#nose')
>>> parts
SplitResult(scheme='foo', authority='user@example.com:8042', path='/over/there', query='name=ferret', fragment='nose')
>>> parts.scheme
'foo'
>>> parts.authority
'user@example.com:8042'
>>> parts.userinfo
'user'
>>> parts.host
'example.com'
>>> parts.port
'8042'
>>> uriunsplit(parts[:3] + ('name=swallow&type=African', 'beak'))
'foo://user@example.com:8042/over/there?name=swallow&type=African#beak'
>>> urijoin('http://www.cwi.nl/~guido/Python.html', 'FAQ.html')
'http://www.cwi.nl/~guido/FAQ.html'
>>> uridefrag('http://pythonhosted.org/uritools/index.html#constants')
DefragResult(uri='http://pythonhosted.org/uritools/index.html', fragment='constants')

For various reasons, the Python 2 urlparse module is not compliant with current Internet standards, does not include Unicode support, and is generally unusable with proprietary URI schemes. Python 3’s urllib.parse improves on Unicode support, but the other issues still remain. As stated in Lib/urllib/parse.py:

FC 3986 is considered the current standard and any future changes
to urlparse module should conform with it.  The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes,
some parsing quirks from older RFCs are retained.

This module aims to provide fully RFC 3986 compliant replacements for the most commonly used functions found in urlparse and urllib.parse, plus additional functions for conveniently composing URIs from their individual components.

See also

RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
The current Internet standard (STD66) defining URI syntax, to which any changes to uritools should conform. If deviations are observed, the module’s implementation should be changed, even if this means breaking backward compatiblity.

URI Decomposition

URI Composition

URI Encoding

Character Constants

uritools.GEN_DELIMS

A string containing all general delimiting characters specified in RFC 3986.

uritools.RESERVED

A string containing all reserved characters specified in RFC 3986.

uritools.SUB_DELIMS

A string containing all subcomponent delimiting characters specified in RFC 3986.

uritools.UNRESERVED

A string containing all unreserved characters specified in RFC 3986.

Structured Parse Results

The result objects from the uridefrag() and urisplit() functions are instances of subclasses of collections.namedtuple. These objects contain the attributes described in the function documentation, as well as some additional convenience methods.