Functions
def	url_normalize (url, charset='utf-8')

def	testcase1 (expected, value)

def	testcase2 (original, normalized)

Variables
string	__license__ = "Python"

float	__version__ = 1.1

int	MAX_ALLOWED_LABEL_LENGTH = 63

	suite = unittest.TestSuite()

list	tests1

dictionary	tests2

Function Documentation

◆ testcase1()

def app.url_normalize.testcase1	(	expected,
		value
	)

Definition at line 193 of file url_normalize.py.

   def testcase1(expected, value):
 
     class test(unittest.TestCase):
 
       def runTest(self):
         assert (url_normalize(value) == value) == expected, (expected, value, url_normalize(value))
     return test()
 

Here is the call graph for this function:

◆ testcase2()

def app.url_normalize.testcase2	(	original,
		normalized
	)

Definition at line 276 of file url_normalize.py.

   def testcase2(original, normalized):
 
     class test(unittest.TestCase):
 
       def runTest(self):
         assert url_normalize(original) == normalized, (original, normalized, url_normalize(original))
     return test()
 

Here is the call graph for this function:

◆ url_normalize()

def app.url_normalize.url_normalize	(	url,
		charset = `'utf-8'`
	)

Sometimes you get an URL by a user that just isn't a real
URL because it contains unsafe characters like ' ' and so on.  This
function can fix some of the problems in a similar way browsers
handle data entered by the user:

>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')
'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'

:param charset: The target charset for the URL if the url was
        given as unicode string.

Definition at line 40 of file url_normalize.py.

 def url_normalize(url, charset='utf-8'):
   """
   Sometimes you get an URL by a user that just isn't a real
   URL because it contains unsafe characters like ' ' and so on.  This
   function can fix some of the problems in a similar way browsers
   handle data entered by the user:
 
   >>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')
   'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'
 
   :param charset: The target charset for the URL if the url was
           given as unicode string.
   """
 
   def _clean(string):
     string = unicode(string, 'utf-8', 'replace')
     return unicodedata.normalize('NFC', string).encode('utf-8')
 
   default_port = {
     'ftp': 21,
     'telnet': 23,
     'http': 80,
     'gopher': 70,
     'news': 119,
     'nntp': 119,
     'prospero': 191,
     'https': 443,
     'snews': 563,
     'snntp': 563,
   }
   if isinstance(url, unicode):
     url = url.encode(charset, 'ignore')
 
   # if there is no scheme use http as default scheme
   if url[0] not in ['/', '-'] and ':' not in url[:7]:
     url = 'http://' + url
 
   # shebang urls support
   url = url.replace('#!', '?_escaped_fragment_=')
 
   # splitting url to useful parts
   scheme, auth, path, query, fragment = urlparse.urlsplit(url.strip())
   (userinfo, host, port) = re.search('([^@]*@)?([^:]*):?(.*)', auth).groups()
 
   # Always provide the URI scheme in lowercase characters.
   scheme = scheme.lower()
 
   # Always provide the host, if any, in lowercase characters.
   host = host.lower()
   if host and host[-1] == '.':
     host = host[:-1]
 
   if (len(host) <= MAX_ALLOWED_LABEL_LENGTH):
     # take care about IDN domains
     host = host.decode(charset).encode('idna')  # IDN -> ACE
 
   # Only perform percent-encoding where it is essential.
   # Always use uppercase A-through-F characters when percent-encoding.
   # All portions of the URI must be utf-8 encoded NFC from Unicode strings
   path = quote(_clean(path), "~:/?#[]@!$&'()*+,;=%")
   fragment = quote(_clean(fragment), "~")
 
   # note care must be taken to only encode & and = characters as values
   query = "&".join(["=".join([quote(_clean(t), "~:/?#[]@!$'()*+,;=%") for t in q.split("=", 1)]) for q in query.split("&")])
 
   # Prevent dot-segments appearing in non-relative URI paths.
   if scheme in ["", "http", "https", "ftp", "file"]:
     output = []
     for part in path.split('/'):
       if part == "":
         if not output:
           output.append(part)
       elif part == ".":
         pass
       elif part == "..":
         if len(output) > 1:
           output.pop()
       else:
         output.append(part)
     if part in ["", ".", ".."]:
       output.append("")
     path = '/'.join(output)
 
   # For schemes that define a default authority, use an empty authority if
   # the default is desired.
   if userinfo in ["@", ":@"]:
     userinfo = ""
 
   # For schemes that define an empty path to be equivalent to a path of "/",
   # use "/".
   if path == "" and scheme in ["http", "https", "ftp", "file"]:
     path = "/"
 
   # For schemes that define a port, use an empty port if the default is
   # desired
   if port and scheme in default_port.keys():
     if port.isdigit():
       port = str(int(port))
       if int(port) == default_port[scheme]:
         port = ''
 
   # Put it all back together again
   auth = (userinfo or "") + host
   if port:
     auth += ":" + port
   if url.endswith("#") and query == "" and fragment == "":
     path += "#"
   return urlparse.urlunsplit((scheme, auth, path, query, ''))
 

Here is the call graph for this function:

Here is the caller graph for this function:

Variable Documentation

◆ license

string app.url_normalize.__license__ = "Python"

private

Definition at line 29 of file url_normalize.py.

◆ version

float app.url_normalize.__version__ = 1.1

private

Definition at line 30 of file url_normalize.py.

◆ MAX_ALLOWED_LABEL_LENGTH

int app.url_normalize.MAX_ALLOWED_LABEL_LENGTH = 63

Definition at line 38 of file url_normalize.py.

◆ suite

app.url_normalize.suite = unittest.TestSuite()

Definition at line 151 of file url_normalize.py.

◆ tests1

list app.url_normalize.tests1

Definition at line 154 of file url_normalize.py.

◆ tests2

dictionary app.url_normalize.tests2

Definition at line 205 of file url_normalize.py.

Functions

Variables

Function Documentation

◆ testcase1()

◆ testcase2()

◆ url_normalize()

Variable Documentation

◆ __license__

◆ __version__

◆ MAX_ALLOWED_LABEL_LENGTH

◆ suite

◆ tests1

◆ tests2

◆ license

◆ version