Sending HTTP Requests Using Python’s urllib

Content:

urllib is a Python 3 library for making HTTP requests. It is part of the Python Standard Library.

The urllib library has gone through a couple of iterations, starting life as a Python 2 library. Documentation can therefore be tricky to find, as it’s not always clear which version of urllib the documentation is referring to.

This is further compounded by the existence of urllib3, which is totally unrelated to the built-in urllib library. You can learn more about the origins of urllib, and a comparison between urllib and the popular requests library, in this article.

This article will cover the basics, and show you how to use urllib in your application.

Importing urllib

The first step to using urllib is to import it into your application.

You can do this by adding the following import statement.

from urllib import error, parse
from urllib.request import Request, urlopen

This not only imports the urllib.request.Request class required to actually send a request, but also includes a few other useful classes to parse the response and handle errors.

As urllib is part of the Python standard library, you shouldn’t need to install any additional packages.

The rest of the code examples in this article will assume you’re using the import statements listed above.

Initialising the Request

The simplest way to create a request is to use urllib.request.urlopen. Pass in a string containing the URL to access. The response can be read using read().

with urlopen(url) as response:
    html = response.read()

For more advanced queries, you can create a Request object, and pass this to urlopen instead. This allows a more customised query.

For example, using Request allows headers and body content to be set. Note that Request uses the data attribute to hold data to send in the request body.

req = Request(url, headers=headers, data=body)
with urlopen(req) as response:
    // Parse response

urllib not only supports HTTP URLs, but can also connect using a variety of different protocols, such as FTP.

Setting Headers

There are two ways to add headers to a urllib Request object.

The first is to pass a dictionary containing the required headers to the Request constructor.

headers = {
    'Content-Type': 'application/x-www-form-urlencoded',
    'Authorization': 'Basic'
}

req = Request(url, headers=headers)

This is useful if you’re adding a large number of headers.

The alternative is to use the add_header() function, which takes the name of the header, followed by the value. This function alters an existing Request object.

req = Request(url)
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
req.add_header('Authorization', 'Basic')

This can be repeated as many times are required to add all of your headers. You might want to use this method when conditionally adding headers

Note that the header here takes the form of a tuple, rather than a dictionary.

Setting the Request Body

When sending a POST request, you’ll probably want to add data to the request body. This can be done in the Request object constructor, similar to setting the headers.

This time, though, you can’t just pass in a dictionary. The data needs to be encoded correctly, using a combination of parse.urlencode() and string.encode().

First, pass your dictionary to parse.urlencode().

body = parse.urlencode({
    'colour': 'brown',
    'size': 9,
    'material': 'leather'
})

parse.urlencode() converts the dictionary to a string containing key=value pairs.

The resulting string needs to be converted to UTF8.

body = body.encode()

This can then be passed to the Request constructor. urllib refers to this attribute as data.

req = Request(url, data=body)

When the data attribute is set to a value other than None (which is the default), the request type is automatically changed to POST.

Setting the Method

As specified above, the method will default to GET if the data attribute is set to None, and POST otherwise.

It’s also possible to set it manually in the Request constructor, by adding a value for the method attribute. For example, the following will create a PUT request.

req = Request(url, data=body, method='PUT')

The method can be set to any value.

Parsing the Response

The simplest way to read the response from a urllib request is to use read().

req = Request(url, headers=headers, data=body)
with urlopen(req) as response:
    response_string = response.read()

The response value is returned as a string.

If you’re expecting a JSON response, you’ll need to use json.loads to decode it. Simply pass the result from response.read() to json.loads(). Be sure to add an import statement for loads to your code.

from json import loads

req = Request(url, headers=headers, data=body)
with urlopen(req) as response:
    json = loads(response.read())

Handling Errors

HTTPError is thrown when a request returns an error response. The HTTPError contains the response code, a reason string, and the full response headers.

The example below checks the error reason attribute, and prints it.

from urllib import error

try:
    req = Request(url, headers=headers, data=body)
    with urlopen(req) as response:
        html = response.read()
except error.HTTPError as e:
    print(e.reason)

You should always try and catch this error, to properly handle request errors.