Asynchronous Programming with Twisted

  1. Introduction
  2. The Problem that Deferreds Solve
  3. Deferreds - a signal that data is yet to come
  4. See also

Introduction

This document is a introduction to the asynchronous programming model, and to Twisted's Deferred abstraction, which symbolises a 'promised' result and which can pass an eventual result to handler functions.

Many computing tasks take some time to complete, and there are two reasons why a task might take some time:

  1. it is computationally intensive (for example factorising large numbers) and requires a certain amount of CPU time to calculate the answer; or
  2. it is not computationally intensive but has to wait for data to be available to produce a result.

It is the second class of problem — non-computationally intensive tasks that involve an appreciable delay — that Deferreds are designed to help solve. Functions that wait on hard drive access, database access, and network access all fall into this class, although the time delay varies.

Waiting for answers

A fundamental feature of network programming is that of waiting for data. Imagine you have a function which sends an email summarising some information. This function needs to connect to a remote server, wait for the remote server to reply, check that the remote server can process the email, wait for the reply, send the email, wait for the confirmation, and then disconnect.

Any one of these steps may take a long period of time. Your program might use the simplest of all possible models, in which it actually sits and waits for data to be sent and received, but in this case it has some very obvious and basic limitations: it can't send many emails at once; and in fact it can't do anything else while it is sending an email.

Hence, all but the simplest network programs avoid this model. You can use one of several different models to allow your program to keep doing whatever tasks it has on hand while it is waiting for something to happen before a particular task can continue.

Not waiting on data

There are many ways to write network programs. The main ones are:

  1. handle each connection in a separate operating system process, in which case the operating system will take care of letting other processes run while one is waiting;
  2. handle each connection in a separate thread1 in which the threading framework takes care of letting other threads run while one is waiting; or
  3. use non-blocking system calls to handle all connections in one thread.

Non-blocking calls

The normal model when using the Twisted framework is the third model: non-blocking calls.

When dealing with many connections in one thread, the scheduling is the responsibility of the application, not the operating system, and is usually implemented by calling a registered function when each connection is ready to for reading or writing -- commonly known as asynchronous, event-driven or callback-based programming.

In this model, the earlier email sending function would work something like this:

  1. it calls a connection function to connect to the remote server;
  2. the connection function returns immediately, with the implication that the notify the email sending library will be called when the connect has been made; and
  3. once the connection is made, the connect mechanism notifies the email sending function that the connection is ready.

What advantage does the above sequence have over our original blocking sequence? The advantage is that while the email sending function can't do the next part of its job until the connection is open, the rest of the program can do other tasks, like begin the opening sequence for other email connections. Hence, the entire program is not waiting for the connection.

The Problem that Deferreds Solve

Deferreds are designed to enable Twisted programs to wait for data without hanging until that data arrives.

The basic idea behind Deferreds, and other solutions to this problem, is to keep the CPU as active as possible. If one task is waiting on data, rather than have the CPU (and the program!) idle waiting for that data (a process normally called "blocking"), the program performs other operations in the meantime, and waits for some signal that data is ready to be processed before returning to that process.

In Twisted, a function signals to the calling function that it is waiting by returning a Deferred. When the data is available, the program activates the callbacks on that Deferred to process the data.

Deferreds - a signal that data is yet to come

In our email sending example above, a parent function calls a function to connect to the remote server. Asynchrony requires that this connection function return without waiting for the result so that the parent function can do other things. So how does the parent function or its controlling program know that the connection doesn't exist yet, and how does it use the connection once it does exist?

Twisted has an object that signals this situation. When the connection function returns, it signals that the operation is incomplete by returning a twisted.internet.defer.Deferred object.

The Deferred has two purposes. The first is that it says "I am a signal that the result of whatever you wanted me to do is still pending." The second is that you can ask the Deferred to run things when the data does arrive.

Callbacks

The way you tell a Deferred what to do with the data once it arrives is by adding a callback — asking the Deferred to call a function once the data arrives.

One Twisted library function that returns a Deferred is twisted.web.client.getPage. In this example, we call getPage, which returns a Deferred, and we attach a callback to handle the contents of the page once the data is available:

from twisted.web.client import getPage

from twisted.internet import reactor

def printContents(contents):
    '''
    This is the 'callback' function, added to the Deferred and called by
    it when the promised data is available
    '''

    print "The Deferred has called printContents with the following contents:"
    print contents

    # Stop the Twisted event handling system -- this is usually handled
    # in higher level ways
    reactor.stop()

# call getPage, which returns immediately with a Deferred, promising to
# pass the page contents onto our callbacks when the contents are available
deferred = getPage('http://twistedmatrix.com/')

# add a callback to the deferred -- request that it run printContents when
# the page content has been downloaded
deferred.addCallback(printContents)

# Begin the Twisted event handling system to manage the process -- again this
# isn't the usual way to do this
reactor.run()

A very common use of Deferreds is to attach two callbacks. The result of the first callback is passed to the second callback:

from twisted.web.client import getPage

from twisted.internet import reactor

def lowerCaseContents(contents):
    '''
    This is a 'callback' function, added to the Deferred and called by
    it when the promised data is available. It converts all the data to
    lower case
    '''

    return contents.lower()

def printContents(contents):
    '''
    This a 'callback' function, added to the Deferred after lowerCaseContents
    and called by it with the results of lowerCaseContents
    '''

    print contents
    reactor.stop()

deferred = getPage('http://twistedmatrix.com/')

# add two callbacks to the deferred -- request that it run lowerCaseContents
# when the page content has been downloaded, and then run printContents with
# the result of lowerCaseContents
deferred.addCallback(lowerCaseContents)
deferred.addCallback(printContents)

reactor.run()

Error handling: errbacks

Just as a asynchronous function returns before its result is available, it may also return before it is possible to detect errors: failed connections, erroneous data, protocol errors, and so on. Just as you can add callbacks to a Deferred which it calls when the data you are expecting is available, you can add error handlers ('errbacks') to a Deferred for it to call when an error occurs and it cannot obtain the data:

from twisted.web.client import getPage

from twisted.internet import reactor

def errorHandler(error):
    '''
    This is an 'errback' function, added to the Deferred which will call
    it in the event of an error
    '''

    # this isn't a very effective handling of the error, we just print it out:
    print "An error has occurred: <%s>" % str(error)
    # and then we stop the entire process:
    reactor.stop()

def printContents(contents):
    '''
    This a 'callback' function, added to the Deferred and called by it with
    the page content
    '''

    print contents
    reactor.stop()

# We request a page which doesn't exist in order to demonstrate the
# error chain
deferred = getPage('http://twistedmatrix.com/does-not-exist')

# add the callback to the Deferred to handle the page content
deferred.addCallback(printContents)

# add the errback to the Deferred to handle any errors
deferred.addErrback(errorHandler)

reactor.run()

See also

  1. Using Deferreds, a more complete guide to using Deferreds, including Deferred chaining.
  2. Generating Deferreds, a guide to creating Deferreds and firing their callback chains

Footnotes

  1. There are variations on this method, such as a limited-size pool of threads servicing all connections, which are essentially just optimizations of the same idea.

Index

Version: 2.0.1