Asynchronous Programming with GTask

Christian Hergert <chris@dronelabs.com>
Last Updated: Wednesday November 12, 2008



You can find the API documentation for GTask at http://docs.dronelabs.com/gtask/api/.

GTask is a young project working towards providing an asynchronous toolkit to GObject and associated language bindings. This document provides an introduction to asynchronous programming using GTask in its current form. Where appropriate, I will try to draw a clear line of what will or may change as development continues.

Experienced programmers may see similarities to other concurrent frameworks as they heavily influenced the design of GTask. I primarily took this route so that documentation on concurrent programming from other projects would continue to be applicable.

Introduction

Asynchronous programming is on the rise. It is important to have a good framework to write this paradigm changing software or code can get out of hand quite quickly. GTask is a mini-framework to simplify asynchronous and concurrent programming with GObject. Now let me Tarantino for a moment and explain why there has been such a shift to asynchronous programming.

Whether you are writing a small script to manage your mp3's or a distributed map-reduce to parse your log files, chances are you will run into similar problems in software design. Computing is not instantaneous. Sometimes we are CPU bound while waiting for a batch of processing to complete. Sometimes we are waiting for data to arrive from an external resource. However, the problem can be generalized the same. We make a request, and at some point in the future it will be done.

GTask works to simplify this problem by abstracting this into a concept called Task's. A task has a specific set of processing to perform or perhaps data to receive. Any of the steps may take an unknown period of time, so a series of callbacks and errbacks are performed after the task has completed. Don't worry if you have no idea what those are. They will be explained in more depth in Callbacks and Errbacks.

This type of programming is not very new in fact. It has been done for years. You might also know it as event-driven or callback-based programming. It is my sincere hope, that even if GTask isn't for you, that you find the absolute joy that comes with mastering asynchronous programming. Once mastered, it is almost impossible to give up.

I would like to thank the developers of Python Twisted for providing what is quite possibly the best implementation of an async framework in existence and a plethora of networking protocols to go with it. Twisted became my drug of choice rather quick and I miss it in every other language I use. Hence, GTask.

Tasks

If GTask was a race, the Task object would be the starting line. It encapsulates a work-item to provide common functionality such as task cancellation, processing chains, and execution dependencies.

What's in a Task?

GTask is a GObject subclass meaning you may inherit and override features as you choose. Lets start off by creating a new instance of a GTask and then cover the details that go into making that happen. The examples provided throughout the document are in C, however, you will find a few supplements at the end exemplifying the language bindings.

You will need to include the <gtask/gtask.h> header in your sources files as such.

1 #include <gtask/gtask.h>

For our first example we will create a task which encapsulates a blocking file read from disk. While this isn't ideal in true asynchronous programming, it does demonstrate how you can use GTask to push long blocking or cpu intensive calls to a thread pool for execution.

 1 static GValue*
 2 read_contents (GTask *self, gpointer user_data)
 3 {
 4   const char *filename = user_data;
 5   // ...
 6   return NULL;
 7 }
 8
 9 int
10 main (int argc, char *argv[])
11 {
12   GTask *task = g_task_new (read_contents, "/etc/passwd", NULL);
13   // ...
14   return 0;
15 }

You will notice we didn't really do anything here as the purpose was mostly so that you will see line 12. Here we create a new instance of the GTask class. We have provided a callback method matching the signature of a GTaskFunc which will be called when the task is executed on a worker thread. Our second argument, "/etc/passwd" is the user data which is passed to read_contents(). The third parameter, which we did not use, is a callback for when the task loses all references and will be cleaned up. You will typically use this to lower the reference count on the second parameter if needed.

Lets look at the signature of GTaskFunc.

  GValue* (*GTaskFunc) (GTask *task, gpointer user_data);

The first argument to the delegate is the task which is being invoked. You can use this during runtime for various effects on the task. For example, you can set the task as errored using g_task_set_error(). The second parameter, as explained above, is the user_data argument supplied when creating the task.

With a task created, we can start digging into the real power of GTask, the Processing Chain.

Processing Chains

The processing chain in GTask is a series of Callbacks and Errbacks that occur after the execution of a task. This allows for post-analysis, further processing, and error handling. All of this is centered around manipulating the data generated during the GTaskFunc.

For example, if our GTaskFunc in the first example was to return the string of data read from the file, it would be common for callbacks to do further processing on the content. Perhaps cutting out some data, or adding additional. Likewise, if there was an error during processing, errbacks could be called to resolve the error.

Callbacks and Errbacks

Before we look into the process of how Callbacks and Errbacks are executed, lets take a look at what each are intended to perform.

A callback is intended to manipulate the data further after the task has completed.

An Errback is intended to resolve an error that happened further up in the processing chain.

In the following image, you will see how callbacks and errbacks are part of a system called Task Handlers. Handlers are not exposed to the public api but serve the point of exemplifying control flow in the processing chain.

In the following diagram you can see how each handler has a Callback and an Errback. A handler is not required to have both, but it allows for branch control during post processing. Handlers are executed sequentially after the task. If the task currently has an error, the next available errback will be executed. If the task is not errored, the next available callback will be executed. This continues through the chain until all handlers are executed.

When I wrote that the handlers are executed sequentially, I was lying to a degree. While they are guaranteed to be executed one after another, there may be a degree of time between the handlers. This is because we provide a mechanism to defer your callback or errback to be executed in a separate thread. A GUI thread for example. In this case, the processing chain pauses until the callback or errback completes. We will go into this deeper in Main Dispatch.

Returning Tasks from Handlers

What happens if you need to block the task until a secondary task has completed? You may return a new task from a GTaskFunc, GTaskCallback, or GTaskErrback. Doing so will pause any further execution of the processing chain until the secondary task has completed. The result of the secondary task will become the result of the original task when execution of the processing chain continues. Likewise, if the secondary task completes with an error, that error will become the error for the original task.

Closures

GTask internally uses GClosure to manage execution of delegates. This is to both simplify language bindings and the internal code. The power of doing so can be seen in the python bindings where delegate methods are free to take an arbitrary number of parameters. Programmers in C are currently not this lucky. If anyone reading this has ideas on integrating variable arguments into the C closures, patches are welcome.

In addition to using delegates directly, methods are provided to pass GClosure instances. See the documentation on the following methods for more information.

Main Dispatch

The Main Dispatch feature of GTask provides a mechanism for callbacks and errbacks to be executed on the main thread. This saves programmers the need to worry about having control over the GDK thread lock for gtk+. This functionality is enabled by default. To disable it, set the "main-dispatch" property to FALSE on your GTaskScheduler.

  g_object_set (scheduler, "main-dispatch", FALSE, NULL);

A common use of the main dispatch feature is to generate new content for your GUI in a GTask and update the GUI from the callback. If there is potential for an error, the errback is a great place to update an error dialog for the GUI.

Task Dependencies

GTask's may have dependencies that prevent premature execution of the task. By adding a dependency to a task, the scheduler is not allowed to execute the task until those dependencies have been met. This is a great way to create a one-to-many processing notification where many tasks are dependent on a single shared task completing.

For documentation on using dependencies, see the API reference for the following methods.

In the near future, helper methods utilizing dependencies will be added. Currently, I'm considering the following helper methods.

  GTask* g_task_any_of (GTask *task1, ...);

This method could be used to execute a task when any of the dependent tasks complete.

  GTask* g_task_all_of (GTask *task1, ...);

This method could be used to execute a task when all of the dependent tasks have completed.

  GTask* g_task_n_of (int n, GTask *task1, ...);

This method could be used to execute a task when a given number of dependent tasks have completed.

Asynchronous Tasks

As I mentioned in the beginning, simply putting many blocking calls on a thread pool is not the answer. It simply hides the problem for a short period of time. As you scale, you will often run into new, more challenging debugging problems.

Before I start, I should mention that I'm not entirely happy with this portion of GTask yet, and am looking for insight on how we can make it more friendly and desirable to use.

When a task is utilizing asynchronous methods in its GTaskFunc, we cannot know that the task has actually completed when the execution completes on the worker thread. This is because the callback for the asynchronous call the delegate did most likely hasn't completed yet. Therefore, a task can be declared as asynchronous using g_task_set_async(). Asynchronous tasks carry the burden of notifying the task when their execution has completed.

This is done as follows.

  g_task_set_state (mytask, G_TASK_CALLBACKS);

This moves the tasks state into the callbacks phase, which is synonymous to the post processing chain.

Task Scheduler

The task scheduler manages how tasks get executed during runtime. It is currently monolithic in nature, requiring much to be reimplemented if you choose to subclass it. I do hope to change this in the near future. With that said, there is nothing preventing you from writing your own scheduling mechanism for the problem domain at hand. This section will be fairly light until the revamp has been completed.

To those considering implementing a scheduler: I suggest that if tasks, callbacks, or errbacks return a task, you immediately execute the task to prevent a sort of bumper-to-bumper effect in the scheduler.

Another pain point I have which I'm not sure I currently have a solution for is scheduling of tasks. Twisted for example, does not have any concept of task scheduling. Its built into a single main loop called the reactor. While this is nice and simple, I do feel it holds back a bit of control that GTask currently provides with the ability for multiple schedulers.

To schedule a task using the default scheduler, we use the following.

  GTaskScheduler *sched = g_task_scheduler_get_default ();
  g_task_scheduler_schedule (sched, mytask);

Custom Schedulers

To implement your own scheduler, you will probably want to override two functions in the vtable. The GTaskSchedulerClass::init method and the GTaskScheduler::schedule method.

The default implementation creates a thread pool in the init method and pushes a work item onto the thread pool in the schedule method. You should really check out the source for the details.

Tips and Tricks

Avoiding shared state

Many times the slowdown in applications comes from negotiating locks and the cost of memory barriers. The processing chain provides an excellent way to mitigate this problem because your state is passed from handler to handler and can be safe from other threads if you so choose.

GTask currently does have a few GMutex's used and a few race conditions have not been fixed. As development continues I hope to remove the locks in favor of lockless algorithms and that goes hand-in-hand with fixing the remaining race conditions. I chose to go the route of defining the API before really pushing the code perfection.

Roadmap

There are so many things I'd like to do with this library. Here is my short list.

Remove Locks

Scheduler Revamp

Auto marshaling of values

Examples

Python Rss Viewer

This example provides a simple webkit view that allows you to put in the url of an rss or atom feed and generates a simple html overview of contents.

This example can be found here.

Vala Rss Viewer

This example is pretty identical to the python version, except implemented in Vala.

This example can be found here.

Copyright

Copyright © 2008 Christian Hergert

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You may obtain a copy of the GNU Free Documentation License from the Free Software Foundation by visiting their Web site or by writing to:

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA