Wolfgang Bangerth, May 2003
Since the report on multithreading was written in 2000, we have put in place a new implementation of the threading scheme (the first release to contain it is 4.0). The new scheme can do all that you could do before, so the report is in a sense still valid, but it describes a syntax that is no more used. We will here briefly describe this syntax as well as some considerations that guided us while implementing it. For general questions on multithreading, how programs that use it must look like, and for pitfalls to watch out for, please still refer to the report mentioned above.
POSIX and other thread libraries only allow functions as thread entry
points that satisfy the signature
and starting threads involves a clumsy syntax. Thread entry points
with another signature need to be "wrapped", i.e. their arguments need
to be stored in a structure, and we need a function with above
signature that can be used to "unpack" the arguments and call the
desired function. This basically forces us to have one such structure
and entry function for each function signature that we want to start a
thread with.
void * (*) (void *)
The first incarnations of the threading scheme in deal.II already got a long way towards making this simpler, by hiding the thread entry points, the packing and unpacking behind a layer of carefully crafted templates. It allowed you to call (almost) any function with arbitrary argument lists on a new thread, except that functions that returned values were not allowed. Implementing such a template scheme is not simple, since, besides simplicity to use, it has to take care of the lifetimes of objects that need to be synchronised across threads, and in particular since templates do not allow for functions with arbitrary numbers of arguments - they need to be repeated for every number of arguments, which makes implementation tedious. Nevertheless, the old scheme was very much usable.
However, the old scheme had a number of shortcomings:
void
. We want to be able to call everything
on a new thread that can also be called on the present one.
Regarding the last point, note that any other function is called by
Ideally, the following syntax for starting any function on a new
thread would be nice:
f(arg1, arg2);
obj.f(arg1, arg2);
This syntax is not possible in C++, but the following syntax is,
making it relatively clear what the intent of the statement is:
spawn f(arg1, arg2);
spawn obj.f(arg1,arg2);
This is the syntax we will want to achieve (except for the fact that the
spawn (f)(arg1, arg2);
spawn (obj, &Class::f)(arg1,arg2);
spawn
function is in a namespace Threads
, just like
all other entities described here).
This text will discuss the details that are needed to implement this syntax, as well as the following points:
spawn(
) so as to take unbound functions and member
functions, whether virtual or static. Of course, every call needs to be type
safe, i.e. the exact
same conversions of arguments need to be performed as in a usual call (except
for two additional copies that are necessary).spawn()
needs to return a value that allows us to identify,
and join a thread. The syntax for this will be
Thread<> t = spawn(f)(arg1, arg2);
t.join ();
If we don't save the return value of spawn()
, as in the examples
above, then we have just created a detached thread.f()
returns a value, say, an integer, then
we want to be able to retrieve it once the thread has finished:
Thread<int> t = spawn (f)(1., 1.);
t.join ();
int i = t.return_value ();
This requires some care when functions return references, but some
template magic will save us. Another special case are functions that
return void
.ThreadGroup
object, and wait for them
collectively, rather than one-by-one.Basically, the syntax above is all you need to know. It is as simple as that. The rest of this text, in comparison is very much of technical nature. I took most of it from a technical discussion I had with the author of the threading scheme in boost, William Kempf. It describes the way the threading scheme is implemented, the meaning of the various classes, etc. It probably doesn't give you much insight how to use it, but should explain in reasonable detail how it works. For more examples of use, take a look at a number of the example programs in deal.II, or at some places in the library itself.
This paper is divided into the following parts:
internal
, those to be used are in a namespace
Threads
. The implementation uses Boost's shared_ptr. Some parts of
the implementation parallel the
boost::function library, but they are small and taylored to the
particular purpose at hand; in particular, they make heavy use of the
boost::tuple library. We note that the code has in some places already evolved
a little bit beyond the state of this paper, but the main ideas are all to be
found still.
Each thread that has been created is described by exactly one object
of type thread_description<RT>
, where RT
here and in the
sequel will always denote the return type of the function being called
on a new thread. The thread_description
class is split into an
operating system dependent base class, and an independent derived
class. The base class is responsible for abstracting the OS
interface to the functions creating, joining, killing, and signalling
threads. For POSIX threads, this class looks as follows:
struct thread_description_base {
private:
pthread_t pt;
mutable volatile bool was_joined;
mutable boost::mutex join_mutex;
mutable boost::condition join_condition;
public:
thread_description_base () : was_joined (false) {};
virtual ~thread_description_base () { /* ... */ };
void create (void * (*p) (void *), void *d) {
pthread_create (&pt, 0, p, d);
};
void join () const {
if (was_joined)
return;
boost::mutex::scoped_lock lock(join_mutex);
if (!was_joined)
pthread_join (pt, 0);
was_joined = true;
};
};
join()
can be called more than once and uses Schmidt's thread-safe
double-checking pattern for speed. There could be additional functions
kill()
or send_signal()
, but these are not presently
implemented.
In the destructor, we need to make sure that a thread is joined at least once in its lifetime, or if not that it is being detached (otherwise, we create the thread equivalent of a zombie process, which will lead to a resource leak in the operating system). This is a little tricky, since the destructor might be called while the thread is still running; comments in the code explain how we work around this.
The thread_description<RT>
class is derived from this base
class:
template <typename RT>
struct thread_description : public thread_description_base
{
return_value<RT> ret_val;
};
Its only purpose is to provide a place of storage for the return
value of the function being called on the new thread. Since functions
might return references or just nothing at all, the return_value
template is used. It is described below in the section on Tool
Classes. The return value will be set on exit of the function being
called.
As mentioned, there is exactly one thread_description<RT>
object per created thread. It is accessed using boost::shared_ptr
objects, and references are held from each Thread<RT>
object
for this thread as
well as from a wrapper function on the new thread. The object is thus
deleted, when all Thread<RT>
objects for this thread have gone out of
scope (or point to different threads) and the thread itself has
finished; this is the appropriate time.
On the calling thread, we basically use the Thread<RT>
class, ThreadGroup<RT>
class, and spawn
function. The Thread<RT>
class has the following
implementation:
template <typename RT = void>
class Thread {
public:
Thread () {};
Thread (const boost::shared_ptr<thread_description<RT> > &td)
: thread_description (td) {};
void join () const { thread_description->join (); };
RT return_value () {
join ();
return thread_description->ret_val.get();
};
bool operator == (const thread &t) {
return thread_description == t.thread_description;
};
private:
boost::shared_ptr<thread_description<RT> > thread_description;
};
Copy constructor and operator=
are generated automatically by the
compiler. Note that asking for the return_value
automatically waits
for the thread to finish, and that for this it is helpful that we can
call join()
more than once on the thread description object. The
return_value()
function also makes use of the fact that if RT=void
,
then the return construct is still valid. Furthermore, since this is
the most common case, the template argument of the thread class has a
default of void
.
The ThreadGroup
class is a container distributing calls to its
member functions to all its elements. Elements are added using
operator+=
, and they are stored using a
std::vector
. (A std::set
would be more appropriate,
but then we would have to have operator<
for
Thread<RT>
objects.) It has the same default value for the
template argument:
template <typename RT = void>
class ThreadGroup
{
public:
ThreadGroup & operator += (const Thread<RT> &t) {
threads.push_back (t);
return *this;
};
void join_all () const {
for (typename std::vector<Thread<RT> >::const_iterator
t=threads.begin(); t!=threads.end(); ++t)
t->join ();
};
private:
std::vector<Thread<RT> > threads;
};
Since objects of type Thread<RT>
are freely copyable, there
is no need
to provide an index operator for ThreadGroup
; if you need to index
its elements (for example to get at the return value), use
std::vector<Thread<RT> >
.
Finally, there are overloads of the spawn
template, for unbound
functions, as well as const
and non-const
member
functions. We only show them for unary member functions:
template <typename RT, typename C, typename Arg1>
mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
spawn (C &c, RT (C::*fun_ptr)(Arg1)) {
return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (c,fun_ptr);
}
template <typename RT, typename C, typename Arg1>
mem_fun_encapsulator<RT,const C,boost::tuple<Arg1> >
spawn (const C &c, RT (C::*fun_ptr)(Arg1) const) {
return mem_fun_encapsulator<RT, const C, boost::tuple<Arg1> > (c,fun_ptr);
}
Note that we need two overloaded versions, for const
and
non-const
member functions. Both create an intermediate object (in the
internal
namespace) that will accept arguments in place of the function being
called on the new thread, make sure a new thread is created, copy the
arguments to the new thread's stack, and only then return. The exact
mechanism is described in the next section.
In the implementation, we have to repeat the functions above for
binary, ternary, ... member functions, and also for unbound member
functions. One would really like to have something also for objects other than
pointers to (member-)functions that provide an
operator()
. However, this doesn't seem to be possible if
operator()
returns something other than void
or takes
arguments. This
would need some kind of typeof-operator which is not standard C++. See the
discussion in the Open Problems section.
In this section, we describe the gory details of copying arguments
from the stack of the old thread to the stack of the new one. These
details are not necessary to use the spawn()
functions,
so are probably boring and may be skipped.
The basic idea is the following: spawn()
returns an object and provides
it with the address of the function to be called, and in the case of a
member function with the address of an object. mem_fun_encapsulator
looks like this:
template <typename RT, typename C, typename ArgList,
int length = boost::tuples::length<ArgList>::value>
class mem_fun_encapsulator;
template <typename RT, typename C, typename ArgList>
class mem_fun_encapsulator<RT,C,ArgList,1> {
typedef typename mem_fun_ptr<RT,C,ArgList>::type MemFunPtr;
public:
mem_fun_encapsulator (C &c, MemFunPtr mem_fun_ptr)
: c (c), mem_fun_ptr(mem_fun_ptr) {};
Thread<RT>
operator() (typename boost::tuples::element<0,ArgList>::type arg1) {
return mem_fun_wrapper<RT,C,ArgList> (mem_fun_ptr, c,
boost::tie(arg1)).fire_up ();
};
private:
C &c;
MemFunPtr mem_fun_ptr;
};
(Note how the default value specification of the last template argument automatically redirects uses with three template parameters to the correct four-parameter specialization, even though the general template is never used.)
The constructor stores the two addresses. If one calls
the next thing that is invoked is the
spawn(obj, &C::f) (42);
operator()
of this class. It
takes the argument(s), creates a temporary with the two addresses and
a reference to the argument (that's what boost::tie
) does, and calls
fire_up()
on this temporary. fire_up
has all the information, and does
the work. Note that we will not pass references to the individual
arguments, but bind them all together with boost::tie
, so that we need
not have different versions of the mem_fun_wrapper
class for different
numbers of arguments. (However, we need a separate partial
specialization of the mem_fun_encapsulator
class for each number of
function arguments.) The tie_args
template is used to make a version
of the ArgList
type with all reference types; it is described below.
The next question, of course, is how mem_fun_wrapper
looks like. Let
us first consider the base class that it has in common with
fun_wrapper
, the wrapping class for non-member function objects:
template <typename RT, typename EntryPointClass>
struct wrapper_base {
Thread<RT> fire_up () {
thread_descriptor
= DescriptionPointer(new typename thread_description<RT>());
boost::mutex::scoped_lock lock (mutex);
thread_descriptor->create (&EntryPointClass::entry_point,
(void *)this);
condition.wait (lock);
return thread_descriptor;
}
protected:
typedef boost::shared_ptr<thread_description<RT> >
DescriptionPointer;
DescriptionPointer thread_descriptor;
mutable boost::mutex mutex;
mutable boost::condition condition;
};
fire_up
is the only real function; it creates a thread descriptor
object, and calls it with a pointer to the present object, and the address of
the starting point is EntryPointClass::entry_point
, where
EntryPoint
is the name of a class that implements this thread
starting function and is passed as a template argument to
wrapper_base
.
Before it starts the new thread, it acquires a mutex and
afterwards wait until a condition is signalled before it finishes by
using the thread descriptor object to generate a Thread<RT>
object.
The magic happens in the derived class:
template <typename RT, class C, typename ArgList>
struct mem_fun_wrapper
: public wrapper_base<RT, mem_fun_wrapper<RT,C,ArgList> >
{
typedef typename mem_fun_ptr<RT,C,ArgList>::type MemFunPtr;
typedef typename tie_args<ArgList>::type ArgReferences;
mem_fun_wrapper (MemFunPtr mem_fun_ptr,
C &c,
const ArgReferences &args)
: c (c),
mem_fun_ptr (mem_fun_ptr),
args (args) {};
private:
mem_fun_wrapper ();
mem_fun_wrapper (const mem_fun_wrapper &);
C &c;
MemFunPtr mem_fun_ptr;
ArgReferences args;
static void * entry_point (void *arg)
{
const wrapper_base<RT> *w
= reinterpret_cast<const wrapper_base<RT>*> (arg);
const mem_fun_wrapper *wrapper
= static_cast<const mem_fun_wrapper*> (w);
MemFunPtr mem_fun_ptr = wrapper->mem_fun_ptr;
C &c = wrapper->c;
ArgList args = wrapper->args;
boost::shared_ptr<thread_description<RT> >
thread_descriptor = wrapper->thread_descriptor;
{
boost::mutex::scoped_lock lock (wrapper->mutex);
wrapper->condition.notify_one ();
}
call (mem_fun_ptr, c, args, thread_descriptor->ret_val);
return 0;
};
};
Note in particular, how this class passes itself as second template parameter
to the base class, enabling the latter to call the
mem_fun_wrapper::entry_point
function as entry point to the new
thread. When the fire_up function in the base
class is called, it creates a new thread that starts inside this
function, and the argument given to it is the address of the
wrapper_base
object. The first thing the entry_point
function does, is
to cast back this address to the real object's type (it knows the real
type of the object, since the address of this function has been handed
down through the template magic), then copies the address of
the object to work with and the address of the member function to be
called from the stack of the old thread to the stack of this new
thread. It then also copies the arguments, which so far have been held
only as references, but copies them by value. Next, it gets the
address of the return thread descriptor, and with it the address of
the return value (the shared_ptr
will also make sure that the object
lives long enough). The part in braces signals the condition to the
old thread, which hangs in the fire_up
function: the arguments have
been copied, and the old thread can go on, eventually also destroying
objects that have been copied by value. Finally, it calls the
requested function with the proper arguments through a generic
interface (described in the section on tools) and sets the return
value of the thread.
In the implementation above, some tool classes have been used. These are briefly described here.
return_value<T>
class template
This class stores a value of type T
if T
is not a
reference or void
. It offers get()
and
set()
functions that get and set the value. If T
is a
reference type, then set()
is obviously not possible since
references cannot be rebound after construction time. The class therefore
stores a pointer, and set()
sets the pointer to the object the
reference references. get()
then returns the reference again. If
T
is void
, then the class is empty and there is only
a get()
function that returns
void
.
template <typename RT> struct return_value
{
private:
RT value;
public:
RT get () const { return value; }
void set (RT v) { value = v; }
};
template <typename RT> struct return_value<RT &>
{
private:
RT * value;
public:
RT & get () const { return *value; }
void set (RT & v) { value = &v; }
};
template <> struct return_value<void> {
static void get () {};
};
call
function templates
The call
function templates take a function pointer, an argument list
tuple, and the address of the return value object, and call the
function with these arguments. Since we have to unpack the argument
list, we have to dispatch to different functions, depending on the
number of arguments, in the usual way:
template <int> struct int2type;
template <typename RT, typename PFun, typename ArgList>
static void call (PFun fun_ptr,
ArgList &arg_list,
return_value<RT> &ret_val)
{
Caller<RT>::do_call (fun_ptr, arg_list, ret_val,
int2type<boost::tuples::length<ArgList>::value>());
};
The Caller
class has the following member functions:
template <typename RT> struct Caller
{
template <typename PFun, typename ArgList>
static void do_call (PFun fun_ptr,
ArgList &arg_list,
return_value<RT> &ret_val,
const int2type<1> &)
{ ret_val.set ((*fun_ptr) (arg_list.template get<0>())); };
// likewise for int2type<0>, int2type<2>, ...
};
There is a specialization Caller<void>
that does not set a return
value, and for each call and do_call
function there is a second
function for member function pointers that takes an object as
additional argument.
mem_fun_ptr
In order to form a pointer to member function for both cases of const
and non-const
member functions, we need a simple tool:
template <typename RT, class C, typename ArgList,
int length = boost::tuples::length<ArgList>::value>
struct mem_fun_ptr_helper;
template <typename RT, class C, typename ArgList>
struct mem_fun_ptr_helper<RT, C, ArgList, 1>
{
typedef RT (C::*type) (typename boost::tuples::element<0,ArgList>::type);
};
template <typename RT, class C, typename ArgList>
struct mem_fun_ptr_helper<RT, const C, ArgList, 1>
{
typedef RT (C::*type) (typename boost::tuples::element<0,ArgList>::type) const;
};
template <typename RT, class C, typename ArgList>
struct mem_fun_ptr
{
typedef typename mem_fun_ptr_helper<RT,C,ArgList>::type type;
};
Note that if the second template argument is a const C
, then we mark
the member function const
. The two templates for mem_fun_ptr_helper
have to be repeated for every number of arguments that we have in
mind. Note also that the specification of the default argument in the
declaration of the general template of mem_fun_ptr_helper
saves us
from recomputing it in mem_fun_ptr
.
add_reference
for tuples
The following classes add references to the elements of a tuple, thus
providing the type equivalent of the return value of the boost::tie
functions. There are probably ways inside boost's tuples library to do
this, but I couldn't locate this.
template <int N, typename Tuple>
struct add_reference_to_Nth
{
typedef typename boost::tuples::element<N,Tuple>::type ArgType;
typedef typename boost::add_reference<ArgType>::type type;
};
template <typename Tuple, int = boost::tuples::length<Tuple>::value>
struct tie_args_helper;
template <typename Tuple>
struct tie_args_helper<Tuple,1>
{
typedef
boost::tuple<typename add_reference_to_Nth<0,Tuple>::type>
type;
};
template <typename Tuple>
struct tie_args
{
typedef typename tie_args_helper<Tuple>::type type;
};
The tie_args_helper
class is repeated for every number of elements we
want to use.
The only unsolved semantic problem I am aware of at present is the
following: if we have a function
then this function can be called as
void f(const int &i);
i.e. the compiler creates a temporary and passes its address to
f(1);
f()
. When invoking f()
on a new thread, however, as in
then it is only guaranteed that the call to
spawn (f)(1);
spawn()
does not return
before the new thread is started and has copied the arguments to
f()
. However, the argument is only the reference to the temporary, not
its value. f()
will thus likely observe corrupted values for its
argument. On the other hand, copying the value is no option either, of
course. Since to the author's best knowledge the language does not
provide means to avoid taking the address of a temporary, there is
presently no way to avoid this problem. Suggestions for healing it are
very welcome.
operator()
Above, we have not defined an overload of spawn
for functor-like
objects, even though that would be desirable. One way to do so would be
This only works if
template <typename C>
mem_fun_encapsulator<void,C,boost::tuple<> >
spawn (C &c) {
return spawn (c, &C::operator());
}
operator()
satisfies the signature
struct C { void operator() (); };
We could add another overload if operator()
is
const
. However, what one
would like is an overload for more general signatures. Unfortunately,
this requires that we can infer type and number of arguments and
return type of operator()
at the time we declare the return type of
above overload of spawn()
. I have not found a way to infer this
information just by using the template parameter C
-- it just seems
not possible. What would work if it were supported by compilers is a
kind of typeof
-operator:
template <typename C>
typeof(spawn(c,&C::operator())) // **
spawn (C &c) {
return spawn (c, &C::operator());
}
When seeing the declaration, the compiler would automatically check
which version of the overloaded spawn()
function it would call, and
correspondingly take the return type. gcc does support the typeof
keyword, but even present CVS snapshots generate an internal compiler
error on this construct.
The scheme using mutices and condition variables to synchronise calling and called thread seems expensive. A simpler approach would be to replace it by letting the creating thread generate an object on the heap that holds copies of the arguments (instead of references as presently), spawn the new thread and just go on without any synchronisation.
The calling thread would then not have to copy the arguments onto its
local stack and signal to the calling thread. It would only have to
delete the memory after the call to the user-supplied function
returns. Apart from replacing ArgReferences
by
ArgList
in some places,
the scheme would basically just replace *_encapsulator::operator()
,
fire_up
, and thread_entry_point
:
thread<RT>
operator() (typename boost::tuples::element<0,ArgList>::type arg1) {
return (new mem_fun_wrapper<RT,C,ArgList> (mem_fun_ptr, c,
boost::tie(arg1)))->fire_up ();
};
thread<RT> fire_up () {
thread_descriptor
= DescriptionPointer(new typename detail::thread_description<RT>());
thread_descriptor->create (entry_point, (void *)this);
// no synchronisation here
return thread_descriptor;
}
static void * entry_point (void *arg) {
wrapper_base<RT> *w = reinterpret_cast<wrapper_base<RT>*> (arg);
fun_wrapper *wrapper = static_cast<fun_wrapper*> (w);
// no copying here; no synchronisation necessary
detail::call (wrapper->fun_ptr, wrapper->args,
wrapper->thread_descriptor->ret_val);
// delete memory
delete wrapper;
return 0;
}
The perceived simplicity without using mutices and condition variable might be deceptive, however, since memory allocation and deallocation requires locking and unlocking mutices as well, and is generally not a cheap operation.
However, the main problem is that I get spurious segmentation faults with this on my Linux box. These always happen inside the memory allocation and deallocation functions in the C++ and C language support libraries. I believe that these are not bugs in the application, but in the language runtime. However, my motivation to debug multithreading problems in the libc is very limited; for reference, valgrind 1.94 does not show accesses to uninitialized or already freed memory portions, even for runs that eventually crash later on.
Here are some additional suggestions for discussion:
If f()
is a function returning an integer, then the following is
legal:
The question, then, would be: do we want to allow conversions between
double d = f(arg1, arg2);
Thread<double>
and Thread<int>
objects?
And do we want to allow a
conversion from Thread<T>
to Thread<void>
(i.e.: casting away the return value)?
Since one can still assign the return value of the thread to a double,
the only real merit in allowing conversions is in putting threads with
different return value types into a
double d = thread.return_value();
ThreadGroup
:
double f1 ();
int f2 ();
ThreadTroup<double> tg;
tg += spawn(f1)();
tg += spawn(f2)(); // convert Thread<int> to Thread<double>
tg.join_all ();
Being able to do this is probably only syntactic sugar, except for the
case where we are not interested in the return values of all threads,
i.e. the conversion Thread<T> -> Thread<void>
seems
like the only one that is really worth it.
I have made some initial experiments with implementing general
conversions. The main problem is that we need to allow conversion
chains:
thread<double> t1 = spawn (f)(arg1, arg2);
thread<int> t2 = t1;
thread<double> t3 = t2;
If f()
returns 1.5, then t3.return_value()
needs to
return 1.0. I believe that such conversions could be implemented, by adding the
types in the chain into a boost::tuple
of growing length, and writing
a function that converts a value of the first type of this tuple to
the second, to the third, ..., to the last type in the tuple. However,
a plethora of internal compiler errors has scared me off doing more
experiments in this direction.
When you have a class hierarchy like
then calling
struct B { void f(); };
struct D : public B {};
fails for gcc (but succeeds with Intel's icc). Presumably, gcc is
right: template arguments must match exactly, and
spawn (D(), &B::f);
D()
is of type
D
, while &B::f
leads to a class type of
B
. There is no function template for spawn for which this call can
match without a derived-to-base conversion. We could now change the template
into
template <typename RT, typename C, typename Arg1>
mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
spawn (C &c, RT (C::*fun_ptr)(Arg1)) {
return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (c,fun_ptr);
}
i.e. introduce another class template
template <typename RT, typename A, typename C, typename Arg1>
mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
spawn (A &a, RT (C::*fun_ptr)(Arg1)) {
return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (a,fun_ptr);
}
A
for the type of the
object. Since the arguments of the constructor to the
mem_fun_encapsulator
object are known, the compiler would perform a
derived-to-base conversion for object a
if necessary. I don't know
whether this is desirable, in particular since also other conversions
could happen here that one would not want (in the extreme case
generating a temporary)..
When one writes
one gets an error that "'this' is not convertible to type X&". One has
to write
spawn (this, &X::f)
*this
instead. It would be simple to have another set of
overloads of
spawn()
that accepts a pointer instead of a reference,
and simply forwards to the existing function. This is just for the
lazy people, probably, but it is a common case.
When a function on a new thread throws an exception, it only
propagates up to one of the two entry_point()
functions, then vanishes
into the run-time system and kills the program. Ideally, we would have
a way to pass it over to the main thread. This, however, would need
some support from the language. Basically, we would need two
operations:
entry_point
function catch it and stack it somewhere, just like we
do for the return valueThread::join()
function must raise this
stored exception if there was one, again without knowing its type.