A short description of the new threading scheme

Wolfgang Bangerth, May 2003

Since the report on multithreading was written in 2000, we have put in place a new implementation of the threading scheme (the first release to contain it is 4.0). The new scheme can do all that you could do before, so the report is in a sense still valid, but it describes a syntax that is no more used. We will here briefly describe this syntax as well as some considerations that guided us while implementing it. For general questions on multithreading, how programs that use it must look like, and for pitfalls to watch out for, please still refer to the report mentioned above.

1. Rationale and Introduction

POSIX and other thread libraries only allow functions as thread entry points that satisfy the signature

  void *  (*)  (void *)
and starting threads involves a clumsy syntax. Thread entry points with another signature need to be "wrapped", i.e. their arguments need to be stored in a structure, and we need a function with above signature that can be used to "unpack" the arguments and call the desired function. This basically forces us to have one such structure and entry function for each function signature that we want to start a thread with.

The first incarnations of the threading scheme in deal.II already got a long way towards making this simpler, by hiding the thread entry points, the packing and unpacking behind a layer of carefully crafted templates. It allowed you to call (almost) any function with arbitrary argument lists on a new thread, except that functions that returned values were not allowed. Implementing such a template scheme is not simple, since, besides simplicity to use, it has to take care of the lifetimes of objects that need to be synchronised across threads, and in particular since templates do not allow for functions with arbitrary numbers of arguments - they need to be repeated for every number of arguments, which makes implementation tedious. Nevertheless, the old scheme was very much usable.

However, the old scheme had a number of shortcomings:

Regarding the last point, note that any other function is called by

    f(arg1, arg2);
    obj.f(arg1, arg2);
Ideally, the following syntax for starting any function on a new thread would be nice:
    spawn f(arg1, arg2);
    spawn obj.f(arg1,arg2);
This syntax is not possible in C++, but the following syntax is, making it relatively clear what the intent of the statement is:
    spawn (f)(arg1, arg2);
    spawn (obj, &Class::f)(arg1,arg2);
This is the syntax we will want to achieve (except for the fact that the spawn function is in a namespace Threads, just like all other entities described here).

This text will discuss the details that are needed to implement this syntax, as well as the following points:

Basically, the syntax above is all you need to know. It is as simple as that. The rest of this text, in comparison is very much of technical nature. I took most of it from a technical discussion I had with the author of the threading scheme in boost, William Kempf. It describes the way the threading scheme is implemented, the meaning of the various classes, etc. It probably doesn't give you much insight how to use it, but should explain in reasonable detail how it works. For more examples of use, take a look at a number of the example programs in deal.II, or at some places in the library itself.

This paper is divided into the following parts:

  1. This introduction
  2. Entities (functions, classes) that are used by both and that describe the newly created thread
  3. Entities that are used on the calling thread
  4. Entities that are used to create a thread
  5. Tool classes
  6. Open problems
  7. Further suggestions
We will present the main parts of the code in the text. The implementation is in the library; all entities that are not to be used by the user are placed into a namespace internal, those to be used are in a namespace Threads. The implementation uses Boost's shared_ptr. Some parts of the implementation parallel the boost::function library, but they are small and taylored to the particular purpose at hand; in particular, they make heavy use of the boost::tuple library. We note that the code has in some places already evolved a little bit beyond the state of this paper, but the main ideas are all to be found still.

2. Entities that describe threads

Each thread that has been created is described by exactly one object of type thread_description<RT>, where RT here and in the sequel will always denote the return type of the function being called on a new thread. The thread_description class is split into an operating system dependent base class, and an independent derived class. The base class is responsible for abstracting the OS interface to the functions creating, joining, killing, and signalling threads. For POSIX threads, this class looks as follows:

    struct thread_description_base {
      private:
        pthread_t                 pt;
        mutable volatile bool     was_joined;
        mutable boost::mutex      join_mutex;
        mutable boost::condition  join_condition;
  
      public:
        thread_description_base () : was_joined (false) {};
        virtual ~thread_description_base () { /* ... */ };
          
        void create (void * (*p) (void *), void *d) {
          pthread_create (&pt, 0, p, d);
        };

        void join () const {
          if (was_joined)
            return;
          boost::mutex::scoped_lock lock(join_mutex);
          if (!was_joined)
              pthread_join (pt, 0);
          was_joined = true;
        };  
    };

join() can be called more than once and uses Schmidt's thread-safe double-checking pattern for speed. There could be additional functions kill() or send_signal(), but these are not presently implemented.

In the destructor, we need to make sure that a thread is joined at least once in its lifetime, or if not that it is being detached (otherwise, we create the thread equivalent of a zombie process, which will lead to a resource leak in the operating system). This is a little tricky, since the destructor might be called while the thread is still running; comments in the code explain how we work around this.

The thread_description<RT> class is derived from this base class:

    template <typename RT>
    struct thread_description : public thread_description_base
    {
        return_value<RT> ret_val;
    };

Its only purpose is to provide a place of storage for the return value of the function being called on the new thread. Since functions might return references or just nothing at all, the return_value template is used. It is described below in the section on Tool Classes. The return value will be set on exit of the function being called.

As mentioned, there is exactly one thread_description<RT> object per created thread. It is accessed using boost::shared_ptr objects, and references are held from each Thread<RT> object for this thread as well as from a wrapper function on the new thread. The object is thus deleted, when all Thread<RT> objects for this thread have gone out of scope (or point to different threads) and the thread itself has finished; this is the appropriate time.

3. Entities that are used on the calling thread

On the calling thread, we basically use the Thread<RT> class, ThreadGroup<RT> class, and spawn function. The Thread<RT> class has the following implementation:

    template <typename RT = void>
    class Thread {
      public:
        Thread () {};
        Thread (const boost::shared_ptr<thread_description<RT> > &td)
          : thread_description (td) {};    
        
        void join () const { thread_description->join (); };
    
        RT return_value () {
          join ();
          return thread_description->ret_val.get();
        };
    
        bool operator == (const thread &t) {
          return thread_description == t.thread_description;
        };
    
      private:
        boost::shared_ptr<thread_description<RT> > thread_description;
    };

Copy constructor and operator= are generated automatically by the compiler. Note that asking for the return_value automatically waits for the thread to finish, and that for this it is helpful that we can call join() more than once on the thread description object. The return_value() function also makes use of the fact that if RT=void, then the return construct is still valid. Furthermore, since this is the most common case, the template argument of the thread class has a default of void.

The ThreadGroup class is a container distributing calls to its member functions to all its elements. Elements are added using operator+=, and they are stored using a std::vector. (A std::set would be more appropriate, but then we would have to have operator< for Thread<RT> objects.) It has the same default value for the template argument:

    template <typename RT = void>
    class ThreadGroup 
    {
      public:
        ThreadGroup & operator += (const Thread<RT> &t) {
          threads.push_back (t);
	  return *this;
        };
    
        void join_all () const {
          for (typename std::vector<Thread<RT> >::const_iterator
                 t=threads.begin(); t!=threads.end(); ++t)
            t->join ();
        };
        
      private:
        std::vector<Thread<RT> > threads;
    };

Since objects of type Thread<RT> are freely copyable, there is no need to provide an index operator for ThreadGroup; if you need to index its elements (for example to get at the return value), use std::vector<Thread<RT> >.

Finally, there are overloads of the spawn template, for unbound functions, as well as const and non-const member functions. We only show them for unary member functions:

    template <typename RT, typename C, typename Arg1>
    mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
    spawn (C &c, RT (C::*fun_ptr)(Arg1)) {
      return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (c,fun_ptr);
    }
    
    template <typename RT, typename C, typename Arg1>
    mem_fun_encapsulator<RT,const C,boost::tuple<Arg1> >
    spawn (const C &c, RT (C::*fun_ptr)(Arg1) const) {
      return mem_fun_encapsulator<RT, const C, boost::tuple<Arg1> > (c,fun_ptr);
    }

Note that we need two overloaded versions, for const and non-const member functions. Both create an intermediate object (in the internal namespace) that will accept arguments in place of the function being called on the new thread, make sure a new thread is created, copy the arguments to the new thread's stack, and only then return. The exact mechanism is described in the next section.

In the implementation, we have to repeat the functions above for binary, ternary, ... member functions, and also for unbound member functions. One would really like to have something also for objects other than pointers to (member-)functions that provide an operator(). However, this doesn't seem to be possible if operator() returns something other than void or takes arguments. This would need some kind of typeof-operator which is not standard C++. See the discussion in the Open Problems section.

4. Entities that are used to create a thread

In this section, we describe the gory details of copying arguments from the stack of the old thread to the stack of the new one. These details are not necessary to use the spawn() functions, so are probably boring and may be skipped.

The basic idea is the following: spawn() returns an object and provides it with the address of the function to be called, and in the case of a member function with the address of an object. mem_fun_encapsulator looks like this:

    template <typename RT, typename C, typename ArgList,
              int length = boost::tuples::length<ArgList>::value>
    class mem_fun_encapsulator;

    template <typename RT, typename C, typename ArgList>
    class mem_fun_encapsulator<RT,C,ArgList,1> {
        typedef typename mem_fun_ptr<RT,C,ArgList>::type MemFunPtr;      
  
      public:
        mem_fun_encapsulator (C &c, MemFunPtr mem_fun_ptr)
            : c (c), mem_fun_ptr(mem_fun_ptr) {};
  
        Thread<RT> 
        operator() (typename boost::tuples::element<0,ArgList>::type arg1) {
            return mem_fun_wrapper<RT,C,ArgList> (mem_fun_ptr, c,
                                                  boost::tie(arg1)).fire_up ();
        };
      
      private:
        C         &c;
        MemFunPtr  mem_fun_ptr;
    };

(Note how the default value specification of the last template argument automatically redirects uses with three template parameters to the correct four-parameter specialization, even though the general template is never used.)

The constructor stores the two addresses. If one calls

    spawn(obj, &C::f) (42);
the next thing that is invoked is the operator() of this class. It takes the argument(s), creates a temporary with the two addresses and a reference to the argument (that's what boost::tie) does, and calls fire_up() on this temporary. fire_up has all the information, and does the work. Note that we will not pass references to the individual arguments, but bind them all together with boost::tie, so that we need not have different versions of the mem_fun_wrapper class for different numbers of arguments. (However, we need a separate partial specialization of the mem_fun_encapsulator class for each number of function arguments.) The tie_args template is used to make a version of the ArgList type with all reference types; it is described below.

The next question, of course, is how mem_fun_wrapper looks like. Let us first consider the base class that it has in common with fun_wrapper, the wrapping class for non-member function objects:

    template <typename RT, typename EntryPointClass>
    struct wrapper_base {
        Thread<RT> fire_up () {
          thread_descriptor
            = DescriptionPointer(new typename thread_description<RT>());
  
          boost::mutex::scoped_lock lock (mutex);        
          thread_descriptor->create (&EntryPointClass::entry_point,
                                        (void *)this);
          condition.wait (lock);
  
          return thread_descriptor;
        }
  
      protected:
        typedef boost::shared_ptr<thread_description<RT> >
        DescriptionPointer;
        
        DescriptionPointer thread_descriptor;
  
        mutable boost::mutex     mutex;    
        mutable boost::condition condition;
    };

fire_up is the only real function; it creates a thread descriptor object, and calls it with a pointer to the present object, and the address of the starting point is EntryPointClass::entry_point, where EntryPoint is the name of a class that implements this thread starting function and is passed as a template argument to wrapper_base. Before it starts the new thread, it acquires a mutex and afterwards wait until a condition is signalled before it finishes by using the thread descriptor object to generate a Thread<RT> object.

The magic happens in the derived class:

    template <typename RT, class C, typename ArgList>
    struct mem_fun_wrapper
       : public wrapper_base<RT, mem_fun_wrapper<RT,C,ArgList> > 
    {
        typedef typename mem_fun_ptr<RT,C,ArgList>::type MemFunPtr;      
        typedef typename tie_args<ArgList>::type ArgReferences;
        mem_fun_wrapper (MemFunPtr            mem_fun_ptr,
                         C                   &c,
                         const ArgReferences &args)
                        : c (c),
                          mem_fun_ptr (mem_fun_ptr),
                          args (args)  {};
      private:
        mem_fun_wrapper ();
        mem_fun_wrapper (const mem_fun_wrapper &);
        
        C            &c;
        MemFunPtr     mem_fun_ptr;
        ArgReferences args;
        
        static void * entry_point (void *arg)
          {
            const wrapper_base<RT> *w
              = reinterpret_cast<const wrapper_base<RT>*> (arg);
            const mem_fun_wrapper *wrapper
              = static_cast<const mem_fun_wrapper*> (w);
            MemFunPtr mem_fun_ptr = wrapper->mem_fun_ptr;
            C        &c           = wrapper->c;
            ArgList   args        = wrapper->args;
  
            boost::shared_ptr<thread_description<RT> >
              thread_descriptor  = wrapper->thread_descriptor;
            
            {
              boost::mutex::scoped_lock lock (wrapper->mutex);
              wrapper->condition.notify_one ();
            }
            
            call (mem_fun_ptr, c, args, thread_descriptor->ret_val);
            
            return 0;
          };
    };

Note in particular, how this class passes itself as second template parameter to the base class, enabling the latter to call the mem_fun_wrapper::entry_point function as entry point to the new thread. When the fire_up function in the base class is called, it creates a new thread that starts inside this function, and the argument given to it is the address of the wrapper_base object. The first thing the entry_point function does, is to cast back this address to the real object's type (it knows the real type of the object, since the address of this function has been handed down through the template magic), then copies the address of the object to work with and the address of the member function to be called from the stack of the old thread to the stack of this new thread. It then also copies the arguments, which so far have been held only as references, but copies them by value. Next, it gets the address of the return thread descriptor, and with it the address of the return value (the shared_ptr will also make sure that the object lives long enough). The part in braces signals the condition to the old thread, which hangs in the fire_up function: the arguments have been copied, and the old thread can go on, eventually also destroying objects that have been copied by value. Finally, it calls the requested function with the proper arguments through a generic interface (described in the section on tools) and sets the return value of the thread.

5. Tool classes

In the implementation above, some tool classes have been used. These are briefly described here.

a) The return_value<T> class template

This class stores a value of type T if T is not a reference or void. It offers get() and set() functions that get and set the value. If T is a reference type, then set() is obviously not possible since references cannot be rebound after construction time. The class therefore stores a pointer, and set() sets the pointer to the object the reference references. get() then returns the reference again. If T is void, then the class is empty and there is only a get() function that returns void.

    template <typename RT> struct return_value 
    {
      private:
        RT value;
      public:
        RT get () const { return value; }
        void set (RT v) { value = v; }
    };

    template <typename RT> struct return_value<RT &> 
    {
      private:
        RT * value;
      public:
        RT & get () const { return *value; }
        void set (RT & v) { value = &v; }
    };

    template <> struct return_value<void> {
        static void get () {};
    };

b) The call function templates

The call function templates take a function pointer, an argument list tuple, and the address of the return value object, and call the function with these arguments. Since we have to unpack the argument list, we have to dispatch to different functions, depending on the number of arguments, in the usual way:

    template <int> struct int2type;

    template <typename RT, typename PFun, typename ArgList>
    static void call (PFun     fun_ptr,
                      ArgList &arg_list,
                      return_value<RT> &ret_val)
    {
      Caller<RT>::do_call (fun_ptr, arg_list, ret_val,
                           int2type<boost::tuples::length<ArgList>::value>());
    };

The Caller class has the following member functions:

    template <typename RT> struct Caller 
    {
        template <typename PFun, typename ArgList>
        static void do_call (PFun     fun_ptr,
                             ArgList &arg_list,
                             return_value<RT> &ret_val,
                             const int2type<1> &)
        {  ret_val.set ((*fun_ptr) (arg_list.template get<0>()));  };

        // likewise for int2type<0>, int2type<2>, ...
    };

There is a specialization Caller<void> that does not set a return value, and for each call and do_call function there is a second function for member function pointers that takes an object as additional argument.

c) mem_fun_ptr

In order to form a pointer to member function for both cases of const and non-const member functions, we need a simple tool:

    template <typename RT, class C, typename ArgList,
              int length = boost::tuples::length<ArgList>::value>
    struct mem_fun_ptr_helper;

    template <typename RT, class C, typename ArgList>
    struct mem_fun_ptr_helper<RT, C, ArgList, 1>
    {
        typedef RT (C::*type) (typename boost::tuples::element<0,ArgList>::type);
    };

    template <typename RT, class C, typename ArgList>
    struct mem_fun_ptr_helper<RT, const C, ArgList, 1>
    {
        typedef RT (C::*type) (typename boost::tuples::element<0,ArgList>::type) const;
    };

    template <typename RT, class C, typename ArgList>
    struct mem_fun_ptr
    {
        typedef typename mem_fun_ptr_helper<RT,C,ArgList>::type type;
    };

Note that if the second template argument is a const C, then we mark the member function const. The two templates for mem_fun_ptr_helper have to be repeated for every number of arguments that we have in mind. Note also that the specification of the default argument in the declaration of the general template of mem_fun_ptr_helper saves us from recomputing it in mem_fun_ptr.

d) add_reference for tuples

The following classes add references to the elements of a tuple, thus providing the type equivalent of the return value of the boost::tie functions. There are probably ways inside boost's tuples library to do this, but I couldn't locate this.

    template <int N, typename Tuple>
    struct add_reference_to_Nth
    {
        typedef typename boost::tuples::element<N,Tuple>::type ArgType;
        typedef typename boost::add_reference<ArgType>::type type;
    };

    template <typename Tuple, int = boost::tuples::length<Tuple>::value>
    struct tie_args_helper;

    template <typename Tuple>
    struct tie_args_helper<Tuple,1>
    {
        typedef 
        boost::tuple<typename add_reference_to_Nth<0,Tuple>::type>
        type;
    };

    template <typename Tuple>
    struct tie_args 
    {
        typedef typename tie_args_helper<Tuple>::type type;
    };

The tie_args_helper class is repeated for every number of elements we want to use.

6. Open Problems

a) A variable lifetime problem

The only unsolved semantic problem I am aware of at present is the following: if we have a function

    void f(const int &i);
then this function can be called as
    f(1);
i.e. the compiler creates a temporary and passes its address to f(). When invoking f() on a new thread, however, as in
    spawn (f)(1);
then it is only guaranteed that the call to spawn() does not return before the new thread is started and has copied the arguments to f(). However, the argument is only the reference to the temporary, not its value. f() will thus likely observe corrupted values for its argument. On the other hand, copying the value is no option either, of course. Since to the author's best knowledge the language does not provide means to avoid taking the address of a temporary, there is presently no way to avoid this problem. Suggestions for healing it are very welcome.

b) Forwarding of operator()

Above, we have not defined an overload of spawn for functor-like objects, even though that would be desirable. One way to do so would be

    template <typename C>
    mem_fun_encapsulator<void,C,boost::tuple<> >
    spawn (C &c) {
      return spawn (c, &C::operator());
    }
This only works if operator() satisfies the signature
    struct C {    void operator() ();  };

We could add another overload if operator() is const. However, what one would like is an overload for more general signatures. Unfortunately, this requires that we can infer type and number of arguments and return type of operator() at the time we declare the return type of above overload of spawn(). I have not found a way to infer this information just by using the template parameter C -- it just seems not possible. What would work if it were supported by compilers is a kind of typeof-operator:

    template <typename C>
    typeof(spawn(c,&C::operator()))          // **
    spawn (C &c) {
      return spawn (c, &C::operator());
    }

When seeing the declaration, the compiler would automatically check which version of the overloaded spawn() function it would call, and correspondingly take the return type. gcc does support the typeof keyword, but even present CVS snapshots generate an internal compiler error on this construct.

c) Using a memory based scheme rather than condition variables

The scheme using mutices and condition variables to synchronise calling and called thread seems expensive. A simpler approach would be to replace it by letting the creating thread generate an object on the heap that holds copies of the arguments (instead of references as presently), spawn the new thread and just go on without any synchronisation.

The calling thread would then not have to copy the arguments onto its local stack and signal to the calling thread. It would only have to delete the memory after the call to the user-supplied function returns. Apart from replacing ArgReferences by ArgList in some places, the scheme would basically just replace *_encapsulator::operator(), fire_up, and thread_entry_point:

      thread<RT>
      operator() (typename boost::tuples::element<0,ArgList>::type arg1) {
        return (new mem_fun_wrapper<RT,C,ArgList> (mem_fun_ptr, c,
                                                   boost::tie(arg1)))->fire_up ();
      };

      thread<RT> fire_up () {
        thread_descriptor
          = DescriptionPointer(new typename detail::thread_description<RT>());

        thread_descriptor->create (entry_point, (void *)this);
        // no synchronisation here
        return thread_descriptor;
      }

      static void * entry_point (void *arg) {
        wrapper_base<RT> *w       = reinterpret_cast<wrapper_base<RT>*> (arg);
        fun_wrapper      *wrapper = static_cast<fun_wrapper*> (w);
        // no copying here; no synchronisation necessary
        detail::call (wrapper->fun_ptr, wrapper->args,
                      wrapper->thread_descriptor->ret_val);
        // delete memory
        delete wrapper;
        return 0;
      }

The perceived simplicity without using mutices and condition variable might be deceptive, however, since memory allocation and deallocation requires locking and unlocking mutices as well, and is generally not a cheap operation.

However, the main problem is that I get spurious segmentation faults with this on my Linux box. These always happen inside the memory allocation and deallocation functions in the C++ and C language support libraries. I believe that these are not bugs in the application, but in the language runtime. However, my motivation to debug multithreading problems in the libc is very limited; for reference, valgrind 1.94 does not show accesses to uninitialized or already freed memory portions, even for runs that eventually crash later on.

7. Alternative Suggestions

Here are some additional suggestions for discussion:

a) Conversions between return values

If f() is a function returning an integer, then the following is legal:

    double d = f(arg1, arg2);
The question, then, would be: do we want to allow conversions between Thread<double> and Thread<int> objects? And do we want to allow a conversion from Thread<T> to Thread<void> (i.e.: casting away the return value)?

Since one can still assign the return value of the thread to a double,

    double d = thread.return_value();
the only real merit in allowing conversions is in putting threads with different return value types into a ThreadGroup:
    double f1 ();
    int    f2 ();
 
    ThreadTroup<double> tg;
    tg += spawn(f1)();
    tg += spawn(f2)();    // convert Thread<int> to Thread<double>
    tg.join_all ();

Being able to do this is probably only syntactic sugar, except for the case where we are not interested in the return values of all threads, i.e. the conversion Thread<T> -> Thread<void> seems like the only one that is really worth it.

I have made some initial experiments with implementing general conversions. The main problem is that we need to allow conversion chains:

    thread<double> t1 = spawn (f)(arg1, arg2);
    thread<int>    t2 = t1;
    thread<double> t3 = t2;

If f() returns 1.5, then t3.return_value() needs to return 1.0. I believe that such conversions could be implemented, by adding the types in the chain into a boost::tuple of growing length, and writing a function that converts a value of the first type of this tuple to the second, to the third, ..., to the last type in the tuple. However, a plethora of internal compiler errors has scared me off doing more experiments in this direction.

b) Conversions between class types I

When you have a class hierarchy like

    struct B { void f(); };
    struct D : public B {};
then calling
    spawn (D(), &B::f);
fails for gcc (but succeeds with Intel's icc). Presumably, gcc is right: template arguments must match exactly, and D() is of type D, while &B::f leads to a class type of B. There is no function template for spawn for which this call can match without a derived-to-base conversion. We could now change the template
    template <typename RT, typename C, typename Arg1>
    mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
    spawn (C &c, RT (C::*fun_ptr)(Arg1)) {
      return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (c,fun_ptr);
    }
into
    template <typename RT, typename A, typename C, typename Arg1>
    mem_fun_encapsulator<RT,C,boost::tuple<Arg1> >
    spawn (A &a, RT (C::*fun_ptr)(Arg1)) {
      return mem_fun_encapsulator<RT, C, boost::tuple<Arg1> > (a,fun_ptr);
    }
i.e. introduce another class template A for the type of the object. Since the arguments of the constructor to the mem_fun_encapsulator object are known, the compiler would perform a derived-to-base conversion for object a if necessary. I don't know whether this is desirable, in particular since also other conversions could happen here that one would not want (in the extreme case generating a temporary)..

c) Conversions between class types II

When one writes

    spawn (this, &X::f)
one gets an error that "'this' is not convertible to type X&". One has to write *this instead. It would be simple to have another set of overloads of spawn() that accepts a pointer instead of a reference, and simply forwards to the existing function. This is just for the lazy people, probably, but it is a common case.

d) Catching exceptions

When a function on a new thread throws an exception, it only propagates up to one of the two entry_point() functions, then vanishes into the run-time system and kills the program. Ideally, we would have a way to pass it over to the main thread. This, however, would need some support from the language. Basically, we would need two operations:

  • clone an exception without knowing its type; we could then in the entry_point function catch it and stack it somewhere, just like we do for the return value
  • back on the main thread, the Thread::join() function must raise this stored exception if there was one, again without knowing its type.
Given how exceptions are implemented usually, the machinery for these operations is probably there, but is not exported to the user through the run-time environment. Thus, an implementation of such ideas has to wait for changes in the language specification.


Wolfgang Bangerth, 2003