Comments on comments to Herb Sutter's updated GotW #6b solution (part 2)

Previously^[1] I wondered, from musing when reading Herb Sutter’s updated Guru of the Week 6b^[2] article, how one might – in C++11 – enforce a concurrent usage pattern in which an object can only be modified after creation by the creating thread until all modifications are done when the object becomes immutable and concurrently accessible. Concurrent access before an object becomes immutable is considered an error as are attempts to modify an object that is immutable.

Part 1 ended with a scheme in which erroneous updates in the mutable phase are detected by operations verifying they are called in the context of the creator thread by comparing the creator thread id – an instance member – with the caller’s thread’s id. I speculated that entering the immutable state could be indicated by setting the thread id member to the ‘no executing thread’ value of std::thread::id after which all non-mutating operations (spelt const in C++) may be called concurrently by any thread, and all calls to mutating operations would fail.

Before continuing I shall mention that I realise there are obvious, simpler, ways to arrange code to support this type of usage. That is not the point; the point is to see if such a usage pattern can be implemented in such a way as to report misuse and be convenient and efficient to boot while taking some of the new C++11 features out for a spin and seeing where we end up!

Let’s continue by taking a detailed look at changing objects from mutable to immutable. This transition has two consequences:

The object cannot be modified at all other than to be destroyed. This could be termed freezing the object or similar.
All threads may access the state of the object, thus all modifications need to be made visible to any reader threads. We might say this is publishing the object.

On some hardware platforms we may be able to reliably achieve the first effect in the manner previously described – that is by just setting the updating thread id member to ‘no executing thread’, but this ought to be an atomic update. If C++11’s std::atomic type template fully supported class-types, which it does not, we could just freeze an object something like so:

    void the_type::freeze()
    {
      validate_call_context();
      update_id.store(std::thread::id{},std::memory_order_relaxed);
    }

The reasoning is thus: the only thread that can access the object initially is the creator thread – all other threads will fail call context validation. After calling freeze even the creator thread would fail call context validation. Other threads will either see the original creator thread’s id during call context validation or the updated ‘no executing thread’ value, neither of which will allow them to update the object.

Unfortunately no thread can even get read access to a frozen object as they will also in general be call context validated (the exception by the way would be for data that is initialised in the object’s constructor and never modified thereafter). But after freezing non-mutating – or const – operations should be allowed. Providing an overloaded const qualified implementation of validate_call_context that allows access if the update_id is ‘no executing thread’ would achieve this.

This scheme will not fully publish an object’s state to other threads. To do this there needs to be inter-thread memory access synchronisation. Specifically, the creator-thread has to release all memory writes it has made and all other threads will have to ensure they acquire these released writes before reading their values. Of course, each reading thread should do this as efficiently as possible – preferably only once.

Because in this case we have sets of pairwise synchronisation requirements between the creator thread and each reader thread acquire-release ordering can be used. The obvious choice here would be to use a std::atomic<bool> flag that the creator thread store-releases to in the publishing operation and each reader thread load-acquires from:

    void the_type::publish()
    {
      validate_call_context();
      published.store(true, std:: memory_order_release);
    }

In which the published instance member is of type std::atomic<bool> initialised to false. This allows the two validate_call_context overloads to simply check published to see if the object has been published:

    void the_type::validate_call_context()
    {
      if ( published.load(std::memory_order_acquire)
        || std::this_thread::get_id()!=update_id
         )
        {
          throw std::runtime_error
                { "Illegal usage : Concurrent access or attempt at "
                  "mutating operation on a published immutable "
                  "object."
                };
        }
    }

    void the_type::validate_call_context() const
    {
      if (!published.load(std::memory_order_acquire)
        && std::this_thread::get_id()!=update_id
         )
        {
          throw std::runtime_error
                { "Illegal usage : Concurrent access to "
                  " an unpublished object."
                };
        }
    }

Note the difference in the checks for the const and non-const overloads.

Placing all the required scaffolding into a single class would allow ‘client’ classes to provide the required support more conveniently. As it stands only publish needs to be accessed outside the type’s implementation so instances of such a support class could be included by composition as an instance member, with publish implemented as a forwarding operation to this member. Another possibility would be as a mix-in base class included by (private) inheritance.

There are still questions to resolve. Most prevalent is how the reader-threads know when they can access an object. With the scheme as discussed so far each reader thread would have to try a non-mutating operation repeatedly until it did not throw an exception – which underlines that it would definitely be a Good Idea™ to define a specific exception type in any real implementation.

Next, the scheme is intrusive – each operation has to remember to do something to validate the call context such as calling validate_call_context. Not only that but each operation has to atomically fetch data with potential memory synchronisation overheads on each call.

In theory at least the memory synchronisation overheads could be reduced for those processors where such overheads are high – namely those with a weakly ordered memory model – by using std::memory_order_consume, which is intended to rely on data dependency ordering, in place of std::memory_order_acquire^[3]. As in this case all updates occur in the context of a single object, they would be dependent on the object’s this pointer. Thus we can replace the published flag with a std::atomic<T*>, where T is the type of our object, initialised to nullptr and store-released to a value of the object’s this pointer in publish. The use of std::memory_order_acquire would be replaced by std::memory_order_consume in both validate_call_context overloads. Additionally, all references to the object would also have to be initially loaded via a call to published.load(std::memory_order_consume) – indicating that some refactoring of the code might be in order.

Note that I said ‘in theory’ in the preceding paragraph. This is because the current specification of C++11 makes it difficult for compiler writers to create an efficient, data-dependency ordering implement of std::memory_order_consume for weakly ordered CPUs and all implementations to date it appears take the lazy option of implementing std::memory_order_consume as std::memory_order_acquire^[3]^[4] .

The final problem that springs to mind is the question of knowing when it is safe to delete an object.

Then there is the question of what effect relaxing some of the constraints would have: allowing the transfer of update-status to another thread as mentioned towards the end of part 1 for example. So it seems the scheme is workable but there is definitely much room for improvement. I feel I may have to write up some further instalments at some point…

Comments on comments to Herb Sutter's updated GotW #6b solution (part 2)

References