Thursday, July 10, 2014

C++ design brilliance: exception-safe code without try/catch

This is kind of a little thing to appreciate about the design of C++. If you're already familiar with basic C++ exception-safety stuff, you can skip my spiel and go to the last paragraph.

Problem Statement

Have you ever seen code that looks like this? Perhaps in a school assignment, on an online forum, or in a production code base?
// global mutex variable
std::mutex mut;

void foo()
{
    mut.lock();

    bar();

    mut.unlock();
}
At first glance this code may seem innocent, but it is actually a time bomb waiting to go off. Consider the case where bar() is defined as follows:
void bar()
{
    throw std::logic_error("ya blew it!");
}
In this case, the exception will be thrown out of bar(), which will end execution of foo() early, which will incorrectly leave the mutex locked upon leaving foo().

Next question: Does the following code handle this error correctly?
try
{
    foo();
}
catch (const std::exception& e)
{
    std::cout << "Caught exception: " << e.what() << std::endl;
}
Unfortunately, the answer is no. Even if you catch the exception, it's too late: The mutex is already locked, and the next call to foo() will cause the program to hang forever when it tries to re-lock mut.

We could put a try/catch block inside foo(), but this is a bad idea for two reasons:
  1. It's ugly to have to indent a whole function every time you want to lock a mutex.
  2. Companies often have an internal C++ police that disallows use of exceptions. Any code containing try/catch will not pass code review.
The Solution
A nice little fact about C++ helps us here: Destructors of local objects are called when you return from a function normally, and also when a function is ended by an exception. Thus, we can define the following helper class:
class lock_guard
{
    std::mutex& m_;
public:
    lock_guard(std::mutex& m) : m_(m)
    {
        m_.lock();
    }
    ~lock_guard()
    {
        m_.unlock();
    }
};
Using this helper class (which is actually provided by the standard libraries, see std::lock_guard), we can guarantee that our mutex will be properly unlocked no matter how the function execution ends. (This is similar to the finally block in languages like Java.) This feature allows us to write the correct version of foo():
void foo()
{
    lock_guard lock(mut);

    bar();
}
Now, whether bar() throws or doesn't, our mutex is always unlocked. Great! So, where's the subtle brilliance in this?

The Brilliance
The brilliance in this is that we managed to make code exception-safe without using any try/catch statements. This shows that it's possible to write exception-safe code that still gets through your company's exception-hating C++ police. Once your company finally agrees to allow use of exceptions, you won't need to refactor any code because it will already be exception-safe!

My question to readers: Is this exception-safety mechanism general enough to handle any case that would ordinarily require try/catch? How far can we go without offending the company's C++ police?

Friday, December 13, 2013

Using a Lippincott Function for Centralized Exception Handling

The lippincott function is a way to wrap the handling of many different exceptions into a single reusable function.

Consider this use case:
  • A C++ library (say, libfoo) has many distinct exception types.
  • You want to wrap the C++ library with a C API.
  • You want error codes instead of exceptions for the C API.

A simple approach is to write every C API function as follows:
typedef enum foo_Result {
    FOO_OK,
    FOO_ERROR1,
    FOO_ERROR2,
    FOO_UNKNOWN
} foo_Result;

foo_Result foo_dothing()
{
    try
    {
        // Can throw MyException1 or MyException2
        foo::DoThing();
    }
    catch (const MyException1&)
    {
        return FOO_ERROR1;
    }
    catch (const MyException2&)
    {
        return FOO_ERROR2;
    }
    catch (...)
    {
        return FOO_UNKNOWN;
    }

    return FOO_OK;
}
There are some maintenance problems with the above approach:
  • DoThing() might later throw more exception types.
  • You must repeat the exception handling code for every API function.

Jon Kalb suggests the following refactoring method, which he named after Lisa Lippincott, who taught him the technique.
foo_Result lippincott()
{
    try
    {
        throw;
    }
    catch (const MyException1&)
    {
        return FOO_ERROR1;
    }
    catch (const MyException2&)
    {
        return FOO_ERROR2;
    }
    catch (...)
    {
        return FOO_UNKNOWN;
    }
}

foo_Result foo_dothing()
{
    try
    {
        foo::DoThing();
        return FOO_OK;
    }
    catch (...)
    {
        return lippincott();
    }
}
"throw;", when inside a catch block, simply rethrows the currently caught exception. In this case, the "throw;" is not directly placed in a catch block within the lippincott() function, but it is transitively (safely) called from within the catch block of foo_dothing().

There are some important preconditions to calling lippincott():
  • You cannot call the lippincott function from outside a catch block.
    • "throw;" outside of a catch block will call std::terminate().
  •  lippincott() must be noexcept. No exceptions should escape it.
    • The exception will leak out of the C API otherwise.

If we want extra safety, we can implement safeguards for the aforementioned preconditions:

To safely handle a violation of the first precondition, we can check that std::current_exception()  is not null previous to doing the "throw;"

To prevent an exception from being thrown out of lippincott(), we can wrap the whole body of the function in a try/catch.

Here is the "extra safe"/paranoid version of lippincott():
foo_Result lippincott()
try
{
    try
    {
        if (std::exception_ptr eptr = std::current_exception())
        {
            std::rethrow_exception(eptr);
        }
        else
        {
            return FOO_UNKNOWN;
        }
    }
    catch (const MyException1&)
    {
        return FOO_ERROR1;
    }
    catch (const MyException2&)
    {
        return FOO_ERROR2;
    }
    catch (...)
    {
        return FOO_UNKNOWN;
    }
}
catch (...)
{
    return FOO_UNKNOWN;
}
The C API can now be written entirely in the style of foo_dothing(), which will centralize the error code conversion through the lippincott function.

Another interesting idea is to use the lippincott function to convert exceptions into string representations for debugging. This area is especially in need of the extra try/catch surrounding the function, since allocating dynamic strings can fail.

Full working example: http://ideone.com/m2ZfHN

The term "lippincott function" was popularized by Jon Kalb in his exception-safe coding talks.
You can find his explanation here: youtube link

Wednesday, December 4, 2013

Correcting the transitivity of const in C++

C++ allows you to make an instance of a class const. This normally prevents making changes to any member variables and also prevents the user to call any member functions which are not also marked const.

There is, however, a quirk which you must be aware of. Consider the following code:
struct A {
    int* x;
    A(): x{new int} {}
   ~A() { delete x; }
};

int main()
{
    const A a;
    *a.x = 3;
}
This code compiles and runs with no problems, even though it might appear to be writing over a read-only piece of data.
The type of a.x reveals the reason why this is allowed:
int * const
This denotes a "Constant pointer to int", which is different from a "Pointer to a constant int" or a "Constant pointer to a constant int".
In this case of "Constant pointer to int", overwriting the pointer is not allowed, but overwriting the int it points to is possible.
This may be desired behaviour in some cases, but in others it is not.

This differs from the D Programming Language, where both const and immutable are transitive by default (although in a slightly different but cool way.)

One simple way to prevent the possibly undesired behaviour is to use access modifiers and getters to prevent direct access from the user:
struct A {
    A(): x_{new int} {}
   ~A() { delete x_; }

          int& x()       { return *x_; }
    const int& x() const { return *x_; }
private:
    int* x_;
};
Now it is impossible* to access a mutable reference to *x_ through a const A.
However, it is still possible to write to *x_ from within const member functions of A. This makes it possible for const member functions to have side-effects on the class which are unexpected by the user.
[*]: again, nothing is impossible in C++.

C++11's smart pointers also have the property of not being transitively const.
Note the signatures of these std::unique_ptr member functions:
pointer std::unique_ptr::get() const;

typename std::add_lvalue_reference<T>::type
         operator*() const;  

pointer operator->() const;
These methods all return non-const pointers and references, even if the method is called on a const std::unique_ptr instance.

We could keep enforcing the transitive const relationship by writing the const and non-const getters for every publicly exposed member, but there exists a more general way to solve this problem: We can write a smart pointer with built-in transitivity for const.

std::unique_ptr is not far from the mark, so let's work on top of it:
template<
    class T,
    class Deleter = std::default_delete<T>
> class transitive_ptr : public std::unique_ptr<T,Deleter>
{
public:
    // inherit typedefs for the sake of completeness
    typedef
        typename std::unique_ptr<T,Deleter>::pointer
        pointer;
    typedef
        typename std::unique_ptr<T,Deleter>::element_type
        element_type;
    typedef
        typename std::unique_ptr<T,Deleter>::deleter_type
        deleter_type;

    // extra typedef
    typedef
        const typename std::remove_pointer<pointer>::type*
        const_pointer;

    // inherit std::unique_ptr's constructors
    using std::unique_ptr<T,Deleter>::unique_ptr;
 
    // add transitively const version of get()
    pointer get() {
        return std::unique_ptr<T,Deleter>::get();
    }
    const_pointer get() const {
        return std::unique_ptr<T,Deleter>::get();
    }

    // add transitively const version of operator*()
    typename std::add_lvalue_reference<T>::type
    operator*() {
        return *get();
    }
    typename std::add_lvalue_reference<const T>::type
    operator*() const {
        return *get();
    }
 
    // add transitively const version of operator->()
    pointer operator->() {
        return get();
    }
    const_pointer operator->() const {
        return get();
    }
};
Attempting to write to a transitively const instance of a pointer will now fail, and our class declaration is also much more concise because we were able to convey the rules for using x within its type:
struct A {
    transitive_ptr<int> x;
    A(): x{new int} {}
};

int main() {
    const A a;
    *a.x = 3;
}
Compiler output:
error: assignment of read-only location 
‘a.A::x.transitive_ptr<T, Deleter>::operator*
<int, std::default_delete<int> >()’
*a.x = 3;
     ^
This is another useful smart pointer to add to our smart pointer tool box, getting us one step closer to always correctly enforcing the rule of zero.

Full working example: http://ideone.com/0RUr3V

Friday, November 29, 2013

D's scope statement in C++

This article is at Version 2 as of:
November 30th 2013
Online discussion of this article may be outdated.


The D Programming Language brings a unique feature: The scope statement. In this C++ secret, I will explain how to implement it in C++.

Disclaimer


Some parts here get tricky, and I don't claim to be a C++ god. If something looks strange to you, then something could really be wrong. Do some research and please tell me what you think!

Introduction


For The Innocents among you who don't know about D's scope feature, please read Dlang's official documentation for a good explanation.

In summary, the scope statement allows you to specify blocks of code to be run at the end of their scope, and conditionally executed based on whether the scope was exited by an exception or not.

In D, the usage is as follows (from dlang.org):
void abc()
{
    Mutex m = new Mutex;

    lock(m); // lock the mutex
    scope(exit) unlock(m); // unlock on leaving the scope

    foo(); // do processing
}
In normal everyday C++, we use RAII to achieve the same effect:
struct lock_guard {
    std::mutex& m_;
    lock_guard(std::mutex& m) : m_(m) { m_.lock(); }
    ~lock_guard() { m_.unlock(); }
};

void abc()
{
    std::mutex m;
    lock_guard guard { m }; // Locks the mutex.
    foo();    
    // ~lock_guard() unlocks the mutex upon leaving the scope.
}
The beauty of this pattern is that ~lock_guard() is guaranteed* to be called upon unwinding the stack, no matter if foo() throws an exception or not. This is an important part of writing exception-safe code.

[*] Nothing is guaranteed in C++: Destructors are not invoked if a thrown exception will never be caught, because in this scenario no exception handler is installed. Also, if a destructor leaks out an exception while another exception is in flight, then all bets are off.

The custom lock_guard class is not necessary, it is part of the C++ standard library.

This might look all good so far, but the problem arises when, for example, we want to use a C library which does not offer convenient C++ RAII resource handles like std::lock_guard. This forces exception-conscious C++ users to write tons of little RAII resource handling classes. On the other hand, if C++ had a scope statement, these boilerplate RAII bindings would not be necessary!

The Goal


Enough talk, here is my promise to you. This C++ Secret allows the following code:
void abc()
{
    // We are manipulating a non-RAII C resource.
    pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_lock(&m); // lock mutex
    scope (exit) {
        // unlock on leaving the scope
        pthread_mutex_unlock(&m);
    };
    foo(); // Do dangerous work which may throw an exception
}
Come hell or high water, this code will not leak the mutex. It will always be unlocked upon exiting the scope of that function, whether through normal code flow or through an exception being raised in foo().

Scope Guard


Everything begins with the scope_guard class, which will store a function object to call upon cleanup, a policy for how that cleanup should be used, and a policy for installing the scope block.
template<typename F, typename CleanupPolicy>
class scope_guard : CleanupPolicy, CleanupPolicy::installer
{
    using CleanupPolicy::cleanup;
    using CleanupPolicy::installer::install;
 
    typename std::remove_reference<F>::type f_;
 
public:
    scope_guard(F&& f) : f_(std::forward<F>(f)) {
        install();
    }
    ~scope_guard() {
        cleanup(f_);
    }
};
This class will store a routine to run at the end of the scope, and the CleanupPolicy has the right to decide how to invoke it. More on the CleanupPolicy's installer later.

Note the use of perfect forwarding to handle both lvalue and rvalue references in one fell swoop.

Installation Policies


These polices define some code to be run at the initialization of a scope_guard.
struct unchecked_install_policy {
    void install() { }
};
 
struct checked_install_policy {
    void install() {
        if (std::uncaught_exception()) {
            std::terminate(); // sorry
        }
    }
};
The installation policies solve an important problem: If an exception is currently in flight, a scope (success) or scope (failure) statement is not able to distinguish between that current exception and any other one thrown after it in the same scope. In this case, the scope (success) and scope (failure) statements behave incorrectly because they can confuse one exception for another.

The only time when an exception is "currently in flight" is during the call to destructors while unwinding the stack. Therefore, we establish and enforce an important invariant to be upheld by the programmer:

ONE CANNOT USE EITHER OF
scope (failure)
scope (success)
WITHIN A DESTRUCTOR!

This includes functions called from destructors.

scope (exit), on the other hand, is always ok. Knock yourself out!

Cleanup Policies


Next, we will define the cleanup policies. Their behaviours are:
  • exit: cleanup will happen come hell or high water.
  • failure: cleanup will happen only if the scope is exited by an exception.
  • success: cleanup will happen only if the scope is exited normally.
An appropriate installation policy is also defined for each cleanup policy.
struct exit_policy {
    typedef unchecked_install_policy installer;

    template<typename F>
    void cleanup(F& f) {
        f();
    }
};

struct failure_policy {
    typedef checked_install_policy installer;

    template<typename F>
    void cleanup(F& f) {
        // Only cleanup if we're exiting from an exception.
        if (std::uncaught_exception()) {
            f();
        }
    }
};

struct success_policy {
    typedef checked_install_policy installer;

    template<typename F>
    void cleanup(F& f) {
        // Only cleanup if we're NOT exiting from an exception.
        if (!std::uncaught_exception()) {
           f();
        }
    }
};
As you can see, exit_policy calls the cleanup unconditionally. Meanwhile, the failure_policy and success_policy use std::uncaught_exception() to check if an exception is currently "in flight" as we are unwinding the stack, which allows them to decide if the cleanup should be made or not.

Furthermore, exit_policy has no checking required during the installation, while failure_policy and success_policy enforce the invariant that no exceptions can be in flight at the time of installing the guard. WITH DEADLY FORCE!

We can test what we have so far:
void SayBye()
{
    std::cout << "Bye bye!\n";
}

int main()
{
    scope_guard<void (*)(), exit_policy>
        sayByeOnExit { SayBye };

    std::cout << "Hello!\n";
}
Output:
Hello!
Bye bye!
As expected, the call to SayBye is deferred until we exit from the scope of main().

Syntactical Sugar


Now that we have nailed down the mechanics, here comes the syntactical sugar.
template<typename CleanupPolicy>
struct scope_guard_builder { };
 
template<typename F, typename CleanupPolicy>
scope_guard<F,CleanupPolicy>
operator+(
    scope_guard_builder<CleanupPolicy> builder,
    F&& f
    )
{
    return std::forward<F>(f);
}

// typical preprocessor utility stuff.
#define PASTE_TOKENS2(a,b) a ## b
#define PASTE_TOKENS(a,b) PASTE_TOKENS2(a,b)

#define scope(condition) \
    auto PASTE_TOKENS(_scopeGuard, __LINE__) = \
        scope_guard_builder<condition##_policy>() + [&]

int main()
{
    scope (exit) {
        std::cout << "Bye bye!\n";
    };

    std::cout << "Hello!\n";
}
Output:
Hello!
Bye bye!
An instance of scope_guard is put on the stack with an automatically generated name made from "_scopeGuard" and the current line number.

Since C++ cannot infer template arguments from arguments passed to a constructor, we cannot simply assign a lambda to a scope_guard. Instead, we use a factory function, which happens to be operator+.

The first argument is an empty struct scope_guard_builder, which is only used to infer the CleanupPolicy type. The condition argument to the scope macro is used to deduce the cleanup policy to use.

The second argument is a function object, which is used to construct the returned scope_guard. The scope macro passes a lambda, which captures its enclosing scope's automatic variables by reference, as this second argument.

operator+ was chosen simply because it is an infix operator. If a regular function would have been used instead of an infix operator, the usage would look something like:
scope (exit) {
    std::cout << "bye\n";
});
Note the outstanding right parenthesis at the end, which would be required to end the regular (prefix notation) function call to which the lambda would be passed.

Instead, the lambda's capture list is part of the macro ("[&]") and the lambda's body is conveniently used as the syntax for the scope statement's block. This also allows you to enhance scope blocks by configuring their lambda's mutability, exception specification, and attribute specification.

Conclusion


To conclude, here is a relatively robust unit test.

The runtime error at the end is intentional - it shows that we enforce the rule that only scope (exit) can be used in destructors.
struct HasScopeExitInDtor {
    ~HasScopeExitInDtor() {
        scope (exit) {
            std::cout << "scope (exit) in dtor success test\n";
        };
        try {
            std::cout << "scope (exit) in dtor failure test\n";
            throw 1;
        } catch (...) { }
    }
};

struct HasScopeSuccessInDtor {
    ~HasScopeSuccessInDtor() {
        std::cout << std::flush;
        scope (success) {
            std::cout << "error: scope (success) used in dtor\n";
        };
    }
};

int main()
{
    {
        const char* captureTest = nullptr;
        scope (exit) {
            std::cout << "scope (exit) success test\n";
            std::cout << "captureTest: " << captureTest;
        };
        scope (success) {
            std::cout << "scope (success) success test\n";
        };
        scope (failure) {
            std::cout << "scope (failure) success test\n";
        };
        captureTest = "ok\n";
    }
    try {
        HasScopeSuccessInDtor s;
  
        scope (exit) {
            std::cout << "scope (exit) failure test\n";
        };
        scope (success) {
            std::cout << "scope (success) failure test\n";
        };
        scope (failure) {
            std::cout << "scope (failure) failure test\n";
        };
  
        HasScopeExitInDtor e;
  
        throw 1;
    } catch (...) { }
}
Output:
scope (success) success test
scope (exit) success test
scope (exit) in dtor failure test
scope (exit) in dtor success test
scope (failure) failure test
scope (exit) failure test
terminate called without an active exception
Beautiful.

Here is the fully functional example: http://ideone.com/2IpHBG

In general, I believe that this makes it easy to write code that is readable and exception-safe. There is only a minor runtime performance cost.

I would personally like scope to be a built-in feature of C++ so that it could be better supported by the compilers themselves.

One downside to this implementation is that the unique name generated by pasting _scopeGuard with __LINE__ disallows using more than one scope statement on the same line. One option is to use __COUNTER__ instead, but it's not standard. This is a minor use case, so I'll let it slide for now.

The major downside is the inability to call scope (success) or scope (failure) within a destructor. Some alternate implementations below address this in interesting ways.

Special Thanks


Thanks to edoceo` in ##C++ on irc.freenode.net for various suggestions and valuable criticism.

Thanks to The One And Only Andrei Alexandrescu for explaining to me the bug of calling scope (success) and scope (failure) within destructors.

Thanks to my colleague, JP Flouret, for helping me remember that, sometimes, less is more.

Further Research


Here are some alternate implementations of the ScopeGuard pattern:

Github


https://github.com/nguillemot/scope
  

Thursday, November 28, 2013

Hello, World!

Hello, welcome to the blog.

I will share some C++ secrets with you on this blog.

Stay tuned.