Monday 23 September 2019

Monday 4 July 2016

Break open that container!

If you happen to follow the world of IT infrastructure, DevOps, or just happen to be an IT professional, you, no doubt, have heard of the container revolution. Propelled into popularity by Docker, containers (and the ecosystem around them) seem to be the answer to all things ops. I, too, like the elegance with which containers solve many of the problems around shipping and running software.

Great, but let's pause for a second to see what problems containers actually solve. I contend that they solve three distinct problems:

1. Ability to package software in its entirety and ship it from the publisher to the consumer. Basically it's about distributing a tarball with all the bits and then running those bits without worrying what else is on the target machine. It's solves the dependency hell problem that package mangers were never quite able to do.

2. Running the software in isolation from other processes/containers on the system. Your contained service can now have its own PID space and its very own TCP port 80. If you happen to blindly schedule services onto machines (e.g. if using an orchestration system like Kubernetes), you can sleep soundly knowing that the services won't conflict.

3. Constraining the resources. When running a container, you can cap the CPU, memory, and I/O that the processes inside the container are allowed to consume.

As I've said before, these are three distinct problems and yet all the container systems try to couple them into a single solution. This is unfortunate for two reasons:

1. Often times, I do not need all three when running a service or an application. In fact, most often I care just about (1) -- pulling an image and running it, without worrying about the dependencies that will get sprinkled onto my system. Or I might want to dedicate a machine to my production database and thus do not need isolation or resource constraints. Yet I would still like to use Docker hub to install the bits. Docker, rkt, systemd-nspawn and others do provide knobs to turn off bits and pieces but it is more painful than it needs to be. On the flip side, I may want to isolate and constrain an application that I installed via apt-get or yum (and no, unshare(1) and cgexec(1) do not quite cut it).

2. It does not allow independent innovation in these areas. The coupling prevents me (without hoop jumping) from using Docker to pull and execute the app while using rkt for resource isolation. This feels very un-UNIXy.

It is easy to see how we got to where we are. Containers were initially viewed as lightweight VMs and so inherited the same semantics as their heavier cousins. VMs, in turn, coupled the aforementioned areas due to their implementations. I think it is safe to say that most people no longer view containers as lightweight VMs. The notion of a "systems container" has been superseded by an "application container". It would be prudent for organisations like Open Container Initiative to consider defining interfaces that would allow innovation of individual components of the container runtimes.

Thursday 8 May 2014

Factory functions and their return types

Suppose we need to write a factory function that constructs a runtime polymorphic object. For the purposes of this post, let's say we want to construct a concrete shape object -- a rectangle, triangle, or an ellipse. Here are our basic declarations:

struct shape {
    virtual ~shape() {}
};

struct ellipse : shape {
    ellipse(int rad_a, int rad_b) {}
};

struct triangle : shape {
    triangle(int base, int height) {}
};

struct rectangle : shape {
    rectangle(int width, int height) {}
};

Basic stuff. Now for the factory function:

enum class shape_type { ellipse, triangle, rectangle };

struct rect {
    int w, h;
};

??? make_shape(shape_type type, rect bounds);

What should make_shape return. A pointer to shape, of course, but which kind. Should it be a raw pointer or a smart pointer like std::unique_ptr and std::shared_ptr. C++11 heavily advocates against raw pointers and I completely agree. That leaves us with a unique_ptr or a shared_ptr. I believe that in vast majority of situations there's a single owner of an object so that begs for returning unique_ptr. At least few other people are of the same opinion: here and here.

The argument goes that a shared_ptr can be constructed from a unique_ptr&& so this will also work just fine for the less common shared ownership cases:

std::shared_ptr<shape> s = make_shape(shape_type::ellipse, { 3, 5 });

While that is certainly true, there is a performance problem with this. C++11 encourages us to use std::make_shared<T> to construct shared ownership objects. Most std::make_shared implementations use a single dynamic memory allocation for both the object and the pointer control block (that stores the ref count). Not only does that save on overhead of calling 'new' twice, it also improves the cache locality by keeping the two close.

That benefit is clearly lost with conversion from unique_ptr to shared_ptr. I would therefore argue that factory functions should come in two flavors: a unique and a shared kind:

std::unique_ptr<shape> make_unique_shape(shape_type type, rect bounds);
std::shared_ptr<shape> make_shared_shape(shape_type type, rect bounds);

We now have two functions that do almost identical work. To avoid code duplication, we should factor out common behavior, right? Right but it turns out to be trickier than I expected. What we want is a helper function that is parameterized on make_shared or make_unique (or similar till will have it in C++14). The solution I came up with uses good old tag dispatching.

First, declare the tags but have them also know their associated smart pointer type:

struct shared_ownership {
    template <typename T> using ptr_t = std::shared_ptr<T>;
};

struct unique_ownership {
    template <typename T> using ptr_t = std::unique_ptr<T>;
}; 

Next, we add two overloads to do the actual construction:

template <typename T, typename... Args>
std::unique_ptr<T> make_with_ownership(unique_ownership, Args... args) {
    // until we have make_unique in C+14
    return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}

template <typename T, typename... Args>
std::shared_ptr<T> make_with_ownership(shared_ownership, Args... args) {
    return std::make_shared<T>(std::forward<Args>(args)...);
}

Finally, we can put it all together to create a generic make_shape along with make_unique_shape and make_shared_shape:

template <typename OwnTag>
typename OwnTag::template ptr_t<shape> make_shape(shape_type type, rect bounds, OwnTag owntag) {
    switch( type ) {
        case shape_type::ellipse:
            return make_with_ownership<ellipse>(owntag, bounds.w / 2, bounds.h / 2);

        case shape_type::triangle:
            return make_with_ownership<triangle>(owntag, bounds.w, bounds.h);

        case shape_type::rectangle:
            return make_with_ownership<rectangle>(owntag, bounds.w, bounds.h);
    }
}

inline std::unique_ptr<shape> make_unique_shape(shape_type type, rect bounds) {
    return make_shape(type, bounds, unique_ownership());
}

inline std::shared_ptr<shape> make_shared_shape(shape_type type, rect bounds) {
    return make_shape(type, bounds, shared_ownership());
}

If you look at the return type of make_shape, it should make you cringe with disgust. Yeah, no bonus points for elegant syntax here. I also dislike the verbose name make_with_ownership. Nevertheless, I believe having a generic function for both unique and shared construction is extremely valuable. I would love to hear proposals for a better implementation and suggestions for a more concise name.

As always, the code is available on GitHub.

Sunday 23 March 2014

forkfs: making any directory into a throw away directory

I recently needed to work on two git branches simultaneously. My particular use case required me to make small changes to both branches, rebuild and run. These were temporary changes that I was going to blow away at the end. One approach could have been to have two clones of the repo but that would have required me to rebuild my project from scratch for the second working copy (it was boost which is large). Having run into the same predicament before, I decided to create a script to "fork" a directory.

The script, called forkfs, works as follows. Suppose you are in your bash session in directory foo:
~/foo$ ls
bar

You then execute forkfs which launches a new bash session and changes your prompt to remind you that you are forked:
~/foo$ sudo ~/bin/forkfs
(forkfs) ~/foo$ ls
bar

So far it looks like nothing has changed but if we start making changes to the contents of foo, they'll only be visible in our forked session:
(forkfs) ~/foo$ touch baz
(forkfs) ~/foo$ ls
bar baz

Open up another bash session and do an ls there:
~/foo$ ls
bar

When you exit out of your forkfs session, all your changes will be lost! You can also make multiple forks of the same directory, just not a fork of a fork. The forkfs script is available on GitHub.

Words Of Caution

Be careful where your fork begins. If you're using it for multiple git branches like I was, be sure to fork at the repository directory -- same place that houses the .git directory. Otherwise git will be confused outside of forked session.

Under The Hood

The script makes use of two technologies that are also used by Docker: aufs and mount namespaces. Aufs is a union filesystem which takes multiple source directories and combines them into a single destination directory. For example, one can mount /home/johndoe such that it's a union of /home/john and /home/doe directories. When you make changes in /home/johndoe, aufs uses a preconfigured policy to figure out which underlying directory gets the changes. One such policy allows forkfs to create Copy-On-Write functionality. When forkfs is forking ~/foo:

1. It creates an empty temporary directory, e.g. /tmp/tmp.Gb33ro1lrU
2. It mounts ~/foo (marked as readonly) + /tmp/tmp.Gb33ro1lrU (marked as read-write) into ~/foo:
mount -t aufs -o br=/tmp/tmp.Gb33ro1lrU=rw:~/foo=ro none ~/foo

Since ~/foo was marked as read only, all the writes go to the temporary directory, achieving copy-on-write.

Notice that the aufs was mounted over ~/foo. An unfortunate consequence of this is that the original ~/foo is no longer accessible. Moreover, it will not be possible to create other forks of ~/foo. This is where mount namespaces come to the rescue.

Mount namespaces allow a process to inherit all of the parent's mounts but have any further mounts not be visible by its parent. Linux actually has other namespaces of which a process can get a private copy: IPC, network, host/domain names, PIDs. Linux containers (LXC) makes use of these to provide light-weight virtualization.

unshare is a handy command to get a process running with a private namespace. forkfs uses "unshare -m bash" to get a bash running with a private mount namespace. It then executes aufs mount without having the rest of the system seeing the fork.

Future Work

If I have time, I'd like to add ability to create named forks and then be able to come back to them (like Python's virtualenv).

Acknowledgements

Special thanks goes to Vlad Didenko for helping me out with bash nuances.

Friday 3 January 2014

LastPark: my first iPhone app gone wrong

I decided to venture out into the unknown -- iOS development. My first app -- LastPark -- saves the last location where the car was parked. I was solely trying to scratch a personal itch as I often forget on what street or what end of the parking lot I parked at. There are plenty of apps out there that allow you to mark the location of the vehicle but that requires remembering to do that before you're lost! Then there are Bluetooth LE devices that you can purchase and install in your car. When you shut your engine the device turns off and the app loses connectivity to it, causing it to save the location. $99 Automatic will also provide a slew of other features while the $25 Find My Car Smarter is very basic.

However my car already has the Bluetooth built in and moreover connects to my iPhone every time I get in to provide me with hands free calling. Why, I thought, would I need to buy another Bluetooth device when I got all the parts already installed. This is when I decided to write LastPark. (Technically I had to spend $99 to join the Apple Developer Program -- the same price as the fancy Automatic).

Unfortunately getting access to plain old Bluetooth (not the new Low Energy one) is not exactly easy in iOS. It seems like the only way was to use a private framework (undocumented). The upside is that a few people have already gone this route and there's even a demo project on GitHub that I used as a starting point. The downside is that Apple does not allow apps that use private frameworks in the AppStore. Not a biggie for me as I was developing it for my own use.

What killed this app was the inability to run it in the background. After some time in the background state the app gets suspended and doesn't get woken up for Bluetooth notifications. I tried specifying various background mode preferences in the plist but to no avail. I realize that Apple tries to improve the battery life by limiting the amount of work the apps can do in the background. However I believe it should be more liberal in allowing apps to register for notifications of ambient activities. These registrations probably take just a few bytes in some table inside a daemon (that's running anyway) and don't take much resources.

I've released what I got on GitHub. Any comments on how to get this to work as desired would be greatly appreciated.

Tuesday 3 September 2013

Lazy approach to assignment operators

Most of the time we don't have to worry about defining copy/move constructors and assignment operators -- the compiler happily generates them for us. Sometimes, however, we must do the dirty work ourselves and code them up manually, often together with the destructor. By hand crafting the assignment operators, we sometimes gain extra efficiency (e.g. std::vector re-using the memory in copy-assignment) but most of the time the code just ends up looking like a copy/paste job of destructor and copy/move constructor.

If we're not lazy and define a swap function, we can use the copy/swap idiom to get a free pass on the copy-assignment operator. Not so for the move-assignment. This post from 2009 provides an interesting trick to reuse destructor/copy constructor to implement the copy-assignment without the swap function. Here's the code provided by that post:

struct A {
    A ();
    A (const A &a);
    virtual A &operator= (const A &a);
    virtual ~A ();
};

A &A::operator= (const A &a) {
    if (this != &a) {
        this->A::~A(); // explicit non-virtual destructor
        new (this) A(a); // placement new
    }
    return *this;
}

The author does warn about the downsides of using this trick but I think the concerns are fairly minor. The technique can also be extended for the move-assignment operator:

template <typename T, typename U>
A &A::operator= (A &&a) {
    if (this != &a) {
        this->A::~A(); // explicit non-virtual destructor
        new (this) A(std::move(a)); // placement new
    }
    return *this;
}

And can be generalized into a utility function that can cover both of those uses. The function and its usage is shown below:

template <typename T, typename U>
T& assign(T* obj, U&& other) {
    if( obj != &other ) {
        obj->T::~T();
        new (static_cast<void*>(obj)) T(std::forward<U>(other));
    }
    return *obj;
}

struct A {
    A ();
    A (const A& a);
    A(A&& a);
    virtual ~A ();
    A& operator= (const A& a) {
        return assign(this, a);
    }
    A& operator=(A&& a) {

        return assign(this, std::move(a));
    }
};

One very nice side affect of this approach is that it becomes possible to create a copy/move assignment operators for those classes that can otherwise only be copy/move constructable. For example, consider:

struct X {
   int& i;
};

The compiler will generate a copy/move constructor pair but will delete the corresponding assignment operators. You'd be hard pressed to define them yourself as well. But destroy/construct trick allows us to side step such limitations!

Note that assign's second argument is a "universal reference" and will bind to anything. Thus the assign function can actually by used to implement any assignment operator (not just copy/move) as long as the corresponding constructor is available.

Now suppose that struct X is located in a third party library and you don't want to modify it to add the assignment operators. By defining a utility class assignable<T>, we can add the desired functionality externally:

template <typename T>
class assignable : public T {
public:
    using T::T;

    assignable(T const& other) :
        T(other) {
    }

    assignable(T&& other) :
        T(std::move(other)) {
    }

    template <typename U>
    assignable& operator=(U&& other) {
        return assign(this, std::forward<U>(other));
    }
};

// and usage:
struct X {
    X(int& ii) : i(ii) {}
    int& i;
};

int i = 1, j = 2;
assignable<X> x(i);
x = assignable<X>(j);

This approach makes me wonder if the language should support a way of auto generating the copy/move assignments not member-wise but from destructor and constructor. We could then opt-in to such goodness like this:


Foo& operator=(Foo&&) = default(via_constructor);
Foo& operator=(Foo const&) = default(via_constructor);

The code (along with a work around for compilers not supporting inheritable constructors) is available on GitHub.

Tuesday 4 June 2013

Folding SSL/TLS into TCP to gain efficiency

In this post, I will divert from my usual topic of C++ to jog down my thoughts about TCP and SSL.  I have limited knowledge of networking and even more limited understanding of security so my ramblings here might be full of flaws and security holes. Nevertheless, I thought it would be fun to share a random idea on how to get the web to run a little faster.

Background

SSL is used by HTTPS (and others) to secure the pipe between the browser and the web server. Once used primarily by sites performing financial transactions (banks, ecommerce), it is more and more used by services which require a login (e.g. Gmail, Twitter). As such, a fast HTTPS connection is more important than ever. HTTPS connection starts out as a TCP 3-way handshake, followed by an SSL/TLS handshake. Let's take a quick look at each one in more detail.

TCP 3-way Handshake

TCP 3-way handshake starts with the client sending a SYN packet to the server. Upon receipt, the server replies with its own SYN packet but also piggybacks the acknowledgement (that it received the client's SYN) in the same packet. Thus, the server replies with a SYN-ACK packet. Finally, when the client receives the server's SYN-ACK packet, it replies with an ACK to signify that it has received the SYN from the server. At this point the TCP connection is established and the client can immediately start sending data. Therefore, the latency cost of connection establishment is 1 round-trip in the case of the first message sent by the client (as in HTTP case) and 1.5 round-trips in the case where the server is the first to speak its mind (e.g. POP3).

There are a number of reasons for SYN-packets and 3-way handshake. First, the SYN (synchronize) packets are used to exchange the initial sequence numbers that will be used by both parties to ensure transport reliability (and flow control). Second, it is used to detect old (duplicate) SYN packets arriving at the host by asking the sender to validate them (see RFC793, Figure 8). Lastly, the handshake is used as a security measure. For example, suppose a TCP server is sitting behind a firewall that only accepts traffic from IP 66.55.44.33. If the connection would become established with the very first SYN packet, an attacker could create TCP/IP packet whose source IP would be 66.55.44.33 instead of his own and put the data in that very packet. The server would receive the SYN+data, create a connection and forward the data to the application. By sending a SYN-ACK and expecting it to echo the sequence number in the final ACK packet, the server can be certain that the source IP was not spoofed.

Interestingly, RFC793 does allow data to be included in the SYN packets. However, for the reasons described above, it requires the stack to queue the data for the delivery to the application only after the successful 3-way handshake.

SSL/TLS Handshake

Once a TCP connection is established, SSL handshake begins with the client sending ClientHello message containing version, random bits, and a list of cipher-suites it supports (e.g. AES-256). It can also include a number of extension fields. The server responds with ServerHello containing selected version and cipher-suite, random bits and extension fields. The server then follows up with its certificate and ServerHelloDone message. The three messages can end up within one TCP packet. Finally, the client sends the cipher key (actually bits that will be used to compute it) and both the client and server send messages to switch from plain-text to encrypted communication.

The takeaway here is that just like a TCP handshake, SSL also requires several packets to be exchanged before the application level communication can commence. These exchanges add another 2 round-trips worth of latency.

Proposal: Combine TCP and SSL Handshakes

What if TCP and SSL handshakes could be combined to save a round trip? Conceptually, TCP resides at layer 4 (transport) of the OSI reference model. Since it seems that nobody can quite figure out where SSL fits in, it doesn't seem all that unnatural to create "Secure TCP" protocol (not unlike IPSec). SSL handshake can then begin with the very first SYN packet:

Client                                                     Server
~~~~~~                                                     ~~~~~~
TCP SYN, SSL ClientHello -->
            <-- TCP SYN+ACK, SSL ServerHello+Cert+ServerHelloDone 
TCP ACK, SSL ClientKeyExch+... -->
                            <-- TCP ACK, SSL ChangeCipherSpec+...

This scheme would almost have to push SSL processing into the kernel (since that's where TCP is usually handled) but perhaps some hybrid solution could be implemented (e.g. PKI done in userspace).

Backward Compatibility

Recall that the original TCP specification (RFC793) allows data to be exchanged in SYN packets (I am not sure, though, how OS kernels actually handle this situation). If a server not supporting this new scheme were to receive a SYN/ClientHello packet, it would simply respond with the usual SYN-ACK and queue the ClientHello until it receives the final ACK. The client would see a SYN-ACK packet with no data, send an ACK and wait for the ServerHello to come a bit later.

Security Considerations

One security vulnerability with the proposed scheme is that it would make SYN flood DoS attack easier to execute. Since the data from ClientHello would need to be stored in the embryonic connection, it would use up more memory, causing the system to run out of memory earlier than today. The new scheme would also be incompatible with SYN cookies, a countermeasure against the SYN flood attack.

What about SSL Sessions?

SSL actually has mechanisms for turning the 4-way handshake into a 2-way handshake for all but the first connection (by a client caching a server cookie). This can continue to function as is, and, combined with the proposed scheme would make the whole TCP/SSL connection establishment take just a single round trip.

TCP Fast Open

A recent proposal from Google, called TCP Fast Open, is designed to reduce the overhead of the TCP 3-way handshake. In a similar way to SSL sessions, TCP Fast Open generates a cryptographically signed cookie that the client can store and then use for subsequent connections. While the first client-server connection uses the full 3-way handshake, subsequent ones can have data sent and delivered to the application via the SYN packet. While TCP Fast Open should be able to peacefully coexist with this proposal, it would not help in reducing the number of round trips for SSL's session establishment.

Related Work

A proposal from 2009, attempts to solve a similar problem in a generic way. It proposes the ability to include a limited amount of data (up to 64 bytes) in the SYN-ACK packet coming from the server. It could then be used to include a first part of the handshake. However, although the proposal aims to be generic (not tied to a specific upper layer protocol), it does not allow data to be included in the client's SYN packet, making it incompatible with protocols that rely on the client to initiate the handshake. The 64 byte limitation may also not be enough for SSL needs.

Conclusion

With more services utilizing HTTPS and a growing number of wireless devices whose communication latencies are often greater than those of their wired counterparts, it is imperative to minimize the number of round-trips made during connection establishment. While I am assuming that the proposed scheme is not adequate for implementation, I hope it can be used as food for thought on how to improve performance of the future web applications.