Messaging in the GNU Objective-C runtime (1 of 2)

Table of Contents

I’ve mentioned before that, of all of the C dialects available, Objective-C appeals most to me from the standpoint of simplicity and elegance. The language isn’t perfect but, prior to version 2.0, I found the syntax to be relatively clean and intuitive, unlike that of C++.

When examining the syntax of the language, particularly the use of bracketed statements to indicate method calls (messaging), it’s easy to see that Smalltalk features were imposed on the underlying C syntax rather than being added as a natural evolution of the language — but I’d guess that this was because the initial "compiler" for Objective C was something closer to a preprocessor than a full-blown parser. Nevertheless, the syntax does make method calls easy to recognize in long stretches of code; and the ability to "tag" parameters also helps to clarify the purpose and use of each object.

Unfortunately, Objective C never really caught on as a mainstream language, and Apple seems to have abandoned it in favor of a language with semantics that are closer to C++ (read: uglier, in my opinion). Much of the reluctance to use the language seems to revolve around the idea that Objective C code can never compete with C and C++ in terms of performance: Objective C resolves method calls (messages) to their implementations at runtime, while traditional compilers do so at compile time. In general, the former approach is supposed to provide the most flexibility in implementing of a given method, while the latter approach will provide the greatest speed. But does the flexibility offered by Objective C actually actually penalize code execution times?

For the Apple implementation of the Objective C runtime, the answer would appear to be "yes" — at least, this was the case in 2003. But that was more than twenty years ago, and with the traditional Apple runtime. The question is whether the Objective C runtime implementation presently offered by GCC suffers from the same speed penalty.

After digging deep into the runtime source code, it would appear that the GNU runtime implementation of messaging is much faster than the original Apple implementation, though there are a few additional things that can be done to squeeze a little more speed out of not only the messaging implementation, but the runtime as a whole.

In this article, we’ll cover some theory which will provide a basis for understanding the message implementation used in the GNU Objective C runtime. A follow-up article will look at that implementation in depth.


As noted above, Objective C handles method resolution at runtime. To do this, it extends the idea of polymorphism to the level of method calls.

Typically, object-oriented programming languages abstract an object into properties — data managed by an object — and methods: routines offered by the object which can be used to manipulate that data. Objective C, however, further abstracts methods into selectors — somewhat akin to the signature of a specific method — and IMPs: the implementations of those methods. Whereas in other languages the caller of a given method uses the equivalent of an offset within a virtual method table (VMT) to invoke the desired method, in Objective C the caller uses a selector for this purpose.

A selector is not related to a VMT at all; instead, it acts as a key that can be used to look up the implementation of a method. The key is unique to the running process and, unlike with compiler-resolved method implentations, it can be used to invoke a method in any class known to the runtime — even those classes that are not known by the compiler to implement the associated method. The implementation invoked by the selector might be one defined at compile-time, or it might be one dynamically provided at runtime: a routine in a DLL or one executing remotely on another machine. As long as the parameter and return types are compatible, everyone’s happy. (It is not strictly required that they be compatible, however — but that’s a topic for another time.)

The advantages of this approach should be obvious, particularly in a connected world. The ability to invoke a remote object as if it was local and without needing to know whether it is or not, as well as the ability to dynamically replace method implementations based on conditions found at runtime, are not easy to find or implement in other languages — see, for example, DCOM. However, it is not without its drawbacks; and chief among these is the fact that resolving method calls at runtime requires time: time that in traditional languages is spent by the compiler, which performs the lookup and bakes it into the compiled output. Essentially, each time a method is called, the Objective C runtime must perform the same operation that a traditional object-oriented language compiler typically only has to do once.

Hash tables

There are ways to reduce the amount of time required for this operation, of course. The Apple implementation of 2003 appears to have used a hash table for that purpose, an implementation it no doubt inherited from NeXT. The idea behind a hash table is to reduce the number of elements that must be searched in order to find a desired item. They are implemented in a manner similar to a filing cabinet. A filing cabinet is used to organize files into drawers based on specific criteria — the first letter of a person’s last name, for instance — meaning it is only necessary to look through single drawer for a desired file, instead of sifting through the entire pile of folders. A hash table calls these file drawers "buckets", but the principle is the same: once you have the hash (or key) of a given item, you know which bucket to search for it. It is faster to search one bucket than it is the entire list, and so less time is sacrificed with each lookup.

However, the system has trade-offs: it requires a certain amount of time to generate the hash for an arbitrary object and, as a filing cabinet occupies more physical space than a pile of files on the floor, so does a hash table generally require more space in memory than a simple array of entries. The old Apple implementation used a very quick hashing algorithm to match a selector with the implementation provided by any given object, but this was not without trade-offs of its own, since their hash table was only designed to keep a single method implementation in each bucket. Quick and simple hash algorithms tend to produce the same hash value for different objects; thus, the amount of time required to find a desired method implementation would increase if the desired bucket was already occupied by another method implementation that was not the desired implementation. In other words, the filing cabinet used by Apple was designed to keep one file per drawer, and if there was already a file in a given drawer and it wasn’t the one desired, then it would be necessary to search through other drawers.

It is evident, therefore, there was unquestionably a speed penalty to pay for each method call in the old Apple runtime; and this penalty was magnified if it was necessary to call a method multiple times in succession, such as within a loop. It is possible to work around the penalty by looking up the method implementation once and then caching the result, using the cached implementation for all subsequent calls — but this sacrifices the flexibility offered by dynamic method resolution and limits the calling code to either using only the method implementation provided by a given class or keeping a cached method implementation for each class it uses. There are better ways to reduce the speed penalty, and the GNU Objective C runtime uses one of them.

Alternatives to the hash table

It’s been mentioned that a selector functions as a key. Specifically, it is a unique integer value that can be used to look up the implementation of a given method in any class. The value of a selector is tied to the name of the desired method, and so two methods with different names will have different selectors — different keys. (Yes, we’ve already hinted that two methods with the same name in two different classes will use the same key, even if the implementations of those methods differ — and that will become important in a bit.)

The advantage of using a selector to look up a method implementation, rather than simply using the method name itself, is speed: it is faster for a processor to work with an integer value than with a sequence of characters. However, because it is not known whether or not the target object implements a given method, a selector cannot simply be used as an index into a simple array or a VMT because it’s quite possible, even likely, that it would refer to memory that did not exit, or which was not a valid method implementation — and this would have also sorts of nasty implications for the running program.

The NeXT/Apple runtime was (and may still be) built on the idea of constructing a hash table for each class, with all of the methods that class implements, and then using the selector as a way to look up the method implementation from within that table. As a dynamic construct, the hash table implementation can indicate when the target class does not actually implement the desired method, thereby saving the running program from faulting when it attempts to invoke a method implementation that does not exist; this is not something that can be done with a statically-compiled VMT. However, the tradeoff is speed, as noted above; it takes time to flip through every file in the target file drawer to find the one you need. This translates into slower method calls; when your program must make dozens or hundreds of such calls throughout its lifetime, this loss of speed starts to become noticeable. An alternative solution, and one that would greatly improve how fast a method implementation could be found, would be to organize method implementations into an array.

Linear arrays?

You are likely already familiar with the idea of a standard array. In its simplest form, a one-dimensional array is simply a sequential stretch of storage that can contain a fixed number of elements. The advantage of a linear array is that, when the index of the desired element is known ahead of time, as it is with selectors, then retrieving the data from the element is very fast. In theory, you could build your array of method implementations, then use the selector value to find the method implementation you wanted, and call it, with a speed that very nearly matches that of traditional, statically-defined classes and objects. We’ve mentioned, however, that you cannot simply use a selector as an index into an array; in fact, there are complicating factors.

Remember that a selector can be used with any class or object known to the Objective C runtime, even those that don’t implement the corresponding method. In other words, selectors are not tied to a specific class. Furthermore, the compiler cannot guarantee a method invoked by a selector is implemented. A class that implements it at compile time may have that implementation changed at runtime, while a class that does that does not implement the method at compile time may have an implementation added at runtime. Both scenarios are perfectly valid uses of Objective C, and so the compiler has to allow for them. It will complain, but it will still encode the method lookup.

What does all of this mean for why linear arrays cannot be used? Because selectors are not bound to classes, but method implementations are, and because the compiler must generate method lookups for methods that it cannot guarantee will be implemented, the only way to use a linear array of method implementations would be to allocate, for each class, an array large enough to hold a message implementation for every selector used across the entire program. In other words, each class would have to maintain storage for the method implementations of every class. Even if method invocations were very fast as a result, you can see that this would quickly add up to a lot of wasted memory, especially in a program composed of many classes with multiple methods: most of the array elements maintained by each class would be empty (NULL), since they would refer to methods not implemented by that particular class.

Memory constraints preclude the use of a standard linear array, and are likely the reason a hash table was chosen for the traditional runtime. However, although a standard linear array may not be suitable, it does not follow that something like an array could not be used instead.

Sparse arrays!

The idea behind a sparse array is both straightforward and quite elegant: present an interface that mimics a traditional array, but only allocate memory for elements that are not empty. When a program asks for an element at an index that is not allocated, simply return a default value. You have almost all of the benefits of a linear array without all of the wasted storage.

You can see the appeal at once. In theory, at least, a sparse array would indeed allow each class to maintain "storage" for the method implementations of every class, without actually needing to allocate storage for those methods; storage would only be allocated for those methods implemented by the class itself. At the same time, we get improved method lookup speed because we aren’t searching through hash table buckets to find the method implementation we want — we can go straight to the array element indicated by the selector and then invoke the method that we find there.

Theory and practice never align quite so simply, however. It should be evident that, in order for a sparse array to avoid allocating memory for empty elements, there must be some way of determining whether or not storage for a given element has been allocated, and a way of returning a default value in cases where storage has not been allocated. This adds some overhead to the process of indexing into the array, meaning that a sparse array does not quite match the speed of a traditional array. However, a carefully-designed sparse array implementation can minimize this overhead, providing as much speed as possible.

Next time we’ll delve into the sparse array implementation found in the GNU Objective C runtime, and look at ways we might further tweak the implementation to provide additional gains in messaging speed.