Amin Mesbah


Spelling with Elemental Symbols

Posted on 2017-05-16

(Full source code is available here.)

Sitting in my 5-hour-long chemistry class, my gaze would often drift over to the periodic table posted on the wall. To pass the time, I began to try finding words I could spell using only the symbols of the elements on the periodic table. Some examples: ScAlEs, FeArS, ErAsURe, WAsTe, PoInTlEsSnEsS, MoISTeN, SAlMoN, PuFFInEsS.

I wondered what the longest such word was ('TiNTiNNaBULaTiONS' was the longest one I could come up with). Then I started thinking about how nice it would be to have a tool that could find the elemental spellings of any word. I decided to write a Python program. Given a word as input, it would output all elemental spellings for that word:

Generating Character Groupings

If all the symbols of all the elements were the same length, the task would be trivial. However some elemental symbols are glyphs consisting of two characters ('He'), while others are single-character glyphs ('O'). This turns out to make things much more complex. The 'pu' in 'Amputations' can be represented either by Plutonium ('Pu') or by Phosphorus and Uranium ('PU'). Any given word would need to be broken down into all possible combinations of single and double-character glyphs.

I decided to call each of these permutations a 'grouping'. Groupings specify the division of a word into glyphs. A grouping can be represented as a tuple of 1s and 2s where 1 represents a single character glyph and 2 represents a double character glyph. Every elemental spelling corresponds to some grouping:

In an attempt to explore the problem and find patterns, I wrote the following table in my notebook:

Question: For a string of length n, how many sequences of 1s and 2s exist such that the sum equals n?

n # Groupings Groupings
0 1 ()
1 1 (1)
2 2 (1,1), (2)
3 3 (1,1,1), (2,1), (1,2)
4 5 (1,1,1,1), (2,1,1), (1,2,1), (1,1,2), (2,2)
5 8 (1,1,1,1,1), (2,1,1,1), (1,2,1,1), (1,1,2,1), (1,1,1,2), (2,2,1), (2,1,2), (1,2,2)
6 13 (1,1,1,1,1,1), (2,1,1,1,1), (1,2,1,1,1), (1,1,2,1,1), (1,1,1,2,1), (1,1,1,1,2), (2,2,1,1), (2,1,2,1), (2,1,1,2), (1,2,2,1), (1,2,1,2), (1,1,2,2), (2,2,2)
7 21 (1,1,1,1,1,1,1), (2,1,1,1,1,1), (1,2,1,1,1,1), (1,1,2,1,1,1), (1,1,1,2,1,1), (1,1,1,1,2,1), (1,1,1,1,1,2), (2,2,1,1,1), (2,1,2,1,1), (2,1,1,2,1), (2,1,1,1,2), (1,2,2,1,1), (1,2,1,2,1), (1,2,1,1,2), (1,1,2,2,1), (1,1,2,1,2), (1,1,1,2,2), (2,2,2,1), (2,2,1,2), (2,1,2,2), (1,2,2,2)

Answer: fib(n + 1)!?

I was surprised to find the Fibonacci sequence in this unexpected place. During my subsequent investigations, I was even more surprised to discover that knowledge of this particular pattern goes back almost 2000 years to the prosodists of ancient India, who discovered it in their study of the permutations of short and long syllables of Vedic chants. For a fascinating exploration of this and other advancements in the history of combinatorics, consult section 7.2.1.7 of The Art of Computer Programming by Donald Knuth.

Fascinating as this finding was, I still hadn't accomplished my actual goal: To generate the groupings themselves. Some pondering and experimentation led me to the most straightforward solution I could think of: to generate all possible sequences of 1s and 2s, then filter out any that don't sum to the length of the input word.

from itertools import chain, product

def generate_groupings(word_length, glyph_sizes=(1, 2)):
    cartesian_products = (
        product(glyph_sizes, repeat=r)
        for r in range(1, word_length + 1)
    )

    # include only groupings that represent the correct number of chars
    groupings = tuple(
        grouping
        for grouping in chain.from_iterable(cartesian_products)
        if sum(grouping) == word_length
    )

    return groupings

A Cartesian product is the set of all tuples composed of a given set of elements. The Python Standard Library provides itertools.product(), a function that returns the Cartesian product of the elements in a given iterable. cartesian_products is a generator expression that yields every possible permutation of the items in glyph_sizes up to the given length, word_length.

So, if word_length is 3, cartesian_products will yield:

[
    (1,), (2,), (1, 1), (1, 2), (2, 1), (2, 2), (1, 1, 1), (1, 1, 2),
    (1, 2, 1), (1, 2, 2), (2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)
]

These are then filtered so that groupings ends up only including those permutations whose elements add up to word_length.

>>> generate_groupings(3)
((1, 2), (2, 1), (1, 1, 1))

Clearly, there is a lot of extra work being done here. The function calculated 14 permutations, but only ended up using 3. The performance gets rapidly worse as word_length increases. More on that later. Having gotten this function working, I moved on to the next task.

Mapping Words to Groupings

Once all the possible groupings for a word have been calculated, that word can be "mapped" to each grouping:

def map_word(word, grouping):
    chars = (c for c in word)

    mapped = []
    for glyph_size in grouping:
        glyph = ""
        for _ in range(glyph_size):
            glyph += next(chars)
        mapped.append(glyph)

    return tuple(mapped)

The function treats each glyph in the grouping like a cup, and fills it with as many characters from the word as it can hold before moving on to the next one. When all the characters have been placed in the proper glyph, resulting mapped word is returned as a tuple.

>>> map_word('because', (1, 2, 1, 1, 2))
('b', 'ec', 'a', 'u', 'se')

Once a word is mapped, it's ready to be compared to the list of elemental symbols to find any possible spellings.

Finding the Spellings

With the other two functions doing most of the work, I made a spell() function which tied everything together:

def spell(word, symbols=ELEMENTS):
    groupings = generate_groupings(len(word))

    spellings = [map_word(word, grouping) for grouping in groupings]

    elemental_spellings = [
        tuple(token.capitalize() for token in spelling)
        for spelling in spellings
        if set(s.lower() for s in spelling) <= set(s.lower() for s in symbols)
    ]

    return elemental_spellings

spell() gets all possible spellings, returning only those that are entirely composed of elemental symbols. I used sets to efficiently filter out any invalid spellings.

Python's sets are are very similar to mathematical sets. They unordered collections of unique elements. Behind the scenes, they are implemented as dictionaries (hash-tables) with keys, but no values. Since all the elements in a set are hashable, membership tests are very efficient (average case O(1) time). The comparison operators are overloaded to test for subsets by using these efficient membership operations. Sets and dictionaries are deeply explored in the wonderfully informative book, Fluent Python by Luciano Ramalho.

With this last component functional, I finally had a working program!

>>> spell('amputation')
[('Am', 'Pu', 'Ta', 'Ti', 'O', 'N'), ('Am', 'P', 'U', 'Ta', 'Ti', 'O', 'N')]
>>> spell('cryptographer')
[('Cr', 'Y', 'Pt', 'Og', 'Ra', 'P', 'H', 'Er')]

The Longest Word?

Happy to have implemented the core functionality, I gave my program a name (Stoichiograph), and made a command line wrapper for it. The wrapper takes words, either as arguments or from a file, and prints out their spellings. After adding an option to order the words descendingly by length, I let my program loose on my unsuspecting list of words.

$ ./stoichiograph.py --sort --batch-file /usr/share/dict/american-english
NoNRePReSeNTaTiONaL
NoNRePReSeNTaTiONAl
[...]

Nice! That definitely wasn't a word I would have ever thought of myself. My program was already fulfilling its purpose. I played around a bit and found a longer word:

$ ./stoichiograph.py nonrepresentationalisms
NoNRePReSeNTaTiONaLiSmS
NONRePReSeNTaTiONaLiSmS
NoNRePReSeNTaTiONAlISmS
NONRePReSeNTaTiONAlISmS

Interesting. I wondered whether this was truly the longest word (spoiler), and I wanted to investigate longer words. First, though, I needed to make my program run much faster.

Addressing Performance Issues

It took my program about 16 minutes to work through the 119,095 words (of which many were quite short) in my word list:

$ time ./stoichiograph.py --sort --batch-file /usr/share/dict/american-english
[...]
real    16m0.458s
user    15m33.680s
sys     0m23.173s

That's about 120 words per second on average. I was sure I could make it faster. I needed more detailed performance information to know how best to do so.

Line profiler is a tool for identifying performance bottlenecks in Python code. I used it to profile my program as it found spellings for a single 23-character-long word. Here's a condensed version of what it had to say:

Line #   % Time   Line Contents
===============================
    30            @profile
    31            def spell(word, symbols=ELEMENTS):
    32     71.0       groupings = generate_groupings(len(word))
    33
    34     15.2       spellings = [map_word(word, grouping) for grouping in groupings]
    35
    36                elemental_spellings = [
    37      0.0           tuple(token.capitalize() for token in spelling)
    38     13.8           for spelling in spellings
    39                    if set(s.lower() for s in spelling) <= set(s.lower() for s in symbols)
    40                ]
    41
    42      0.0       return elemental_spellings

Line #   % Time   Line Contents
===============================
    45            @profile
    46            def generate_groupings(word_length, glyhp_sizes=(1, 2)):
    47                cartesian_products = (
    48      0.0           product(glyph_sizes, repeat=r)
    49      0.0           for r in range(1, word_length + 1)
    50                )
    51
    52      0.0       groupings = tuple(
    53      0.0           grouping
    54    100.0           for grouping in chain.from_iterable(cartesian_products)
    55                    if sum(grouping) == word_length
    56                )
    57
    58      0.0       return groupings

It is not a great surprise that generate_groupings() takes so long. The problem it tries to solve is a special case of the subset sum problem which is NP-complete. Finding a cartesian product gets expensive quickly, and generate_groupings() finds multiple cartesian products.

We can perform asymptotic analysis to get a sense of just how horrifically bad things are:

  1. We assume that glyph_sizes always has 2 elements (1 and 2).
  2. product() finds the cartesian product of a set of 2 elements r times, so the big O of product() is O(2^r).
  3. product() is called in a for loop which repeats word_length times, so if we set n as word_length, the big O for the whole loop is O(2^r * n).
  4. But r has a different value each time the loop runs, so really it's more like O(2^1 + 2^2 + 2^3 + ... + 2^(n-1) + 2^n).
  5. And since 2^0 + 2^1 + ... + 2^n = 2^(n+1) - 1, our final big O is: O(2^(n+1) - 1), or O(2^n).

With O(2^n) time complexity we can expect the execution time to double for each incrementation of word_length. Horrifying!

I thought about this performance problem for many weeks. I had two related, but distinct, performance cases to consider:

  1. Processing a list of words of varying length.
  2. Processing a single, but very long word.

The second case proved, as one might guess, to be much more important because it affected the first. Though I didn't immediately know how to improve case 2, I had some ideas for case 1, so that's where I started.

Case 1: Being Lazy

Laziness is a virtue not only for programmers, but also for programs themselves. Addressing case 1 required laziness. If my program was going to process a long list of words, how could I get my program to do as little work as possible?

Checking for Invalid Characters

Surely, I thought, there are likely to be words in a given list that contain characters not present in any elemental symbol. Time expended trying to find spellings for these words is time wasted. A word list could be processed faster if these words were quickly identified and skipped.

Unfortunately for me, it turns out that 'j' and 'q' are the only letters not contained in any elemental symbol:

>>> set('abcdefghijklmnopqrstuvwxyz') - set(''.join(ELEMENTS).lower())
('j', 'q')

And only about 3% of the words in my particular dictionary file contain 'j' or 'q':

>>> with open('/usr/share/dict/american-english', 'r') as f:
...     words = f.readlines()
...
>>> total = len(words)
>>> invalid_char_words = len(
...     [w for w in words if 'j' in w.lower() or 'q' in w.lower()]
... )
...
>>> invalid_char_words / total * 100
3.3762962340988287

Skipping those words made the program only about 2% faster:

$ time ./stoichiograph.py --sort --batch-file /usr/share/dict/american-english
[...]
real    15m44.246s
user    15m17.557s
sys     0m22.980s

Not the kind of performance improvement I was hoping for, so on to the next idea.

Memoization

Memoization is the technique of saving a function's output and returning it if the function is called again with the same inputs. A memoized function only needs to generate output once for a given input. This can be very helpful with expensive functions that are called many times with the same few inputs, but only works for pure functions.

generate_groupings() was the perfect candidate. It was unlikely to encounter a very large range of inputs, and was very expensive for long word lengths. The functools package makes memoization trivial by supplying the @lru_cache() decorator.

Memoizing generate_groupings() lead to a notable, though still not satisfactory reduction in execution time:

$ time ./stoichiograph.py --sort --batch-file /usr/share/dict/american-english
[...]
real    11m15.483s
user    10m54.553s
sys     0m17.083s

Still, that's not bad for a single decorator from the standard library!

Case 2: Being Clever

My optimizations so far had helped slightly with case 1, but the core problem, the inefficiency of generate_groupings() hadn't been addressed, and big individual words still took too long to process:

$ time ./stoichiograph.py nonrepresentationalisms
[...]
real    0m20.275s
user    0m20.220s
sys     0m0.037s

Laziness can yield a certain degree of success, but sometimes cleverness is needed.

Recursion and the DAG

One evening as I was dosing off, I had a flash of inspiration and ran to my whiteboard to draw this:

Photograph of whiteboard diagram of recursive algorithm

I realized I can take any string, pop the 1 and 2-character glyphs off the front, then recurse into the remaining substring in both cases. Once I've traversed the whole string, I will have found all the glyphs, and, critically, I will have information about their structure and order. I also realized that a graph could be a very good way to store this information.

So if the series of recursive function calls into our charming example word, 'amputation', looks something like this:

'a' 'mputation'
    'm' 'putation'
        'p' 'utation'
            'u' 'tation'
                't' 'ation'
                    'a' 'tion'
                        't' 'ion'
                            'i' 'on'
                                'o' 'n'
                                    'n' ''
                                'on' ''
                            'io' 'n'
                        'ti' 'on'
                    'at' 'ion'
                'ta' 'tion'
            'ut' 'ation'
        'pu' 'tation'
    'mp' 'utation'
'am' 'putation'

Then, after filtering out all the glyphs that don't match any elemental symbol, we could end up with a nice graph like this:

Diagram of directed acyclic graph representing the spellings of 'Amputation'

Specifically, we will have made a directed acyclic graph (DAG) with each node holding a glyph. Each path from the first node to the last would be a valid elemental spelling of the original word!

I had never worked with graphs before, but I found a very helpful essay on the Python website that explained the basics, including how to efficiently find all paths between two nodes. The wonderful book 500 Lines or Less has a chapter with another example of a graph implementation in Python. I based my graph class on these examples.

With a simple graph class implemented and tested, I turned my whiteboard drawing into a function:

# A single node of the graph. A glyph and its position in the word.
Node = namedtuple('Node', ['value', 'position'])

def build_spelling_graph(word, graph, symbols=ELEMENTS):
    """Given a word and a graph, find all single and double-character
    glyphs in the word. Add them to the graph only if they are present
    within the given set of allowed symbols.
    """

    def pop_root(remaining, position=0, previous_root=None):
        if remaining == '':
            graph.add_edge(previous_root, None)
            return

        single_root = Node(remaining[0], position)
        if single_root.value.capitalize() in symbols:
            graph.add_edge(previous_root, single_root)

            if remaining not in processed:
                pop_root(
                    remaining[1:], position + 1, previous_root=single_root
                )

        if len(remaining) >= 2:
            double_root = Node(remaining[0:2], position)
            if double_root.value.capitalize() in symbols:
                graph.add_edge(previous_root, double_root)

                if remaining not in processed:
                    pop_root(
                        remaining[2:], position + 2, previous_root=double_root
                    )
        processed.add(remaining)

    processed = set()
    pop_root(word)

The Payoff

Where the initial brute-force algorithm operated horrifyingly in O(2^n) time, this recursive one operates in O(n) time. Much better! The first time I timed my newly optimized program on my word list file, I was blown away:

$ time ./stoichiograph.py --sort --batch-file /usr/share/dict/american-english
[...]
real    0m11.299s
user    0m11.020s
sys     0m0.17ys

Execution time had gone from 16 minutes down to 10 seconds, the processing rate from 120 words per second to 10,800!

This was the first time I truly appreciated the power and value of data structures and algorithms.

The Actual Longest Word

With this newfound power I was able to find a new longest elementally spellable word: 'floccinaucinihilipilificatiousness.'

This amazing word comes from 'floccinaucinihilipilification,' defined as "the act or habit of describing or regarding something as unimportant, of having no value or being worthless," and often cited as the longest non-technical English word. 'Floccinaucinihilipilificatiousness' has 54 elemental spellings, all of which can experienced by any explorer brave enough to traverse its beautiful graph:

Diagram of directed acyclic graph representing the spellings of 'floccinaucinihilipilificatiousness'

Time Well Spent

One may be tempted to floccinaucinihilipilificate on this whole endeavor, but I found it both valuable and important. When I began this project, I was relatively inexperienced at programming, and had no idea how to even begin. Progress was slow, and it took a long time to arrive satisfactory solution (look at the commit history to see the large gaps where I took a break to work on other projects).

However, I learned a great deal along the way. This project introduced me to:

Understanding these concepts has helped me again and again. Recursion and trees have been particularly important for my n-body simulation project.

Finally, it feels good to have answered my own initial question. I no longer have to wonder about elemental spellings because I now have a tool to find them for me, and with a quick pip install stoichiograph, you can too.

Discussion
Further Reading

Matus Goljer wrote a follow-up article that explores the generation of character groupings with dynamic programming:

I drew much inspiration from Peter Norvig's elegant solutions to interesting problems:

Two informative articles about performance profiling in Python:

Debugging a Quadtree in C

Posted on 2017-05-12

I've been working on a project that simulates stars or particles and the forces of attraction between them. To make the simulation more efficient and able to handle more particles, I've been working on implementing a Barnes-Hut Algorithm by using a quadtree to partition the simulation space. When it works, it's beautiful:

Gif of the Barnes-Hut spatial partition at work

Today made a breakthrough on a bug in my quadtree implementation. I had been getting very occasional segfaults, and Valgrind was complaining of invalid reads and free()s. I haven't had much time to work on the problem for the past two weeks, but I was often thinking about it (and even, with an unnerving frequency, dreaming about it). When I had an hour here or there, I would return to my investigation and continue my attempt to understand the cause of the problem. Eventually I made the minimal working example below.

//qt_test.c

#include <stdio.h>
#include <stdlib.h>

#define NUM_UPDATES 10
#define QUAD_TREE_LEVELS 3


struct QuadTreeNode
{
    struct QuadTreeNode *ne;
    struct QuadTreeNode *nw;
    struct QuadTreeNode *sw;
    struct QuadTreeNode *se;
};


struct QuadTree
{
    struct QuadTreeNode *root;
};


struct QuadTreeNode* quad_tree_node_init(void)
{
    struct QuadTreeNode *node = (struct QuadTreeNode*)malloc(sizeof (struct QuadTreeNode));
    if (node)
    {
        node->ne = NULL;
        node->nw = NULL;
        node->sw = NULL;
        node->se = NULL;
    }
    return node;
}


void quad_tree_node_free(struct QuadTreeNode *node)
{
    if (node)
    {
        quad_tree_node_free(node->ne);
        quad_tree_node_free(node->nw);
        quad_tree_node_free(node->sw);
        quad_tree_node_free(node->se);
        free(node);
    }
}


void quad_tree_node_subdivide(struct QuadTreeNode *node, int remaining_levels)
{
    if (node && remaining_levels > 0)
    {
        remaining_levels--;
        node->ne = quad_tree_node_init();
        node->nw = quad_tree_node_init();
        node->sw = quad_tree_node_init();
        node->se = quad_tree_node_init();

        quad_tree_node_subdivide(node->ne, remaining_levels);
        quad_tree_node_subdivide(node->nw, remaining_levels);
        quad_tree_node_subdivide(node->sw, remaining_levels);
        quad_tree_node_subdivide(node->se, remaining_levels);
    }
}


struct QuadTree* quad_tree_init(void)
{
    struct QuadTree *qt = (struct QuadTree*)malloc(sizeof (struct QuadTree));
    if (qt)
    {
        qt->root = quad_tree_node_init();
        quad_tree_node_subdivide(qt->root, QUAD_TREE_LEVELS - 1);
    }
    return qt;
}


void quad_tree_free(struct QuadTree *qt)
{
    if (qt)
    {
        quad_tree_node_free(qt->root);
        free(qt);
    }
}


void update(struct QuadTree *qt)
{
    quad_tree_free(qt);
    qt = quad_tree_init();
}


int main(void)
{
    struct QuadTree *qt = quad_tree_init();

    for (int i = 0; i < NUM_UPDATES; ++i)
    {
        printf("Update %d\n", i+1);
        update(qt);
    }

    quad_tree_free(qt);

    return 0;
}

Here's a sample from the output of valgrind --track-origins=yes ./qt_test:

Invalid read of size 8
==16345==    at 0x1088B7: quad_tree_node_free (qt_test.c:43)
==16345==    by 0x1089DB: quad_tree_free (qt_test.c:82)
==16345==    by 0x108A00: update (qt_test.c:90)
==16345==    by 0x108A5D: main (qt_test.c:102)
==16345==  Address 0x5e350a0 is 16 bytes inside a block of size 32 free'd
==16345==    at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16345==    by 0x1088D0: quad_tree_node_free (qt_test.c:45)
==16345==    by 0x1089DB: quad_tree_free (qt_test.c:82)
==16345==    by 0x108A00: update (qt_test.c:90)
==16345==    by 0x108A5D: main (qt_test.c:102)
==16345==  Block was alloc'd at
==16345==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16345==    by 0x10885E: quad_tree_node_init (qt_test.c:25)
==16345==    by 0x1089A0: quad_tree_init (qt_test.c:71)
==16345==    by 0x108A30: main (qt_test.c:97)

[...]

==16345== Invalid free() / delete / delete[] / realloc()
==16345==    at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16345==    by 0x1088D0: quad_tree_node_free (qt_test.c:45)
==16345==    by 0x1088AD: quad_tree_node_free (qt_test.c:41)
==16345==    by 0x1088BF: quad_tree_node_free (qt_test.c:43)
==16345==    by 0x1089DB: quad_tree_free (qt_test.c:82)
==16345==    by 0x108A00: update (qt_test.c:90)
==16345==    by 0x108A5D: main (qt_test.c:102)
==16345==  Address 0x5e35570 is 0 bytes inside a block of size 32 free'd
==16345==    at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16345==    by 0x1088D0: quad_tree_node_free (qt_test.c:45)
==16345==    by 0x1088AD: quad_tree_node_free (qt_test.c:41)
==16345==    by 0x1088BF: quad_tree_node_free (qt_test.c:43)
==16345==    by 0x1089DB: quad_tree_free (qt_test.c:82)
==16345==    by 0x108A00: update (qt_test.c:90)
==16345==    by 0x108A5D: main (qt_test.c:102)
==16345==  Block was alloc'd at
==16345==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16345==    by 0x10885E: quad_tree_node_init (qt_test.c:25)
==16345==    by 0x10890B: quad_tree_node_subdivide (qt_test.c:54)
==16345==    by 0x108958: quad_tree_node_subdivide (qt_test.c:61)
==16345==    by 0x1089B0: quad_tree_init (qt_test.c:72)
==16345==    by 0x108A30: main (qt_test.c:97)

I was baffled for quite a while by these errors. I spent a lot of time checking and re-checking all the functions that init and free the quadtree and quadtree nodes. Only after expending great effort did I finally determine the cause.

The problem turned out to lie with the way I was passing my struct around to different functions. I'm used to C++, in which one can pass things either by value or by reference, but in C, everything is passed by value. My update() function was changing its local copy of the QuadTree *, qt, but that change was did not apply to the original qt in main(). That meant that all the calls to qt in main() were accessing the old memory that had been freed.

Another Level of Indirection

The first solution I found was to change update() to take a QuadTree ** parameter:

void update(struct QuadTree **qt)
{
    quad_tree_free(*qt);
    *qt = quad_tree_init();
}

Then in main() I called it like:

update(&qt);

Now that the qt parameter was wrapped in an extra pointer, changes to it in update() were now being properly applied to qt in main(). Valgrind's previous complaints were gone. It was a nice example to support the claim that "All problems in computer science can be solved by another level of indirection."

Change Only What Needs Changing

I also found another way of solving the problem. I already had a QuadTree struct which conveniently wrapped the rest of the actual tree and kept track of the root. I realized that there was no reason to continually be freeing and reallocating qt itself.

I simplified the QuadTree functions to basically be "run-once".

struct QuadTree* quad_tree_init(void)
{
    struct QuadTree *qt = (struct QuadTree*)malloc(sizeof (struct QuadTree));
    qt->root = NULL;
    return qt;
}
void quad_tree_free(struct QuadTree *qt)
{
    if (qt)
    {
        quad_tree_node_free(qt->root);
        free(qt);
    }
}

I changed my update function to only operate on qt->root.

void update(struct QuadTree *qt)
{
    quad_tree_node_free(qt->root);
    qt->root = quad_tree_node_init();
    quad_tree_node_subdivide(qt->root, QUAD_TREE_LEVELS - 1);
}

And I updated my main function accordingly:

int main(void)
{
    struct QuadTree *qt = quad_tree_init();
    qt->root = quad_tree_node_init();
    quad_tree_node_subdivide(qt->root, QUAD_TREE_LEVELS - 1);

    for (int i = 0; i < NUM_UPDATES; ++i)
    {
        printf("Update %d\n", i+1);
        update(qt);
    }

    quad_tree_free(qt);

    return 0;
}

Still no errors from Valgrind! qt is happy to point to the same memory address for most of the program.

I'm not sure which of these solutions is more appropriate, but that will become clear with time. I'm happy to have gained a deeper understanding of pointers, the implications of pass-by-value in C, and some C idioms with which to deal with similar situations.

Dangers of the Default Copy Constructor

Posted on 2017-01-07

Learning C++ is hard. Arrays, pointers, compilation, the stack and the heap, and memory allocation all seem straightforward to those versed in their subtleties. However, the beginner is confronted with a cluster of difficult concepts demanding seemingly abstruse knowledge about the C++ spec. and the inner workings of computers. These matters are complicated further when combined with classes, objects, and the host of concepts associated with object-oriented programming. The syntax, subtle and unforgiving, hardly makes the initial steps of the journey any easier.

Tutoring C++ has provided me with ample opportunity to clarify and explain such concepts. I've become familiar with the pain points of the first two semesters of C++, and I'm confident in my ability to help students work through them. Yet I am still a beginner and constantly learning new things. Students occasionally run into perplexing bugs that make quite evident the limits of my understanding.

On one such occasion, I was helping a student debug her code for a string class. Her professor had provided a header file, MyString.h, and a test program, my_string_test.cpp. Both of these, the professor had warned, were not to be altered. The student's task was to implement the class in MyString.cpp.

For the sake of clarity (not to mention adherence to academic ethics), the code here included is greatly simplified. Numerous methods and overloaded operators are omitted.

MyString.h
#ifndef MyString_H
#define MyString_H

#include <iostream>
#include <cstring>

class MyString;
std::ostream &operator <<(std::ostream &strm, MyString &str);


class MyString
{
    public:
        MyString();
        MyString(const char *str);
        ~MyString();

        int str_length() const;
        const char* get_string() const;

        friend std::ostream &operator <<(std::ostream &strm, MyString &str);

    private:
        char *the_string;
        // The number of characters in the string up to, but not
        // including, the null terminator '\0'.
        int length;
};
#endif
my_string_test.cpp
#include "MyString.h"

int main()
{
    // String literal constructor
    MyString first("A string literal.");
    std::cout << "first: " << first << std::endl;

    // Copy constructor
    MyString second(first);
    std::cout << "second: " << second << std::endl;

    return 0;
}

The class keeps track of a null-terminated array of characters. It has two member variables:

When she came in for help, the student had already implemented and tested most of the class. Her implementation looked something like this:

MyString.cpp
#include "MyString.h"
#include <iostream>
#include <cstring>

MyString::MyString()
{
    length = 0;
    the_string = new char[1];
    the_string[0] = '\0';
}

MyString::MyString(const char *str)
{
    length = std::strlen(str);

    the_string = new char[length + 1];

    for (int i=0; i < length; i++)
    {
        the_string[i] = str[i];
    }

    the_string[length] = '\0';
}

MyString::~MyString()
{
    delete [] the_string;
}

int MyString::str_length() const
{
    return length;
}

const char* MyString::get_string() const
{
    return the_string;
}

std::ostream &operator <<(std::ostream &strm, MyString &str)
{
    strm << str.the_string;
    return strm;
}

The Problem

Until she had implemented the destructor, MyString::~MyString(), everything had apparently been working perfectly. Running the program, I was surprised to see the following debug errors:

screenshot visual studio access violation window

screenshot visual studio debug assertion failed window

screenshot visual studio heap corruption detected window

Overlooking these somewhat enigmatic error messages, I commented out the destructor's definition. The program executed without any issues:

screenshot of program successfully running in a terminal

The destructor was simple, merely freeing the memory pointed to by the_string. I wondered how it could be causing these problems. I decided to step through the program line by line in the Visual Studio debugger. We found that the debug errors didn't appear until the execution of return 0 at the end of main(). Baffling.

Visual Studio's disassembly view divulged the details of what the processor was being told to do. There were 2 calls to MyString::~MyString() (1 for each MyString object) after return 0:

[...]
0027645F  cmp         esi,esp  
00276461  call        __RTC_CheckEsp (0271352h)  
    return 0;
00276466  mov         dword ptr [ebp-0F8h],0  
00276470  mov         byte ptr [ebp-4],0  
00276474  lea         ecx,[second]  
00276477  call        MyString::~MyString (02711CCh)  
0027647C  mov         dword ptr [ebp-4],0FFFFFFFFh  
00276483  lea         ecx,[first]  
00276486  call        MyString::~MyString (02711CCh)  
0027648B  mov         eax,dword ptr [ebp-0F8h]  
}
[...]

The first destructor call would execute as expected, but the second would raise the error messages. A few carefully observed runs through the debugger revealed the problem: The first MyString object's destructor was freeing memory for both objects' pointers. This was the state after that first call:

screenshot of msvc local and watch windows after calling the first destructor The first destructor frees memory for both objects! (full screenshot)

Note that first.the_string and second.the_string are both pointing to the same memory address! The second destructor was trying to free the same memory as the first. Memory can't be freed twice, so an exception was raised. Finally we understood the cause of the error messages.

When two pointers reference the same memory address, they are said to be 'aliased'. Pointer aliasing is a harbinger of untold woe, and should generally be avoided. I searched through the student's code, but I couldn't find no statement that could have aliased the two the_string pointers. How could it be happening if the student had written no code to make it happen? Many minutes were spent carefully stepping through each line of the code before we found the unexpected source of the problem: The professor's code.

The professor had included a test of the MyString class copy constructor in my_string_test.cpp, but no prototype for it in MyString.h. The student, having been forbidden to alter these files, had not defined the copy constructor in MyString.cpp. One would expect the compiler to loudly complain about such a situation, but it appeared instead to be implicitly creating a constructor of its own:

[...]
    MyString second(first);
00276415  push        8  
00276417  lea         ecx,[second]  
0027641A  call        MyString::__autoclassinit2 (02713CFh)  
0027641F  mov         eax,dword ptr [first]  
00276422  mov         dword ptr [second],eax  
00276425  mov         ecx,dword ptr [ebp-18h]  
00276428  mov         dword ptr [ebp-28h],ecx  
0027642B  mov         byte ptr [ebp-4],1  
[...]

This was the state after executing that second mov. Note the aliased pointers:

screenshot of msvc local and watch windows after calling default copy constructor The default copy constructor sets both pointers to the same memory address! (full screenshot)

After a bit of research, we learned that when no copy constructor is defined, the compiler will automatically generate a default copy constructor:

A compiler-generated copy constructor sets up a new object and performs a memberwise copy of the contents of the object to be copied. If base class or member constructors exist, they are called; otherwise, bitwise copying is performed.

This copy constructor performs naive memberwise assignment, copying the values of each member variable of one object to the corresponding variable in the other. This can be adequate for extremely basic classes, but proves to be disastrous when classes have pointer member variables. The address of one pointer is directly copied to another, and so it went with the_string.

The Solution

As is the case with many errors, once we understood its cause, it was trivial to fix. The student created a proper copy constructor and, in an entirely justifiable violation of the rules, added a prototype for it to MyString.h.

// MyString.cpp

MyString::MyString(MyString &obj)
{
    length = obj.length;

    the_string = new char[length + 1];

    for (int i=0; i < length; ++i)
    {
        the_string[i] = obj.the_string[i];
    }

    the_string[length] = '\0';
}

Stepping through the code a final time, we could see that, after this new copy constructor was called, each MyString object had its own memory:

screenshot of msvc local and watch windows after calling the custom copy constructor Each pointer has its own address (full screenshot)

There was no more pointer aliasing, and both destructors freed the appropriate memory without issue:

screenshot of msvc local and watch windows after calling the first destructor First destructor successful (full screenshot)

screenshot of msvc local and watch windows after calling the second destructor Second destructor successful (full screenshot)

Lessons Learned

In retrospect, the cause of this bug was right before our eyes. Those more experienced with C++ probably would have recognized it immediately, but it took us quite a while to even figure out where to look. A number of factors contributed to this:

Though time consuming and occasionally frustrating, this was a valuable opportunity to develop skills of careful observation, thoughtful questioning, and systematic experimentation that are so essential to developing software. Both student and tutor departed with new knowledge and understanding, and perhaps a healthy fear of leaving special methods undefined.