Zebra Codes

Writing Generator Functions in C++

19th of November, 2023
C++

Many languages have the ability to let you create generators: a function that can be called multiple times, picking up where it left off and returning a different value each time. Unfortunately C++ does not give you this functionality, however that doesn’t mean that we can’t implement it ourselves.

There is nothing special about a generator function, it is just a syntactic shortcut that wraps up a function and a state machine. The code you write in your generator function is converted by the compiler into a state machine, along with some data to track the current state. It is this state machine that allows the function to pick up where it left off.

Because C++ does not do this for us, we simply have to implement the logic ourselves.

Basic Structure

We are going to implement our generator using an iterator-based approach. This is not the only way to do it, however it means that we will be able to use it in range based for loops and other STL algorithms like this:

for (const auto& item : generator)
{
    // ...
}

This is very convenient to use so it’s worth a little extra boilerplate code. The code requires two classes:

  • The Generator stores the information to be iterated over, and creates the beginning and end iterators.
  • The Iterator stores a reference to the generator over which it is iterating, and the current position of iteration.

Note that a generator may have several iterators iterating over it at the same time, all at different positions. An iterator will in general not modify a generator, although this may not be possible in cases where the data being iterated over is ephemeral, such as an input stream from a network.

The generator must supply two functions called begin() and end() for compatibility with STL and the range-based for loop.

class Generator
{
    public:
    Iterator begin();
    Iterator end();
}

The iterator requires the following three functions to be implemented:

  • operator*(): This is the “dereferencing operator”, and returns the current value.
  • operator++(): This moves the iterator to the next value.
  • operator!=(): This is used to compare the iterator to the end iterator, to see if iteration is complete.

The iterator also requires a reference to the generator, and a way to create an iterator that is “at the end”. I’ll be accomplishing this in the constructor.

class Iterator
{
    private:
    Generator& generator;

    public:
    Iterator(Generator& gen, bool end = false);
    int operator*();
    Iterator& operator++();
    bool operator!=();
}

Generating Numbers in a Range

We will start by writing a simple generator to produce numbers in a range.

Note: I have presented the Generator and Iterator classes separately for ease of reading, however it won’t actually compile like this as they reference each other circularly. If you want to actually compile this example then move the definition of the Iterator class inside the Generator class, as I have done with the “HelloGenerator” example farther down.

As detailed above, the outer “Generator” class is there simply to create the “begin” and “end” iterators, and store the maximum number to iterate to:

/**
 * Generate a range of numbers from zero to n that can be iterated over,
 * for example RangeGenerator(3) will generate 0, 1, 2, 3.
 */
class RangeGenerator
{
    /** The number to iterate to. */
    int max;

    public:

    /**
     * Create the generator.
     *
     * @param iMax The last number in the range.
     */
    RangeGenerator(int iMax) : max(iMax)
    {
    }

    /**
     * Create an iterator at the beginning of the range.
     *
     * @return Returns the beginning iterator.
     */
    Iterator begin()
    {
        return Iterator(*this);
    }

    /**
     * Create an iterator at the end of the range.
     *
     * @return Returns the end iterator.
     */
    Iterator end()
    {
        return Iterator(*this, true);
    }
};

The inner “Iterator” class then does the actual work. In this case, it starts by initializing the current number and then increments it each time the iterator is advanced. Note that the iterator never needs to compare its current value to the generator’s maximum value. This is handled by the algorithm using the iterator by comparing its current iterator to the “end” iterator returned by RangeGenerator::end().

/**
 * The inner iterator.
 */
class Iterator
{
    /** The generator being iterated over. */
    Generator& generator;

    /** The current number. */
    int current = 0;

    public:

    /**
     * Create the iterator.
     *
     * @param inGenerator The generator to iterate over.
     * @param end         True to create an "end" iterator.
     */
    Iterator(Generator& inGenerator, bool end = false) : generator(inGenerator)
    {
        if (end)
            current = generator.max + 1;
    }

    /**
     * Dereference the iterator.
     *
     * @return Returns the current number in the range.
     */
    int operator*()
    {
        return current;
    }

    /**
     * Increment the iterator.
     * This moves the iterator state to the next item in the range.
     *
     * @return Returns the iterator.
     */
    Iterator& operator++()
    {
        ++current;
        return *this;
    }

    /**
     * Compare the iterator to another.
     *
     * @return bool Returns true if the iterators are different, false if they are the same.
     */
    bool operator!=(const Iterator& rhs)
    {
        if (&generator != &rhs.generator)
            return false;

        return current != rhs.current;
    }
};

That’s all there is to it, you can use it like this:

// Print the numbers 0 to 10.
for (const auto n : RangeGenerator(10))
    std::cout << n << "\n";

Generator State

One of the useful properties of generators is their ability to resume execution from where it left off, instead of from the beginning of the function. Consider the PHP generator below. The first time the generator is called, it will return “Hello”. The second time, it will return the name that was passed in as an argument.

function sayHello(string $name): Generator
{
    yield "Hello"
    yield $name;
}

Because C++ does not offer this functionality, we must implement it ourselves by writing a state machine. This simply means using a variable to keep track of which state we are currently in, and switching on that when we use the iterator.

Note that I’ve moved the Iterator class to be inside the Generator class, as it does not need to be exposed.

#include <iostream>
#include <string>

/**
 * Generate "Hello" followed by the user's name.
 */
class HelloGenerator
{
    /** The name to output. */
    std::string name;

    /**
     * The inner iterator.
     */
    class Iterator
    {
        /** The generator being iterated over. */
        HelloGenerator& generator;

        /** The current state. */
        int state = 0;

        public:

        /**
         * Create the iterator.
         *
         * @param inName The name to output.
         */
        Iterator(HelloGenerator& inGenerator, bool isEnd = false) : generator(inGenerator)
        {
            if (isEnd)
                state = 2;
        }

        /**
         * Dereference the iterator.
         *
         * @return Returns a string.
         */
        std::string operator*()
        {
            if (state == 0)
                return "Hello ";
            else
                return generator.name;
        }

        /**
         * Increment the iterator.
         * This moves the iterator to the next state.
         *
         * @return Returns the iterator.
         */
        Iterator& operator++()
        {
            ++state;
            return *this;
        }

        /**
         * Compare the iterator to another.
         *
         * @return bool Returns true if the iterators are different, false if they are the same.
         */
        bool operator!=(const Iterator& rhs)
        {
            if (&generator != &rhs.generator)
                return false;

            return state != rhs.state;
        }
    };

    public:

    /**
     * Create the generator.
     *
     * @param iMax The last number in the range.
     */
    HelloGenerator(const std::string& inName) : name(inName)
    {
    }

    /**
     * Create an iterator at the beginning of the range.
     *
     * @return Returns the beginning iterator.
     */
    Iterator begin()
    {
        return Iterator(*this);
    }

    /**
     * Create an iterator at the end of the range.
     *
     * @return Returns the end iterator.
     */
    Iterator end()
    {
        return Iterator(*this, true);
    }
};

// Outputs: Hello Zebra
int main()
{
    for (const auto s : HelloGenerator("Zebra"))
        std::cout << s;

    return 0;
}

This is all that’s require to write useful, functional generators. It’s a simple matter to extend the generator to store more data or to extend the iterator to enable more states.

As a Callable

Although perfectly serviceable when used with loops and algorithms, the previous example fell slightly short of the ideal: Calling the generator and incrementing it were two separate operations. If you wish to have it truly callable, indistinguishable from any other function, then you can wrap it in a functor (an object callable as a function, implemented by overriding the () operator.)

The functor can be made generic using templates and will therefor work with any generator.

/**
 * A template to turn a generator into a functor.
 *
 * @tparam T The generator type. This can be deduced automatically.
 */
template<typename T>
class GeneratorFunction
{
    /** The iterator being used. */
    decltype(std::declval<T>().begin()) it;

    public:

    /**
     * Initialize the functor.
     *
     * @param g The generator over which to iterate.
     */
    GeneratorFunction(T& g): it(g.begin())
    {
    }

    /**
     * Override the function call operator so that the object
     * can be called as if it were a function.
     *
     * @return Returns the result of *iterator;
     */
    auto operator()()
    {
        auto result = *it;
        ++it;
        return result;
    }

};

// Outputs: Hello Zebra
int main()
{
    // Note! Don't do GeneratorFunction(HelloGenerator("")), as the generator
    // will be a temporary and the functor will be left with a dangling reference.

    HelloGenerator generator("Zebra");
    GeneratorFunction f(generator);
    
    std::cout << f();
    std::cout << f();

    return 0;
}

Further Improvements

To further increase compatibility with standard library algorithms you may wish to implement the LegacyIterator requirements. This means that the iterator must be copy assignable, which means using a pointer to the generator rather than a reference. I chose to use a reference here as it makes it clear in the example code that it can never be null.