Monthly Archives: August 2008

Iterators in Visual Basic

WARNING: This is a speculative post. Caveat emptor.

Actually, in this case I don’t thing the above warning is strong enough. This is a super speculative post, because I believe the chance of it appearing in the next version of the language is not extremely high, not because it’s not a worthy feature but because it’s more than a little work and we’ve got a lot of other very worthy features we’re considering. However, since it’s something that’s valuable and something we keep getting requests for, we have decided to at least generally sketch out what iterators would look like in Visual Basic, with the idea that when the time comes that we have the resources to do them we can jump on it. Consider yourself double warned.

To start with, an iterator is just a structure that returns a sequence of values. Basically, when the IEnumerable(Of T) interface method GetEnumerator() returns an instance of the interface IEnumerator(Of T), that instance is an iterator. You can iterate through the collection by calling MoveNext on the enumerator and looking at the Current property until you’ve run out of values. Or you can use the For Each statement, which will do the iteration for you.

It’s possible to construct iterators in Visual Basic today–all you have to do is implement IEnumerable(Of T) and then implement IEnumerable(Of T). But the problem is that doing this involves a whole lot of boilerplate code that is the same in almost all iterators. On top of that, if an iterator involves a lot of conditions or branching, doing state management can be quite tricky. In an ideal world, what you’d like to be able to do is simply state the values that should be iterated over and have the compiler generate all the boilerplate code and state management for you. C# does this through the yield statement, which can be used in a method that returns an iterator to automatically generate the iterator:

using System;
using System.Collections.Generic;

class Program
{
    public static IEnumerable<int> FromTo(int low, int high)
    {
        if (low <= high)
        {
            yield return low;
            foreach (int i in FromTo(low + 1, high))
            {
                yield return i;
            }
        }
    }

    static void Main(string[] args)
    {
        foreach (int i in FromTo(1, 5))
        {
            Console.WriteLine(i);
        }
    }
}

In this example, the FromTo method returns an iterator (typed as IEnumerable(Of Integer)) which is automatically generated by the compiler. What we’re considering in Visual Basic is essentially the same feature, but with a slightly different design. Instead of tying the iterator to a method, as in C#, we’re considering extending the multi-line lambda syntax that I talked about in the previous post to make “anonymous iterators.” The above example in VB would look something like:

Module Module1
    Function FromTo(ByVal low As Integer, ByVal high As Integer) As IEnumerable(Of Integer)
        Return Iterator
                   If low <= high Then
                       Return low
                       Return Each FromTo(low + 1, high)
                    End If
               End Iterator
    End Function

    Sub Main()
        For Each i In FromTo(1, 5)
            Console.WriteLine(i)
        Next
    End Sub
End Module

There are a couple of differences to notice here:

  • An anonymous iterator is just an expression, so it can be used as the return value of a function but could also appear as the target of a For Each, used in a LINQ query, stored into a field, etc. (This is particularly useful when a function produces an iterator and you want to do some validation. In the C# model, the validation code is not run until MoveNext is first called on the iterator, not when the iterator is actually produced.)
  • By default the type of the iterator is IEnumerable(Of T), where T is inferred from all the types returned from the iterator using the same algorithm we use for multi-line lambdas. You could also explicitly state the iterator type using an As clause, and we would allow IEnumerator(Of T) as the iterator type just like C# does.
  • We don’t require the yield keyword since we’ve already marked the block as being special by using a new contextual keyword Iterator. (There would also be an Exit Iterator statement, not shown.)
  • Our plan would be to add an Each operator that “unwraps” a collection and returns each of its values one by one. This is generally useful but also addresses some performance issues with nested iterators which you can read more about in Wes Dyer’s post on iterators. (It might also later allow things like unwrapping collections in collection initializers, like {1, 2, 3, Each a, 5, 6, 7}, where a is an array of integers or something…)

We’re interested in people’s feedback on this proposed design and any other comments you might have.