Local variables: scope vs. lifetime (plus closures)

Over a month ago, I asked what a particular chunk of code should do:

Module Module1

    Sub Main()

        For i As Integer = 0 To 2

            Dim x As Integer

            Console.WriteLine(x)

            x += 1

        Next

        Console.ReadLine()

    End Sub

End Module

I purposefully left the question open and vague because I wanted to see what the community feedback would be without any kind of preconceived notions. I didn’t expect for it to take me so long to return to this question, so I apologize if people got frustrated waiting, but I do want to get back to why I asked and what I think about the whole question. Let’s start by getting what actually happens out of the way: the program prints “0 1 2”. The reason for this takes a little bit of explaining.

What’s important here are two related but different ideas: scope and lifetime. The scope of a variable decides where a variable’s name can be used in a program. The lifetime of a variable decides how long the storage for that variable exists in memory. (In most programming languages the scope of a local variable is at very least a subset of the lifetime of the variable, otherwise you’d be able to refer to the local variable after its storage goes away, which would be bad.)

So the question now boils down to: what’s the lifetime of a local variable in VB? Most people who assumed that the answer would be “0 0 0” made the reasonable assumption that the lifetime of a local variable is the same as the scope of the local variable. So they expected that when the code reached the end of the For…Next block, they’d reached the end of both the scope and the lifetime of the local variable x and the storage for x would go away. Then, when the loop started up again, we would give you a whole new storage location for x that (like all storage locations) was initialized to zero.

However, those of you who tried it out discovered that in VB the lifetime of a local variable does not equal its scope. In fact, the lifetime of a local variable is from the beginning of a method all the way through to the end of a method, regardless of the variable’s scope. Even though x is only in scope within the For…Next loop’s statement block, it lives throughout the entire method. Thus, when you loop, you get the same storage location as you got the last time. And thus you get “0 1 2” instead of “0 0 0”. And, in fact, this is consistent with the way the Common Language Runtime works. When you define a method, you declare the locals that the method is going to use. When you enter the function, the CLR creates storage for those local variables and initializes them to zero. And when you exit the function, the CLR throws away the storage for those local variables. So VB is actually entirely in sync with what it’s platform does. And it’s the same for C#, only they finesse the issue — since you have to explicitly initialize all locals in C#, there’s no way to observe whether the lifetime of a local variable extends beyond it’s scope. But their local variables live just as long as the ones in VB.

This whole discussion is something of a minor point, at least until you get to closures, that is. What are closures, you ask? Well, the best way to explain them is by example. Let’s say you’ve got code that looks like this:

Sub Main()

    Dim value As Integer

    Dim xs = { 1, 2, 3, 4 }

 

    value = 2

    Dim ys = Select x From x In xs Where x < value

 

    For Each y As Integer In ys

        Console.WriteLine(y)

    Next

    Console.ReadLine()

End Sub

You’ll notice here that the query references the local variable “value”. Those of you well versed in the intricacies of LINQ will know, however, that the way LINQ works is that it pulls the expression “x < value” off into a function, a delegate of which gets passed to the Where method. Then the Where method uses this delegate to determine which members of the xs collection are filtered out. But how can we pull out the expression “x < value” to another method when the expression refers to a local variable? One method can’t see another method’s locals! Or can it…?

What happens in this case is we use a closure. A closure is just a special structure that lives outside of the method which contains the local variables that need to be referred to by other methods. When a query refers to a local variable (or parameter), that variable is captured by the closure and all references to the variable are redirected to the closure. So the statement “value = 2” assigns the value 2 to a variable location in a closure, not a variable location on the stack. Since the closure lives outside of the method, methods created by a LINQ query can legally refer to the local variables captured in the closure. And it all just works.

I’m purposefully skipping over a lot of the nitty-gritty of how closures work to avoid writing a whole chapter on this subject, but the practical upshot of this is that with closures, the lifetime of a local in an inner block becomes a whole lot more important. Let’s go back to a modified version of our original code:

Module Module1

    Sub Main()

        Di
m
queries(2) As IEnumerable(Of Integer)

        Dim xs = { -2, -1, 0, 1, 2 }

 

        For i As Integer = 0 To 2

            Dim y As Integer = i

            queries(i) = Select x From x In xs Where x <= y

        Next

 

        For Each q As Integer In queries(0)

            Console.WriteLine(q)

        Next

        Console.ReadLine()

    End Sub

End Module

The intent of this code is to create an array of queries that have different upper bounds — so queries(2) will return all values less than or equal to 2, queries(1) will return all values less than or equal to 1, and queries(0) will return all values less than or equal to zero. At least, that’s the intent. But if you go try this on the current LINQ code on my machine (not sure if it’ll run on the latest CTP or not), you’ll actually get the following result: “-2 -1 0 1 2”. Huh? The problem is that, if you’ll remember, the variable y lives for the entire method. Each iteration of the loop doesn’t get its own copy of y, it gets the same copy of y that every other iteration gets. This means, though, that when the query captures the local variable y, each iteration of the loop captures the same copy of y. Which means that when y gets changed inside of the loop, all the queries’ copy of y gets changed. All of the queries are going to return the same set of values.

What you really want in this case is for each iteration of the loop to capture a unique copy of y. In other words, you want to treat y as if its lifetime was only the inner part of the loop, not the whole method. And if you look at what C# does with anonymous delegates (and, now, lambda expressions), you’ll see this is what they do — since they require definite assignment, they can behave “as if” variables in inner scopes have shorter lifetimes than the entire method (even though they really don’t). To accomplish this, they have to use nested closures, which is beyond the scope of this entry and is left as an exercise to the reader (for the moment, at least).

So, the practical upshot is that with the introduction of closures to VB (regardless of whether we expose lambda expressions, which is still a bit of an open question), we’ve got a problem with local variable lifetime. We could use our flow analysis, introduced in VB 2005 for warnings, to perhaps finesse this issue the way C# does, but there are some complications. It’s very much an open issue, which is why I really wanted to see what people’s expectations were — it’s really useful data for understanding how people (at least those who read my blog) think about the problem.

Expect more down the road once we’ve got more of a handle on the problem, and kudos to anyone who made it this far

Updated 3/29/06: Corrected code error!

24 thoughts on “Local variables: scope vs. lifetime (plus closures)

  1. David Stone

    "So, the practical upshot is that with the introduction of closures to VB (regardless of whether we expose lambda expressions, which is still a bit of an open question), we’ve got a problem with local variable lifetime. We could use our flow analysis, introduced in VB 2005 for warnings, to perhaps finesse this issue the way C# does, but there are some complications."

    What are they? I don’t use VB.NET (although Erik Meijer gave me a very convincing speech after his session at the PDC as to why I should), but I’d love to understand what these complications are.

    Also, why *not* expose lambda expressions? VB.NET doesn’t have any explicit support for anonymous methods…and it seems like lambdas would be a convenient and easy way to give those to VB devs. They’re much easier to pick up than anonymous methods are (for obvious reasons) and I’d think that, since that fits in with the "ease of use" philosophy of VB.NET, that it’d almost be natural to kill two birds with one stone and expose lambdas.

    Reply
  2. Philip Rieck

    It took me many long hours to understand *why*

    int i = 0;

    for (int j = 0; j < 100; j++)

    {

    i = j;

    ThreadPool.QueueUserWorkItem(delegate

    {

    Console.WriteLine(String.Format("i = {0}; j= {1}", i, j));

    });

    }

    behaved differently than

    for (int j = 0; j < 100; j++)

    {

    int i = j;

    ThreadPool.QueueUserWorkItem(delegate

    {

    Console.WriteLine(String.Format("i = {0}; j= {1}", i, j));

    });

    }

    And why each did what they did. Long, brain-against-the-wall hours.

    Now, I use both VB and C# (mostly c#, but I try not to fall into the trap of misplaced language snoberry) And it seems to me that you just can’t get away with confusing behavior like this in VB.

    I don’t presume to know if the culture grew from the language, the language from the culture, or if it’s just that the culture found the language, but VB users just get stuff done. And looking at your last example snippet, There’s just no indication that any wierdness would be expected.

    A person looking at the code would not see a delegate hoisted. Using the debugger and stepping slowly would probably make the "problem" vanish.

    My opinion: You can’t create wierdness with no visual indication. I’d prefer it to work like it looks it should. Plain, simple. However, if that’s not possible there *must* be some syntax there that makes it obvious that you aren’t getting the simple behavior you think you are. A compiler warning won’t cut it. (Sadly, many VB.net projects I have gone in to modify compile with hundreds of warnings).

    Reply
  3. Tim Hall

    I would have thought that the compiler would complain that y was already declared, like it would if you changed the second loop in the second example to

    Dim y as integer

    For Each y In ys

    Console.WriteLine(y)

    Next

    That would cause the first Dim y as integer to complain "Error 46 Variable ‘y’ hides a variable in an enclosing block"

    This time my assumption was wrong, it doesnt complain even though it compiles to the same code (i assume). Maybe that is what you should do to fix that situation.

    I also think its a bit moot. I dont know anyone that would reuse the Y variable in that manner.

    AS for lambda expressions, i have no idea what they are, but i see no reason to hide them unless it means losing some exisitng or better future functionality then it has to be carefully weighed, but if the only reason is to keep things simple and hide the advanced stuff then i say include it, if people dont want to use it they dont have to they can blissfully ignore it (like with the way VB8 has custom events and regular events we can pick the time saving method or if its required go the advanced route).

    Reply
  4. Bill McCarthy

    Hey Paul,

    In your last bit of code there is no "ys" variable declared. I think you meant queries(0). πŸ˜‰

    And yes we do really want a unique copy of y in each loop. πŸ™‚

    One thing that might also be nice is the ability to explicitly bypass closures or at least closures that maintain state with the callign code. Kind of like telling it to take a snapshot of what this value is now. e.g:

    let’s assume that magic snapshot keyword was "Eval"

    For i As Integer = 0 To 2

    Dim y As Integer = i

    queries(i) = Select x From x In xs Where Eval(x <= y)

    Next

    there, everythign inside of the Eval(…) would be that unique copy, populated with the value at that time theprojection is instantiated not delayed until it is evaluated.

    Reply
  5. Geoff Appleby

    Hey Bill,

    The eval trick would be cool, but it assumes that fact that the real problem isn’t going to go away – the lack of unique y’s in the loop.

    People’s expectations of what they thought _would_ happen are sorta irrelevant. Sure, a lot of poeple might have expected ‘0 0 0’, but it’s not just VB that’s making the lifetime of the variable the entire method – as you said, the CLR is dictating it too. No matter how many people were expecting 0 0 0, the current behaviour (and correct, as far as i’m concerned based on the spec) is 0 1 2. This needs to stay as the norm – there’s going to be someone out there _relying_ on this fact.

    So the only option as I see it is make both happen by changing how these closures work. The closures are needed to allow the query to reference the local, but why do they need to point at the ONE closure each time? Each time a query is constructed, can’t it use a copy of the closure, instead of the closure itself?

    I don’t know. I haven’t looked at the internals yet, but it sounds easy enough πŸ™‚

    Reply
  6. Shannon McCracken

    The way I would (without reading anything about it) expect it to work is when a closure is made, the value of any variables are substuted in at that point – not a link to the existing variable.

    Writing something like:

    For i As Integer = 0 To 2

    queries(i) = Select x From x In xs Where x <= i

    Next

    makes (at face value) perfect sense to me.

    I’d expect "select …" to be more like what happens when you call a function/constructor – conceptually:

    For i As Integer = 0 To 2

    queries(i)=New SelectXfromXInXsWhereXLessThanI(i)

    Next

    Yes, this means the closure can’t change the value of i in the function, but that’s fine by me — the same semantics as passing parameters to an object constructor.

    I would find it linking to the variable itself to be confusing (I suspect closures are going to be confusing!) I’d advocate Closure being its own special keyword to highlight the the rather different nature of closures.

    But that’s just how I’d look at it. πŸ™‚

    Reply
  7. K. Kraemer

    So variable scope does not have any affect on when it is first allocated storage? It is always allocated at the beginning of the function.

    Does this mean there is no memory benefit for scoping a variable at the most appropriate level?

    Reply
  8. Geoff Appleby

    Kraemer: Exactly right. If you dim it, it’ll exist, even if you don’t enter a code path that uses it.

    The trick is that it’s not much of a memory waste with reference types. When you declare an instance of a value type, it’ll use as much memory as needed. But with a reference type, then it will only take up enough space for a ‘pointer’ until you call ‘new’ on it. That’s my understanding anyway – what the size of a pointer is in .net I can’t remember anymore – but I’d presume 32bit.

    Reply
  9. Bill McCarthy

    Hey Geoff,

    I think an approach like Eval would be useful generally speaking, not just for loop scoped variables.

    Take for example ByRef parameters. In VB.NET if I pass a property in, the actual property gets updated. that’s incredibly cool, but if you don’t want that to happen, then you need tot create a temporary variable. So Eval in that scenario woudl jsut be shorthand for creatign the temporary variable.

    wiht LINQ, the delayed evaluation of projections is a really cool feature. It allows you to build complex projections (projections on projections) with a greater degree of consistency within. It also delays potentially expensive operations until needed.

    However I think there will be times when you will want to decouple the projection from fields/properties. Here once again Eval would be shorthand for creating a local variable and passing that in (non loop construct that is)

    Now with loop scope variables, Eval would achieve that result as well. I’m not crazy about the idea of having to use it for loop scope variables, but I could see that it would work πŸ˜‰

    That being said, I think you might be right in saying that closures when using loop scope variables could be new instances of the closure rather than the same closure.

    Shanon: you can always evaluate the entire projection into array or list, and the evaluation is done at that point in time, so you can make it like a function call. Projections thoguh are pretty much Ienumerators that delay the evaluation until you enumerate them. that’s a good thing, and I don’t think we’d want ot loose that. If you want it behave like a function call, you can call ToArray (not sure if there is a ToList ?) on the projection.

    Reply
  10. Bill McCarthy

    Geoff: On re-thinking about having new instances of closures for loop scoped variables, I think that might potentially break any code where the loop variable is a reference type and there are reference integrity checks (e.g. If a Is b Then), etc)

    Reply
  11. Mike Griffiths

    Geoff –

    Slight confusion here perhaps. I thought that you asked people what they thought it "should do" not what they thought it did do – most knew that I think. I also got the impression that most people thought is "should" treat the variable declared within the loop as having a scope and lifetime confined by that loop.

    If the variable had been a class and the line had read

    Dim myClass as New someClass()

    there would have been no doubt that the programmer intended the class to be re-created from scratch with it’s internally defined default values in place.

    Reply
  12. Anthony D. Green

    I would like to make note of the VB parenthetical … operator (). When passing a value to a method expecting it to be passed ByRef you can ‘Force’ ByVal by wrapping the expression in parenthesis.

    I think that extending that into the closures would be better than Eval but maybe I’m wrong.

    One thing I can say about VB is that (unlike C#) we haven’t completely exhausted all the symbols on the keywords so we could make up syntax for days and not have any trouble. I also want to thank the VB team for considering the expected behavior of passing a property ByRef (that the C# team so naturally ignored).

    And I’m having mixed feelings about anonymous methods (are they a maintainence nightmare?). I thought they’d make multithreaded UI Control.Invoke look better but since they are anonymous METHODS not anonymous DELEGATES they really don’t save many keystrokes at all. In VB they’d save even less so probably. What is the unit-testability/bug-trackability of them?

    Oops, look at the time, I think this comment has left the scope of this post … or has it?

    Reply
  13. paulvick

    Wow, lots of great comments. I can’t address all of this, but some points:

    * I’m definitely NOT talking about breaking existing code. Code that’s written already needs to keep working. Any scheme we come up with needs to incorporate that, no question.

    * Definite assignment is extremely difficult in the context of On Error, and especially On Error Resume Next, because the branching behavior becomes extremely complex.

    * I think the idea of making capture (or non-capture) explicit is interesting. I’m sympathetic to the idea that most people don’t necessarily expect l-value capture to occur. This is something we’ll have to think further about.

    Paul

    Reply
  14. Jonathan Allen

    > The way I would (without reading anything about it) expect it to work is when a closure is made, the value of any variables are substuted in at that point – not a link to the existing variable.

    I agree with this statement. Instead of Eval, there should be a special syntax to say "Don’t just use the value, create a closure".

    Reply
  15. Pingback: @ Head

  16. Pingback: @ Head

  17. Pingback: Karl Seguin [MVP]

  18. Gilles Michard

    For me the problem is to pass the variable to the closure by reference and not by value. It should be handled as an input byVal parameter.

    Reply
  19. Pingback: PepLluis

  20. Thomas Heim

    Ce qui est important voici deux reliés mais différentes idées : scope et lifetime. La portée d’une variable décide où le nom d’une variable peut être employé dans un programme.

    And merci for great comments.

    Reply
  21. Pingback: Bloggers MSDN Latam

  22. Magnus Markling

    Hi!

    The last example is no longer working as stated. Although it’s a welcome change, what was changed?

    Regards

    Magnus

    Reply

Leave a Reply to paulvick Cancel reply

Your email address will not be published. Required fields are marked *