Lexical Closures In C

Introduction

Even if one is an experienced C programmer, a successful compilation of your program in the first attempt certainly brings a sense of satisfaction - whatever the logic behind it may be. "Hey, look! The GNU C Compiler has accepted my program!" Take a look at the two code samples given below; these are the two most common methods used by programmers to start coding a program.

/* Sample 1. */
main()
{
}

/* Sample 2. */
#include <stdio.h>

int main()
{
	return 0;
}

What is important while writing code is not the caution but the approach - but at times, it is good to be skeptical about your C programs to some extent. Not necessarily the logic, but the compiler you are using.

The GNU C compiler provides several options for compiling a piece of code; the more options you know, the more useful (and the more confusing) it can be. I have aliased the GCC front-end cc like this:

alias cc='gcc -Wall --pedantic-errors -Wstrict-prototypes'

The option --pedantic-errors helps me make my C programs adhere to strict ANSI standards. GCC provides several extensions to the C language, which are often either unnoticed or taken for granted due to people's assumptions. Here, I am going to give a brief description on one such extension - nesting of functions.

Nesting of Functions and Closures

In the article Functional Programming with Python, a function or a procedure is said to have some analogy to mathematical functions. If 'x' is a variable, then we have a function f(x) which does some operations on 'x' to give some value 'y'. Hence we have:

y = f(x)

The article also briefly describes closures. A closure is a property associated with functions; when a function manipulates input and produces output which holds the features and characteristics of the input, then the function is said to possess (we should say 'satisfy' instead of 'possess') the closure property.

[ The above definition is, perhaps, less rigorous than it could be; the standard definition of 'closure' in programming is a data structure that contains both a function and a set of variables defining the environment in which that function will be executed. -- Ben ]

For example: consider the set of natural numbers 'N'. If x₁ and x₂ are elements in the set N, and the function f(x) is an addition (by binary operator `+') of x₁ and x₂ then addition has the closure property. Since sum of x₁ and x₂ is again a natural number, we can say that the binary operator '+' satisfies the closure property over the set of natural numbers.

Programming languages like Python and LISP support nesting of functions. The above mentioned article explains with an example in Python. An example for LISP is given below:

(defun foo (a) (defun bar (b) (+ b 1)) (+ a (bar 3)))

(setq a (foo 4))
(print a)

The function `bar' is nested and defined inside the definition of `foo'. `bar' increments and returns the parameter that it takes, and `foo' returns the the sum of the return value of `bar' invoked with parameter 3 and the parameter that it takes. The variable `a' then is set to:

3 + 4 = 7

Hence, `a's value is printed as 7.

This feature of function nesting is seen in the C language, as an extension of GCC. Compiling the code below, with the --pedantic-errors option enabled, will tell you that `ISO C forbids nested functions' - but the code will compile cleanly without the option. Check out the code:

/* compile it with gcc --pedantic-errors filename.c*/
#include <stdio.h>
#include <stdlib.h>
int main()
{
	void foo()
	{
		printf("Hello World\n");
	}
	foo();
	return 0;
}

Like local variables, nesting of functions will restrict the scope of the function in which it is defined. For the above example, the binding of function foo is not visible outside main. The association between identifiers and the place to store their values is called binding, and scope refers to the part of the code where the binding of the identifier is visible.

Consider another example given below:

#include <stdio.h>
#include <stdlib.h>
int main()
{
	int x;
	x = 10;
	{
		float x;
		x = 4.2
	}
	return 0;
}

In the above example, `x' has two bindings with respect to main. But if we remove the declaration float x;, then the binding will be same throughout.

Scope

Consider a binary search algorithm performed over a list of sorted numbers. The code can be seen here, in listing 1.

We can localize the array 'A' and the function 'binary_search' to 'main' if we don't have any other functions that need to access 'binary_search'; an example of this can be seen here, in listing2.

Now both 'A' and 'binary_search' are within the lexical scope of 'main'; hence, they are enclosed in the same scope. Let us define lexical scoping a bit more: Lexical Scope is the scope defined by the structure of the code.

A language with lexical scoping can support function definitions within another function. With this, the nested function gets access to the local variables defined in the enclosing scope, and is itself visible during the definition of the function being nested. That is:

#include <stdio.h>
#include <stdlib.h>

int main()
{
	int x=10;
	void foo()
	{
		printf("hello\n");
	}
	int y=20;
	void bar()
	{
		printf("World\n");
	}
}

Here, only the binding of 'x' is visible to 'foo', whereas 'bar' can "see" the binding of 'x', 'foo', and 'y'. We can now say that the textual arrangement determines the lexical scope. Now, what if function 'foo' wants to access function 'bar'? One of the options here would be to declare the prototype of 'bar' before the definition of 'foo'. See the listing below:

#include <stdio.h>
#include <stdlib.h>

int main()
{
	int x=10;
	auto void bar(void);
	void foo()
	{
		printf("Hello\n");
		bar();
	}
	int y=20;
	void bar()
	{
		printf("World\n");
	}
	foo();
	return 0;
}

Thomas M. Bruel's paper on lexical closures in C++ describes this as a method to allow definition of mutually recursive functions at inner lexical levels. Removing the 'auto' keyword will give a warning message. Try it! (Refer to Section A.8.1 in Storage Class Specifiers by Kernighan & Ritchie for clarification and details.)

The other type of scoping is dynamic scoping, in which the active call stack handles name resolution during run time. To make it more clear, see listing3.

'x' in function 'foo' is a non-local reference, but it is local to function 'bar'. The print statement in function 'main' is also a non-local reference. If C was a dynamically scoped language (thank god that it isn't), then reference to 'x' in function 'foo' would be bound to its definition in the function 'bar'. However, C is a lexically scoped language, and thus reference to 'x' in function 'foo' is bound to its global definition. If we run this program, the output will be '1', not '0'.

Now consider the following (listing4): we have the definition of function 'add' within the scope of 'init_add'. It is interesting to note that 'add' refers to the parameter 'i', which is passed to the function 'init_add'. For the function 'init_add', the binding of 'i' is retained (even inside the function 'add') until 'init_add' returns. Now, from the mathematical definition of 'closure', the function 'add' is said to "close over" the parameter 'i'; therefore, 'add' satisfies the closure property over 'i', and this is termed a lexical closure, in which the lexical scoping is preserved (the reference for 'i' is not overridden by any other local definition of 'i' - not that there are any).

It should be clear by now that lexical scoping provides several advantages. Functions can be made reentrant and hence the compiled machine code will be reentrant. Local declarations can be stored in registers (in an optimized way) which eliminates the symbol references upon compilation (an optimization performed by the compiler.) We are no longer restricted to declaring all variables global (a very bad practice leading to problems like variable suicide among others) and passing parameters to every function that is invoked.

Why is there no nesting in ANSI, then?

I had mentioned earlier that nesting of functions is only seen in C as an extension of GCC. While it is an advantage to some extent, we cannot guarantee safe access to variables or functions within the lexical scope just because the closure property has been satisfied. We must not forget that even when the closure property is retained, in certain cases there will be no reference to a variable within the current activation record in the stack.

Conclusion and Further Reading

We have seen what the nesting of functions means and can do to some extent, given the lexical scoping rules of C. If you are interested in further reading, you should go through Structure And Interpretation of Computer Programs and the Wiki entry on Lexical Closure. For the article that inspired me, and which gives more details on the implementation part, see Lexical Closures For C++.

[BIO] I completed my B. Tech in Computer Science & Engineering from a small town called Trichur, in Kerala, God's Own Country in India. Presently I am working in Naturesoft Pvt. Ltd, Chennai, India as a Programmer. I spend my free time reading books on Linux and exploring the same. Also, I have a good appetite for Physics. My motive in life is to go forward, onward and upward.

Copyright © 2005, Hiran Ramankutty. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 112 of Linux Gazette, March 2005

<-- prev | next -->