Garbage collection offers an alternative to
Traditional Memory Management. Instead of the application task or
Mutator explicitly de-allocating objects when they are no longer needed, the
Garbage collector uses program state information to determine whether any valid sequence of
Mutator actions could reference an
object. If no valid sequence of program actions can reference a object, then it is unnecessary in the on-going computation, the object is
Garbage and may be reclaimed by the collector. Conversely, all objects which can be referenced by such a sequence of actions are
Live.
Garbage objects may be isolated, containing no references to other garbage, or they may form lists, trees or other structures of potential arbitrary complexity with other objects, both free and live.
Garbage collection also implications for the operation of execution stack. Studies indicate that the cost of allocating stack frames and associated objects on a garbage collected heap can be comparable the cost of traditional stack allocation. This technique is especially useful for languages with closures, such as Scheme, in which frames can have a lifetime independent in the lifetimes of their callers and callees (those immediately `above' and `below' them on the stack).
In many cases garbage collection is integrated into other system components, for example virtual memory/paging or the saving of the runtime heap to backing store in Smalltalk.
Garbage collection as a graph traversal problem
Objects which can be referenced by a future sequence of valid mutator actions are said to be reachable. Those which will be referenced in the future by the mutator are said to be live, that is, necessary to the ongoing computation.
A garbage collector can be thought of as performing a liveness analysis on heap objects, locating those objects which cannot be referenced either directly from a root set (the stack and global variables), or from a root set via other objects on the heap. Transforming garbage collection into a graph traversal problem enables traditional graph traversal algorithms such as depth- and breadth-first searching to be utilised when tracing the heap to decide whether a object is garbage.
Liveness analysis is, however, inherently undecidable, because the Halting problem prevents determination of the liveness of objects in some situations.
Consider, for example, the following Java class:
/** A class to illustrate the undecidability of reachability analysis */
class Undecidable {
/** A method whose completion is undecidable */
void anUndecidableMethod(){
// ...
}
/** A method which illustrate undecidable reachability. */
void anotherMethod(){
// the object whose reachability is undecidable
Object anObject = new Object();
// the method call during which anObject reachability is undecidable
anUndecidableMethod();
// an access of anObject
anObject.getClass();
// another call method call
anUndecidableMethod();
}
}
anUndecidableMethod() is a method whose completion is an undecidable problem, in the Halting sense. anotherMethod() is another method, which creates a new Object and then calls anUndecidableMethod(). An anObject is only accessed in afterwards, which will never be executed if anUndecidableMethod() never returns. Since anUndecidableMethod()'s return is undecidable, so is the access of anObject().
In practice, all known garbage collectors assume that all method calls return. This is a conservative conservative assumption---it may lead to the retention of reachable but not live objects, but never to the freeing of reachable or live objects.
Most garbage collectors also assume that anObject is live throughout the call to anotherMethod(), even though no code in the last lines access it. This is also a conservative assumption, but it greatly increases the usefulness of debuggers, as variables hold their value until they pass out of scope, rather than having undefined value between their last executed reference and when they pass out of scope.
These two assumptions convert liveness into scope, a well understood notion fundamental to modern programming language design. Some work has been done on the effect of these two assumptions and it appears that in some cases, considerable memory is lost to them.
For example, consider the following Java class, which accepts a number of command-line arguments, reads them into variables and then performs some processing based on their values:
/** A class to illustrate retention of excessive memory */
class AClass {
/* A number of state variables whose value is determined by the command line
arguments to the program */
static boolean _stateInformationA;
static boolean _stateInformationB;
//...
/** A method to read the command line arguments into the state variables. */
static private void handleCommandLineArgs(String argv){
// ...
}
/** A method which actually does the programs work */
static private void doStuff(){
//...
}
/** The entry point into the program */
static public void main(String argv){
// handle the command line arguments
handleCommandLineArgs(argv);
while (true){
doStuff();
}
}
}
The program spends the vast majority of it's time in the while loop, by which time the command-line arguments, stored in argv, are known to be redundant (unnecessary for ongoing computation). While retention of such objects are necessary when a debugger is running (or in a system, such as the Smalltalk runtime environment, where the system debugger may be invoked at any time), in the general case, they may be reclaimed.
It should be noted that any objects used (or potentially used) throughout the nonterminating function are not reclaimable and need not always impact of collector design or correctness.
A good review paper on memory management and garbage collection is: Paul R. Wilson. Uniprocessor garbage collection techniques. In Proc of International Workshop on Memory Management in the Springer-Verlag Lecture Notes in Computer Science series., St. Malo, France, September 1992.
The garbage collection FAQ is at:
http://iecc.com/gclist/GC-faq.html