Txr Internals Guide Kaz Kylheku CONTENTS: SECTION LINE 0. Overview 48 1. Coding Practice 55 1.2 Program File Structure 78 1.3 Style 92 1.3 Error Handling 154 1.4 I/O 167 1.5 Type Safety 177 1.6 Regression 219 2. Dynamic Types 228 2.1 Two Kinds of Values 235 2.1 Pointer Bitfield 246 2.2 Heap Objects 269 2.3 The COBJ type 289 2.4 Strings 306 2.4.1 Encapsulated C Strings 321 2.4.2 Representation Hacks for 2-byte wchar_t 365 2.4.3 Representation hacks for 4-byte wchar_t that is 2-byte aligned 423 3. Garbage Collection 433 3.1 Root Pointers 451 3.2 GC-safe Code 474 3.2.1 Rule One: Full Initialization 500 3.2.2 Rule Two: Make it Reachable 529 3.3 Weak Reference Support 717 3.4 Finalization 760 3.5 Generational GC 784 3.5.2 Representation of Generations 793 3.5.3 Basic Algorithm 829 3.5.4 Handling Backpointers 864 3.5.5 Generational GC and Finalization 942 4. Debugging 971 4.2. Debugging the Yacc-generated Parser 1102 4.3. Debugging GC Issues 1115 4.4 Object Breakpoint 1138 4.5 Valgrind: Your Friend 1157 0. Overview This is an internals guide to someone who wants to understand, and possibly change or extend the txr program. The purpose is to give explanations, provide rationale and make coding recommendations. 1. Coding Practice 1.1 Language Txr is written in a language that consists of the common dialect between C90 and C++98. The code can be built with either the GNU C compiler or the GNU C++ compiler. Use is made of some Unix functions from before Unix95, which are requested by means of -D_XOPEN_SOURCE (POSIX.1, POSIX.2, X/Open Portability Guide 4). Also, the header is used, which was introduced by a 1995 addendum to the C language, so it may be said that the actual C dialect is C95. In coding new features or fixing bugs, care must be taken to preserve this. Code must continue to compile as C and C++, and not increase the portability requirements. C++ compilation can be arranged using ./configure --ccname=g++ (for instance). Note that txr takes some non-portable liberties with the language, such as encoding bit fields into pointers, and treating automatic storage as a flat stack which can be treated as an array that can be walked by a garbage collector looking for references to objects. There are assumptions about the alignment of objects too. 1.2 Program File Structure The txr code has a simple flat structure: a collection of .c files (and also a .l flex file and a .y yacc file) and headers. The txr project follows the include header style that every C source file includes all needed headers, in the proper order. Headers do not include other headers. The generation of the dependency makefile dep.mk depends on this; the depend.txr script does not scan headers for inclusion of other headers. If this stylistic decision is ever changed, the dependency generation will have to be updated. 1.3 Style Tab characters are avoided in txr source files. The indentation is two characters. Formatting is similar to K&R, though the yacc grammar files use a Lispy formatting. Expression or statement elements which are syntactically parallel, but on separate lines, must be horizontally aligned with each other: if (function(argument1, argument)) rather than: if (function(argument1, argument)) The opening brace of a function goes on a separate line. if/else braces ``cuddle'' into the previous line, except when the condition spans multiple lines: if (multi + line + condition) { /* brace doesn't cuddle */ } else { } switch cases indent with the switch: switch (x) { case ... ... break; } switches handle all enumeration members; default cases have a break even if they are last in the block. The following style is permitted if (...) { ... } else switch (...) { case ... } Forward and backward goto are permitted, unless it is /glaringly/ obvious that the code can be written better without it. Certain C programming conventions are avoided. For generic pointers to anything (needed in some low-level code) use the type mem_t *, not void *, and use casts on conversions to and from this pointer. The void * pointer, which came into C by way of C++, is brain-damaged. It allows C programs to subvert the type system without any cast operators or diagnostics. In C++ it's a little better because conversions from void * require a cast. In this project, we want all hazardous pointer conversions to be marked in the code by casts, whose presence is demanded by compiler diagnostics. 1.3 Error Handling Multiple return points from functions are encouraged. Txr has a garbage collector, so there is usually no need to branch to a common cleanup just for the sake of freeing memory. Txr also has exceptions; code that must free some resource other than garbage collected memory if a failure occurs, must be exception safe. Exceptions should be used for both internal errors and environmental situations. The internal_error macro is preferred to calling abort. 1.4 I/O Use of the C streams and printf must be avoided. Txr has its own streams and its own formatter function called format. Printing to a dynamic string is supported. There are three global streams: std_output, std_input and std_error. These streams don't do everything that standard I/O streams can do, such as binary I/O, but their capabilities can be extended. 1.5 Type Safety The void * type must be avoided in this project. For a generic pointer to any object, use the mem_t * type, which comes from lib.h. Do not call malloc directly; use chk_malloc. This function won't return a null pointer; it throws an exception. It returns mem_t *. struct mystruct *foo = coerce(struct mystruct *, chk_malloc(sizeof *foo)); Raw C casts like (ptr *) must be avoided. Three macros are provided for different kinds of casting: strip_qual, convert and coerce. Use the strip_qual macro for writing conversions which strip type qualifiers (const, volatile) from a pointer: const char *with_const = "abc"; char *without_const = strip_qual(char *, with_const); Use the convert macro for most value conversions that do not involve type punning, but do not take place implicitly. int i = 0; enum foo { a, b, c } enum_val = a; enum_val = convert(enum foo, i); Use the coerce macro for type punning conversions: char *str = "abc"; unsigned char *buf = coerce(unsigned char *, p); extern void function(int); cnum n = coerce(cnum, function); If you compile TXR using a C++ compiler, it will inform you if you have used these macros incorrectly: for instance, using a convert for a conversion that requires a coerce or vice versa, or writing a convert or coerce which strips qualifiers. Use ``make enforce'' to check the code for violations of some rules; this make rule must succeed. Also, the code must build cleanly as C++, not only as C. 1.6 Regression All changes must be verified not to break the test cases. This is done by running ``make tests''. Running ``make tests'' is not possible if the code is being cross-compiled; in that case run ``make install-tests'' after ``make install''. This will add the test cases and a shell script to run them to the installation. The cases can then be installed and run on the cross target. 2. Dynamic Types The txr code is organized around a dynamic typing paradigm implemented in C. Values are represented by the C type val, which is a typedef name for a pointer to obj_t, i.e obj_t *. 2.1 Two Kinds of Values A value of type val falls into two kinds: heaped and immediate. An heaped val points to an obj_t object, which is a union of a number of structure types, discriminated by a type field. An immediate val actually contains the value inside the pointer, and does not point to anything. 2.1 Pointer Bitfield Immediate and heaped values are distinguished by a two-bit field in the least significant bits of the pointer. If the two bits are 00 (i.e. the pointer is four-byte-aligned) then the value is a pointer to a heaped object, unless it is the null pointer. The null pointer is understood to be the object nil. The is_ptr(v) macro evaluates true for a value v which is not nil, and which points to a heap object (at least according to its bit field; is_ptr does not validate the pointer). The codes 01 10 and 11 indicate immediate values: values of type NUM, CHR and LIT, respectively. That is to say, if the tag bits are 01, then the remaining upper bits of the pointer constitute a signed integer. The range of this integer is NUM_MIN to NUM_MAX, defined in lib.h. The code 10 is for characters: the remaining bits of the pointer encode a wchar_t value. The bits 11 indicate that the object is a pointer to an encapsulated C string (of wide characters), which is most often a literal. See the subsection Encapsulated C Strings below. Only C strings whose first character is suitably aligned can be represented as LIT objects. The address of the first character of the string is formed by masking out the 11 code, leaving a pointer which is four-byte aligned. 2.2 Heap Objects Heap types are an union of various structures: union obj. The obj_t name is a typedef. All of the structures are no larger than four pointer-sized words, including the type tag, and it should be kept that way. Heaps are managed as arrays of this union obj. If any one of the union members is made larger than four words, then the heap size will increase. Though the type tag is defined by a enumeration, for memory management purposes, the type field is overloaded with additional bitmasked values. The FREE flag in a type field marks an object on the free list. The REACHABLE flag is the marking bit used during garbage collection. There is also a MAKEFRESH flag, which is used to in conjunction with REACHABLE to indicate that an object, though reachable, should be kept in, or demoted to to the baby object generation. The implementation of user-defined finalization handlers uses this to prevent finalized objects from becoming mature, just because they were made reachable for the purposes of the finalization callback. 2.3 The COBJ type The COBJ type is a mechanism whereby a ``native'' C type can be integrated into the dynamic type system. Under the COBJ model, the heap allocated object of type COBJ serves as a handle which points to a separately allocated C object, which can be an arbitrary structure. The relationship between the dynamic world and this object is managed through a registered table of operations. The module managing that object must provide functions for dealing with garbage collection, printing, equality and hashing. The garbage collector hooks allow the object's module to be notified when the associated COBJ handle becomes unreachable. The associated C object may contain references to dynamic objects (i.e. members of type val). In that case, it must provide the mark function, which, when invoked, must traverse the object's members of this type and report to the garbage collector that they are reachable by invoking mark_obj on them. 2.4 Strings All string manipulation should be done using the dynamic object system. The object system provides three kinds of strings: encapsulated C strings, regular strings and lazy strings (type tags LIT, STR and LSTR, respectively). Most code working with strings doesn't have to care about the difference between these. However, taking advantage of the performance capabilities of lazy strings requires some special coding (which is backward compatible with regular strings). For instance, if you want to know whether the length of a lazy string S is greater than 42, you don't want to do this: gt(length_str(S), num(42)). This will force an instantiation of the lazy string. There are functions for testing whether a string's length is greater, lesser, greater or equal and lesser or equal, to some number. 2.4.1 Encapsulated C Strings The design of the dynamic type system recognizes that programs contain literals and static strings, and that sometimes transient strings are are used which have temporary lifetimes. Therefore, a special provision is made in the val type to be able to represent C strings directly, without having to create dynamically allocated copies in heap storage. These C strings represented as values of type val are referred to in this document as encapsulated C strings. A C string whose address is aligned on a four-byte boundary, or more strictly, is converted to an encapsulated C string by masking the bits 11 into the least significant two bit positions of its pointer, and then manipulated as a value of type val (pointer to obj_t). Encapsulated C strings can be transparently used wherever the other kinds of strings can be used, so the benefit is immense, for the small cost of a bit operation. Most often, this feature is used for literals, and the lit macro is provided for this situation. The macro call lit("abc") produces a value of type val which represents the wide string L"abc". However, C strings other than literals can be encapsulated as values also. The most obvious candidates are static strings which are arrays, rather than literals, and stack-allocated strings, which C programs often use as efficient temporary buffers for character manipulation. Two functions are provided for converting these kinds of strings to encapsulated strings: the functions static_str and auto_str. They do the same thing: simply take the wchar_t * pointer and convert it to a obj_t * pointer with the bits 11 in the tag field (thus requiring that the C string pointer be aligned such that these bits are originally 00). Two different functions which do the same thing are provided, because it is generally much safer to convert a static string to a val (due to its indefinite lifetime) than an automatic string (which becomes indeterminate when the enclosing block terminates). Care should be taken to only ever use auto_str to wrap a stack-allocated string as a val, so that such usage can be found in the program by searching for occurrences of ``auto_str''. Secondly, care should be taken to ensure that values produced by auto_str do not try to escape beyond the lifetime of the enclosing block. If they are passed to functions those functions must not retain the value in any persistent place. For instance if an object is constructed which contains an automatic string, that object must not be used beyond the lifetime of that string. Note that it is okay if garbage objects contain auto_str values, which refer to strings that no longer exist, because the garbage collector will recognize these pointers by their type tag and not use them. 2.4.2 Representation Hacks for 2-byte wchar_t On some systems (notably Cygwin), the wide character type wchar_t is only two bytes wide, and the alignment of string literals and arrays is two byte. This creates a problem: we need a two-bit type tag in the pointer, but pointers have only one spare bit due to their strict alignment. It turns out that this is not a problem provided that we can ensure that no two distinct string objects share the same four byte word, and if we're willing to incur a small performance penalty to find the beginning of the string when we need it. On these systems, what we do is add a null character at the beginning of the string, and an extra one at the end: So the literal L"abc" is actually represented by L"\0" L"abc" L"\0". We then take the pointer to the 'a' character as the string, which falls into one of two cases: it is either four-byte aligned (case 1), or it is two-byte aligned (case 2). Either way, it falls into some four byte cell, either at its base or at its third byte. When we add the tag bits 11 (TAG_LIT), we make this pointer point to the fourth byte (byte 3) of the four byte cell. To recover the pointer, we remove the tag (replace it with bits 00), which leaves us pointing to the base of the four-byte cell. The string either starts there (case 1) or two bytes higher (case 2). The case is distinguished by looking at the pointed-at wchar_t. If it is the null character, then the pointer is incremented to the next character. The padding at the end of the string ensures that this trick works for the null string, where the test for the null character always succeeds. The lit macro, which existed before this hack, takes care of doing this so most code doesn't know the difference. The new wli macro helps manage this representation when access is needed to C string literals which are not used directly, but first assigned to variables, and also provides type safety by using a different pointer type for strings which have been treated with the padding. const wchli_t *abc = wli("abc"); /* special type */ val abc_obj = static_str(abc); /* good: requires const wchlit_t * pointer */ val xyz_obj = static_str(L"xyz"); /* error */ val def_obj = static_str(lit("abc")); /* error */ The wini and wref macros manage this representation when character arrays are used. The wini macro abstracts away the initializer, so the programmer doesn't have to be aware of the extra null bytes: wchar_t abc[] = wini("abc"); /* potentially six wchar_t units! */ The wref macro hides the displacement of the first character: wchar_t *ptr_a = wref(abc); /* pointer to "a" */ wref(abc)[1] = L'B'; /* overwrite 'b' with 'B' */ On a platform where this hack isn't needed, these w* macros are no-ops. 2.4.3 Representation hacks for 4-byte wchar_t that is 2-byte aligned On the LLVM compiler on OS X, I ran into the issue that although wchar_t is four byte aligned, the compiler neglects to make wide string literals four byte aligned. Cases occur of misaligned literals. The solution is to borrow some of the logic that is used for handling two-byte wchar_t. The data is similarly padded, and an adjustment calculation takes place similarly to recover the pointer. 3. Garbage Collection Txr has a fairly simple mark-and-sweep garbage collector. The collector marks objects by performing a depth-first-search over the graph formed by inter-object references, starting at certain root values. Objects which are not marked are identified during the sweep phase, which is a linear scan through the object heaps, and placed on the free list. During the marking phase, the bit value 0x100 (denoted by the symbolic constant REACHABLE) is used to mark reachable objects. This flag is reset during the sweep phase, but the flag 0x200 (the value of the symbolic constant FREE) is added to the type field of objects on the free list. This FREE flag has the effect of ``poisoning'' free objects: if an object is prematurely reclaimed (indicating a bug in the garbage collection system), uses of that object will see a bad type tag, so that there is a good chance the program will throw an exception due to a failed type check. 3.1 Root Pointers The marking phase of the garbage collector looks in two places for root pointers: by scanning the entire call stack, and by looking at a registered list of global variables. Scanning the stack means that the garbage collector is conservative: it could encounter values which look like valid object references, but are actually only accidentally so due to having the right bit pattern. When this happens, objects that should be considered garbage will remain live. This is called "spurious retention", and can be a bad problem, but it's better than the opposite problem of premature deallocation. Global root pointers are registered individually using the prot1 function, or many at once using the protect function. Care must be taken to properly null-terminate the variable argument list to protect. It does not use the nao convention, but rather (val *) 0. The garbage collector takes care to also scan the machine registers. This is currently done using a broadly portable approach, namely recording the machine state into the stack with the setjmp macro. 3.2 GC-safe Code Since garbage collection is being used in code processed by a compiler which knows nothing about garbage collection, it is important to obey certain rules so that the code is gc-safe. Code which is not gc-safe is susceptible to two potential serious problems: the premature garbage collection of an object, and accesses, in the garbage collector, to uninitialized parts of an object. The rules for gc-safe code are not difficult in txr, due to the immense simplification that the garbage collector scans the stack and registers. If a value is in an automatic local variable, or if the code is working with the value as the result of an expression, function return, or passing it as a function parameter, that value is visible to the gc and protected. Thus, the rules only have to be followed in lower-level code which is close to the allocator. Normal application code does not have to follow any special rules. The garbage collector is called implicitly by code which calls make_obj to pull a raw object from the garbage collector's free list. Code which does not allocate code will not be interrupted by the garbage collector. That's another helpful simplification, but it comes at the cost of not supporting multithreading. However, code that calls make_obj must be written with the assumption that make_obj may garbage collect on any call. Now, here come the rules. 3.2.1 Rule One: Full Initialization A function which calls make_obj must not be hanging on to any references to a partially initialized object. Any partially initialized object may be visited by the garbage collector during the call to make_obj. A partially initialized object may have a type code which still indicates that it is free. If the garbage collector encounters an object on the stack which is free, it will simply skip that object. This means that the sweep phase may then return that object to the free list. If a free object is encountered during transitive marking, the garbage collector will abort. In other words, if the program allocates an object from the free list, but then accidentally invokes the garbage collector prior to completing the initialization of that object, the object may be reclaimed back to the free list and the program is then working with a freed object; or the program may even abort. If the program initializes only the type field of the object from make_obj, but not the other fields that may contain a value of type val, and then invokes the garbage collector, the garbage collector will treat that object as visible, and then try to mark the val-typed fields of that object, thereby using uninitialized memory. The full initialization rule is therefore that after make_obj is called, the object must be fully initialized before doing any other operation that may allocate gc memory. Fully initialized means that the type field is initialized, as well as any other field that is visited during garbage collection. 3.2.2 Rule Two: Make it Reachable A function which constructs an object must place it in live, reachable storage before attempting to construct another object. The garbage collector does not scan all of memory for root pointers, only the stack and registered globals. So for instance, if the only reference to an object is inside a dynamically allocated structure, and that structure is not visible to the allocator, then if gc is invoked, that object will be reclaimed. So the following pattern is incorrect. { some_struct_type *t = (some_struct_type *) chk_malloc(sizeof *t); t->value = cons(foo, bar); return cobj((mem_t *) t, some_type_symbol, &some_type_ops); } There are three allocations in the code. The allocation of the structure assigned to pointer t, the allocation of the cons cell stored in t->value, and the allocation of the COBJ. The issue is that the object t is not known to the allocator. It is a ``native'' C type, which the garbage collector will not traverse. The garbage allocator can see the pointer t, because it scans the stack and registers, but that pointer is not recognized by the garbage collector since it doesn't point into one of its heaps, and so the collector will not find and mark the t->value member. Of course, the operations structure ``some_type_ops'' presumably contains a mark function which knows how to traverse this object and find values inside it. But that does not come into play until this object is registered as a COBJ, which does not happen until the last line in the above block where the cobj function is called. After the cobj call, the t pointer is hooked into the COBJ object, and visible to the garbage collector. So the object allocated by cons(foo, bar) is put into a structure which is yet invisible to the allocator, and that reference is the only live reference which the program has to that cons cell. Consequently, the subsequent call to the allocator, hidden inside the cobj function, may trigger gc, and cause this cons cell to be reclaimed into the free list! The following adjustment does not fix the problem: { val c = cons(foo, bar); some_struct_type *t = (some_struct_type *) chk_malloc(sizeof *t); t->value = c; /* still wrong */ return cobj((mem_t *) t, some_type_symbol, &some_type_ops); } Even though the cons cell is now also held in a local variable, as well as in the structure, it is still not necessarily visible to the garbage collector. The problem is that after the ``t->value = c'' assignment, the variable c is no longer live. Variable liveness is a concept from dataflow analysis, which is a process implemented in optimizing compilers. A variable is live at some point in the code if the value stored in it has a next use: another code can be reached from that point which uses the value. The variable c has no next use after the t->value = c assignment. There is only one execution path from that point in the code, and that path leads to the termination of the block, which destroys c. Essentially, the t->value structure member is the sink for the data flow which carries the cons cell: The data flow emanates from the call cons(foo, bar), and terminates in t->value. Here is yet one more incorrect way to fix this: { val co; some_struct_type *t = (some_struct_type *) chk_malloc(sizeof *t); t->value = nil; co = cobj((mem_t *) t, some_type_symbol, &some_type_ops); t->value = cons(foo, bar); return co; } The above properly initializes the structure, and then associate it with the COBJ. This makes the structure visible to the garbage collector (through the co variable, which is live at the point where the cobj function is called, due to having a next use in the return statement!) Now we can safely stash a newly allocated cons cell into that structure, allowing that structure to hold the one and only reference to that object. The issue which renders the above incorrect is with *how* we stash that cons into the object. The above breaks specifically because of generational garbage collection. The issue is that the t->value = cons(foo, bar) uses a plain C assignment. The problem is that the cons(foo, bar) call can trigger a garbage collection, which can promote the co object into the mature generation. Yet, the cons itself is a baby object. And consequently, the assignment now mutates a mature object to point to a baby object: the forbidden direction. If the above code structure is used, the assignment must use the set macro: { val co; some_struct_type *t = (some_struct_type *) chk_malloc(sizeof *t); t->value = nil; co = cobj((mem_t *) t, some_type_symbol, &some_type_ops); set(mkloc(t->value, co), cons(foo, bar)); return co; } This is cumbersome. Another approach, which avoids two-step initialization of the structure, and the cumbersome set: { val c = cons(foo, bar); some_struct_type *t = (some_struct_type *) chk_malloc(sizeof *t); co = cobj((mem_t *) t, some_type_symbol, &some_type_ops); t->value = c; return co; } In this situation, the variable c maintains a live, gc-visible reference to the cons across the cobj allocation. The variable c is live at the point of the cobj call because it has a next use: its value is used in the subsequent assignment to t->value. We don't initialize the structure because even if the cobj function triggers gc, the gc cannot yet see that structure and so there is no danger. After cobj returns, the first thing we do is initialize the structure (obeying the first rule of gc-safe code). Just after cobj returns, the structure is uninitialized and visible to the garbage collector, but there is nothing that will trigger gc prior to the initialization. The generational issue goes away because if the call to cobj triggers garbage collection, it will mean that the cons is a mature object. There is no problem with the assignment because it mutates a baby object to point to a mature object. Note that this premature collection problem also affects functions which simply take an existing object and put it into a structure, where it is not obvious that an object may have been allocated which is not visible to gc, /* Looks harmless: allocate structure, stick the argument object into it and make a COBJ! */ typedef struct { val mem; } foo; val make_foo(val member) { foo *f = (foo *) chk_malloc(sizeof *foo); f->mem = member; /* Oops, member is no longer live. */ return cobj((mem_t *) f, ...); } The problem is that the caller which invokes foo might not maintain any live reference to the argument object either, and so the f->mem = member might be the one and only sink for the data flow carrying that object; i.e. the one and only reference to that object in the entire program. One way that can happen is that the object is just a temporary that is allocated in the function call expression itself: make_foo(string("abc")); /* oops! */ The make_foo function can be corrected like this: val make_foo(val member) { cobj co; foo *f = (foo *) chk_malloc(sizeof *foo); f->mem = nil; /* do not forget Rule One */ co = cobj((mem_t *) f, ...); f->mem = member; return co; } Another possible approach is to use the gc_hint function to ensure liveness: val make_foo(val member) { foo *f = (foo *) chk_malloc(sizeof *foo); val out; f->mem = member; /* Oops, member is no longer live. */ out = cobj((mem_t *) f, ...); gc_hint(member); return out; } gc_hint provides a data sink for the member, ensuring that this variable stays live across the call to cobj. The variable is no longer live at the "return out" statement, but at that point it doesn't matter because it has been safely stored in f->mem, which has been firmly installed as the handle of a cobj, and is visible to the garbage collector. The tradeoff is that the first approach generates two writes to f->mem, whereas the second makes an external function call. 3.3 Weak Reference Support COBJ objects can support weak pointers, but there is no fully encapsulated interface for this; to be more specific, adding a new module of objects that have weak references, it is necessary to to add a function call code into the garbage collection function. Modules with weak references should closely follow the design pattern used by the hash module. Hash tables are implemented using COBJ, and provide weak key and value support thanks to cooperation with the gc module. Weak references work as follows. During gc marking, a given COBJ module must maintain a list of all objects of its kind which are marked (or at least just that subset of them which contains weak references). It must refrain from marking the weak references contained in these objects, but rather leave them unmarked. After the initial marking phase, gc will call a global function in each module that manages objects with weak references. (Currently there is only one such function: hash_process_weak; a similar function must be written for a new module and added). This function must process and clear the weak list gathered during the initial marking. Each weak reference in each object on this weak marked list must be inspected to see whether it refers to an object which is still reachable. Weak references which point to values which have not been reached (do not have the REACHABLE bit) must be lapsed according to the object's rules for lapsing weak references. For instance, a hash table with weak keys will delete a key/value pair if the key reference lapses. A weak pointer container object might convert a lapsed weak reference to the value nil. Weak objects can defer marking certain other non-weak objects. For instance the hash module, during marking, does not mark the vector object that serves as the hash chain table (at least not for weak hashes), and neither does it mark the conses which make up the hash chains emanating from that vector. This marking is completed in hash_process_weak. After the lapsed entries are removed (their conses are spliced out of the chains), then the vector is marked, which transitively causes the chain conses to be marked. The conses that were removed due to the lapsing of weak keys thus stay unmarked and are reclaimed during the sweep phase of the gc, which soon follows. 3.4 Finalization Finalization (user-defined finalization hooks associated with objects) is implemented in a fairly straightforward way, with a slight complication having to do with generational GC (see below). Finalization uses a simple global list of registrations which is processed during every garbage collection pass, in two phases. First, just after the regular garbage collection marking phase and weak hash table processing, the finalization registration list is walked twice. First, those entries in it whose registered objects are still unreachable are flagged for later processing, then in the second pass over the list, all objects are given "new lease on life" by being marked as reachable. The registered functions in all entries are marked as reachable also. Next, the garbage collection sweep pass takes place to reclaim all unreachable objects and reset the GC-related flags in all objects. When this is done, it becomes safe to call the finalization functions. The list of registrations is walked once again, and the previously flagged entires are called out and expunged. The list is walked in a safe way which allows the called handlers to register new finalizations. The newly registered finalizations are combined with the unflagged, unexpunged previous registrations into a new list, which will be processed at the next garbage collection pass. 3.5 Generational GC 3.5.1 Preprocessor Delimitation Currently, the generational GC code is delimited by #ifdef CONFIG_GEN_GC. So to understand what the differences are between the regular GC and generational, one just has to read those sections dependent on that preprocessor symbol. 3.5.2 Representation of Generations Generational garbage collectors are typically copying collectors. In a copying system, objects can be segregated into generations by their physical location. If an object is in a certain area, then it's in a certain generation. Moving it to a different area reassigns it to a different generation. In TXR, the garbage collection is non-copying. For generational GC support, we simply carve some bits from the type field of an object to indicate the generation. There are only three generation values: -1, 0 and 1. Generation 0 indicate a "baby" object. The value 1 indicates a mature object. A freshly allocated object is put into generation 0. The generation value -1 has three different meanings: 1. It is used to mark baby (generation 0) objects which have been stored into the checkobj array, so that they are not placed there twice. 2. It is used to mark mature (generation 1) objects which have been placed into the mutobj array, also so they are not put there twice. 3. It is used during marking to flag reachable objects which should not be placed into, or remain in, generation 1. The checkobj and mutobj arrays have to do with handling backreferences from old to young objects, and are described a few paragraphs below. Meaning 3 has to do with interaction between finalization and generational GC, also described below. These different uses of -1 don't interefere because when the checkobj and mutobj arrays are processed, which happens early the marking phase, those objects are changed from generation -1 to generation 0. (A temporary situation, done in the knowledge that all reachable objects in generation 0 that are processed in the sweep phase will go to generation 1). The third meaning of -1 is used in the last phase of marking having to do with finalization (the prepare_finals function), which applies this -1 generation only to objects that have been left unreachable by all previous marking, and are reachable only thorugh the finalizer registration list. After this phase, only these special objects can possibly have generation -1. 3.5.3 Basic Algorithm When an object is newly allocated, it is not only assigned generation 0, but is appended into the freshobj array. This array allows the garbage collector to identify all of the baby objects (because unlike a copying collector, it cannot just traverse a "nursery" area). The array is cleared on every garbage collection, and after each garbage collection, there are only mature objects, since all live objects are promoted to generation 1. So freshobj identifies all baby objects since the last garbage collection, which is the same as all baby objects in existence, period. Whenever the freshobj array fills up, a generational collection cycle must be triggered, otherwise there is no place to record the next baby object. Generational collection walks all of the root places like the stack and registered globals. However, generational garbage collection does not traverse generation 1 objects. It traverses only objects whose generation is less than 1. When a generation 1 object is visited, the recursion simply returns. All generation 1 objects are considered reachable, without the necessity of visiting all of them and marking them. This of course may be wrong: there may be generation 1 objects which have become garbage. Generational garbage collection will not find generation 1 garbage, only a full garbage collection pass will. Under generational GC, a full sweep is also not performed. Since a full mark was not done, it would be pointless. A full sweep would just waste time visiting all of the heaps and necessarily skipping all the unmarked generation 1 objects, almost defeating the point of generational GC. The full sweep is replaced by a generational sweep which traverses only the baby objects, which, recall, are all in the freshobj array. Those baby objects which were not marked during the marking phase are recycled. So generational GC saves time by avoiding doing full marking (terminating whenever it meets a generation 1 object) and avoiding a full sweep (processing only the freshobj array). 3.5.4 Handling Backpointers Under generational GC, there is the problem that objects in generation 1 can be destructively changed (mutated) so that they point to baby objects. This is a problem, because generational GC avoids traversing the generation 1 objects. If the only reference to a baby object is a mutated pointer in a mature object, and the GC doesn't realize this, it will reclaim that baby object, leaving the mature object with an invalid, dangling pointer. This problem is solved by identifying all such destructive operations in the code base, and ensuring that they go through an appropriate interface rather than a direct C assignment. In various areas of the code base, a type called loc is used which points to a memory location of type val. When TXR is compiled with the ordinary mark-and-sweep garbage collector, the loc type is just a typedef name for "val *". When TXR is compiled for generational GC support, the loc type becomes a structure holding a pair of values: a val * pointer called "ptr" and a val called "obj". The obj member holds a reference to an object, and ptr points to specific memory location inside that object, such as the cdr field of a cons cell, the element of an array or whatever. Any potentially unsafe assignment to a storage location inside a heap object is performed by obtaining a pointer to that location of type loc. The set macro is then used to store a value in it. Under generational GC, the set macro expands to the call to a function called gc_set which performs the necessary checks to see whether a location within a gen 1 object is being assigned to hold a gen 0 object. When gc_set detects that the address of gen 0 object is being written into the field of a gen 1 object, it changes the generation of the gen 0 object to -1 and stores in in the next available element in checkobj array. The change to -1 prevents it from repeating this action for the same object twice since duplicates only waste space in the checkobj array. Not only are the duplicates wastefully visited more than once, but when checkobj is full, a generational GC cycle is triggered. During a generational gc, the checkobj array is treated as an additional root area, ensuring that baby objects that might be the target of a backpointer from generation 1 are marked and retained. In addition to set there is also mpush: a macro for pushing onto a list which handles the situation of gen 0 cons cell being pushed onto list held in a gen 1 location. In some cases, an additional macro called mut is used instead. This macro is part of an alternative strategy for dealing with the backpointer problem. Under the regular garbage collector, this macro does nothing, but under generational GC, it places objects into the mutobj array, which is similar to the checkobj array. Unlike the checkobj array, which holds baby objects suspected of being reachable from generation 1, the mutobj array holds generation 1 objects which are suspected of referencing baby objects. Like with checkobj, object placed in mutobj are assigned to generation -1. In the case of mutobj, this is essential, in two ways. Firstly, the mutobj array must not contain duplicates, whereas in the case of checkobj, duplicates only waste CPU cycles. Secondly, it is essential that these generation 1 objects are reassigned to -1, so that they are treated as babies during marking and are traversed in order to mark the baby objects they reference. Objects marked with generation 1 are not traversed during a generational GC cycle. During a generational gc, like checkobj, mutobj is treated as an additional root area. Because the objects have been reassigned to generation -1, they are properly traversed and the generation 0 objects they refer to can be reached and marked. Unlike checkobj, however, the mutobj array is subjected to a sweep. Although no object in the mutobj array is eligible for reclamation, they have to be returned to generation 1, and their REACHABLE flags have to be reset. These actions are carried out by sweep logic. The mut macro is useful for large aggregate objects which are subject to a big destructive change, such as the modification of multiple elements of a vector. If the set macro were used, then each element would have to be individually handled as an assignment, and possibly generate its own entry in the checkobj array. The mut macro simply allows the vector as a whole to be suspected of now pointing to some babies via its newly assigned elements. The generational garbage collection pass then naturally deals with visiting the individual the elements to mark those which are babies. 3.5.5 Generational GC and Finalization There is an interaction between generational GC and TXR's finalization support. When objects registered for finalization are processed, they and their functions have to be reinstanted as reachable objects, so that the finalization handlers can safely execute. Under plain mark-and-sweep GC, these objects will be collected in the next garbage collection pass, since they will be found to be unreachable again (unless their finalization handler reinstated them into the reachability graph!) and this time they will not registered for finalization any more. Under generational GC, objects which are marked as reachable, however, pass into the mature generation. If this happens for objects which are being finalized, a silly situation occurs in which objects known to be unreachable, and which are only temporary made reachable for finalization, are being promoted to, or retained in, the mature generation, so that they won't be reclaimed until the next full garbage collection pass. To prevent this silly situation, objects which are marked as reachable during finalization processing are assigned to generation -1. During a generational GC, all such objects are generation 0 objects, but during a full GC, this could include generation 1 objects also. Then, during the sweep phase, objects which carry this flag are assigned to generation 0 and are placed into the freshobj array nursery, as if they were just freshly allocated. Doing this is safe even for objects that were previously generation 1, because since the objects had just been found to be unreachable, this means that no references to them exist from any other live object, and that implies that no backreferences exist to them from the mature generation. (This takes place before the finalization handlers are called whcih could introduce such backreferences.) 4. Debugging 4.1. Using gdb Debugging txr is mostly easy thanks to the dynamic types. The function d() is provided which makes it easy to print an object. Most of the Lisp-like functions in txr can be invoked from the debugger. You can construct objects, inspect values with complex expressions etc. If the problem you're debugging can be reproduced in an unoptimized build, then use that. It's much better because values are not optimized out. Simply run ./configure opt_flags= then "make clean" and "make". If the program catches an exception and terminates cleanly, then place a breakpoint on the function "uw_throw" to catch this in the debugger. Sample debug session: $ gdb ./txr GNU gdb (GDB) Fedora (6.8.50.20090302-23.fc11) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i586-redhat-linux-gnu". For bug reporting instructions, please see: ... (gdb) b match_line Breakpoint 1 at 0x80503a2: file match.c, line 295. (gdb) r -c '@a' - Starting program: /home/kaz/txr/txr -c '@a' - hello Breakpoint 1, match_line (bindings=0x0, specline=0xb7fd163c, dataline=0xb7fd15bc, pos=0x1, spec_lineno=0x5, data_lineno=0x5, file=0xb7fd15fc) at match.c:295 295 if (specline == nil) (gdb) p d(specline) ((sys:var a)) $1 = void (gdb) p d(car(specline)) (sys:var a) $2 = void (gdb) p d(dataline) "hello" $3 = void (gdb) n 298 elem = first(specline); (gdb) n 300 switch (elem ? type(elem) : 0) { (gdb) p d(elem) (sys:var a) $4 = void (gdb) n 303 val directive = first(elem); (gdb) n 305 if (directive == var_s) { (gdb) n 306 val sym = second(elem); (gdb) n 307 val pat = third(elem); (gdb) p d(sym) a $5 = void (gdb) n 308 val modifier = fourth(elem); (gdb) n 309 val pair = assoc(bindings, sym); /* var exists alr... */ (gdb) p d(bindings) nil $6 = void (gdb) n 311 if (gt(length(modifier), one)) { (gdb) p d(length(modifier)) 0 $7 = void (gdb) p d(one) No symbol "one" in current context. (gdb) n 316 modifier = car(modifier); (gdb) n 318 if (pair) { (gdb) n 349 } else if (consp(modifier)) { /* regex variable */ (gdb) n 363 } else if (nump(modifier)) { /* fixed field */ (gdb) n 378 } else if (modifier) { (gdb) n 381 } else if (pat == nil) { /* no modifier, no elem (gdb) n 382 bindings = acons_new(bindings, sym, sub_str(data... (gdb) n 383 pos = length_str(dataline); (gdb) p d(bindings) ((a . "hello")) $8 = void (gdb) n 628 break; (gdb) p d(pos) 5 $9 = void (gdb) n 646 specline = cdr(specline); (gdb) n 647 } (gdb) n Breakpoint 1, match_line (bindings=0xb7fd154c, specline=0x0, dataline=0xb7fd15bc, pos=0x15, spec_lineno=0x5, data_lineno=0x5, file=0xb7fd15fc) at match.c:295 295 if (specline == nil) (gdb) n 649 return cons(bindings, pos); (gdb) n 650 } (gdb) n match_files (spec=0xb7fd161c, files=0xb7fd15dc, bindings=0x0, first_file_parsed=0xb7feaebc, data_linenum=0x0) at match.c:1995 1995 if (nump(success) && c_num(success) < c_num(length_st ... (gdb) quit 4.2. Debugging the Yacc-generated Parser To debug the parser, which should be rare, you have to edit the makefiles (config.make is a good place) to pass the -t option to yacc to build an instrumented parser. To force a regeneration of the parser, remove y.tab.c and run make. To see the debug trace, you must also set the yydebug variable. Instead of modifying the program, another way is to just set a breakpoint on main in gdb and do a "set yydebug=1". The file y.output is useful; it summarizes the LALR(1) state machine generated by the parser. 4.3. Debugging GC Issues Use the --gc-debug option of txr to run it in a mode in which it eagerly reclaims garbage after nearly every operation. This slows it down, but makes it more likely to catch invalid uses of garbage. It works even better with Valgrind integration. There are other GC issues that are hard to catch, like spurious retention. This is when the code generated by the C compiler hangs on to an object which, in the source code semantics, should be garbage. It can happen, for example, when a variable has gone out of scope, but the stack location where that variable was last stored has not been overwritten. Register-save areas in the stack frame can similarly contain stale data, because when a register value is restored from the save area, the copy remains there. Spurious retention can also happen if a bit pattern is generated which looks like a reference to an object, by chance. We share this problem with garbage collectors like Boehm. Luckily, unlike Boehm, we do not have this problem over dynamic objects, because we do not scan dynamic memory. All dynamic objects are registered with the garbage collector and are precisely traced. What isn't precisely traced is the call stack and machine context. 4.4 Object Breakpoint If TXR is compiled with -DEXTRA_DEBUGGING=1 then two global symbols are defined which make it possible to catch GC-related traversals of a particular object. When compiled with EXTRA_DEBUGGING defined as 1, TXR provides a function called breakpt. The purpose of this function is to serve as a target of a debugger breakpoint; when interesting situations happen, TXR calls that function. An EXTRA_DEBUGGING build also provides a global variable called break_obj, of type val. This normally holds the value nil, which is uninteresting. The variable can be set, from within the debugger, to hold an arbitrary object. When that object is visited during garbage collection and in certain other interesting situations such as hash table weak processing, the breakpt function is called. Using this object breakpoint feature, it is possible to investigate various issues, such as spurious retention: how is a particular object being reached. 4.5 Valgrind: Your Friend To get the most out running txr under valgrind, build it with valgrind support. Of course, you have to have the valgrind development stuff installed (so the valgrind.h header file is visible), not only the valgrind executables. Do a ./configure --valgrind then rebuild. If this is enabled, txr uses the Valgrind API to inform valgrind about the state of allocated or unallocated areas on the garbage-collected heap, if it is additionally run with the --vg-debug option. Valgrind will be able to trap uses of objects which are marked as garbage. Using --gc-debug together with --vg-debug while running txr under valgrind is a pretty good way to catch gc-related errors. However, Valgrind will not precisely identify individual heap objects. If a freed object is misused, Valgrind will only be able to say something like that the pointer is 536 bytes into a large block allocated in the more function called from make_obj (i.e. a heap). Valgrind will not give you the call trace which led to that particular object being allocated, only the call stack which triggered the containing heap being allocated: an irrelevant piece of information that can confuse you!