summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2014-10-30 07:49:07 -0700
committerKaz Kylheku <kaz@kylheku.com>2014-10-30 07:49:07 -0700
commit0580d0373fc9b9b6a84cfb5749257b095e610e73 (patch)
tree3b8d065dec821f0856313f47305911b579a35dd1
parent6eca4a9313fb8af95d1f0ea961b351aaba487a1e (diff)
downloadtxr-0580d0373fc9b9b6a84cfb5749257b095e610e73.tar.gz
txr-0580d0373fc9b9b6a84cfb5749257b095e610e73.tar.bz2
txr-0580d0373fc9b9b6a84cfb5749257b095e610e73.zip
Implementing finalization hooks.
* gc.c (struct fin_reg): New struct type. (final_list, final_tail, mark_makefresh): New static variables. (mark_obj): Under generational GC, if make_makefresh is in effect, set the generation to -1 on all marked objects. (sweep_one): In an EXTRA_DEBUGGING build, call breakpt if the object being swept is the one in break_obj. Under generational GC, place reachable objects that are in generation -1 the freshobj nursery and assign them to generation 0, rather than sticking them into the mature generation 1. (sweep): Under generational gc, reset the freshobj_idx variable here, so that sweep_one has an empty nursery in which to place the generation -1 objects. (prepare_finals, call_finals): New static functions. (gc): Call prepare_finals before sweep, and call call_finals just before re-enabling GC and returning. Do not reset freshobj_idx to zero; this was done in sweep, which may have added entries into it. (gc_finalize): New function. (gc_late_init): Register gc_finalize as intrinsic function finalize. * txr.1: Documented finalize. * HACKING: Documented finalization, described the additional meaning of the -1 generation, and added a section on debugging with break_obj and breakpt.
-rw-r--r--ChangeLog29
-rw-r--r--HACKING193
-rw-r--r--gc.c114
-rw-r--r--txr.156
4 files changed, 341 insertions, 51 deletions
diff --git a/ChangeLog b/ChangeLog
index b48a1b12..ab68d9e7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,34 @@
2014-10-30 Kaz Kylheku <kaz@kylheku.com>
+ Implementing finalization hooks.
+
+ * gc.c (struct fin_reg): New struct type.
+ (final_list, final_tail, mark_makefresh): New static variables.
+ (mark_obj): Under generational GC, if make_makefresh is in
+ effect, set the generation to -1 on all marked objects.
+ (sweep_one): In an EXTRA_DEBUGGING build, call breakpt
+ if the object being swept is the one in break_obj.
+ Under generational GC, place reachable objects that are
+ in generation -1 the freshobj nursery and assign them to generation 0,
+ rather than sticking them into the mature generation 1.
+ (sweep): Under generational gc, reset the freshobj_idx variable
+ here, so that sweep_one has an empty nursery in which
+ to place the generation -1 objects.
+ (prepare_finals, call_finals): New static functions.
+ (gc): Call prepare_finals before sweep, and call call_finals
+ just before re-enabling GC and returning. Do not reset freshobj_idx to
+ zero; this was done in sweep, which may have added entries into it.
+ (gc_finalize): New function.
+ (gc_late_init): Register gc_finalize as intrinsic function finalize.
+
+ * txr.1: Documented finalize.
+
+ * HACKING: Documented finalization, described the additional
+ meaning of the -1 generation, and added a section on debugging with
+ break_obj and breakpt.
+
+2014-10-30 Kaz Kylheku <kaz@kylheku.com>
+
* gc.h (break_obj): Missing extern added on declaration.
This broke C++ compilation with -DEXTRA_DEBUGGING turned on.
diff --git a/HACKING b/HACKING
index d6983fb4..52a879f0 100644
--- a/HACKING
+++ b/HACKING
@@ -5,40 +5,43 @@ CONTENTS:
SECTION LINE
-0. Overview 44
-
-1. Coding Practice 51
-1.2 Program File Structure 74
-1.3 Style 88
-1.3 Error Handling 150
-1.4 I/O 163
-1.5 Type Safety 173
-1.6 Regression 215
-
-2. Dynamic Types 224
-2.1 Two Kinds of Values 231
-2.1 Pointer Bitfield 242
-2.2 Heap Objects 265
-2.3 The COBJ type 279
-2.4 Strings 296
-2.4.1 Encapsulated C Strings 311
-2.4.2 Representation Hacks for 2 Byte wchar_t 355
-
-3. Garbage Collection 414
-3.1 Root Pointers 431
-3.2 GC-safe Code 454
-3.2.1 Rule One: Full Initialization 480
-3.2.2 Rule Two: Make it Reachable 509
-3.3 Weak Reference Support 644
-3.4 Generational GC 687
-3.4.2 Representation of Generations 696
-3.4.3 Basic Algorithm 716
-3.4.4 Handling Backpointers 751
-
-4. Debugging 828
-4.2. Debugging the Yacc-generated Parser 959
-4.3. Debugging GC Issues 972
-4.4. Valgrind: Your Friend 995
+0. Overview 47
+
+1. Coding Practice 54
+1.2 Program File Structure 77
+1.3 Style 91
+1.3 Error Handling 153
+1.4 I/O 166
+1.5 Type Safety 176
+1.6 Regression 218
+
+2. Dynamic Types 227
+2.1 Two Kinds of Values 234
+2.1 Pointer Bitfield 245
+2.2 Heap Objects 268
+2.3 The COBJ type 288
+2.4 Strings 305
+2.4.1 Encapsulated C Strings 320
+2.4.2 Representation Hacks for 2 Byte wchar_t 364
+
+3. Garbage Collection 423
+3.1 Root Pointers 441
+3.2 GC-safe Code 464
+3.2.1 Rule One: Full Initialization 490
+3.2.2 Rule Two: Make it Reachable 519
+3.3 Weak Reference Support 654
+3.4 Finalization 697
+3.5 Generational GC 721
+3.5.2 Representation of Generations 730
+3.5.3 Basic Algorithm 766
+3.5.4 Handling Backpointers 801
+3.5.5 Generational GC and Finalization 879
+
+4. Debugging 908
+4.2. Debugging the Yacc-generated Parser 1039
+4.3. Debugging GC Issues 1052
+4.4 Object Breakpoint 1075
+4.5 Valgrind: Your Friend 1094
0. Overview
@@ -274,6 +277,12 @@ Though the type tag is defined by a enumeration, for memory management
purposes, the type field is overloaded with additional bitmasked values.
The FREE flag in a type field marks an object on the free list.
The REACHABLE flag is the marking bit used during garbage collection.
+There is also a MAKEFRESH flag, which is used to in conjunction with
+REACHABLE to indicate that an object, though reachable, should be
+kept in, or demoted to to the baby object generation. The implementation of
+user-defined finalization handlers uses this to prevent finalized objects from
+becoming mature, just because they were made reachable for the purposes of the
+finalization callback.
2.3 The COBJ type
@@ -428,6 +437,7 @@ reclaimed (indicating a bug in the garbage collection system), uses
of that object will see a bad type tag, so that there is a good chance
the program will throw an exception due to a failed type check.
+
3.1 Root Pointers
The marking phase of the garbage collector looks in two places for root
@@ -684,16 +694,40 @@ due to the lapsing of weak keys thus stay unmarked and are reclaimed during
the sweep phase of the gc, which soon follows.
-3.4 Generational GC
+3.4 Finalization
+
+Finalization (user-defined finalization hooks associated with objects) is
+implemented in a fairly straightforward way, with a slight complication
+having to do with generational GC (see below).
+
+Finalization uses a simple global list of registrations which is processed
+during every garbage collection pass, in two phases. First, just after
+the regular garbage collection marking phase and weak hash table processing,
+the finalization registration list is walked twice. First, those entries in it whose
+registered objects are still unreachable are flagged for later processing,
+then in the second pass over the list, all objects are given "new lease on
+life" by being marked as reachable. The registered functions in all entries are
+marked as reachable also. Next, the garbage collection sweep pass takes place
+to reclaim all unreachable objects and reset the GC-related flags in all
+objects. When this is done, it becomes safe to call the finalization functions.
+The list of registrations is walked once again, and the previously flagged
+entires are called out and expunged. The list is walked in a safe way which
+allows the called handlers to register new finalizations. The newly registered
+finalizations are combined with the unflagged, unexpunged previous
+registrations into a new list, which will be processed at the next garbage
+collection pass.
+
-3.4.1 Preprocessor Delimitation
+3.5 Generational GC
+
+3.5.1 Preprocessor Delimitation
Currently, the generational GC code is delimited by #ifdef CONFIG_GEN_GC.
So to understand what the differences are between the regular GC and
generational, one just has to read those sections dependent on that
preprocessor symbol.
-3.4.2 Representation of Generations
+3.5.2 Representation of Generations
Generational garbage collectors are typically copying collectors. In a copying
system, objects can be segregated into generations by their physical location.
@@ -706,14 +740,30 @@ generation.
There are only three generation values: -1, 0 and 1. Generation 0 indicate a
"baby" object. The value 1 indicates a mature object. A freshly allocated
-object is put into generation 0. The generation value -1 is used to mark baby
-objects which have been stored into the checkobj array, or generation 1
-objects which have been placed into the mutobj array, to prevent them
-from being added to these arrays twice. These arrays have to do with
-backreferences from old to young objects, and are described a few paragraphs
-below.
-
-3.4.3 Basic Algorithm
+object is put into generation 0. The generation value -1 has three different
+meanings:
+ 1. It is used to mark baby (generation 0) objects which have been stored into
+ the checkobj array, so that they are not placed there twice.
+ 2. It is used to mark mature (generation 1) objects which have been placed
+ into the mutobj array, also so they are not put there twice.
+ 3. It is used during marking to flag reachable objects which should not be
+ placed into, or remain in, generation 1.
+The checkobj and mutobj arrays have to do with handling backreferences from old
+to young objects, and are described a few paragraphs below. Meaning 3 has
+to do with interaction between finalization and generational GC, also described
+below. These different uses of -1 don't interefere because when the checkobj
+and mutobj arrays are processed, which happens early the marking phase,
+those objects are changed from generation -1 to generation 0. (A temporary
+situation, done in the knowledge that all reachable objects in generation 0
+that are processed in the sweep phase will go to generation 1). The
+third meaning of -1 is used in the last phase of marking having to do with
+finalization (the prepare_finals function), which applies this -1 generation
+only to objects that have been left unreachable by all previous marking, and
+are reachable only thorugh the finalizer registration list. After this phase,
+only these special objects can possibly have generation -1.
+
+
+3.5.3 Basic Algorithm
When an object is newly allocated, it is not only assigned generation 0, but is
appended into the freshobj array. This array allows the garbage collector to
@@ -748,7 +798,7 @@ GC saves time by avoiding doing full marking (terminating whenever it meets a
generation 1 object) and avoiding a full sweep (processing only the freshobj
array).
-3.4.4 Handling Backpointers
+3.5.4 Handling Backpointers
Under generational GC, there is the problem that objects in generation 1 can
be destructively changed (mutated) so that they point to baby objects. This is
@@ -825,6 +875,36 @@ of now pointing to some babies via its newly assigned elements. The generational
garbage collection pass then naturally deals with visiting the individual the
elements to mark those which are babies.
+
+3.5.5 Generational GC and Finalization
+
+There is an interaction between generational GC and TXR's finalization
+support. When objects registered for finalization are processed, they
+and their functions have to be reinstanted as reachable objects, so that the
+finalization handlers can safely execute. Under plain mark-and-sweep GC,
+these objects will be collected in the next garbage collection pass, since
+they will be found to be unreachable again (unless their finalization handler
+reinstated them into the reachability graph!) and this time they will not
+registered for finalization any more. Under generational GC, objects which
+are marked as reachable, however, pass into the mature generation.
+If this happens for objects which are being finalized, a silly situation
+occurs in which objects known to be unreachable, and which are only
+temporary made reachable for finalization, are being promoted to, or
+retained in, the mature generation, so that they won't be reclaimed until the
+next full garbage collection pass. To prevent this silly situation, objects
+which are marked as reachable during finalization processing are assigned
+to generation -1. During a generational GC, all such objects are generation
+0 objects, but during a full GC, this could include generation 1 objects
+also. Then, during the sweep phase, objects which carry this flag are assigned
+to generation 0 and are placed into the freshobj array nursery, as if they were
+just freshly allocated. Doing this is safe even for objects that were
+previously generation 1, because since the objects had just been found to be
+unreachable, this means that no references to them exist from any other live
+object, and that implies that no backreferences exist to them from the mature
+generation. (This takes place before the finalization handlers are called
+whcih could introduce such backreferences.)
+
+
4. Debugging
4.1. Using gdb
@@ -992,7 +1072,26 @@ dynamic objects are registered with the garbage collector and are precisely
traced. What isn't precisely traced is the call stack and machine context.
-4.4. Valgrind: Your Friend
+4.4 Object Breakpoint
+
+If TXR is compiled with -DEXTRA_DEBUGGING=1 then two global symbols are defined
+which make it possible to catch GC-related traversals of a particular object.
+
+When compiled with EXTRA_DEBUGGING defined as 1, TXR provides a function called
+breakpt. The purpose of this function is to serve as a target of a debugger
+breakpoint; when interesting situations happen, TXR calls that function.
+
+An EXTRA_DEBUGGING build also provides a global variable called break_obj, of
+type val. This normally holds the value nil, which is uninteresting. The
+variable can be set, from within the debugger, to hold an arbitrary object.
+When that object is visited during garbage collection and in certain other
+interesting situations such as hash table weak processing, the breakpt
+function is called. Using this object breakpoint feature, it is possible
+to investigate various issues, such as spurious retention: how is a particular
+object being reached.
+
+
+4.5 Valgrind: Your Friend
To get the most out running txr under valgrind, build it with valgrind support.
Of course, you have to have the valgrind development stuff installed (so
diff --git a/gc.c b/gc.c
index 2e95eb80..bb30a6d6 100644
--- a/gc.c
+++ b/gc.c
@@ -83,6 +83,13 @@ alloc_bytes_t opt_gc_delta = DFL_MALLOC_DELTA_THRESH;
int gc_enabled = 1;
+static struct fin_reg {
+ struct fin_reg *next;
+ val obj;
+ val fun;
+ type_t obj_type;
+} *final_list, **final_tail = &final_list;
+
#if CONFIG_GEN_GC
static val checkobj[CHECKOBJ_VEC_SIZE];
static int checkobj_idx;
@@ -91,6 +98,7 @@ static int mutobj_idx;
static val freshobj[FRESHOBJ_VEC_SIZE];
static int freshobj_idx;
int full_gc;
+static int mark_makefresh;
#endif
#if EXTRA_DEBUGGING
@@ -284,6 +292,13 @@ tail_call:
if ((t & FREE) != 0)
abort();
+#if CONFIG_GEN_GC
+ if (mark_makefresh)
+ obj->t.gen = -1; /* Will be put into freshobj by sweep_one */
+ else if (obj->t.gen == -1)
+ obj->t.gen = 0; /* Will be promoted to generation 1 by sweep_one */
+#endif
+
obj->t.type = convert(type_t, t | REACHABLE);
#if EXTRA_DEBUGGING
@@ -448,6 +463,11 @@ static int sweep_one(obj_t *block)
const int vg_dbg = 0;
#endif
+#if EXTRA_DEBUGGING
+ if (block == break_obj)
+ breakpt();
+#endif
+
#if CONFIG_GEN_GC
if (!full_gc && block->t.gen > 0)
abort();
@@ -457,10 +477,20 @@ static int sweep_one(obj_t *block)
abort();
if (block->t.type & REACHABLE) {
- block->t.type = convert(type_t, block->t.type & ~REACHABLE);
#if CONFIG_GEN_GC
- block->t.gen = 1;
+ if (block->t.gen == -1) {
+ block->t.gen = 0;
+ if (freshobj_idx < FRESHOBJ_VEC_SIZE)
+ freshobj[freshobj_idx++] = block;
+ /* If freshobj is full, it doesn't matter the next make_obj
+ call will find this situation and set the full_gc flag,
+ and the subsequent full_gc will take care of all
+ these objects. */
+ } else {
+ block->t.gen = 1;
+ }
#endif
+ block->t.type = convert(type_t, block->t.type & ~REACHABLE);
return 0;
}
@@ -519,9 +549,13 @@ static int_ptr_t sweep(void)
#if CONFIG_GEN_GC
if (!full_gc) {
int i;
+ int limit = freshobj_idx;
+
+ freshobj_idx = 0; /* sweep_one can put NOPROMOTE objects into freshobj */
+
/* No need to mark block defined via Valgrind API; everything
in the freshobj is an allocated node! */
- for (i = 0; i < freshobj_idx; i++)
+ for (i = 0; i < limit; i++)
free_count += sweep_one(freshobj[i]);
/* Generation 1 objects that were indicated for dangerous
@@ -532,6 +566,8 @@ static int_ptr_t sweep(void)
return free_count;
}
+
+ freshobj_idx = 0;
#endif
for (heap = heap_list; heap != 0; heap = heap->next) {
@@ -553,6 +589,58 @@ static int_ptr_t sweep(void)
return free_count;
}
+static void prepare_finals(void)
+{
+ struct fin_reg *f;
+
+ if (!final_list)
+ return;
+
+#if CONFIG_GEN_GC
+ mark_makefresh = 1;
+#endif
+
+ for (f = final_list; f; f = f->next)
+ f->obj_type = f->obj->t.type;
+
+ for (f = final_list; f; f = f->next) {
+ mark_obj(f->obj);
+ mark_obj(f->fun);
+ }
+}
+
+static void call_finals(void)
+{
+ struct fin_reg *new_list = 0, *old_list = final_list;
+ struct fin_reg **tail = &new_list, *f, *next;
+
+ if (!final_list)
+ return;
+
+ final_list = 0;
+ final_tail = &final_list;
+
+#if CONFIG_GEN_GC
+ mark_makefresh = 0;
+#endif
+
+ for (f = old_list; f; f = next) {
+ next = f->next;
+
+ if ((f->obj_type & REACHABLE) != 0) {
+ funcall1(f->fun, f->obj);
+ free(f);
+ } else {
+ *tail = f;
+ tail = &f->next;
+ }
+ }
+
+ *tail = 0;
+ *final_tail = new_list;
+ final_tail = tail;
+}
+
void gc(void)
{
val gc_stack_top = nil;
@@ -570,6 +658,7 @@ void gc(void)
gc_enabled = 0;
mark(&mc, &gc_stack_top);
hash_process_weak();
+ prepare_finals();
swept = sweep();
#if CONFIG_GEN_GC
#if 0
@@ -590,9 +679,9 @@ void gc(void)
#if CONFIG_GEN_GC
checkobj_idx = 0;
mutobj_idx = 0;
- freshobj_idx = 0;
full_gc = full_gc_next_time;
#endif
+ call_finals();
gc_enabled = 1;
}
}
@@ -686,10 +775,27 @@ static val gc_wrap(void)
return nil;
}
+static val gc_finalize(val obj, val fun)
+{
+ type_check(fun, FUN);
+
+ if (is_ptr(obj)) {
+ struct fin_reg *f = coerce(struct fin_reg *, chk_malloc(sizeof *f));
+ f->obj = obj;
+ f->fun = fun;
+ f->obj_type = NIL;
+ f->next = 0;
+ *final_tail = f;
+ final_tail = &f->next;
+ }
+ return obj;
+}
+
void gc_late_init(void)
{
reg_fun(intern(lit("gc"), system_package), func_n0(gc_wrap));
reg_fun(intern(lit("gc-set-delta"), system_package), func_n1(gc_set_delta));
+ reg_fun(intern(lit("finalize"), user_package), func_n2(gc_finalize));
}
/*
diff --git a/txr.1 b/txr.1
index c2742885..5b19b877 100644
--- a/txr.1
+++ b/txr.1
@@ -25501,6 +25501,62 @@ of memory. It is for this reason that the garbage collector uses the GC delta.
There is a default GC delta of 64 megabytes. This may be overridden in
special builds of \*(TX for small systems.
+.coNP Function @ finalize
+.synb
+.mets (finalize < object << function )
+.syne
+
+.desc
+The
+.code finalize
+function registers
+.meta function
+to be invoked in the situation when
+.meta object
+is identified by the garbage collector as unreachable.
+This function is called a finalizer.
+
+If and when this situation occurs, the finalizer
+.meta function
+will be called with
+.meta object
+as its only argument.
+
+Finalizers are called in the same order in which they are registered:
+newer registrations are called after older registrations.
+
+If
+.meta object
+is registered multiple times by multiple calls to
+.codn finalize ,
+then if those finalizers are called, they are all called, in the order
+of registration.
+
+After a finalization call takes place, its registration is removed;
+.meta object
+and
+.meta function
+are no longer associated. However, neither
+.meta object
+nor
+.meta function
+are reclaimed immediately; they are treated as if they were reachable objects
+until at least the next garbage collection pass.
+It is therefore safe for
+.meta function
+to store somewhere a persistent reference to
+.meta object
+or to itself, thereby reinstanting these objects as reachable.
+
+.meta function
+is itself permitted to call
+.code finalize
+to register the original
+.code object
+or any other object for finalization. Such registrations made during
+finalization execution are not eligible for the current phase of finalization
+processing; they will be processed in a later garbage collection pass.
+
.SS* Modularization
.coNP Special variable @ *self-path*
.desc