Fall 2005 Programming Language Seminar:
Notes from 05 December

[ Washington University in St Louis ]
Department of Computer Science

Seminar main page Discussion notes Assignment Links

Notes:

This week's discussion led by Morgan Deters.

Tool notes

Some tools I used in class this week are very important. We've seen them a bit before but haven't really discussed them. If you are unfamiliar with readelf and objdump, you should skim their manual pages to see what they do. They are good things to have in your toolbox. I also used the c++filt tool during class, which allows you to understand the C++ and Java-mangled symbols you get from these low-level tools. As always, familiarity with the GNU debugger gdb (which I also use regularly in class) is critical as well.

Profiling support in glibc

Some questions arose last week about how profiling actually operates under the hood. We return to it now.

When compiling with the -p or -pg options, gcc instruments your code to count how many times functions execute and how long they take. First, remember how the profiling process works (profile_test.c available here):

[mdeters@limbo gcc-4.0.1-tests]$ gcc -pg -o profile_test profile_test.c
[mdeters@limbo gcc-4.0.1-tests]$ ./profile_test
0
5000
10000
15000
20000
...
4990000
4995000
done
total == 1642668640
[mdeters@limbo gcc-4.0.1-tests]$ gprof profile_test
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ns/call  ns/call  name
 94.00      1.41     1.41  5000000   282.00   282.00  foo
  6.00      1.50     0.09                             main

...

In the profile_test program, main() calls foo() five million times. This is reflected in the gprof output. How does it do this counting?

Let's disassemble the profile_test binary:

[mdeters@limbo gcc-4.0.1-tests]$ objdump -d profile_test
...

08048504 :
 8048504:       55                      push   %ebp
 8048505:       89 e5                   mov    %esp,%ebp
 8048507:       53                      push   %ebx
 8048508:       83 ec 14                sub    $0x14,%esp
 804850b:       e8 74 fe ff ff          call   8048384 

...

08048568 
: 8048568: 55 push %ebp 8048569: 89 e5 mov %esp,%ebp 804856b: 83 ec 18 sub $0x18,%esp 804856e: e8 11 fe ff ff call 8048384 ...

I'm cutting out a lot of the output here, but notice these calls to the mcount() function in the prologues of both main() and foo(). These don't exist when we aren't compiling with profiling instrumentation! So we've figured out how gcc does function invocation counting, but let's dig into glibc source just to make sure we understand it.

(I'll refer to glibc 2.3.3 sources here.)

mcount() has a platform-independent part to actually increment the counts. However, let's look at its definition (in gmon/mcount.c, line 59 -- remember we're in the glibc source tree, not the gcc one):

_MCOUNT_DECL(frompc, selfpc)    /* _mcount; may be static, inline, etc */

Its actual definition will depend on the platform (the system-dependent headers define _MCOUNT_DECL). For i386, we have these from sysdeps/i386/machine-gmon.h:

extern void mcount_internal (u_long frompc, u_long selfpc) internal_function;

#define _MCOUNT_DECL(frompc, selfpc) \
void internal_function mcount_internal (u_long frompc, u_long selfpc)

Okay, so that means that the platform-independent part of mcount() is actually called mcount_internal() (though really it's called __mcount_internal(), with leading underscores) and is defined with this internal_function thing. This internal_function thing is defined elsewhere and just sets up gcc to pass parameters in registers rather than on the stack.

So we still haven't found mcount(). Well, it turns out it's hand-coded in x86 assembly. You can find it in sysdeps/i386/i386-mcount.S. Why is this part platform-specific, and why does it have to be coded in assembly? To understand this, note that the part written in C, mcount_internal(), takes two arguments: a selfpc (the function that called mcount()) and a frompc (the function that called that function). Normally this information isn't available to C. However, return addresses are on the stack at well-known locations, so we can get to them from assembly. All mcount() does is set up the call to mcount_internal().

First, the mcount() assembly stub saves some registers that will get clobbered (it later restores them, as you can see). It copies its own return address from the stack into %edx (that will be selfpc---remember mcount_internal() expects its arguments in registers, not on the stack!), and copies the previous stack frame's return address into %eax (that's frompc). Then it calls mcount_internal().

mcount_internal() figures out which functions its arguments point into and updates counts in a call graph of the program. The gprof program will tell you not only how many times a function was called, but how many times it was called from each function that called it. That's why the frompc argument is there.

What about timing, though? So far we've only seen counts.

Timing is handled indirectly. GCC could instrument each function with a timer, starting it on entry, recording it on exit, and pausing it on calls to other functions. But this would add a lot of code and runtime overhead. Instead, GCC tells glibc to set up a regular profiling alarm. This is done in sysdeps/posix/sprofil.c at line 351 (note we're still in the glibc 2.3.3 sources here). (You can look up the manual page of setitimer() yourself. It's useful for user applications as well.)

In this case, a profiling timer is set. This timer counts down when the process is in both user- and kernel-space and delivers a SIGPROF signal to the process when it expires. (The granularity of this timer depends on what the operating system is willing to provide.)

When SIGPROF is delivered, profil_count() (sysdeps/posix/sprofil.c line 108) will be run and passed the process's program counter at the point of interruption. profil_count() notes what function was executed when the SIGPROF was delivered, and resumes the process.

The end result is a number of profiling samples across your profiled program. For each function, you know how many times the profiling timer went off while it was executing. This gives you a good idea (though imprecise due to sampling error) of where time is spent in your program.

Okay, we're done with the glibc sources for now. Back to gcc.

Boehm collector

The Boehm garbage collector, designed and written by Hans-J. Böhm, is a conservative garbage collector designed to operate in hostile environments. Conservative here is a technical term meaning that the collector cannot distinguish between pointer and non-pointer data in general. The Boehm collector is designed to operate as a general malloc() replacement in C (imagine!--just malloc() and let the system do the free() for you!). Certainly this environment is "hostile" to garbage collection, and certainly the collector cannot in general distinguish a pointer from a non-pointer (both just being bit-patterns somewhere in memory).

However, in GCJ-compiled Java code, we do know where pointers are and where they aren't. Casting between arithmetic types and pointers (object references) is not permitted in Java, and neither is pointer arithmetic. In this constrained environment, there's no way for an application to hide a pointer from the collector, and, further, heap layout information can be made available at runtime. The collector can use this information---and when running on behalf of GCJ-compiled code, the Boehm collector is not actually conservative.

A full description of the Boehm collector is beyond the scope of these notes. I recommend you check out the boehm-gc/doc directory in the GCC source tree for more information about the collector and its uses. But this brings us to an important point---a version of the Boehm collector sources are available right there in the GCC source tree for us to browse.

GCJ needs to interact with the Boehm collector at runtime. Let's trace what happens. When we covered Runtime Support I, we saw that the GCJ compiler driver produces a main() stub in response to the --main compiler option---a short main() routine, in C, that calls _Jv_RunMain() from libgcj. Peeking at _Jv_RunMain() (remember it's in libjava/prims.cc, we see that it calls _Jv_CreateJava_VM() (also in that file). This function, among other things, calls _Jv_InitGC().

You'll find _Jv_InitGC() in libjava/boehm.cc. This function performs a number of initialization duties for the Boehm collector when it runs for GCJ-compiled code. First, it turns off the collector's feature that detects interior pointers---those pointers that point into the middle, rather than the beginning, of objects (this is very common in C when e.g. traversing an array). It registers the _Jv_Mark_Obj() function as procedure #0 (more on that later!) and sets up a new kind of object, the Java array, that requires special treatment. Java arrays are associated to the _Jv_MarkArray() routine rather than _Jv_MarkObj().

This is probably a good time to mention that the Boehm collector is a mark-&-sweep collector. Such collectors operate in two phases. First, the mark phase finds all objects in the heap that are reachable, through pointers, from the stack, machine registers, and static process data (global variables). All of these objects are marked (a bit is set in their header). Second, the sweep phase scans the entire heap, reclaiming all unmarked objects.

To ensure that the application cannot hide some pointers during the mark phase, the Boehm collector (in its default configuration) stops all application threads while marking. However, the sweep phase proceeds incrementally, as memory is needed. (This reduces unnecessary swapping that you otherwise would get when scanning the entire heap.)

Okay, so how is all of this hooked up, then? I mean, let's say I allocate some storage in heap. How does the collector know that that storage constitutes an object?

The allocation functions used are the collector's. Remember I mentioned above that the Boehm collector can be used as a malloc() replacement. So the GCJ runtime just uses the Boehm collector's memory allocation routines. The collector makes sure there's space for the mark bit and other information it needs to deal with the chunk of allocated memory. Look in libjava/prims.cc, line 425. You'll see a number of different allocation routines. Anytime you write new in a Java program, GCJ will translate it into a call to one of these allocation routines (and a constructor invocation to initialize the object). These functions make use of the lower-level routines _Jv_AllocObj(), _Jv_AllocPtrFreeObj(), and _Jv_AllocArray(), all of which can be found in libjava/boehm.cc. These routines, in turn, use GC_GCJ_MALLOC() and similar---the Boehm collector's allocation routines---to get the storage. Note that _Jv_AllocArray() tells the collector that it's allocating an array---we set up the array object kind in _Jv_InitGC() and so we have to tell the collector that this is one such object that we want treated specially.

Okay, so then, during marking, how does the collector avoid conservatism---how does it know what is a pointer and what is not?

The collector has the help of the compiler and the runtime. During Runtime Support I, we saw that there was a "GC Descriptor" stowed away in each type's vtable. Well that's the part that tells the collector what's a pointer!

In its simplest form, the GC descriptor for a type is a bitmap indicating which object slots are references---a "1" for a reference, a "0" for a non-reference. However, the type might be too large for such a representation, so it can also be a count of the reference fields, if they all occur in objects of the type before non-reference fields. But they might not, so in general, you can also indicate a function for the collector to call so you can mark your object yourself. You specify it by a simple index---for example, by specifying that it's procedure #0. That's what the _Jv_MarkObj() routine is, and that's why we registered it as procedure #0 in _Jv_InitGC().

This descriptor is built in two places---once in GCC's Java front-end (for compiled classes) and once in the runtime (for dynamically-loaded class files). These implementations are, respectively, get_boehm_type_descriptor() (in gcc/java/boehm.c) and _Jv_BuildGCDescr() (in libjava/boehm.cc).

Let's look at get_boehm_type_descriptor(). Several checks are performed to see if a bitmap is possible (line 215), or, if not, if a count is possible as a descriptor (line 209). Finally, at line 226, it is arranged for _Jv_MarkObj() to be called to mark the object.

This descriptor is stuffed into each type's vtable right after the class pointer. Look at line 1467 of gcc/java/class.c. At this point, all the methods are in the vtable already (as a list). Line 1468 prepends the calculated GC descriptor to the vtable list. Line 1471 then prepends the class pointer to the resulting list. Some more stuff is prepended (corresponding to what we studied about vtables before), and the final product is the finished vtable (in tree form).

Okay, so now the GC descriptor is available at runtime to the collector (via the object's vtable), and the collector can figure out what to do. But what if it calls _Jv_MarkObj()? How do we write such a function?

Look at libjava/boehm.cc line 67. This is the _Jv_MarkObj() function. Now skip ahead to line 309. This is the general case. (We're skipping the marking case for instances of java.lang.Class, which are treated as a special case. That part of the code makes sense once you understand what's to follow.) What we have here is an inner loop that visits all fields of the object, marking ones that are reference fields. The outer loop ensures that we look at the reference fields in our superclass too, as well as those of its superclass, etc. The _Jv_MarkArray() is even easier---mark the associated java.lang.Class, then mark all its slots! (Note that _Jv_AllocArray(), which indicates that the object is of the special "array" type, only applies to arrays of reference type---other arrays are allocated as pointer-free objects by _Jv_NewPrimArray() in libjava/prims.cc.

There is "another" garbage collector (though not really) supported by GCJ. This is the "nogc" collector, and is a garbage collector interface for GCJ that does nothing. No collection, nada, zip. So what's its use? Well, maybe you have an irrational fear of garbage collection and can't bring yourself to use it. Or, more likely, you're writing your own garbage collector and want a simple example of how to implement GCJ's interface (see libjava/nogc.cc). The "nogc" "collector" can be selected via the --enable-java-gc=no configure option.

crt stuff

Ever notice how when you compile your programs with gcc -v, you get a lot of extra stuff in the link?

[mdeters@limbo gcc-4.0.1-tests]$ gcc -v -o test test.c
[...]
 /export/home/mdeters/gcc-4.0.1-prefix/libexec/gcc/i686-pc-linux-gnu/4.0.1/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o test /usr/lib/crt1.o /usr/lib/crti.o /export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/crtbegin.o -L/export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1 -L/export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/../../.. /tmp/ccUIfP9z.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/crtend.o /usr/lib/crtn.o
[mdeters@limbo gcc-4.0.1-tests]$

What is all that crap? Pointing to the dynamic linker makes sense, linking in gcc support libraries (as well as the standard C library) makes sense, and the /tmp thing is my source program after compilation and assembly, but crt1, crti, crtbegin... that isn't my code!

Well, it may not be, but it's vital. crt1.o, crti.o, and crtn.o (and some other, special varieties of them) are provided by glibc. crt1.o is always the first thing in the .text section of the output ELF file (the old pre-ELF file format used a crt0.o file, which you might still see around sometimes but I'm not going to discuss further here; the basic purpose is the same). If you disassemble crt1.o you'll find that it has a function called _start which will call into the C library, which (after some initialization) in turn calls your main() function. Ever wondered what happens before your main() function is called? Now you know.

crti.o and crtn.o come first and last for a reason. They contribute infrastructure to the .init and .fini sections of your application---that stuff that happens during start-up and tear-down of your application. If you disassemble crti.o (e.g. with objdump -d), you'll notice it only contains partial definitions of functions---functions that simply trail off and don't return! This is okay, it turns out, because we'll finish them with crtn.o later (disassemble that one to see the finishing bits of the functions). Meanwhile, the .init and .fini sections are open for contributions from other parts of the link. And in particular, crtbegin.o and crtend.o have something to add to them.

crtbegin.o and crtend.o (and some differing varieties of them) are provided by the compiler---remember the other ones came from glibc. crtbegin.o contributes a call to .init to a frame_dummy function (which registers some Java classes, if any, and performs other runtime initialization duties) and schedules .fini to call static data destructors. crtend.o schedules .init to call static data constructors. This is how static data (global variables) get initialized before your main() executes. No magic involved, just more code supplied by the compiler for your convenience!

The source for all the crtbegin.o and crtend.o is in gcc/crtstuff.c. When the compiler is built, this file is compiled multiple times with different preprocessor definitions to produce all the different crt object files necessary.


Assignment:

Continue with final project. Presentations next week.


Links:


Morgan Deters / About me / OpenPGP Public Key / 02 Jan 2006

This page is certified valid HTML 4.01!  This page refers to certified valid CSS!  I support the AnyBrowser campaign