|
Fall 2005 Programming Language Seminar: |
|
| Seminar main page | Discussion notes | Assignment | Links |
This week's discussion led by Morgan Deters.
Some tools I used in class this week are very important. We've seen
them a bit before but haven't really discussed them. If you are
unfamiliar with readelf and objdump, you
should skim their manual pages to see what they do. They are good
things to have in your toolbox. I also used the c++filt
tool during class, which allows you to understand the C++ and Java-mangled
symbols you get from these low-level tools. As always, familiarity with
the GNU debugger gdb (which I also use regularly in class)
is critical as well.
Some questions arose last week about how profiling actually operates under the hood. We return to it now.
When compiling with the -p or -pg options,
gcc instruments your code to count how many times functions execute
and how long they take. First, remember how the profiling process
works (profile_test.c available here):
[mdeters@limbo gcc-4.0.1-tests]$ gcc -pg -o profile_test profile_test.c [mdeters@limbo gcc-4.0.1-tests]$ ./profile_test 0 5000 10000 15000 20000 ... 4990000 4995000 done total == 1642668640 [mdeters@limbo gcc-4.0.1-tests]$ gprof profile_test Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 94.00 1.41 1.41 5000000 282.00 282.00 foo 6.00 1.50 0.09 main ...
In the profile_test program, main() calls
foo() five million times. This is reflected in the gprof
output. How does it do this counting?
Let's disassemble the profile_test binary:
[mdeters@limbo gcc-4.0.1-tests]$ objdump -d profile_test ... 08048504: 8048504: 55 push %ebp 8048505: 89 e5 mov %esp,%ebp 8048507: 53 push %ebx 8048508: 83 ec 14 sub $0x14,%esp 804850b: e8 74 fe ff ff call 8048384 ... 08048568 : 8048568: 55 push %ebp 8048569: 89 e5 mov %esp,%ebp 804856b: 83 ec 18 sub $0x18,%esp 804856e: e8 11 fe ff ff call 8048384 ...
I'm cutting out a lot of the output here, but notice these calls to
the mcount() function in the prologues of both
main() and foo(). These don't exist when we
aren't compiling with profiling instrumentation! So we've figured out
how gcc does function invocation counting, but let's dig into glibc
source just to make sure we understand it.
(I'll refer to glibc 2.3.3 sources here.)
mcount() has a platform-independent part to actually
increment the counts. However, let's look at its definition (in
gmon/mcount.c, line 59 -- remember we're in the
glibc source tree, not the gcc one):
_MCOUNT_DECL(frompc, selfpc) /* _mcount; may be static, inline, etc */
Its actual definition will depend on the platform (the
system-dependent headers define _MCOUNT_DECL). For i386, we have
these from sysdeps/i386/machine-gmon.h:
extern void mcount_internal (u_long frompc, u_long selfpc) internal_function; #define _MCOUNT_DECL(frompc, selfpc) \ void internal_function mcount_internal (u_long frompc, u_long selfpc)
Okay, so that means that the platform-independent part of
mcount() is actually called
mcount_internal() (though really it's called
__mcount_internal(), with leading underscores) and is
defined with this internal_function thing. This
internal_function thing is defined elsewhere and just
sets up gcc to pass parameters in registers rather than on the stack.
So we still haven't found mcount(). Well, it
turns out it's hand-coded in x86 assembly. You can find it in
sysdeps/i386/i386-mcount.S. Why is this part
platform-specific, and why does it have to be coded in assembly?
To understand this, note that the part written in C,
mcount_internal(), takes two arguments: a
selfpc (the function that called mcount())
and a frompc (the function that called that function).
Normally this information isn't available to C. However, return
addresses are on the stack at well-known locations, so we can get to
them from assembly. All mcount() does is set up the
call to mcount_internal().
First, the mcount() assembly stub saves some registers
that will get clobbered (it later restores them, as you can see).
It copies its own return address from the stack into %edx
(that will be selfpc---remember
mcount_internal() expects its arguments in registers, not
on the stack!), and copies the previous stack frame's return address
into %eax (that's frompc). Then it calls
mcount_internal().
mcount_internal() figures out which functions its
arguments point into and updates counts in a call graph of the
program. The gprof program will tell you not only how many
times a function was called, but how many times it was called from
each function that called it. That's why the frompc
argument is there.
What about timing, though? So far we've only seen counts.
Timing is handled indirectly. GCC could instrument each function
with a timer, starting it on entry, recording it on exit, and pausing
it on calls to other functions. But this would add a lot of code and
runtime overhead. Instead, GCC tells glibc to set up a regular
profiling alarm. This is done in sysdeps/posix/sprofil.c
at line 351 (note we're still in the glibc 2.3.3 sources here).
(You can look up the manual page of setitimer() yourself.
It's useful for user applications as well.)
In this case, a profiling timer is set. This timer counts down when the process is in both user- and kernel-space and delivers a SIGPROF signal to the process when it expires. (The granularity of this timer depends on what the operating system is willing to provide.)
When SIGPROF is delivered, profil_count()
(sysdeps/posix/sprofil.c line 108) will be run and
passed the process's program counter at the point of interruption.
profil_count() notes what function was executed when the
SIGPROF was delivered, and resumes the process.
The end result is a number of profiling samples across your profiled program. For each function, you know how many times the profiling timer went off while it was executing. This gives you a good idea (though imprecise due to sampling error) of where time is spent in your program.
Okay, we're done with the glibc sources for now. Back to gcc.
The Boehm garbage collector, designed and written by
Hans-J. Böhm, is a conservative garbage collector
designed to operate in hostile environments. Conservative here
is a technical term meaning that the collector cannot distinguish
between pointer and non-pointer data in general. The Boehm collector
is designed to operate as a general malloc() replacement
in C (imagine!--just malloc() and let the system do the
free() for you!). Certainly this environment is
"hostile" to garbage collection, and certainly the collector cannot in
general distinguish a pointer from a non-pointer (both just being
bit-patterns somewhere in memory).
However, in GCJ-compiled Java code, we do know where pointers are and where they aren't. Casting between arithmetic types and pointers (object references) is not permitted in Java, and neither is pointer arithmetic. In this constrained environment, there's no way for an application to hide a pointer from the collector, and, further, heap layout information can be made available at runtime. The collector can use this information---and when running on behalf of GCJ-compiled code, the Boehm collector is not actually conservative.
A full description of the Boehm collector is beyond the scope of these
notes. I recommend you check out the boehm-gc/doc
directory in the GCC source tree for more information about the
collector and its uses. But this brings us to an important point---a
version of the Boehm collector sources are available right there in
the GCC source tree for us to browse.
GCJ needs to interact with the Boehm collector at runtime. Let's
trace what happens. When we covered
Runtime Support I, we saw that
the GCJ compiler driver produces a main() stub in
response to the --main compiler option---a short
main() routine, in C, that calls
_Jv_RunMain() from libgcj. Peeking at
_Jv_RunMain() (remember it's in
libjava/prims.cc, we see that it calls
_Jv_CreateJava_VM() (also in that file). This function,
among other things, calls _Jv_InitGC().
You'll find _Jv_InitGC() in
libjava/boehm.cc. This function performs a number of
initialization duties for the Boehm collector when it runs for
GCJ-compiled code. First, it turns off the collector's feature that
detects interior pointers---those pointers that point into the
middle, rather than the beginning, of objects (this is very
common in C when e.g. traversing an array). It registers the
_Jv_Mark_Obj() function as procedure #0 (more on
that later!) and sets up a new kind of object, the Java array, that
requires special treatment. Java arrays are associated to the
_Jv_MarkArray() routine rather than
_Jv_MarkObj().
This is probably a good time to mention that the Boehm collector is a mark-&-sweep collector. Such collectors operate in two phases. First, the mark phase finds all objects in the heap that are reachable, through pointers, from the stack, machine registers, and static process data (global variables). All of these objects are marked (a bit is set in their header). Second, the sweep phase scans the entire heap, reclaiming all unmarked objects.
To ensure that the application cannot hide some pointers during the mark phase, the Boehm collector (in its default configuration) stops all application threads while marking. However, the sweep phase proceeds incrementally, as memory is needed. (This reduces unnecessary swapping that you otherwise would get when scanning the entire heap.)
Okay, so how is all of this hooked up, then? I mean, let's say I allocate some storage in heap. How does the collector know that that storage constitutes an object?
The allocation functions used are the collector's. Remember I
mentioned above that the Boehm collector can be used as a
malloc() replacement. So the GCJ runtime just uses the
Boehm collector's memory allocation routines. The collector makes
sure there's space for the mark bit and other information it needs to
deal with the chunk of allocated memory. Look in
libjava/prims.cc, line 425. You'll see a number of
different allocation routines. Anytime you write new in
a Java program, GCJ will translate it into a call to one of these
allocation routines (and a constructor invocation to initialize the
object). These functions make use of the lower-level routines
_Jv_AllocObj(), _Jv_AllocPtrFreeObj(), and
_Jv_AllocArray(), all of which can be found in
libjava/boehm.cc. These routines, in turn, use
GC_GCJ_MALLOC() and similar---the Boehm collector's
allocation routines---to get the storage. Note that
_Jv_AllocArray() tells the collector that it's allocating
an array---we set up the array object kind in
_Jv_InitGC() and so we have to tell the collector that
this is one such object that we want treated specially.
Okay, so then, during marking, how does the collector avoid conservatism---how does it know what is a pointer and what is not?
The collector has the help of the compiler and the runtime. During Runtime Support I, we saw that there was a "GC Descriptor" stowed away in each type's vtable. Well that's the part that tells the collector what's a pointer!
In its simplest form, the GC descriptor for a type is a bitmap
indicating which object slots are references---a "1" for a
reference, a "0" for a non-reference. However, the type might be
too large for such a representation, so it can also be a count
of the reference fields, if they all occur in objects of the type
before non-reference fields. But they might not, so in general, you
can also indicate a function for the collector to call so you can mark
your object yourself. You specify it by a simple index---for example,
by specifying that it's procedure #0. That's what the
_Jv_MarkObj() routine is, and that's why we registered it
as procedure #0 in _Jv_InitGC().
This descriptor is built in two places---once in GCC's Java front-end
(for compiled classes) and once in the runtime (for dynamically-loaded
class files). These implementations are, respectively,
get_boehm_type_descriptor() (in
gcc/java/boehm.c) and _Jv_BuildGCDescr() (in
libjava/boehm.cc).
Let's look at get_boehm_type_descriptor(). Several
checks are performed to see if a bitmap is possible (line 215),
or, if not, if a count is possible as a descriptor (line 209).
Finally, at line 226, it is arranged for
_Jv_MarkObj() to be called to mark the object.
This descriptor is stuffed into each type's vtable right after the
class pointer. Look at line 1467 of
gcc/java/class.c. At this point, all the methods are in
the vtable already (as a list). Line 1468 prepends the
calculated GC descriptor to the vtable list. Line 1471 then
prepends the class pointer to the resulting list. Some more
stuff is prepended (corresponding to what we studied about vtables
before), and the final product is the finished vtable (in tree form).
Okay, so now the GC descriptor is available at runtime to the
collector (via the object's vtable), and the collector can figure out
what to do. But what if it calls _Jv_MarkObj()? How do
we write such a function?
Look at libjava/boehm.cc line 67. This is the
_Jv_MarkObj() function. Now skip ahead to line 309.
This is the general case. (We're skipping the marking case for
instances of java.lang.Class, which are treated as a
special case. That part of the code makes sense once you understand
what's to follow.) What we have here is an inner loop that visits all
fields of the object, marking ones that are reference fields. The
outer loop ensures that we look at the reference fields in our
superclass too, as well as those of its superclass, etc. The
_Jv_MarkArray() is even easier---mark the associated
java.lang.Class, then mark all its slots! (Note that
_Jv_AllocArray(), which indicates that the object is of
the special "array" type, only applies to arrays of reference
type---other arrays are allocated as pointer-free objects by
_Jv_NewPrimArray() in libjava/prims.cc.
There is "another" garbage collector (though not really) supported by
GCJ. This is the "nogc" collector, and is a garbage collector
interface for GCJ that does nothing. No collection, nada, zip. So
what's its use? Well, maybe you have an irrational fear of garbage
collection and can't bring yourself to use it. Or, more likely,
you're writing your own garbage collector and want a simple
example of how to implement GCJ's interface (see
libjava/nogc.cc). The "nogc" "collector" can be selected
via the --enable-java-gc=no configure option.
Ever notice how when you compile your programs with
gcc -v, you get a lot of extra stuff in the link?
[mdeters@limbo gcc-4.0.1-tests]$ gcc -v -o test test.c [...] /export/home/mdeters/gcc-4.0.1-prefix/libexec/gcc/i686-pc-linux-gnu/4.0.1/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o test /usr/lib/crt1.o /usr/lib/crti.o /export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/crtbegin.o -L/export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1 -L/export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/../../.. /tmp/ccUIfP9z.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /export/home/mdeters/gcc-4.0.1-prefix/lib/gcc/i686-pc-linux-gnu/4.0.1/crtend.o /usr/lib/crtn.o [mdeters@limbo gcc-4.0.1-tests]$
What is all that crap? Pointing to the dynamic linker makes
sense, linking in gcc support libraries (as well as the standard
C library) makes sense, and the /tmp thing is my source program after
compilation and assembly, but crt1, crti,
crtbegin... that isn't my code!
Well, it may not be, but it's vital. crt1.o,
crti.o, and crtn.o (and some other, special
varieties of them) are provided by glibc. crt1.o is
always the first thing in the .text section of the output ELF file (the
old pre-ELF file format used a crt0.o file, which you might
still see around sometimes but I'm not going to discuss further
here; the basic purpose is the same). If you disassemble
crt1.o you'll find that it has a function called
_start which will call into the C library, which (after
some initialization) in turn calls your main() function.
Ever wondered what happens before your main() function is
called? Now you know.
crti.o and crtn.o come first and last for a
reason. They contribute infrastructure to the .init and .fini
sections of your application---that stuff that happens during start-up
and tear-down of your application. If you disassemble
crti.o (e.g. with objdump -d), you'll
notice it only contains partial definitions of
functions---functions that simply trail off and don't return! This is
okay, it turns out, because we'll finish them with crtn.o
later (disassemble that one to see the finishing bits of the functions).
Meanwhile, the .init and .fini sections are open for contributions
from other parts of the link. And in particular,
crtbegin.o and crtend.o have something to
add to them.
crtbegin.o and crtend.o (and some differing
varieties of them) are provided by the compiler---remember the
other ones came from glibc. crtbegin.o contributes a
call to .init to a frame_dummy function (which registers
some Java classes, if any, and performs other runtime initialization
duties) and schedules .fini to call static data destructors.
crtend.o schedules .init to call static data
constructors. This is how static data (global variables) get
initialized before your main() executes. No magic
involved, just more code supplied by the compiler for your
convenience!
The source for all the crtbegin.o and
crtend.o is in gcc/crtstuff.c. When the
compiler is built, this file is compiled multiple times with different
preprocessor definitions to produce all the different crt object files
necessary.
Continue with final project. Presentations next week.