next up previous contents
Next: Input Parameters Up: Continuous Compilation for Software Previous: Simulation of Real Programs

 

Chapter 5
ProGenitor

In the course of analyzing the performance of real programs with our simulator, we decided that it would be advantageous to be able to modify various characteristics of the programs to determine the extent to which these characteristics influence the performance. Making extensive modifications to the actual programs or writing the test programs from scratch both seemed infeasible. Instead, we developed the ProGenitor program.

The ProGenitor program is a pseudo-program generator. It takes as input the desired characteristics of a program and generates the setup and behavior files required by our continuous compilation simulator. Using ProGenitor we were able to examine the performance of programs fitting the characteristics we desired without the undue expense of creating each program or searching for existing programs exhibiting some particular trait.

The design of ProGenitor was necessarily driven by the use for which it was created. In particular, the defining characteristics of the generated programs are those which we thought might deserve the most consideration when examining their effect upon the efficiency of the continuous compiler.

The most obvious program trait to consider in terms of its effect on program performance is the relative amount of time spent in library routines as opposed to user routines. Library routines, such as those found in system libraries, are precompiled and available to the continuous compiler in native-code form at all times while user routines are defined in the program source files, which start out in interpreted form. As the percentage of time spent in library routines increases, the performance of the program using the continuous compilation model should approach that of executing a native-code version of the program [7].

ProGenitor thus focuses on the distinction between library routines and user routines since it was thought that this would be the factor that most influenced the performance of the continuous compiler. ProGenitor allows the defining traits of the library and user routines to be specified independent of each other, as well as providing the ability to specify the interaction between the two disjoint sets.

The relative amount of time spent in library and user routines is dictated in part by the relative number of routines of each type. However, the library/user time ratio is also dictated by a number of other factors, such as the average length of time spent in each function and the overall call structure.

In addition to its contribution to the library/user time ratio, the call structure is another factor likely to affect the performance of the continuous compiler. By the term ``call structure'' we are referring to the overall structure of the program as delineated by the component functions. For example, the call structure of a program would be considered wide and shallow if the program consisted of a large number of functions with a short amount of time spent in each function. At the other extreme, a program would be said to have a narrow and deep call structure if it spent a large amount of time in relatively few functions. While most real programs are not likely to fit neatly into such ordered categories, it is still a helpful generalization. The ProGenitor program was designed with this distinction in mind.

ProGenitor allows most of the input parameters to be specified as probability distributions rather than just exact values. The purpose of this is two-fold. First, some parameters are inherently multivalued and therefore fit probability distributions better than single, fixed values. An example would be the number of functions defined in each source file. This number is not likely to be the same for each source file of a program. However, rather than specify it separately for each source file, we took the more convenient approach of specifying the number of functions per source file as fitting a probability distribution.

The second reason for using probability distributions to specify various parameters is to allow us to generate closely related programs from the same specification by the simple expedient of varying the random seed. For example, by specifying the number of source files as a probability distribution rather than a single value, we can easily generate programs that are similar in structure, but with a varying amount of source files.

The individual parameters that are supplied to ProGenitor are described in detail in Section 5.1 and the allowed probability distributions are described in Section 5.2.




next up previous contents
Next: Input Parameters Up: Continuous Compilation for Software Previous: Simulation of Real Programs