next up previous contents
Next: Program Generation Up: ProGenitor Previous: Input Parameters

 

5.2 Probability Distributions

While it is possible to specify the defining characteristics of the desired program to ProGenitor as exact values, ProGenitor also has the ability to accept the specification of certain probability distributions as values for most of the parameters. In particular, most of the input parameters are best specified as conforming to either a uniform or normal distribution.

Consider for example the parameter specifying the number of functions per source file. If this value is given as a single discrete number--say, for example, 3--then each source file will define the exact same number of functions: 3. While this behavior may sometimes be what is desired, it is certainly not very flexible. As an alternative, the number of functions per source file can be specified as following a uniform distribution with lower and upper bounds, say 1 and 5. The exact number of functions defined by each source file will then be generated randomly by ProGenitor, but is guaranteed to be between 1 and 5 (inclusive); and, furthermore, the values chosen will follow a uniform distribution.

In addition to a uniform distribution, it is also possible to specify a (truncated) normal distribution. A normal distribution requires the specification of a mean and standard deviation, as well as minimum and maximum boundaries.

The ProGenitor program will accept a probability distribution as input for all parameters except the execution cutoff time and the random seed. For some parameters, such as the number of functions per source file or the number of call sites per function, multiple values can be used within a single generated program. For other parameters, such as the number of source files, only one value is needed. However, using distributions for these parameters is still sometimes valuable. By using distributions for these parameters, and by specifying various seed values for the random generator, it is possible to generate a family of closely related programs which share certain characteristics, but are not exactly the same.

Choosing truncated normal and uniform as the allowable distributions was rather arbitrary. Ideally, the distributions would be modeled after the characteristics of real programs, but as discussed in Section 2.3, no such study of the characteristics of modern modular or object-oriented programs has been done (at least as far as we know). If such information becomes available in the future, it should not be much trouble to expand ProGenitor take advantage of it.

The method used to generate values following uniform and normal distributions are based on those presented by Knuth .


next up previous contents
Next: Program Generation Up: ProGenitor Previous: Input Parameters