With these experiments we hope to show that the continuous compilation model compares favorably against the traditional compilation model in terms of performance. To do so, we compare the cold-start recovery times of the two models.
Table 6.2: Cold-start with Traditional Compilation
System
The cold-start recovery time is defined as the time necessary to reach a particular point in the program's execution when starting from the initial, uncompiled source. For the traditional system, the recovery time includes the time necessary to compile the program and then execute the program up to the specified point of execution. The recovery time for the continuous compilation system is the time necessary to reach that same point in the program's execution using the continuous compiler. The traditional-system recovery time is calculated by simply summing up the times needed to compile, link, and execute the program. The continuous-system recovery time is obtained from our simulator. Table 6.2 shows the cold-start recovery time in seconds for a traditional compilation system for each of the real programs under consideration.
In Table 6.3 we compare the recovery time, in seconds, of the traditional and continuous compilation systems from a cold-start, using the random compilation selection strategy (see Section 3.2), the replace-at-call replacement strategy (see Section 3.3), and an interpretation penalty of 10.
Table 6.3: Comparison between Traditional and Continuous
Replace-at-Call
By interpretation penalty of 10 we mean that we are assuming that it takes 10 times long to interpret a given piece of code than it would to execute the equivalent native code.
In four out of the six cases, the continuous compiler shows a significant improvement in performance, from 1.5 up to 2 times faster than the recovery time of the traditional compiler. However, in the other two test cases, ghostview and render, we see a decrease in performance. With render the slowdown is fairly small, but with ghostview the continuous compilation recovery time is more than three times the recovery time of the traditional compiler.
We can uncover a clue about the possible cause for the poor performance of ghostview with the continuous compiler using replace-at-call if we examine the performance using replace-preemptive. Table 6.4 shows the performance of ghostview, pico, and render using replace-preemptive instead of replace-at-call. (We are still using the random compilation selection strategy and an interpretation penalty of 10.)
Table 6.4: Comparison between Traditional and Continuous
Replace-Preemptive
As expected, we see an improvement in the performance of the continuous compiler when using the replace-preemptive method over the replace-at-call method. With render this improvement is small, but enough to push it past the performance of the traditional compiler. With ghostview, however, the improvement is tremendous. The recovery time dropped from 377.6 seconds to 83.2 seconds, a speedup of over 450%. Again, this is enough to beat the recovery time of the traditional system. Pico is an anomaly; the recovery time of the continuous compiler is identical when using either the replace-at-call or replace-preemptive strategy. In Section 6.4 we examine the effect of the replacement strategies in more detail. First, however, a few comments are needed about the times given in Table 6.3 and Table 6.4.
The times given for the continuous compiler are the times for the Interpreter module only. In other words, we are assuming that there are two processors, one running the Interpreter module and one running the Compiler module. However, the times given for the traditional compiler assume sequential, single-processor execution. To make the comparison more fair, we need to either give the traditional compiler two processors or restrict the continuous compiler to one processor.
Giving the traditional compiler two processors would not simply halve the recovery times. The performance depends very strongly on the implementation of both the compiler and the program being executed. Both would have to be redesigned to take advantage of the additional processor. This would be a very complex process, and we have no easy way to determine what the actual speedup would be.
On the other hand, the continuous compiler is already parallel in nature and can easily be designed to take advantage of the extra processor. However, in order to make the comparison fair, we will compare the performance of the traditional compiler with that of the continuous compiler running on a system with only one processor.
To calculate the recovery time for the continuous compilation system with only one processor, we will simply add the time spent in the Interpreter module to the time spent in the Compiler module. (To be more realistic we should add some extra time to account for the overhead of context switching between the Interpreter and Compiler modules, but we will ignore that for now.)
Table 6.5: Single-processor performance of render
Table 6.5 shows the performance of the render program for each of the compilation strategies on a single-processor system using the replace-preemptive scheme and an interpretation penalty of 10. The values in the ``Speedup over Traditional'' column were calculated by dividing the 179.1 seconds recovery time of the traditional compiler (see Table 6.2) by the time in the ``Total Time'' column.
As expected, the recovery time of the continuous compiler with a single processor is worse than that of the traditional compiler, due to the extra time spent by the continuous compiler interpreting code before the code gets compiled. However, the difference is fairly small. In fact, with the longest-overall compilation strategy, the difference is only 7.6 seconds, which is less than a 5% slowdown. Even in the worst case, produced by the smallest-first compilation strategy, the slowdown is only about 20%
Table 6.6: Single-processor performance of pico
When we look at the values for pico (again using replace-preemptive and an interpretation penalty of 10), as shown in Table 6.6, we see a similar trend, except that with pico, all the values for the continuous compiler are slightly better than the recovery time of 129.5 seconds for the traditional compiler.
An important point to bear in mind is that with the traditional compiler, the user must wait for compilation to finish before any useful work can be done. With pico this means waiting over 63 seconds before the program even starts execution. In contrast, execution of the program begins immediately with the continuous compiler.
The immediate response time of the continuous compiler is an extremely important benefit. Imagine a situation where someone is trying to decide whether to purchase a dual-processor system to use with a continuous compiler or, for the same money, a single-processor system, with a processor that is twice as fast as the ones in the dual-processor system, for use with a traditional compiler. Even though compilation now occurs twice as fast as before, the traditional compiler would still take over 30 seconds to compile pico. With the dual-processor continuous compiler, program execution begins immediately. In addition, even if we assume that the recovery time for pico with the twice-as-fast processor is half that of the 129.5 seconds given in Table 6.2, it is still greater than the recovery time of around 63.6 seconds for the dual-processor continuous compiler. In actuality, the situation for the twice-as-fast traditional compiler would be even worse. With programs like pico, which spend most of their time waiting for input from the user (see Section 6.6), doubling the processor speed is not going to halve the execution time.
When we look at the values for ghostview, shown in Table 6.7, we get an even greater surprise: the continuous compiler outperformed the traditional compiler's recovery time of 112.9 seconds by up to 7.2 seconds. (This is again with the replace-preemptive strategy and an interpretation penalty of 10).
Table 6.7: Single-processor performance of
ghostview
If we take a closer look at Table 6.7, we find that in four of the cases (mfex-so-far, long-so-far, largest-first, and long-overall) the continuous compiler did not have time to compile the entire program. In fact in these four cases, the continuous compiler running on a dual-processor system would finish execution of the program before a traditional compiler would even finish compiling the program.