next up previous contents
Next: Related Work Up: Continuous Compilation for Software Previous: List of Figures

 

Chapter 1
Introduction

Traditionally, the domains of interpretation and compilation have remained fairly separate. In part this is because languages often lend themselves to one paradigm or the other. For example, languages like LISP and Smalltalk have traditionally been interpreted while those like C and Pascal are usually compiled. However, neither paradigm is inviolate. Compilers exist for many LISP variants [8] and interpreters exist for both C and Pascal [4].

Interpretation is often used for fields such as rapid prototyping, due to the interactive nature of development and the quick response to change. Compilation is generally used for production-level applications due to the performance advantage of executing a program in native-code form. In industry, projects are often started in an interpreted environment to take advantage of the relative ease of development and then later moved to a compiled environment, often at high cost to the company.

In this thesis, we examine the feasibility and benefits of a continuous compilation system by examining the extent to which program interpretation and compilation can be overlapped and intermixed in a system that continuously translates programs into native-code form throughout program execution. We will present experimental data that indicates a significant performance gain by the continuous compilation model over more traditional approaches.

In a typical software development environment, the sequence of steps Edit tex2html_wrap_inline1388 Make tex2html_wrap_inline1388 Execute, as depicted in Figure 1.1, is repeated throughout program development, where the ``Make'' step involves

  1. compilation of those files that have been changed,
  2. compilation of those files that (ultimately) depend on files that have changed, and
  3. linking of separately compiled objects into an executable image.
Execution then proceeds until the next ``bug'' is found, at which point execution is stopped and the cycle repeats from the ``Edit'' step.

  figure54
Figure 1.1: The Traditional Development Cycle 

We are interested in the recovery time of the program development cycle, defined as the time interval occupied by the ``Make'' and ``Execute'' steps just up to the point of arresting program execution. During this time interval, the user has essentially abdicated control of the development cycle to the computer. In particular, shortening this interval improves life for the program developer, because the effects of a program edit can be seen more quickly. For example, if recovery time takes a minute or more, then there is a good possibility that the program developer has switched to a different task, such as reading mail or solving other problems. While recovery time can be shortened by purchase of a faster computer, we investigate an alternative in this thesis, namely adopting a better paradigm for program compilation.

  table59
Table 1.1: Compilation Times With and Without Optimization 

Until recently, the practice of well-accepted software design principles (such as modularity and encapsulation) generally resulted in source files that took relatively little time to compile: the ``Make'' step occupied a relatively insignificant portion of the recovery time. However, developments in language and compiler design have conspired to make compilation times grow:

While efforts at improving compiler performance for processing new language constructs and for aggressive program optimization can only be applauded, we propose in this thesis a complementary approach:

The compiler should effectively continuously transform a program from an interpreted to a fully optimized form.

Consider the scenario depicted in Figure 1.2. Program execution begins with interpretation, with compilation proceeding concurrently with execution. As execution continues, the interpreted source code is gradually replaced with natively-executable code. Performance increases until, eventually, the entire program has been translated to a fully native form.

In this thesis we concentrate on the exploring the first step: the transition from interpreted to native code. But it does not have to stop there. The continuous compiler can proceed with improving the code, starting with interprocedural optimization and moving on to intraprocedural optimizations, until a fully optimized form has been produced.

This approach provides an immediate response time, as with interpretation. However, it also gives performance that approaches and, with more time available to spent on optimization, possibly surpasses that of executing native code.

  figure90
Figure 1.2: The Continuous Compilation Cycle 

The reduction of response time in software development environments is becoming more important as the development environments become more interactive. Visual programming, in particular, is one area in which response time is of vital importance. Typically, visual programming environments use interpretation during the development stage with compilation occurring only when requested by the developer, usually when the code is considered ``complete.'' Such an environment could benefit greatly by the introduction of a continuous compilation model.

The possible applications of this continuous compilation paradigm is by no mean limited to software development. We see an even greater impetus for migrating toward the continuous compilation model coming from the expanding field of moving-target computing. By the term ``moving-target computing'' we are referring to two disparate but parallel trends in computing, namely, mobile computing and platform independence.

Mobile computing simply refers to the use of workstations or terminals that are physically mobile. Until recently, these mobile computers were mostly independent units that connected to other computers or networks only at fixed places where wired communication media was available, and only to exchange relatively small amounts of data. However, in spite of the lower bandwidth provided by current wireless technology, wireless data communication is becoming more prevalent. At the same time, we are seeing a shift to a more connected, networked model of computing.

With this shift in mobile computing comes the need for rapid transmission of programs from a remote server to the mobile platform. Interpretable forms, such as bytecodes, are generally more compact than directly executable forms and can therefore be transmitted much more quickly. For example, a desk calculator implemented by recursive-descent parsing takes approximately 20 Kbytes in native Sparc executable form (after stripping) while a bytecode representation of this same program takes only 3 Kbytes. Regardless of the transmission media used, it takes over six times as long to transmit the executable form. With current transmission rates this can mean a difference of over 15 seconds in transmission time.

The drawback to using interpretation is again execution performance. We generally assume that, interpreting a program incurs a penalty factor of 10 over executing the program in native-code form [19]. Reducing the time required to receive a program to one-sixth of its previous value may not provide any benefit if it takes ten times longer for the program to execute. The use of the continuous compilation model can provide the benefits of the short transmission time without sacrificing performance.

The second component of the moving-target paradigm is platform independence. Historically, making an application available on multiple platforms often required maintaining a separate source for each platform. At the very least, separate binaries are needed. However, it is becoming increasingly easy to deliver applications in a target-independent manner. It is becoming feasible for vendors to maintain a single source for an application, distribute the application in machine-independent form (such as bytecodes), and rely on the remote user's platform to generate the native executable code. The explosive popularity of Sun Microsystems' Java [18] language gives testimony to the desire that exists for this computing model.

For machine-independent application distribution, there is no choice but to deliver the application in platform-independent form. The continuous compilation model examined in this thesis allows interpretation of this form to begin immediately, thereby reducing perceived response time, while also reducing the performance overhead imposed by interpretation.

This thesis presents the results of experiments performed using a simulator of a continuous compilation system to determine the possible performance benefits or detriment of such a system. We also examine various system design issues in an attempt to discover their effect on the performance of the system. In particular, we examine the following questions:

  1. How does the recovery time of the continuous compilation system compare with that of a tradition compilation system?
  2. What effect does the order of compilation of files have on the performance of the continuous compilation system?
  3. How great an effect does various methods of replacing source code with native code have on the continuous compilation system?
  4. To what extent do various program characteristics affect the performance of the program under the continuous compilation system?

Chapter 2 presents some background information and related work. Chapter 3 describes the continuous compilation model in more detail. Chapter 4 describes the simulator we built to perform the timing experiments. Chapter 5 describes a utility we built to generate test cases. Chapter 6 describes the various experiments we carried out to explore the performance of the continuous compilation model. In Chapter 7 we give some concluding remarks and discuss future work.


next up previous contents
Next: Related Work Up: Continuous Compilation for Software Previous: List of Figures