Compilation and Automatic Optimization of Network
Protocol Implementations
Grant Status
28 July 1997
Improving Web Performance by Automatic Compilation of Hints
George has worked with student Girish Chandranmenon on improving Web
Performance using compilation techniques. When we first wrote the NSF
grant some two years ago, our focus was on using compilation for
improving the performance of basic network protocols such as TCP and
packet filters. However, we feel that much of the exciting action has
moved to the application domain and to protocols such as the Web and
Corba. Thus, the concentratino has
shifted to Web performance.
Girish and George have invented
the concept of adding performance hints to Web pages. Since
asking users to add such hints would be tedious and error prone, they
suggest the use of compilation to preprocess Web pages to add these
hints. Here is a summary of some of four specific new ideas for hints that
they have invented which reduce Web latencies.
-
Current network technology is bandwidth-rich but latency-poor; thus
round-trip delays will dominate access latency for web traffic. We
have invented four new techniques that reduce the round-trips needed for web
accesses. The techniques are based on the paradigm of compiling a
web page to collect information about links and inline data in the
page. STORED ADDRESS BINDING almost always eliminates the DNS
lookup (which can cost seconds) at the start of a transaction by storing
the IP address of the endpoint of a link.
-
In
INFORMED SERVER PROXYING, a server tells its client that it has
cached pages referenced in a page the client just retrieved; this
allows the client to retrieve the pages from its current connection,
instead of creating a new connection. This requires adding a bit
(locallyCached flag) to all links.
-
In SELECTIVE LINK REDIRECT, a server can direct its client to any
server that has cached a page referenced in the current document; this
allows cooperating servers to balance load.
-
Finally, AUTO INLINE
DOWNLOAD allows a server to send inline data such as images and
applets, without additional client requests.
Girish and George have written a paper and a technical report with
preliminary implementations (based on modifying the publically available
W3C consortium client and server code) and measurements of these techniques.
They submitted a paper to SIGCOMM 97 based on this report which was rejected
(though it had favorable reviews) with the suggestion that they add
more extensive performance tests. Girish will be working on an extensive
set of perfomance tests in the Fall and trying to convince the Web
community that these ideas are viable. A copy of the Technical Report
is enclosed with the hard copy of this report.
Publication (being readied for conference and journal submisison):
Reducing
Web Latencies Using Pre-computed Hints, Girish Chandranmenon and
George Varghese, Wash U Technical Report, 1997.
Automatic packet filtering
Packet filters are a mechanism for efficiently demultiplexing
network packets to application
endpoints.
We seek a
general,
formal specification method for packet filters that
allows for easy or efficient composition of specifications.
We are investigating
an automatic approach that achieves
these goals,
formulating
packet filter specification as a
language recognition problem:
each filter is represented
by a context-free grammar, whose language is the set of packets the
filter should accept. Thus, packet filters can be formulated through
a general,
well defined specification; further, the grammar-based approach
simplifies
filter composition, which is essential where
scalability is important.
Ron and Mahesh have pursued these goals through a simple, LR parsing
technique.
However, packet filters based on standard
LR parsing techniques
suffer from poor performance: they touch every portion of the input,
they check input bit by bit, they occupy large amount of space.
We developed new optimizations
to LR parsing that enable our automatic approach to overcome the
above problems and achieve
performance rivalling hand-crafted approaches.
Our results compare the LR approach to the
BSD packet filter for TCP connections; our approach shows significant
improvement
when there are multiple filters installed: for 50 TCP connections
our approach is 6 times faster.
These results are documented
in their technical report
Efficient Demultiplexing of Network
Packets by Automatic Parsing, the ideas from which were presented
at the first Workshop
on Compiler Support for System Software.
Turning to latency,
Ron and Jonathan have spent Summer '97 working on techniques
that substantially reduce the time needed to add or remove filters from
a composition. This is currently work-in-progress.
Benchmarking and Optimizing High-performance ORB Middleware
Doug has worked with
student Andy Gokhale on
benchmarking and optimizing the performance of distributed systems
middleware, in general, and Object Request Brokers (ORBs), in
particular. The goal of our research is to develop high-performance
ORB middleware that supports bandwidth- and latency-sensitive
applications. Our work on the NSF grant has provided the following
research contributions:
- Systematic benchmarking of ORB performance over high-speed
networks -- This work has systematically measured the performance
of ORB middleware and precisely pinpointed the sources of overhead
over high-speed ATM networks. These results indicate that the
performance of conventional ORB implementations suffer from to (1)
excessive marshaling/demarshaling i.e., presentation layer conversion
overhead, (2) inefficient server request demultiplexing strategies,
(3) long chains of intra-ORB function calls, and (4) lack of
integration with advanced operating systems and network features.
The results of our ORB performance experiments have appeared in SIGCOMM
'96, GLOBECOM
'96, IEEE
Communication Magazine, and ICDCS
'97.
- ORB optimizations -- Based on our systematic experiments on
ATM, we have developed and prototyped a series of optimizations to
improve ORB performance. These optimizations are based on (a)
optimizations based on networking protocol principles and (b)
optimizations based on compiler principles.
Our research on ORB performance optimizations has focused on reducing
the overhead incurred by interpretive marshaling/demarshaling protocol
engines, such as Sunsoft's Internet Inter-ORB Protocol (IIOP) engine.
The optimizations we've applied include optimizing for the common
case, precomputing, adding redundant storage, and optimizing for the
cache (submitted for review to HICSS '98).
In addition, we have developed and benchmarked several
high-performance IIOP request demultiplexing schemes based on
delayered active demultiplexing, perfect hashing,
and dynamic hashing (will appear in GLOBECOM
'97).
During the summer of '97, we are working on implementing an IDL compiler
that will use advanced compiler techniques to generate very efficient,
compiled stubs and skeletons. This is currently work in progress and
we plan to have results of these optimizations by the end of 1997.
Prepared by
Ron Cytron,
cytron@cs.wustl.edu
Prepared 7/97, Last Modified 7/97.