As the WWW continues to grow, providing adequate bandwidth to countries remote from the geographic and topological center of the network, such as those in the Asia/Pacific, becomes more and more difficult. To meet the growing traffic needs of the Internet some Network Service Providers are deploying satellite connections. Through discrete event simulation of a real HTTP workload with differing international architectures this paper is able to give guidance on the architecture that should be deployed for long distance, high capacity Internet links.
We describe the effect of carrying multiplexed HTTP connections over a high bandwidth delay circuit. We show that a high degree of multiplexing mitigates against TCP's bandwidth delay product limits but that using a TCP connection per HTTP request causes a significant increase in delay. The use of a US based proxy for TCP termination without caching is shown to allow the workload on the connection to be tuned for best performance.
Topics: Networking and Communications
Keywords: TCP/IP performance, satellite, HTTP
The growth of the Internet is placing strain on the world wide telecommunications infrastructure. In particular it is no longer possible to purchase capacity on terrestrial cables to some parts of the world. The demand for network services continues to grow despite the lack of terrestrial capacity. To meet this demand Network Service Providers (NSPs) must move some traffic to satellite circuits, despite the inability of standard window size TCP [Postel81] implementations to perform well in high bandwidth-delay product environments [Allman98b].
NSP that wish to provide a satellite based service must choose an architecture for the international component of the network. Depending on the architecture chosen the traffic on the international circuit might be made up of a large number of independent connections or a small number of connections carrying aggregated traffic. The former would be the case if each HTTP request is carried directly over the link. The latter is the case if user connections are concentrated between a pair of proxies. More tuning is possible if proxies are used because the TCP connection is terminated at devices under the control of the NSP. The TCP stacks can be tuned to better meet the needs of the satellite connection. Without proxies TCP connections terminate at the end users, whose TCP implementations are outside the control of the NSP. The satellite case between proxies is shown in figure 1(b). To make the descriptions easier this figure and the rest of the paper, are written in terms of an international connection between New Zealand and the United States. The results are, of course, more widely applicable. This study is similar to [McGregor98] however this paper considers symmetric satellite connections and includes the effect of limited buffer sizes. This gives more accurate results.
There are advantages to both design approaches described above. The NZ only proxy case (figure 1(a)) is simpler to implement and does not require the NSP to deploy and maintain US based proxy equipment. However the full effect of slow start will be felt by every HTTP request. This effect may be magnified because there are often many HTTP requests required for a single HTML page.
The US-proxy case improves slow start behavior because slow start only occurs once for each inter-proxy TCP connection. When an HTTP request slow start does not normally occur for the international part of the connection. It will still occur locally within NZ and the US. Further performance improvements may be gained in the proxy-to-proxy case because the TCP stacks operating over the international link are under the control of the NSP and may be tuned. In particular a large buffer size may be selected. Finally the aggregation of several HTTP connections over a single TCP connection may allow TCP to better package the data and to carry more piggy-backed acknowledgments. Opposing these performance gains for the proxy-to-proxy case performance may be limited by the number of TCP connections available between the proxies. There are also protocol overheads required to support the multiplexing between the proxies.
In this paper we describe a discrete event simulation that investigates the effect of carrying multiplexed HTTP connections over a high delay bandwidth circuit. The simulations include a real TCP/IP protocol stack and are driven by a trace of HTTP activity collected from the NZ Internet exchange (NZIX). We show that a high degree of multiplexing mitigates against TCP's bandwidth delay product limits but that if a TCP connection is used for every HTTP request a significant increase delay is experienced. The simulations indicate that multiplexed HTTP connections between proxies at both ends of the link reduce this additional latency provided sufficient TCP connections are available between the proxies.
The rest of this paper is organised as follows: Section 2 describes the network being simulated, including the capacities of the links and transmission delays. The simulation is driven from an HTTP trace collected at the NZIX. Section 3 describes the workload including the main characteristics and how heavier workloads were formed to simulate the high capacity links. The simulator design is explained in section 4 and the results of the simulation runs are shown in section 5. The paper ends with the primary conclusions we draw from the results.
There are may ways in which a satellite connection might be deployed as part of an NSPs network. This paper considers two architectures.
The main elements of the first of these architectures is shown in figure 1. The diagram shows web clients located in NZ connecting to a NZ proxy. This proxy in turn connects to a US based proxy which, in its turn, connects to web servers based in the US. There are three TCP connections involved in fetching a web page. The first connects the web client to the NZ proxy, the second is between the two proxies and the third is from the US proxy to the US server.
Multiplexing is implemented between the proxies. That is the data for different replies may be interleaved on a single TCP connection between the proxies. The overhead of multiplexing is assumed to be, on average, 20 bytes per HTTP reply segment received from an HTTP server. It is expected multiplexing will improve the efficiency of the international link. Because TCP does not maintain the boundaries between application requests the data from (possibly different) HTTP reply packets may be repackaged for more efficient TCP transmission. Because of this most TCP segments will be the maximum segment size (MSS).
Because the connections between the proxies persist indefinitely the effect of TCP slow start is greatly reduced over the satellite component of the network, which is where it would otherwise have the greatest effect.
A second, simpler, case is also considered. In this second case the US proxy is replaced with a router. Only two TCP connections are involved with fetching a web page. The first, from the web client to the NZ proxy, is the same as in the US-proxy case. The second connection is from the NZ proxy to the US server providing the web page. In this case there is a TCP connection across the international link for each concurrent HTTP request. In the earlier case the number of TCP connections may be limited by the proxies.
|Satellite Bandwidth||34.368Mbps (E3)|
|TCP buffer size|
|-- Servers||as measured|
|Maximum Segment Size|
|-- Between proxies||1460|
|-- Elsewhere||as measured|
|Delayed in US cloud||as measured|
|Delays in NZ cloud||not simulated|
The main parameters of the network are shown in table 1. The values have been chosen to match real network parameters where possible.
The bandwidth delay product for this network is given by:
BDP = (2 * Dsat ) * B)
|where:||Dsat||is the satellite latency|
|B||is the link bandwidth|
|BDP||= ( 0.320 * 2 ) * 34.368|
|= 66.0 megabits|
That is the TCP buffers need to be larger than 8.2Mb to allow this link to be filled by a single TCP connection. If smaller buffers are used the buffer will be filled before the first data sent has been acknowledged and the flow of data into the link will be suspended while the transmitter waits for an acknowledgment. Standard TCP limits the window size to 64kb[Postel81]. The big window extension for RFC1323[Jacobson92] extends this limit to 232. Use of this option is not widespread and is outside the control of the NSP.
If an implementation is limited to 32767 bytes (a common implementation maximum) then the maximum bit rate for a single TCP connection over this link is:
|MBR||= S / ( Dsat * 2 )|
|= 32767 * 8 / ( 0.320 * 2 )|
|~= 409600 bps|
|where:||Dsat||is the satellite latency|
|S||is the buffer size|
This effect is seen in simulation of the link and is shown in figure 4. This graph shows that the bandwidth consumed by a single TCP connection as the offered load increases plateau at around 415kbps. This bandwidth limitation is expected to impact negatively on the performance of the international connection, especially where a US based proxy is used and the number of connections between the US and NZ proxies is small.
Real satellite NSP architectures would be not be as simple as the one described in this paper. Most will need more than a single proxy at each end of the international link to support the required load. The NZ proxy would almost certainly include a cache that satisfies some of the HTTP requests locally. There are many routers not shown, some of which are central to the satellite feed. The simpler architecture used in this paper makes the simulation easier and allows the difference between the ways of using a satellite link of this type to be examined without interference from the full range of factors that would impact the performance of a real system.
The simulations are driven from a trace of HTTP requests. Most of the information required to generate the simulation input files (described in the next section) was gathered from HTTP logfiles collected from the New Zealand Internet exchange (NZIX). The trace files used were collected from 3:00pm to 3:10pm in July 1997.
There were, on average, 421 requests per interval. (The actual trace includes more requests. This number is the number of successful international requests that were not satisfied by the cache hierarchy.)
To generate higher loads than that experienced when the trace files were collected, traces for the same time on successive days in July were integrated into a single trace. When higher still loads were required more than one copy of each trace was integrated into the logfile. Each copy was offset in time to minimise the effect of the artificial self correlation of the trace generated in this way.
The TCP MSS and server buffer sizes were not recorded in the HTTP traces we used. To discover these parameters a connection was established to each host while the network traffic was monitored using tcpdump. From the tcpdump output the MSS and window size was discovered for most hosts. Some hosts did not advertise their MSS. In this case the most common MSS (1480) was used.
Figure 2. Simulator Design
The simulation process can be considered as three interlinked processes, pre-processing, simulation and post-processing.
775 simulations (each requiring from around a minute to 15-20 minutes on a 450Mhz Pentium CPU running Linux) were run to produce the graphs reported in this paper. (For the curious that's around 3.5 million million CPU cycles).
Post processing is mostly a matter of collecting the results of interest from many simulation runs into a single set of plots. This was done with an array of Perl scripts. GNU plot was used to draw the plots.
The simulator used in this study was based on the ATM-TN simulator[Arlitt95] with modifications for this problem. The changes include replacing the ATM infrastructure with a simpler and more general bit serial interface.
The main two components used from ATM-TN are the conservative (as opposed to parallel) simulation engine and the TCP model. ATM-TN's TCP model includes the actual TCP code from 4.4 BSD Lite, modified to suit the simulation environment. Connections are simulated on a packet by packet basis and include slow start, congestion control, fast retransmit, and fast recovery algorithms.[Stevens97]
The simulator design for the US-proxy case is shown in figure 3. The simulator simulates the connections between the NZ proxy and the servers in the US. It does not include the NZ proxy to web client component of the network because this is not significant to the study. Additional delays that are dependent on the type of connection (e.g. modem or direct connect) will be incurred in the NZ component of the real network.
The non-US-proxy case is similar to the US-proxy one with the omission of the US proxy and the replacement of the two TCP connection modules with a single TCP connection module and a single set of end-to-end TCP connections.
The simulation assumes that the proxy has sufficient CPU and memory to manage the workload and that the delay imposed by processing on the proxy not due to TCP queuing and transmission is negligible.
The results of the simulations are presented in the following graphs. Each point on a graph represents a simulation run. The points have been joined with (straight) lines. Most curves represent different numbers of connections between the proxies. The no-US-proxy case is labeled ``base''.
As discussed in section 2.2 the bandwidth delay product of the network combined with the 32Kb maximum window size limits the throughput that can be obtained by a single TCP connection. A simulation of a single TCP connection over the satellite link is shown in figure 4. The curve is asymptotic to about 415kbps. This corresponds closely to the calculated maximum of 410kbps.
In figure 5(a) the load to NZ is plotted against the total NZ bound traffic presented to the US proxy or router.
Although each TCP connection is limited to around 410kbps there are many concurrent TCP connections in the no-US-proxy case (see figure 7(a)). This allows the link to saturate under high loads. The slope of the line through most of the graph is about 1.09 indicating that there are very few retransmissions occurring and that most packets are size MSS. The graph does not tail off until the link is within 2% of being saturated. (Note that this graph shows presented load against load carried not presented load against useful data transmitted.) Figure 5(b) shows that page latencies increase dramaticly at this time.
If there are a large number of concurrent connections between the caches the US-proxy case performs in a very similar way to the no-US-proxy case. The lines for 100, 150 and 200 connections have a slightly smaller slope than the no-US-proxy case indicating a small efficiency gain through repackaging the load on the more heavily used TCP connections into MSS sized packets. The efficiency gain is small and is probably not a significant saving.
For smaller numbers of connections (50 and below) the link does not reach saturation. Instead the TCP connections reach their saturation point and they limit the flow of packets to the international link.
Perhaps the most interesting result of the study is shown in figure 5(b). The graph shows the average time required to fetch a set of sample pages that were present in all simulations. The result for the US-proxy case, with a large number of connections, is around 45% lower (2.5s per HTTP request) than the no-US-proxy case. This results from the reuse of the international TCP connections saving most of the cost of slow start and connection setup. The saving for an HTML page with multiple components may be even greater. A closer examination of the start of the curves shows that for very low loads the gain is less. This is the case because the initial slow start on the international TCP connections is not amortized over as many HTTP requests and consequently it has a larger effect on the total latency.
For smaller numbers of connections the latency rises rapidly as the TCP throughput limit is approached. This can be seen in figures 5(b) and 5(a) indicates that this begins to occur when the TCP connections reach about 75% of their capacity. As a consequence to achieve the best HTTP latency performance more connections are required than are needed to saturate the international link.
Figure 6 shows the buffer space used in the routers which feed each end of the international link. Figures 6(a) and (c) show the mean usage while figures 6(b) and (d) show the peak usage. Note that the graphs have different scales. The peak usage is more erratic than the mean because of subtle interactions between connections.
In the no-US-proxy case the 512Kb buffer space is filled under high load and some packets will be dropped. This is also true of the US-proxy case if the number of connections is large enough to allow the link to saturate.
Buffer (or link) usage is never heavy in the NZ to US direction in the simulation. (A real US/NZ link would be more heavily used in the US direction because of requests on NZ servers from US clients. These are not simulated here. We assume that sufficient NZ/US capacity exists to carry the client requests to the US.)
Figure 7(a) shows the number of connections between the US proxy and servers for the US-proxy case. In the no-US-proxy case it shows the number of connections from the NZ proxy to US servers. In the latter case this increases rapidly when the international link is saturated because the HTTP requests take a long time to complete (see figure 7(b)). In general the no-US-proxy case uses more connections than the US-proxy case because the connections take longer to complete.
The number of US connections for the single inter-proxy TCP connection curve flattens at around 18Mbps presented load. This is because the TCP throughput on a single connection is insufficient carry the requests from NZ to the US. This is also apparent in figure 7(c). The typical relationship between inbound and outbound traffic can be seen in figure 7(d). This demonstrates the effect of insufficient capacity to carry the HTTP requests. When there are sufficient TCP connections to carry the load this shows an inbound to outbound ratio of about 1:19.
The difference between the no-US-proxy case and the US-proxy case in figures 7(c) and (d) indicates the saving made by repackaging HTTP requests into a smaller number of larger TCP packets. This has a more significant effect than in the US to NZ direction because HTTP requests are smaller than HTTP replies. The effect is probably not useful in current practice because NZ to US links are not normally saturated. This is because of the requirement to purchase symmetric terrestrial connections.
Multiplexing HTTP over, standard window size, TCP connections between international caches, connected by a satellite link, offers significant performance advantages.
The number of TCP connections between the proxies is important. To avoid the link being under utilised around 100 connections were required for an E3 satellite connection from NZ to the US. However additional connections are needed if the best page latency is required. In this case around 150 connections were required.
[Stevens97] W. Stevens, TCP slow start, congestion avoidance, fast retransmit, and fast recovery algorithms, Tech. Rep. RFC2001, IETF Jan. 1997.
[Allman98b] M. Allman, D. R. Glover, and L. A. Sanchez, Enhancing TCP over satellite channels using standard mechanisms, Tech. Rep. draft-ietf-tcpsat-stand-mech-06, IETF Internet Draft Sept. 1998.
[Allman98a] W. S. M. Allman, V. Paxson, Internet draft: TCP congestion avoidance, Tech. Rep. draft-ietf-tcpimpl-cong-control-02, IETF , Dec. 1998.
[Morgan89] W. L. Morgan and G. D. Gordo, Communications satellite handbook Wiley, New York, 1989.
[Postel81] J. Postel, Transmission control protocol, Tech. Rep. RFC793, DARPA, Sept. 1981.
[Jacobson92] V. Jacobson, R. Braden, and D. Borman, TCP extensions for high performance, Tech. Rep. RFC1323, IETF , May 1992.
[Arlitt95] M. Arlitt, Y. Chen, R. Gurski, and C. Williamson, Traffic modeling in the ATM-TN TeleSim project: Design, implementation, and performance evaluation, Proceedings of the 1995 Summer Computer Simulation Conference, (Ottawa, Ontario) July 1995.
[McGregor98] McGregor A.J., Pearson M.W. and Cleary J. The Effect of Multiplexing HTTP Connections Over Asymmetric High Bandwidth-Delay Product Circuits SPIE Conference on Routing in the Internet, Boston Massachusetts p398-409. Nov 1988