Department of Computer Science
University of Waikato
Hamilton, New Zealand
Currently the most accurate bandwidth measurement techniques are to directly measure the fastest rate that traffic can be sent through a network. Wide-scale deployment of these `heavy-weight' bandwidth tests can overwhelm the network with test traffic. Accurate measurement of bandwidth is difficult if simple large data volume techniques are not used but there is some current research in this area. We describe existing techniques that attempt to estimate the bandwidth of both links and paths while attempting to use as small a quantity of data as possible. These techniques must operate from only the end points of a connection, and must not require specialist software be deployed into the core of the network. We explain the theory behind two popular bandwidth estimation techniques, single packet and packet pair techniques. The discussion includes the goals of the techniques and the problems each encounters. We also describe some of the improvements that have been suggested.
Finally, we explain the differences between current techniques and the goals we are trying to achieve and give an early proposal describing the areas we plan to develop further.
With the increase in use of the Internet, more people are finding themselves dependent on it. Just as happened with the telephone system early last century, business and people are finding that the requirement of Internet for communication and gathering of information is something they cannot operate without. Increasingly more and more business models are based solely around the Internet.
However with this growth of dependency and use of the Internet, more and more demands are being placed on the performance of the network. Users require that consistent monitoring of the performance is carried out, in order to both detect faults quickly and predict and provision for the growth of the network.
Measuring the Internet is difficult, some of the reasons for this are described below. Not all ISP's are forthcoming about details of the loading and performance of their network. Even with the support of the ISP, the complexity of the network means that normally multiple providers are involved in the end-to-end connection between hosts. This situation makes the monitoring of end-to-end performance by any one ISP nearly impossible.
The inability for users to be confident in the performance in the network is causing a demand for tools to enable users to asses the performance without assistance. These tools need to quickly and easily measure the end-to-end performance of the network, while not placing anymore load on the network than is absolutely necessary. Extra load may restrict the times that the measurement may be made, and depending on the charging system, could create large extra traffic charges.
One possible way to meet this need would be the deployment of special software or hardware on each router in the network. A solution such as this, however, is just not practical. The cost, time and security problems with this outweigh the gains from this type of instrumentation. The cost of this solution is involved in the man hours spent upgrading software on all of the routers in the network, the charges for this software by the vendor, and the price to upgrade older routers that are unable to run this software. This sort of upgrade is also not going to be instantaneous. The time required for upgrading the software on every router in the entire network would be huge. This would leave a substantial time where there are inconsistencies in the network when it may be possible to measure some of the paths, and not others.
An alternative approach is to use end-to-end software run on the end hosts. This allows the measurement to be run at the users discretion and allows for simple deployment. However this approach requires the software to infer the characteristics of the links involved without being able to directly measure each link individually.
This paper explains and discusses the current research and techniques used towards solving this demand. Also discussed is our motivation to become involved in this area and an early proposal of the approach we plan to take.
The rest of this paper is organised as follows. Section II discusses and explains the multiple definitions different groups apply to common terms in this area. For clarity, the definition of the terms we will use are given. Section III presents our interest in this area. Sections IV through to VII describe the techniques already proposed and some example implementations. Section VIII explains the techniques we will be investigating to meet our objectives. We conclude in section IX.
Like most specialist research areas, bandwidth measurement and estimation has many specific terms that need to be explained. Unfortunately many of these terms are used differently by different authors. This section clarifies the main terms, and defines how we will use them in this paper.
We begin with the names for the components of a network. Hosts are the end points from which a packet either originates from or is destined to. A router is a machine with two or more network connections that forwards packets from one connection to another that will get the packet closer to its destination. A link refers to a single connection between routers or routers and hosts. A path is the collection of links, joined by routers, that carries the packets from the source to the destination host. Two paths are different if any intermediate router is different.
Figure 1 shows a link between two routers. Link latency is the time it takes from the time the first byte of a packet is placed on the medium until the time that the first byte is taken from the medium. This delay is caused by the rate at which signals are propagated in the link (e.g. electrons in a cable) and distance of the medium.
The link bandwidth is the rate at which bits can be inserted into the medium. The faster the bandwidth the more bits can be placed on the medium in a given time frame.
An important thing to note about computer network routers is that they are normally store-and-forward routers. This means that every byte of the packet must be received from the link and placed into a buffer in the router before the router will start to send it out on the destination link. If packets arrive at a router faster than they can be sent out the appropriate output port a packet queue will form for this port. The queue discipline used is almost always a FIFO queue.
In the next sections we go into more detail about the different variations of the terms latency and bandwidth
There are many separate latencies in a network system. Each of these values are used throughout this paper so we describe them all here.
The transmission delay is the time it takes a packet to be placed on the medium. This time is proportional to the packet size and the bandwidth of a link. It is the time from the time the first byte is placed on the network until the time the last byte has been sent.
The transmission time is considered to be the combination of link latency and transmission delay. The transmission time is the time between the first byte being placed on the medium and the last byte being taken off. This is the sum of the link latency and the transmission delay.
The path latency is the sum of all of the individual transmission times as well as the queueing time inside the routers. This is the time that it takes from the sender issuing the packet until the destination receiving it. Path latency is often referred to as the one-way delay .
Path latency is hard to measure as it requires a distributed synchronised clock to timestamp the time the packet was sent and the time the packet was received. The clocks need to be synchronised as the sending time will be timestamped from the senders clock and the reception timestamp from the destination machines clock.
A more common measurement is the round trip time (RTT) latency. This is the sum of the path latency in the forward and reverse directions, and can be measured easily by timing the sending of a packet, have the destination machine respond to the packet immediately and the original sender timestamp the return of this packet.
Unfortunately as the forward and reverse paths and path latencies may differ, the RTT latency cannot differentiate between the forward and reverse delay.
As with latency, there are many variants of the term bandwidth. This section will discuss these.
Unlike latencies, link bandwidths do not sum to result in the path bandwidth. The path bandwidth is defined by the minimum of the link bandwidths, as this is the fastest any traffic can make it through the path. The path bandwidth is also known as the path capacity.
The bandwidth of a path is shared by the traffic under consideration and other traffic. This reduces the amount of bandwidth available to the hosts. This other traffic is referred to as cross traffic.
Available bandwidth is the amount of bandwidth ``left over'' after the cross traffic. The link with the lowest available bandwidth will not necessary be the link with the lowest capacity.
Dovrolis et al.  refer to the link that restricts the paths capacity as the narrow link and the link that restricts the available bandwidth the tight link .
A large amount of money and time is currently being spent implementing high speed, next generation networks. These networks are being constructed in-order to support the large growth in the Internet, as well as enabling more high bandwidth services to run over the network to more people. There is an increasing demand to know if the performance obtained from these networks is what is expected from them. The performance of a network is a complicated issue with many variables effecting different traffic in different ways. While poor values for some performance metrics may only have a strong effect on the performance of a small number of applications, there are a few metrics which are almost universally significant for all types of traffic. Bandwidth and latency are the two most commonly quoted of these performance metrics.
RTT measurements are the simplest of the metrics to measure. They are easy to make suitably accurate without any specialist hardware, and place a light load on the network being measured. Because of this there are several research projects that take regular frequent RTT measurements between a large and diverse range of hosts on the Internet().
One such project is NLANR's  Active Measurement Project (AMP). As of February 2001 AMP consisted of approximately 120 machines, connected to high speed research networks (such as the vBNS and Abilene) placed throughout the USA. Each machine takes regular RTT measurements to every other machine, resulting in measurements of approximately 14,000 paths.
These results are collected and made available on the Web in real-time, enabling network operators and users of the network to view, not only the current state of RTT performance for any path, but also the history of the RTT measurements back as far as six months.
The AMP group wish to extend the information they provide to their users by including regular available bandwidth measurements for all paths.
The concern is that if the measurements of available bandwidth themselves create a large quantity of traffic then this will effect not only the users of the network, but also the results of the bandwidth and RTT measurements from the other machines.
Calculations must also take into account the latency, which varies for each link. As discussed in §II and figure 1, latency is not dependent on the packet size or the bandwidth of that link, but the time that the signal takes to travel down the path. The transmission time of a packet is determined by the packet size (), the bandwidth of the link () plus a fixed latency () value.
If the time and packet size are known equation 1 can be rearranged to give the bandwidth. As latency is fixed for a particular link, latency can be considered as a fixed offset. When the transmission time for multiple, varied sized probe packets are taken, a graph such as figure 2 can be produced. Each probe packet is plotted using packet size vs transmission time. The bandwidth can be calculated from this graph by performing a linear regression to find the slope of the points, and the inverse of this value is the estimated bandwidth.
There are addition problems when conducting these measurements in the real world. The issue of measuring the time it takes a packet to cross each link in the network path separately is the first hurdle.
To avoid the need for special router instrumentation, single packet implementations take advantage of the IP time to live (TTL) field. This field is decremented by one by each router the packet passes through. Once the TTL is decremented to zero, the packet is discarded and the router will send an ICMP TTL expired error message to the original packet source. By setting the TTL to expire at the router at the end of the link to be measured, the sender can record the RTT of the packet to the end of the link and back by recording the sent time of the packet and the return time of the ICMP error message.
An example of a measurement can be seen in figure 3. In this example, host A is measuring the path to host B. More specifically this figure depicts the measurement of the second link of the path to B. For the sake of clarity we have omitted to show either the measurement of the first link, or any of the successive links after this one.
The TTL for the measurement packet is set to 2 as it leaves the machine. The TTL is decremented to 1 in the first router, and 0 in the second router. This router then generates an ICMP error message to be returned to host A. The error message may or may not follow the same path to return to host A and for this reason is displayed as a dashed line. The effect of the return path differing from the forward path will be discussed later in this section.
Although the use of the TTL field allows measurements to be made from the end points without special software deployed in each router in the path, it also causes a number of problems. First there is no way of considering any link (with the exception of the first link in the path) independently. The measurements reported for all but the first link must also include all the effects of the previous links.
This problem is solved by representing a link as a sum of all the results for the previous links, plus the component of this link. The latency for a single link was given in equation 1.
Summing the links we derive:
This means that the latency of the link being measured can be calculated by subtracting the latency measured on the previous links. The bandwidth of the link is calculated by subtracting the slope of the previous links from the slope of this link and taking the inverse of this value.
This process has the disadvantage that errors accumulate over each links measurements. This problem results in a limit to the accuracy that can be expected for measurements of distant links in long hop count paths.
However, this is only one of many more problems that single packet techniques encounter. Possibly the most significant of these problems is the effect of other traffic on the link (cross traffic). If a packet experiences delays, due to cross traffic on the link, then the estimation of time will be affected proportionally to the volume of this traffic.
Jacobson  addresses this issue by assuming that cross traffic can only ever increase a delay seen by a packet. If enough packets are sent eventually one should get at least one through in the minimum time. These packets are said to have the shortest observed round trip time (SORTT) and is discussed by Downey in . The graph shown earlier in figure 2 could now be considered to be just plotting the SORTT values, and ignoring the delayed packets that would appear above each measurement. A graph showing all measurement packets is discussed later in the paper, and is shown in figure 6.
More packets are required to discover the SORTT for more distant links. As the results from links further away from the source are combined to the effects of all of the previous links a packet must experience no queueing delay through every node, not just the node being measured.
While single packet techniques have many difficulties there are a number of common problems in the area of network measurement that do not strongly affect the bandwidth estimate results. Asymmetric routing is one such issue.
The term asymmetric routing refers to when the path too and from a node is different and, because of this, the delay in each direction can be different. This can cause problems for many types of active and passive measurement analysis. In the case of single packet bandwidth estimation techniques asymmetric routing causes few problems. The reason for this is while the ICMP error packet that we need for timing may come back via a different path, the error packet is of fixed size for all sized measurement packets. Assuming the route doesn't change, the time it takes through the return path will, therefore, be fixed for all packet sizes. However, we cannot distinguish between added packet delay due to congestion on the forward and reverse paths. This means that the packet must experience zero delay through all routers on both the forward and reverse paths. This means that we need to send more packets to discover the SORTT.
Asymmetric routing will however cause problems with the latency estimation described earlier. As the return path may change speed for different hops, the subtraction of previous delay times will no longer hold true. This will result in an incorrect estimation of the latency added on by this link. An example of this is shown in figure 4. In this measurement, packets returned from router 3 travel over a low latency link when compared to the links that measurement 2 has to travel over. Using the example latencies shown in the figure, the measurement for link two will be the combined latency of the paths, summing to in this case. The measurement for link 3 however will be carried out over 3 paths, and back over 2 paths and one path. This gives a sum of RTT. The estimated RTT latency for this link will then be , . This compared to the correct value of the link ( RTT) latency.
Another issue causing problems with the latency estimation is caused by the time taken by a router to generate the ICMP error message. ICMP can be used as a form of denial of service attack on a router. To reduce this risk many routers place a very low priority on the creation of ICMP packets to avoid overloading the routers CPU. This means that the latency observed will include this extra time that would not be seen by a packet traveling through a router. This will not affect the bandwidth estimation, however, as long as the router is consistent on the time introduced for all packet sizes. If the router handles packets of different sizes differently this would introduce errors into the slope calculation used to calculate bandwidth.
The combination of these problems means that this technique doesn't scale to higher hop counts. The amount of traffic required to accurately find the SORTT also reduces the usefulness of this method on busy paths.
The noise introduced into the measurement by cross traffic, and the other problems discussed here, is often larger than the difference between the transmission of the smallest packet (40 bytes) and the largest packet (normally 1500 bytes). Downey  discusses measurements where the difference between the largest and smallest packet size measurements is approximately while the largest outlier from the fitted slope is . The fixed latency offset for a good link would be of the order of , while this could climb easily to for long haul links. Intercontinental links, such as New Zealand to the USA can even reach a RTT of or greater.
There have been a number of example implementations of single packet techniques. The most common implementation is Jacobson's pathchar . Two independent clones of Jacobson's pathchar tool are clink  and pchar . Each of these use slightly different algorithms on how to decide when the SORTT has been discovered, but each uses the same basic algorithm described.
Packet pair techniques are often referred to as packet dispersion techniques. This name is perhaps more descriptive. A packet experiences a serialisation delay across each link due to the bandwidth of the link. Packet pair techniques send two identically sized packets back-to-back, and measure the difference in the time between the packets when they arrive at the destination.
For simplicity, we will initially ignore cross traffic, and discuss the effects it has later in this section. Figure 5, modified from , shows packet pair measurements diagrammatically. Shown are 3 links of a path from source S to destination D, and two packets, shown once in each link. Link L1 and L3 have twice the capacity of L2; L2 is the capacity limiting link in the path. The first packet arrives at the router between L1 and L2 and is forwarded out on L2 without queueing delay as there is no other traffic present. L2 has a lower speed than L1 so the second packet arrives while the first packet is still being sent out and is queued. As soon as the first packet has been transmitted down L2 the second packet begins to be sent. As soon as the first packet is received by the router between L2 and L3 the router can forward it out on L3. The first packet will have finished being sent before the second packet is fully received by the second router as L3 is faster than L2. As soon as the router does finish receiving the second packet it will forward it on. As the spacing can only be changed by a slower link, and we have defined L2 to be the capacity limiting (slowest) link, the spacing will remain the same through to the destination machine.
The spacing is equal to the time that the router at the end of the limiting link spent receiving the second packet after the first one was received. This because the first packet was sent as soon as it was fully received and then the router must wait until the second packet has fully arrived before it can be sent on. This value is actually equal to the transmission delay. The transmission delay () proportional to the packet size () and the capacity of the link ().
Note the difference between equation 1 and 3. Single packet techniques have to take into account the latency of the link in the calculations. Packet pair techniques do not need to estimate the latency of the link as it will be the same for both packets (in the absence of cross traffic), canceling it out.
Cross traffic effects are the most obvious and serious problem to affect packet pair measurements. If cross traffic delays the first packet it will compress the spacing between the packets and the bandwidth estimation will be high. This is referred to as time or probe compression. If cross traffic arrives in the queue between the first and second packet the spacing will be expanded resulting in underestimation of the bandwidth.
All recent research into packet pair techniques has focused on filtering out the compressed or extended measurements to provide the closest estimation to the capacity of the path.
Lai et al.  and Carter et al.  both propose statistical methods to filtering the results to estimate the bandwidth. Both of these approaches assume that as the compression or extension values will be random, then the actual bandwidth should appear as the most common measurement.
However current research from  suggests that this may not be the case, and that there may be other, more common values than the actual bandwidth. This result is the focus of current research into packet pair techniques.
Lai and Baker follow up  with a paper describing their design and implementation of a measurement tool . This technique, which they call Packet Tailgating, can be considered a combination of both single and packet pair techniques.
This technique aims to discover both link and path capacities. The derivation of the formulas that form the basis of tailgating are discussed in detail in , and are outside of the scope of this paper.
The project aimed to obtain the link capacity measurements, without some of the limiting factors that are present in single packet techniques. Issues such as asymmetric routing, routers delaying the return of ICMP error messages, rate limiting or firewalling of ICMP packets and effects added on during the return path are some of the problems tailgating avoids.
The basic measurement technique is as follows. If two packets are back to back, but unlike packet pair techniques, the first packet is much larger than the second, the second packet will keep ``catching up'' to the first packet. This is because the transmission delay of the first packet is larger than the second. This means that the smaller packet is ready to be sent instantly from the router once the first packet has been sent. If the larger first packet is set to expire (using the same TTL method discussed in §V) at the link being measured, the second smaller packet will be ``released'' from the delays cause by being behind the bigger packet.
The latency experienced by the smaller packet will change if the link that the larger packet is set to expire changes. The link bandwidth is inferred by inspecting this change in latency. More details on this process are contained in .
The two major techniques described in §V and §VI have both focused on finding the actual bandwidths of paths and links. These values are likely to be interesting to administrators who wish to discover the topology of the network, or wish to investigate if the bandwidth allocated to them is what they are paying for. However these values are probably not going to be so interesting to the end user. They simply want to know ``How fast will my connection go ?''
As stated in §III-A AMP is a project that focuses on giving users of the network information about the current performance of the network. As such the AMP group is interested in, and the focus of our proposed research is on, estimating the available bandwidth and less on link or path capacities.
This is the direction that this project is aims to investigate. Another major difference between our goals and the objectives of the current research is that current proposed techniques aim to give results on any network without any prior knowledge. This allows the tools to be general purpose. AMP wishes to deploy the tool on its current research infrastructure. Both a recent and distant history of the performance of the network as measured by the current tools will therefore be available.
A primary goal is to investigate if these measurements can aid and support the bandwidth estimation. For example, the AMP measurements give information about the long term stability of the RTT measurements This stability of the paths may be useful in the bandwidth estimation process.
It is also likely that the bandwidth estimation will be running consistently, allowing it to constantly change and adapt to the network as it changes, instead of the small snapshot the current methods take while the program is collecting data. Only this data is used to calculate their results.
Both single and packet pair techniques use many of probes to find a small number of probes that they are interested in, and discard the rest. For example figure 6 is a graph of RTT vs packet size for a single link measurement taken by Downey in . All of the packets above the obvious SORTT baseline are discarded in the bandwidth estimation. We propose to investigate the relationship the distribution of the extra data has to the available bandwidth of the path. Similar methods are proposed for packet pair techniques.
As the demand for performance on the Internet grows, so does the requirement for tools to accurately measure performance. This growing demand also means that solutions that place a large load on the network will not scale. This is creating a need for tools that can accurately estimate various types of bandwidths and do so without creating large volumes of traffic.
In this paper we have presented a summary of the current techniques of bandwidth estimation. We have compared single packet techniques with packet pair models. Also explained are the areas that these methods do not provide suitable accuracy. We have also presented existing suggestions on how to improve these techniques.
Finally we have outlined the areas of research that this project will concentrate on and the possible differences our techniques will have to existing methods.
This work is funded in part by NSF Cooperative Agreement No. ANI-9807479.
Last updated: Friday, 01-Jun-2001 17:26:43 NZST. The Webmaster.