Skip to content

Latest commit

 

History

History
254 lines (159 loc) · 23.6 KB

README.md

File metadata and controls

254 lines (159 loc) · 23.6 KB

[This report is submitted as the README of our public repository on Github.]

Measuring the Initial Congestion Window of Popular Webservers

*Serhat Arslan (sarslan@stanford.edu), Catalin Voss (catalin@cs.stanford.edu)*

In this repository, we attempt to reproduce measurements of the initial congestion window (ICW) size used by popular web servers. We aim to replicate the size measurements presented in the 2001 paper "On Inferring TCP Behavior" by Jitendra Padhye and Sally Floyd on the modern Internet.

Installation and Reproduction Steps

The results are designed to be reproduced on a machine running Ubuntu 14.04. Other platforms (i.e. Windows, macOS) may run the tester, but this requires ensuring that the operating system or the hypervisor (if running on a VM) does not change or interrupt the packet communication generated by our tester. (This may require configuration change on local firewall rules or hypervisor settings. The default configuration uses iptables to reproduce the results on a non-VM Ubuntu machine.

  1. Get a copy of the code

    git clone https://github.com/serhatarslan-hub/cs244-ICW_testing.git
    
  2. Install the python dependencies and make sure they're accessible to the root user:

    cd cs244-ICW_testing
    sudo pip install -r requirements.txt
    
  3. Reproduce the results with a modern list of URLS (this will take some time):

    sudo python run_icw_test.py --mss 64 --url_list urls/QuantcastPopularURLs.txt
    

    Please note that we are using the list of most popular 20 thousand URLS provided by Quantcast.

To estimate the initial congestion window of an IP or URL of your choice YOUR_IP and a specific page of your choice PATH_TO_PAGE, simply run:

sudo python run_icw_test.py --host YOUR_IP --rqst_page PATH_TO_PAGE

To perform the tests in a more realistic environment, set the --mss value to 1460 and specify a large object on the webserver in --rqst_page to fill multiple packets of this size.

Our tests request for a non-existing very long page from every URL. This is implemented to make sure response content is long enough to fill the ICW as suggested by [Rüth et al. '17]. To request for the main page of the URL, simply pass --main_page argument while running the tester, e.g.

sudo python run_icw_test.py --mss 64 --url_list urls/QuantasPopularURLs.txt --main_page

For more options, see:

python run_icw_test.py --help

Brief Description

Padhye & Floyd published their original ICW test in 2001 in an attempt to survey the general behavior of TCP implementations of the Internet at the time. Historically, TCP implementations differed widely, so it remains as important today as it was in 2001 to get a picture on the diversity of parameters used on the Internet to ensure fairness and stability of the network. Understanding the big picture of the Internet can then help in determining standards, simulations, and new applications.

Padhye & Floyd's initial ICW test was presented with the release of the TCP Behavior Inference Tool (TBIT), which performs six different tests on publicly available web servers to understand their TCP behavior. The tests covered the ICW value, the choice of congestion control algorithm (CCA), conformant congestion control (CCC), implementation of selective acknowledgements (SACK), TCP timeout duration, and responses to ECN. In this report, we only attempt to reproduce the ICW results for popular web servers.

The congestion window size is one of two metrics that determine how much data can be sent before any acknowledgement of arrival is received. Popular congestion control algorithms start from a relatively small value of congestion window size and slowly increase until congestion is perceived. The ICW determines how much data to be sent without any feedback on current congestion on the network. As a consequence, the choice of ICW is a trade-off between under-utilizing the available capacity and putting too much stress on the network.

When the original paper was published, IETF suggested RFC 2414 to determine the ICW size according to the following calculation:

If (MSS <= 1095 bytes)  
    then win <= 4 * MSS;  
If (1095 bytes < MSS < 2190 bytes)  
    then win <= 4380;  
If (2190 bytes <= MSS)  
    then win <= 2 * MSS;

In 2002, RFC 3390 "Increasing TCP's Initial Window" set a standard for determining ICW for implementors of TCP congestion control. The ICW was to be set as

min (4*MSS, max (2*MSS, 4380 bytes))

In 2013, RFC 6928 proposed to increase ICW size even more and presented the following formula:

min (10*MSS, max (2*MSS, 14600)) 

where MSS is the maximum segment size (in bytes) which set by the negotiation of the two end-hosts during the TCP handshake. (Both endpoints would send their choice of the MSS and the smaller one was to be chosen.) We discuss our evaluation of this function and its effect on the results of our ICW tests below.

The ICW Test

The ICW test proposed by Padhye & Floyd proceeds as follows:

  • The tester performs a simple TCP handshake with the end host and specifies a small MSS value.
  • The tester sends a packet with an HTTP GET request.
  • Following the request, the tester sends no additional acknowledgements for arriving packets. This results in a timeout on the web server after ICW * MSS amount of data has been sent.
  • Once a timeout is detected by the server but after filling the initial congestion window, it will begin retransmitting the previously sent packets.
  • The tester then interprets this retransmission as the conclusion of the ICW and infers the ICW size as the total number of bytes received up to this point divided by the MSS.

In order to infer the ICW size, we need to ensure that the web server will send at least ICW * MSS amount of data. Finding a large file on every web server is difficult so Padhye & Floyd attempt to solve this problem by keeping the MSS small (the paper uses a value of 100 bytes). This is problematic because the most common MSS size on today's Internet is 1460 bytes. We discuss this consideration below.

Reproduction Philosophy

Our code aims to reproduce only the initial congestion window size measurements of Padhye & Floyd by reproducing Tables 2 and 3 in the original paper. To make our results comparable, we aimed to base our reproduction as closely as possible on the written description in Padhye & Floyd. However, some small modifications were necessary to make this reproduction possible. We then address issues with this reproduction on the modern Internet, specifically the issue of an artificially small MSS and describe experiments for reproducing the test with a larger MSS on the modern Internet.

We did not use TBIT (the tool that authors used for their tests) during our reproduction due to compatibility issues. TBIT was implemented for the BSD operating system 19 years ago (source code available at www.aciri.org/tbit/). Although a patch for Linux compatibility was published in 2004, there remain several compatibility issues with modern Linux distributions. We implemented our own initial congestion window size tester in Python 2.7 using the Scapy packet manipulation module. We chose Scapy for its simplicity and flexibility for packet by packet content manipulation. We discuss complications with this choice below.

During our preliminary tests, we realized that relatively large group of the web servers have adopted an ICW size of >> 5 MSS sized packets. In 2001, this was a rare occurrence, so Padhye & Floyd only list sizes up to 5. We extended the table 3 presented on the paper and with commonly encountered ICW configurations.

Although we did not directly use TBIT, the testing methodology of our implementation follows the descriptions given in the Padhye & Floyd paper step by step. The categorization of web servers follows that of the paper exactly:

  1. "If at least three tests return results, and all the results are the same, the server is added to category 1. We have the highest confidence in these results, as they have been shown to be repeatable. We report summary results only for servers belonging to this category."

  2. "If at least three tests return results, but not all the results are the same, the server is added to category 2. The differing results could be due to several factors, such as confusing packet drop patterns (as discussed in Section 2), which are further discussed in Section 5. We would like to minimize the number of servers that fall in this category."

  3. "If one or two tests return results, and all the results are the same, the server is added to category 3. Further tests are needed to categorize the TCP behavior of this server."

  4. "If one or two tests return results, and not all the results are the same, the server is added to category 4. We would like to minimize the number of servers that fall in this category as well."

  5. "If none of the five tests returned a result, this server was added to category 5. These servers need to be investigated further."

Error cases are implemented as described in the paper with small exceptions described the following sub-sections.

A practically large MSS?

While Padhye & Floyd's TBIT tool error'ed out whenever the server responded with a MSS bigger than advertised, we found this impractical for an MSS of 100 bytes as layed out in the paper. Many modern web servers will ignore requests for such a small MSS and when our tool is run in a commercial cloud environment such as Google Cloud, we found that networking infrastructure as well as virtualization platforms along the way often enforce a larger MSS. We thus don't penalize a server for returning data packets greater than our requested size and simply compute ICW as total_payload_size / MSS.

We measure ICW consistently with the unit definition of cwnd, in bytes. As long as the ICW is filled up, the total amount of payload sent will be equal to cwnd. If packet loss occurs along the way, the received payload size may be smaller than what is sent, but our implementation follows the sequence numbers of the received packets and terminates the test when a loss is detected. We also detect whether the response has finished before filling up the cwnd by catching any FIN, RST packets and/or retransmissions. In order to make sure FIN packets are sent at the end of responses, we use the Connection: Close attribute in our GET requests.

However, as we alluded to, it is reasonable to ask whether running such a test with an MSS of 64 or 100 can reasonably return meaningful results on the modern internet, when common MSS values are far from this. To begin to address this question, we perform a test with 5 different MSS values ranging from 64 to 1460 on a known large file. We discuss more practical MSS further below. We acknowledge that our tests with a limited number of MSS trials will not reveal the complete function that is used by the server to calculate its ICW.

Finding large objects

Padhye & Floyd request the main pages of the URLs during their ICW tests. Then they state the risk of not maxing out the ICW with the content of the main page. As a solution to this risk, we follow a method similar to that of [Rüth et al. '17] for ensuring that the response to our GET request maxes out MSS*ICW bytes. We simply make up a fairly large request URL, i.e.

www.stanford.edu/AAAAAaaaaaBBBBBbbbbb...

This almost always ensures either a 301 Moved Permanently or a 404 Not Found error. In both cases, the response typically contains the initial URL (and more), pushing us past the required window.

Although the large URL trick doesn't guarantee a large response, during our preliminary tests we realized that most of the websites had relatively small main page content. Our tool can be re-run to request the content of specific large objects.

Results

The categorization results of the tests are shown below as the reproduction of table 2 in the original paper:

Table 2: ICW: Server categories
Category Servers
1 3513
2 2721
3 91
4 10
5 12515
Total 18850

We have obtained about 3,500 successful results which help us to see the general distribution of ICW size throughout the Internet. The number of category 2 URLs is about 3,000, comparable with the number of category 1 URLs. Category 2 URLs denote that although we could get successful results from the URL, not all of the tests returned the same result. Packet loss or HTTP timeouts may cause this behavior. More investigation into those URLs is required.

The distribution of ICW sizes as a multiple of MSS is presented in table 3 below. To see the ICW distribution more clearly, we have added rows for common ICW sizes of 8, 10, 16, and 32.

Table 3: ICW: Summary results
ICW size Servers
1 0
2 4
3 10
4 6
5 or more 3493
Total 3513
Special ICW Sizes Servers
8 17
10 1119
16 947
32 5

The minimum observed ICW value was 2, the max was 37. Both the median and average value was 16.

Experiments with a Larger MSS

Below we present small-scale experiments for a single Stanford webserver.

MSS size ICW Values
64 5
128 4, 5
512 4, 5
1000 4, 5
1460 3, 4

Discussion

About two thirds our URL tests failed due to HTTP timeouts, packet losses during transmission, and receiving FIN or RST packets before any retransmission. Especially receiving FIN or RST packets was the most common way of failing a test. The main reason for this issue is because we were sending a request for a non-existing page with a very long name, and some web-sites did not include the name of the page in their responses, caising the responses to be shorter then the ICW size. We did not find that requesting URLs for the main page was a good workaround, so solving this issue likely requires using a scraper to identify large object on the webpages.

Comparing to Results of [Padhye & Floyd '01]

Our reproduction reesults differs from the original paper significantly. Most web-servers used to choose an ICW size of 2 packets in 2001 experiments. However, as of 2019, common values for ICW sizes seem to be much larger. Namely ICW sizes of 10 and 16 appear most popular.

When the experiments of [Padhye & Floyd '01] were conducted, the current RFC would expect 4 MSS sized packets. However, the authors observed mostly ICWs of 2 MSS sized packets. This would be expected only when MSS is set to a large value greater than 2190 bytes. Interestingly, a realistic experiment with MSS of 1460 bytes would return an ICW of 3 packets according to the RFC 2414 (1992), but an ICW of 3 was the least common value for their experiments. Those result could be interpreted as operating systems at the time manually setting their ICW for their own benefit or because of misconfiguration.

By contrast, we observe many ICW values much larger than suggested in the most recent RFC.

Results of Contemporary ICW Tests

The most recent RFC 6928 (2013) proposes to use an ICW of 10*MSS bytes for MSS values that are less than or equal to 1460 bytes (see formula above). When we run our tests with an MSS of 64 or 100, thus, we should get an ICW of ≤ 10*MSS. In fact, both realistic experiments (MSS = 1460) and our small tests (MSS = 64) were expected return 10 for ICW in terms of packets. In our test, only 32% of tests adhered to that standard. The rest of the URLs used larger ICW values, likely motivated by a desire to shorten the time-of-completion for large content transmissions. Our result is consistent with what [Padhye & Floyd '01] have discovered in the sense that most of the web servers even in today's Internet tend to not obey what is standardized by IETF, but rather use their choice of ICW which would improve their performance they provide to their users. However, the pendulum has swung into the other direction this time around. The popularity of ICW sizes such as 16 and 32 shows that the liberal spirit of Internet motivates developers to deploy off-standard solutions to get tailored improvement for their applications. Although such freedom can be practiced throughout the Internet, one should consider the fact that larger ICW sizes can only be useful when the networks drop packets in a very rare fashion. This assumption may hold today empirically, which would motivate this approach.

With a single, somewhat out of date webserver, we did not see drastically different ICW behavior for different MSS values. It is possible that with a more practical MSS value, we would see even more egregious violations of the RFC-recommended ICW. However, we believe it is unlikely that webservers that are "poorly-behaved" on small a MSS would suddenly become "well-behaved" on a larger MSS.

We caution that our experiments do not fully reveal the formula used by the web servers we tested. The complete formula given by the RFC 6928 (2013) is in the form of min (X*MSS, max (Y*MSS, 14600)) where X=10 and Y = 2 < X. Since we are using MSS of 64 bytes, what we are measuring is the value of X assuming that the web server uses the same form of ICW formula. Even if our assumption is true, we still need to measure the value of Y to see the complete formula. Additionally, a value other than 14600 may be implemented in the formula.

To attempt to reveal all of the constants in the function, assuming the fixed form, we could test each URL with different MSS values ranging from 64 to at least 7300 bytes and calculate the transmitted size of data. However, this would require large data to be transmitted that fills the ICW independently from the MSS value. If we assume that most today's servers use an ICW formula of the form presented above with 14600 as the fixed constant, measuring X turns out to be enough for estimating the ICW size of realistic connections which would be bounded by X regardless due to the common practice of 1460 bytes of MSS.

Complications and Limitations

  • Choice of URLs. One of the main issues with reproducing the results of [Padhye & Floyd '01] was using the same URLs that were tested two decades ago. We first attempted to use the original URL list posted on the TBIT website. However, as one would expect, most the URLs on this list would return DNS errors, since they no longer exist. In order to prevent this, we decided to use the list of most popular websites in the United States provided by Quantcast. We capped the list to the 20,000 most popular websites to make sure our tests return in feasible amount of time. Every URL was tested 5 times, adding up to 100,000 thousand tests. One limitation with using modern URLs is that many 21st century webservers now enforce HTTPS and some will drop HTTP connections entirely.

  • Use of Scapy. In using Scapy for our replication, we found that the tool can be slow at setting up sniffers, especially when running in virtualized environments. The default packet-receiving function sniff() requires us to use the slower lfilter() method for filtering packets in virtualized envirionments. With that setting, we consistently observed Scapy missing the first few packets before its sniffer was provisioned right after sending our GET request. We especially observed this case when we worked on a VM provisioned in an extremely high performance data center and queried URLs like www.google.com, likely only a few fiber hops away in the same datacenter (we tested this on Google Cloud). To circumvent this issue, we had to set up the sniffer asynchronously and ensure it was set up by the time we sent our first GET packet out.

  • OS port blocking. When using a tool like Scapy to send out raw L3 packets without sockets, the OS kernel closes any incoming connection with a RST packet before we even have a chance to process it. To avoid this, we had to set up a firewall rule before we start sending out any packets. We went with a blanket rule to avoid sending any RSTs on the source port: iptables -D OUTPUT -p tcp --sport %d --tcp-flags RST RST -j DROP where %d is our current evaluation port (unique for each URL and trial). After each test, we revert the firewall rule and close the connection by manually sending RST packet with Scapy. Please note that the provided firewall rules are for Linux operating system. Using other operating systems for running our implementation may still encounter the given port blocking problem.

  • Default Hypervisor Configurations. During our initial experiments, we realized that our hypervisor (Oracle VM Virtualbox) changed the MSS option in the outgoing packets to 1460 bytes even when we manually set it to different values. This was mainly because of the default behavior of the hypervisor itself as reported in this bug report. Since this issue prevents obtaining the desired behavior from the web servers, the following steps may be helpful to overcome the problem:

    1. Use the bridged networking option for Virtualbox. (Go to Machine > Settings > Network > Adapter 1 and set it to "Bridged")

    2. Set DHCP for the connected interface statically and steal the IP information of the host interface to connect the VM interface to the Internet. Namely, edit /etc/network/interfaces on the VM.

      iface eth1 inet static  
      	address [host ip]  
              gateway [host gateway]  
              broadcast [host broadcast]  
              dns-nameservers 8.8.8.8
      

      You may need to run sudo ifdown eth1 and sudo ifup eth1 after this or reboot the VM.

  • Batched Execution. Scapy's way of handling connections involves opening up a new file for every connection to write information about the incoming packets. However, in some versions and on some OS's, Scapy fails to close those files before the python process finishes and leaks file pointers. Since our tester opens many connections in one run, we encountered [Errno 24] Too many open files errors after some number of connections. This error prevented us to collect information for new connections. As a workaround, we provide a way to run the tests in batches. We have run the tests in 15 batches each with 1257 URLs to test. The batched execution steps are aggregated detailed in the reproduce_results_batched.sh script.

  • Using a smaller MSS for higher fidelity. Although the original paper uses MSS of 100 bytes, we concluded that MSS of 64 bytes would be safer for ICW filling. Our decision was confirmed when we compared the two sizes in our preliminary experiments as more URLs were categorized as Category 1.