User login

Weekly Report - 19 January 2018




This week I looked into why the timing results that I collected vary by a large amount for sequential executions of the experiment. After analysing packet traces, I found that TCPDUMP wasn't showing all packets on the primary logger (logger on the primary path directly connected to failed link). When a link is taken down TCPDUMP will crash silently. I was getting packet information by piping TCPDUMPs output to a file. TCPDUMP uses a buffer when it outputs packet information. The buffer doesn't get fully processed when the app exits from a link going down. This was causing fewer packets to be displayed on the output and thus making the results vary by a large margin, depending on the state of the buffer and how many packets were captured. We can fix this problem by running TCPDUMP with the --immediate-mode flag, which disabled the output buffering.

Next, I looked at mitigating the effect of the location of the loggers in the network on recovery time. Because I was using packet timestamps, the location of the logger may affect the recovery time calculation based on the size of the recovery or detour path. I fixed this issue by re-implementing the loggers in libtrace and creating a recovery time calculation app with libtrace. This app will take two packet traces and use the pktgen timestamps to calculate the recovery time. I am also getting packets lost by looking at the pktgen sequence numbers between the traces on the two loggers (last packet on primary logger and first packet on secondary logger). Using the pktgen fields also allows us to place the two loggers on separate switches. Previously we had to have them on the same switch to account for clock differences in the virtual switches.

After these modifications, I have recollected the recovery time stats on my lab machine. I am currently in the process of finishing off the cleaning and re-graphing of these stats.