User login

Blogs

15

Mar

2017

Spent some time investigating running the Chrome browser in a headless fashion, and writing code to control it so that it can be used to perform test measurements. The headless mode in development works well, and I can build a test program that will load and control the browser. Using the Chrome devtools API I can evaluate arbitrary javascript on the page, so I have access to the performance timings (but need to deal with the issue of the Timing-Allow-Origin not being set by most sites). My old YouTube javascript timing code still works, but I need to detect the end of the video in order to collect the final stats (Chrome doesn't appear to want to let me hook into arbitrary events, only a disappointingly small subset are exposed).

Worked on the BGP SDN proof of concept some more, making it event driven so that routing and topology updates can be read while it runs - it can now respond to changes in received routes or link status so that the routing tables are always up to date.

13

Mar

2017

Went back and finished making libflowmanager work with parallel libtrace. The remaining problem had been that the expiry modules were not thread-safe, so I've rewritten them to be classes so that the expiry lists are local to each module. Testing with lpi_protoident has proven these changes to work (at least when reading from a trace file), so I can continue updating the rest of the libprotoident tools to be parallel-libtrace compatible soon.

Spent the remainder of my week validating some of the FSMs produced by my model generation algorithm. Overall, the results are starting to look fairly good -- most of the machines being generated by my code are close matches to the ground truth machines, and there are very few duplicate or redundant machines. The most obvious outstanding problem is related to "tandem repeats", i.e. sequences of multiple system calls that can be repeated any number of times (such as "read,write,read,write,read,write", where "read,write" simply repeats until the action is over. Started looking into methods where I could detect tandem repeats so that I can try to encode them as a single self-repeating state.

06

Mar

2017

Finished testing my packet ordering fix for libtrace. Managed to come up with a more efficient method of determining an appropriate order value for int: and ring: so hopefully performance shouldn't be impacted too much by this change. Also fixed a couple of other libtrace bugs that I had noticed, particularly the horrid performance of tracertstats on some live formats. Released a new version of libtrace (4.0.1) that includes these fixes as well as a few others that have come in since the first parallel release.

More tweaks to the FSM generation code. I've found some errors in the way by which I was determining whether one machine was effectively superceded by another, which was causing me to produce extra redundant machines. I've also come up with the new method for creating the match maps when comparing two sequences -- the old method simply focused on picking the longest match and then finding any other matches that cover unmatched territory, which doesn't work so well for some looping sequences which have repetitive sub-sequences within them. My new method tries to find an optimal set of matches that gets the best possible coverage while minimising the amount of overlap, so we avoid matches that only end up covering one token because the rest of the sequence has already been covered by a larger match.

01

Mar

2017

Libtrace 4.0.1 has been released today.

This release addresses a number of bugs in the new parallel API and updates our DPDK support to be compatible with the latest stable DPDK release (16.07.2).

This release includes the following changes / fixes:
* Fixed bug where libtrace's built-in hasher would always sent packets to the same thread.
* Fixed terrible performance for tracertstats when reading from live formats.
* Fixed bug where trace_pstop() would fail for ring: and int: on older kernels.
* Added support for IPv6 within PPP.
* Added support for PPTP when parsing GRE headers.
* Added API function trace_clear_statistics().
* Fixed race conditions when using parallel API to read from a file.
* Generally improved performance for live formats when using the parallel API by removing an unnecessary mutex.
* Fixed bug where the ordered combiner seemed to be returning packets out-of-order.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

28

Feb

2017

Fixed up some outstanding bugs that had been reported against ampweb, and finally got it up on github. Built new packages for deployment on skeptic and brought it all up to date with everything.

Spent some time investigating TCP throughput tests running for longer than they should. Appears to be caused by the receiver not stopping until the end of data marker is seen, which can sometimes take a lot longer than expected due to queues or delays. The sender stops at the appropriate time, but we take the timing and byte counts from the receiver which runs over time. Will have to think about the best way to make sure this works as expected, without bluntly terminating the connection if possible.

Started implementing some proof of concept BGP/routing software. Got it loading a topology and advertised routes from static text files, building a network and generating internal routing based on some naive rules. Next step will be to make the rules more configurable/pluggable.

27

Feb

2017

Still having some problems with my variant recognition code for the FSM construction. Decided to go back to square one in terms of the set of conditions for variant matching. I've started developing a small dataset of potential variants and tagging them with whether I actually want them to be recognised as variants or not. Using this, I can hopefully look at these as a complete set and try to develop a set of conditions that works well for all scenarios, rather than the previous "whack-a-mole" development strategy where I would focus on the case I'm currently getting wrong, come up with something that fixes that problem and then consequently break several other previously good matches.

Finally managed to track down and fix a nuisance segfault in anomaly_ts that would very occasionally crop up. The biggest challenge was getting the segfault to occur in an environment where I could get a core dump; the problem was obvious once I had a useful dump. Also fixed a handful of interface issues in amp-web and resolved an issue with the event dashboard being slow to load.

Found and resolved a libtrace bug where parallel ring: inputs would appear to produce out-of-order packets, even with the ordered combiner. This was because we were relying on the packet timestamp as a ordering mechanism, but the clock used to timestamp the packets is not strictly monotonic -- using a monotonic clock to determine packet order makes the combiner a lot happier. Packet ordering is now determined per-format as a result, so I'm still testing that ordering still works for the other formats.

21

Feb

2017

Spent some time investigating a segfault in anomaly_ts and tidied up some of the code around that, but that was mostly cosmetic. Made some very minor fixes that seem unlikely to be responsible for the crashing.

Did a last quick polish pass over ampy to remove old files, fix licensing, etc. Set up and ran my own web instance to make sure that I hadn't broken anything while tidying. Put ampy on github. Started to do the same for ampweb but ran out of time at the end of the week.

21

Feb

2017

Spent some more time working on bugs that had been reported in the amplet2 client. Standalone tests now report more useful messages when no valid targets are specified, as well as if any targets failed to resolve to a useful address. Particularly bad names now won't crash the process when trying to read the DNS response. Started looking at dealing with some of the other reports, but they turned into deeper problems that I need to think more about (I'm being inconsistent in the way I treat errors in different tests).

Did a last quick polish pass over nntsc to remove old files, fix licensing, etc. Split pywandevent from the nntsc repository and made it its own project (nntsc no longer uses it anyway). Put libwandevent, pywandevent and nntsc up on github.

20

Feb

2017

Another solid week of state machine improvements. I've been comparing the machines derived by my algorithm against the machines I can derive manually from the raw data. This has revealed quite a few failures on the part of my algorithm; a lot of the problems fell into one of two categories: 1) creating loops in situations when we probably shouldn't have or 2) a failure in the variant recognition code (both in terms of failing to recognise a variant and being too keen to decide two sequences are variants).

In the process of fixing these problems, I also discovered a bug in my original pattern extraction code that was causing it to halt too early, i.e. as soon as it has extracted a pattern of at least 4 tokens rather than the intended 20 tokens, which explains why many of the patterns I was working with were fragments of a whole sequence. Fixing that has greatly improved the quality of the machines I have been deriving, as well as revealing some patterns that I was previously always missing.

Also spent a day tidying up some of the ampy and amp-web code prior to Brendon releasing them on github. Made the old rrd-smokeping collection work again, as well as removed all of the old LPI and munin collections which we are not interested in maintaining right now.

13

Feb

2017

Another short week of refinement on the FSM generation code. Fixed a major bug in my pattern-mining code that was causing it to return substrings that overlapped as the most common repeated substring. Also spent a lot of time refining the code that determine whether a sequence is a variant of another; now, a short sequence that is entirely encompassed by another much longer sequence is considered a good match despite the number of tokens in the long sequence that are unmatched.

Put together a poster describing the FSM work, as CROW are interested in displaying it at the CultivateIT event next week. Even if they don't use it there, it'll probably be handy to have available at some point.

Helped Brendon test out some code polishing that he has done to NNTSC before putting it up on GitHub. Went through and removed some outdated code in the repo (specifically the LPI modules) and updated the docs to not refer to our non-working modules so hopefully nobody will try to use them.