User login

Shane Alcock's Blog




Continued researching and writing for my application protocol paper. Added quite a lengthy background which summarises some of the key events and trends in the history of application protocols which will add a lot of context to my paper.

Also kept investigating new application protocol patterns that continue to appear on the Waikato passive monitor. Added another 5 new protocols this week, so progress continues to be made.




Libprotoident 2.0.11 has been released.

Firstly, this release updates the existing tools to be compatible with both libflowmanager 3 and parallel libtrace. This means that the tools can now take advantage of any parallelism in the traffic source, e.g. streams on a DAG card or a DPDK-capable NIC.

Secondly, we've added 61 new application protocols to our set of detectable protocols, bringing the total supported number of applications to 407. A further 25 existing protocols have been updated to better match new observed traffic patterns.

Finally, there have been a couple of minor bug fixes as well.

Note that this release will require both libflowmanager 3 and libtrace 4, which means that you will likely have to upgrade these libraries prior to installing libprotoident 2.0.11. If this is problematic for you but you still want the new application protocol rules, you can use the '--with-tools=no' option when running ./configure to prevent the tools (which are the reason for the upgraded dependencies) from being built.

The full list of updated protocols can be found in the libprotoident ChangeLog.

Download libprotoident 2.0.11 here!




Libflowmanager 3.0.0 has been released today.

The libflowmanager API has been re-written to be thread-safe (and therefore compatible with parallel libtrace), hence the major version number change.

The old libflowmanager API has been removed entirely; there is no backwards compatibility with previous versions of libflowmanager. If you choose to install libflowmanager 3 then you will need to update your existing code to use the new API. This should not be too onerous in most cases, as most of the old global API functions have simply been replaced with method calls to a FlowManager class instance. The README and example programs demonstrate and explain the new API in detail.

Note that much of our other software that relies on libflowmanager, such as the libprotoident tools and lpicollector, have NOT yet been officially released with libflowmanager 3 support. If you are currently using any of this software, you should continue to use libflowmanager 2.0.5 until we are able to test and release new libflowmanager 3 compatible versions.

You can download both libflowmanager 3 and libflowmanager 2.0.5 from our website.




Added another 5 protocols to libprotoident -- having a slightly more powerful PC for installing and running various candidate applications has helped quite a bit. Updated the rules for several more protocols as well.

Made some more progress on my protocol taxonomy -- I'm up to 'P' for the TCP protocols so I'm probably about 1/4 of the way through now.

Continued re-factoring the FSM generation code. Getting close to done, although I suspect the amount of changes and variable renaming will require a fair bit of testing to make sure I've transferred everything across correctly.

Added the ability to choose between TCP and HTTP throughput data on the AMP matrix. To do this, I had to bring the amp-web/nntsc install on prophet back up to date after a few months of being untouched. As always, there were a few issues with dependencies and versioning which slowed everything down, but eventually Brendon and I got it all working correctly.




Another disrupted week, this time due to being ill. Spent most of my available time looking over the output of my new multi-process state machine generation algorithm. The extra sequence fragments that become apparent when considering multiple processes managed to reveal a few new situations where my code wasn't quite doing the right thing. I've fixed those and am reasonably happy again with the machines produced for my test dataset.

Moved on to some code re-factoring, as the existing code-base had become a bit of a mess from hacking in fixes to all of the edge cases I had been dealing with. In particular, I'm aiming to separate code that deals with the machine itself, i.e. the states and their transitions, from the code that compares sequences and determines what needs to be added to (or removed from) the machine to accommodate the variation.




Slightly disrupted week with Easter and cyclones having an impact on the productivity. Most of my time ended up being spent hunting down more previously unknown protocols. Just three new protocols this week, along with fixes for three more.

On the STRATUS side, I worked on creating a way to "combine" the suffix trees for each individual process so that we can account for sequences that appear frequently in the whole dataset but never more than once or twice within a given process. The original implementation would not recognise those sequences as frequent, because it considered each process individually. I think I've got this working now -- but I'm yet to look at the results too closely.




Continued delving into the unknown traffic on the campus network. Had a mix of frustrating days and successful days -- one protocol (N2Ping) took nearly two days to track down but I got there in the end. 8 new protocols added to libprotoident this week again, so we're starting to get close to 400 supported protocols in libprotoident.

Another week of refinement on the FSM code. Most of the effort has been focused on loop recognition, particularly in terms of making sure we don't ignore candidates that can be used to identify loops.




Have been using my new daily libprotoident email to make some good progress in terms of adding new protocols to libprotoident. Another 8 protocols added this week, with 5 existing protocols improved as well.

Found a few new bugs in my FSM tandem-repeat code after running it against my full test dataset and doing an initial validation of the resulting machines. Finished up a set of slides describing (broadly) what I'm doing overall with the FSM project and how I'm going about it, i.e. suffix trees, pattern extraction, variant detection and machine building.

Started looking into a parallel RT implementation for libtrace / wdcap, with an eye towards removing the combiner bottleneck from wdcap.




Finished implementing tandem repeat detection within my existing pattern extraction code. The initial results look promising, i.e. the code has been able to identify "write,read" as a repeat in the FTP system call log with no obvious false positives. Next job will be to repeat the machine validation and make sure that I have improved the results overall.

Wrote a libprotoident program to perform daily monitoring of unknown payload patterns on the Waikato capture point and send me an email every morning with the 25 "biggest" patterns by payload, as well as a few example flows matching each pattern. Using this data, I've already been able to add a few new patterns to libprotoident and look forward to being able to be more proactive at keeping libprotoident up to date.




Finished porting the remaining libprotoident tools to be parallel-compatible. Spent a couple of days looking at unknown payload patterns in some recent Uni traffic -- unfortunately I wasn't able to make much tangible progress on identifying much of the unknown traffic.

Worked on implementing an algorithm for finding tandem repeats in strings, with the eventual aim of porting it over to work with my system call sequences. The published algorithm consists of three phases, but each of those phases has either involved looking up and implementing several other string processing algorithms (LZ-decomposition, longest common extension) or has required modifications to my existing suffix tree code (extracting a suffix array, bottom-up traversal, storing the longest child suffix in each node). Therefore, I'm about half-way through implementing the algorithm.

Moved libtrace into its own github organization to reflect that libtrace is now going to be more of a community project than a WAND project. I'll still be helping out with maintaining it for now, but now the workload can be shared amongst a group of trusted libtrace users (including people outside of WAND). This will hopefully keep libtrace well looked-after, even as my available time gets more and more restricted.