User login

Blogs

17

May

2011

The last 2 weeks have been spent fixing things up on the WAND website and on warlock in general. So you should hopefully notice less broken things now, however I still haven't been able to work out the root cause of some of the bigger issues such as getting permission denied messages on various pages when you're logged out and some annoyances such as the input format of blog posts and pages reverting to the defaults every time instead of doing something sane like remembering the last used input format.

I still have a long list of things to fix that I'm working my way through so feel free to report any issues you find but chances are now I've got most of the problems on my todo list to fix and will get around to it when I get time.

Helped out with open day on Friday talking to some keen students and flying around New Zealand on the display wall. Built Paul a squid instance that ended up caching most of google earth's high res imagery of New Zealand.

17

May

2011

I finally have a complete draft for my thesis. My conclusion chapter
requires a second revision but otherwise I'm pretty happy with my
chapters. Tony is kindly reading my thesis from start to finish and I
have already had valuable comments back for my introduction. I intend
to submit at the end of this month (Tuesday 31st).

13

May

2011

This weeks focus was on making a start on 3 major assignments (30% +) that are due by the end of the semester.

I have a few more ideas to work on with my 520 project. I met with Jaime and he gave me a good overview of his core network at layers 2 and 3. Still cloudy as to what a good way of bringing multiple layers together in a single visualisation will look like but it is good to have more examples to work with.This weeks focus was on making a start on 3 major assignments (30% +) that are due by the end of the semester.

I have a few more ideas to work on with my 520 project. I met with Jaime and he gave me a good overview of his core network at layers 2 and 3. Still cloudy as to what a good way of bringing multiple layers together in a single visualisation will look like but it is good to have more examples to work with.

Started mapping out a bunch (~20) of papers related to internet topology discovery for the 513 paper. Seemed to pull a good list of relevant papers together to read for my lit review.

Implemented an random forest type machine learning algorithm in java (called Extra Tree - http://www.montefiore.ulg.ac.be/~ernst/extremely-randomized-trees.pdf) for another assignment. I have regression mostly working, need to tweak it and work for classifications too.

Started mapping out a bunch (~20) of papers related to internet topology discovery for the 513 paper. Seemed to pull a good list of relevant papers together to read.

Implemented an random forest type machine learning algorithm in java (called Extra Tree) for another assignment. I have regression mostly working, need to tweak it and work for classifications too.

11

May

2011

Made a few tweaks to my ICT presentation based on feedback I got the previous Friday, mainly adding animations and more diagrams.

Finished up and submitted the libtrace paper to IMC and the inbound session paper to ATNAC.

Made a couple of changes to the libtrace build system to satisfy Debian packaging requirements.

Created a bug tracker for libprotoident and started adding documentation to the trac wiki.

Left for Cyprus on Friday.

10

May

2011

Further refined the spam classification for my existing dataset based on
the spam assassin logs. Building the state machine for the new data shows
every flow tagged as spam going through the same set of transitions
(corresponding to 550 errors and exiting), which makes sense seeing as
anything considered spam gets rejected. From that point on it is very
clear which flows are spam and which aren't, but the small amount of spam
left in the dataset isn't enough to differentiate any of the preceeding
links.

Started looking at some of our recent ISP traces to build a larger dataset
with more spam flows. The data is more useful with some idea as to which
flows are spam and which are ham, so I've used the spamhaus block lists to
get an approximate classification. The data is current enough that the
block lists should be fairly relevant and accurate, and if this looks
promising I can capture new data or perhaps try to get access to mail
server logs. At the moment the state machine generation code is being run
over approximately 1400 SMTP flows (of which one third are spam) to see
how this differs from my old dataset.

Also made a few updates to the KAREN weathermap and spent some time on
documentation covering how to make similar updates.

06

May

2011

This week (and some of last week) I took the Karen weathermap and implemented it using my current network map visualisation. I used a static layout for the POPs and just a basic star layout for regional devices connected to these POPs. The layouts can be applied to a given subnetwork's nodes and can be changed at runtime. As you zoom in on a POP, nodes and links connected become more visible and labels appear. The underlying concept being, that subnetworks can contain subnetworks that contain subnetworks etc.

04

May

2011

This is a test of a bug that Brad doesn't think exists

EDIT: and done!

04

May

2011

As you may have noticed, we've upgraded the WAND website. Aside from the new theme, the biggest change is the addition of blogs for each WAND member. This provides a means for us to keep the wider world up to date with the discoveries we are making as they happen. The blogs have also been tied to our weekly reporting system, so there will be weekly updates from all research staff and students at the very least. Feel free to comment on the blogs if you have anything useful to add or wish to ask questions about the work we're doing.

At this stage, the site is still somewhat of a work in progress. Now that we've migrated successfully, we'll be auditing the content to remove out-of-date information and replace it with new content that reflects what we are doing now rather than what we did several years ago. Expect to see a few changes over the next few months...

03

May

2011

Finished bayesian forecast algorithm and sampling code last week. I also checked out my old kalman filter and ARIMA code. Started changing the code to take the sampling code output as input. Both code requires some tweaking on parameters, to do it properly, I borrowed a book called "Time series analysis" from library and started reading related chapter.

02

May

2011

It looks like the major problem with the spam dataset I've been using is
the classification of greylisted flows as spam. Greylisting is a very
common thing to happen to incoming flows on our mailserver and mostly
looks to occur early on before data is sent. This meant there was a vast
number of almost identical flows that were being counted as spam. Removing
these flows from consideration gives me a smaller dataset, but one in
which almost every flow traverses at least one link that is entirely
classified as spam or ham. At this point the small number of flows that
don't do this appear to involve TLS and will require closer investigation.
Will also need to expand into newer and larger datasets, hopefully some
without greylisting that see larger volumes of spam.