User login

Blogs

12

Jun

2017

Moved the peer and VRF route management out into a separate process for each individual one, so that they can process filters etc in parallel without blocking other peers/VRFs. They are all self-contained now and operate via message passing - BGP commands or routes come in, which are processed and then sent on as further BGP commands or route lists.

Spent some time tracking down the cause of ports not being reused correctly in the AMP throughput test. When run through the server, with IPv4 and IPv6 available, it was not properly closing the socket for the unused address family once a client connected so the test port would still be in use when it later tried to restart the connection to test in the other direction. It should now make sure that only the address family in use has a socket bound to the test port.

12

Jun

2017

Replaced the python ipaddress module in my BGP program with my own very minimal one to reduce memory usage. Replaced some empty sets/lists that may not ever have data with "None" by default as empty data structures are quite memory heavy when you have millions of them. Also updated a couple of heavily used classes to use slots (explicitly stating the attributes) rather than leaving it open ended (and using more memory).

Started looking at adding a command interface to allow updating filters and receiving external measurements or metadata about how we should be routing traffic. The current event loop around I/O doesn't really support this (and has other issues about deadlocking with exabgp) so needed to be rewritten. All exabgp reading and writing now happens in the same place, and in a different thread to the command interface and route management.

02

Jun

2017

Libflowmanager 3.0.0 has been released today.

The libflowmanager API has been re-written to be thread-safe (and therefore compatible with parallel libtrace), hence the major version number change.

The old libflowmanager API has been removed entirely; there is no backwards compatibility with previous versions of libflowmanager. If you choose to install libflowmanager 3 then you will need to update your existing code to use the new API. This should not be too onerous in most cases, as most of the old global API functions have simply been replaced with method calls to a FlowManager class instance. The README and example programs demonstrate and explain the new API in detail.

Note that much of our other software that relies on libflowmanager, such as the libprotoident tools and lpicollector, have NOT yet been officially released with libflowmanager 3 support. If you are currently using any of this software, you should continue to use libflowmanager 2.0.5 until we are able to test and release new libflowmanager 3 compatible versions.

You can download both libflowmanager 3 and libflowmanager 2.0.5 from our website.

16

May

2017

Tidied up some unusual entry points in ampweb that web crawlers were hitting (to return proper error codes rather than broken templates) and tried to block a few of them with robots.txt. Fixed YAML schedule generation to not include tests where the source is an explicit destination (we previously removed the source from the mesh description, but missed this case). Spent some time trying (unsuccessfully) to fix some edge cases in the graph browser modals where previously set values weren't being set correctly.

Put together new releases for all the ampweb components, got them up on github and deployed to a test site. Also updated documentation to make it clearer that all these components work together and aren't independent.

Spent some time profiling the memory usage of my BGP program, after having got the runtime down to a reasonable level. I don't have enough memory to announce a million prefixes to all my peers as well as maintaining internal routing for them all. Found some significant savings by not instantiating my set variables until they were required (an empty set in python is 232 bytes!), but still need to lower the usage.

16

May

2017

Added options to the ampweb mesh configuration to allow setting the individual tests that should be visible in the matrix view, and whether the mesh is a source for these tests or not. Previously we tried to guess and enable these as tests were scheduled, but this led to issues when meshes were created for convenient display grouping and didn't actually run any tests, and it was not transparent why things were behaving the way they did. This is now an entirely manual process that the user has full control over. Also fixed a bug in the matrix display that meant the throughput matrix was losing configuration options when switching between it and other tests, and updated the URL validity checks to match the new formats so that our URLs are now pushed into browser history.

Spent some more time investigating why the BGP code was so slow. Most of the time appears to be spent copying route entries, so I rewrote the deepcopy function for that class to be much more efficient, and also reduced the number of locations where route entries were copied. Replaced some dictionaries with defaultdict structures that remove the need to check for key presence (in very large data structures) before taking actions. I can now import 1 million routes from a peer in under 60 seconds, including running them through a number of simple filters and copying them into a number of VRFs. Exporting these routes takes more memory than I have available however, which will be a job for next week.

16

May

2017

Added interface elements to ampweb enabling the scheduling of a normal or HTTP POST style throughput test, as well as the database/ampy support to make this work. Updated the graph browser to allow selecting and displaying throughput tests of both sorts as well. Spent some time trying to add support for the new tests to the matrix as well, but we haven't done this particular sort of data split before and it's not immediately clear the best way to go about this.

Started to look into why my BGP code is scaling so poorly at 10,000+ prefixes.

08

May

2017

Added another 5 protocols to libprotoident -- having a slightly more powerful PC for installing and running various candidate applications has helped quite a bit. Updated the rules for several more protocols as well.

Made some more progress on my protocol taxonomy -- I'm up to 'P' for the TCP protocols so I'm probably about 1/4 of the way through now.

Continued re-factoring the FSM generation code. Getting close to done, although I suspect the amount of changes and variable renaming will require a fair bit of testing to make sure I've transferred everything across correctly.

Added the ability to choose between TCP and HTTP throughput data on the AMP matrix. To do this, I had to bring the amp-web/nntsc install on prophet back up to date after a few months of being untouched. As always, there were a few issues with dependencies and versioning which slowed everything down, but eventually Brendon and I got it all working correctly.

01

May

2017

Another disrupted week, this time due to being ill. Spent most of my available time looking over the output of my new multi-process state machine generation algorithm. The extra sequence fragments that become apparent when considering multiple processes managed to reveal a few new situations where my code wasn't quite doing the right thing. I've fixed those and am reasonably happy again with the machines produced for my test dataset.

Moved on to some code re-factoring, as the existing code-base had become a bit of a mess from hacking in fixes to all of the edge cases I had been dealing with. In particular, I'm aiming to separate code that deals with the machine itself, i.e. the states and their transitions, from the code that compares sequences and determines what needs to be added to (or removed from) the machine to accommodate the variation.

24

Apr

2017

Improved route aggregation to include the AS set of all ASNs involved in the aggregation so that peers can better perform loop detection.

Improved community support so that imported communities are now in a useful format, and can also now be exported to peers. Added a new filter to match the commonly used no-export communities.

Improved handling of withdraw messages to deal with supernets of advertised prefixes being specified - we can't just remove the exact prefix sent by the peer. Also tidied up some other prefix matching that was using the overlaps() function rather than being a strict subset.

Tidied up AS path modification and community modification via filters to make copies of the route entries so that the changes are only temporary and not reused between different peers and VRFs. A clean original copy of the route is kept and modifications are applied to that rather than stacking up repeatedly on the same instance.

24

Apr

2017

Tidied up exporting routes to peers to remove some that should not be sent - routes should not be advertised back to the peer we got them from in the first place. Also started to filter routes by VRF as well, so that peers can be limited to which they receive routes from.

Started to build up a fake network topology based on the REANNZ network, with different peers and different relationships between them in order to make sure that the required capabilities are present to build a realistic network.