User login

Blogs

18

Dec

2017

Continued trying to get better performance out of the ndag protocol. Most of my time this week was actually spent trying to resolve some issues with the DPDK packet counters, which were not proving to be particularly accurate or useful. After a lot of debugging, I realised that my DPDK packet generator software was reading and clearing the stats for all of the DPDK interfaces, not just the one it was transmitting on. Once I blacklisted the interface I was using to receive reflected ndag packets on from the packet generator, my numbers started making sense again.

Eventually I found that the best way to improve performance was to just enable jumbo frames and therefore send less multicast packets. With jumbo frames, I've been able to capture, encapsulate, export and subsequently receive a sustained 10G with no real issues.

On the OpenLI side, I've improved the API around my decoding code so that I can easily perform the main tasks that libtrace and libpacketdump will require. Specifically, I now have API functions for getting the timestamp, record length and start of the IP capture field from an ETSI-encoded packet. I've also added a "get next field as a string" function for generating nice tracepktdump output. Hopefully with all of this in place, I should be able to integrate the decoder into libtrace by the end of the year.

15

Dec

2017

At the start of this week, I finished modifying the proactive protection recovery controller to retrieve and use the network topology dynamically when computing paths through the network. I then worked on improving the path splicing algorithm by allowing it to consider more nodes from the primary and secondary path as potential path splice destination and source nodes. I finished implementing a new version of the path splice computation algorithm which seems to be able to produce paths that are more minimally overlapping and closer to the destination node.

I also added a few more failure scenarios and network topologies to my testing framework. I then modified the failure scenario parser/loader and files of my simulation framework to allow specifying different Tcpdump logger locations for various network topologies. The loggers are used to calculate recovery time under certain failure scenarios. This modification was needed as different controllers will produce different recovery paths. Several other fixes and changes were also performed to fix bugs and problems found.

At the end of the week, I started working on extending the proactive controller to receive link failure notifications and optimise the pre-installed recovery paths based on the new network topology.

15

Dec

2017

Finished updating the RouteEntry code to save and load to/from a raw buffer and put it into testing. The time to transmit a million routes between processes dropped from 20+ seconds to less than a second. Memory usage also shrunk massively such that all million routes could be sent at once, which was not possible when using pickle.

Made some other improvements to memory usage by no longer storing copies of the routes where not required (and it's easier to ask a peer to resend them), and by storing routes in simple lists when a more complicated data structure isn't actually required.

The BGP router still uses more memory than I would like, and takes longer to do things than I would like, but it is now much improved. Half of the time taken is now spent waiting on ExaBGP to send me all routes, and there are still plenty of inefficiencies to fix in the way filtering and fixing of routes happens.

15

Dec

2017

Continued to investigate the performance of the BGP router and discovered that a very significant amount of time is spent pickling and unpickling routes to send between processes. Using a newer version of python allows me to modify how messages sent through multiprocessing queues get serialised, so I experimented with using protocol buffers, json, and a few other approaches to see what might work. Everything I tested was still too slow or memory intensive when dealing with a million route entries. Decided that the best approach was to store all the routes in one "bytes" field in a protocol buffer message (rather than having each route as a distinct part of the message) and to write just the relevant parts of the route entry straight into the buffer. Started work on implementing this.

15

Dec

2017

Found and fixed a couple of small bugs in the pickling implementation of the RouteEntry class used in the BGP router. Updated the unit tests to check that prefixes and route entries could be correctly pickled and unpickled. Also found and fixed various small issues that didn't show up in testing, but did when exposed to real BGP implementations and a more diverse set of routes (more tests required!).

Started work on getting useful performance numbers around how long it takes to process and distribute routes, so merged the testing prometheus code I had previously written and expanded it to cover more of the interesting parts of the code. Every time routes are touched (importing, exporting, filtering, etc) the time that took is recorded and available to query. So far it looks like most of the time spent is outside of my main functions and in other places - moving data around between processes.

15

Dec

2017

Had another look into building an AMP test using headless Chrome to measure web (particularly YouTube) performance. I can get my code building within the Chrome build system, but I really want to create a library that I can link my own code against, and nothing like that gets built. They claim it does, but those libraries are missing most of the symbols I need, so still need to look into this further.

Found and fixed an issue around amplet2-client cert fetching failing after a certain number had been issued. Turned out to be a simple type issue and comparisons were being made using the wrong type, thus sorting incorrectly and returning an incorrect certificate.

Spent some time writing installation documentation for the AMP server components and adding it to the github wiki.

15

Dec

2017

Built and released new Debian and Ubuntu packages for amplet2-client, ampy, and ampweb.

Found and fixed a few issues in the netevmon email filtering that were caused by the incorrect types being used to make comparisons. Built and installed new packages in one deployment for testing.

Started work tidying up the C modules recently created for some of the more memory hungry parts of the BGP router. Was able to simplify it in a few places, reorganising code to be able to replace custom code with existing library functions, and shrink the amount of memory required slightly further again.

08

Dec

2017

Separated common code that was shared between the telescope prototype and the new wdcapsniffer into its own file so that there is less repeated code to maintain. After a couple of extra bug fixes, I've managed to get my libprotoident daily monitor code working again and now using the ndag export from wdcapsniffer as the packet source. This will help me confirm that the code is generally stable and doesn't drop packets, as I should notice fairly quickly if my libprotoident reports are empty or have bogus data in them.

Added a dpdkndag capture format to libtrace which intercepts ndag multicast on the wire using dpdk, strips the IP, UDP and ndag headers and converts the contained ERF records into libtrace packets. The idea is that this would be faster than joining the multicast group and waiting for ndag packets to work their way through the network stack. This has turned out to be the case, although it is still not enough for a client to keep up with anything more than capturing ~6.5 Gbps.

Started my OpenLI work by developing my own ETSI-LI decoder. To support this, I've written a simple DER decoder which supports most of the primitives that are present in the ETSI standards. I've also written some code to model the ETSI structural hierarchy. I can now decode an example ETSI-encoded bytestream by walking both the hierarchy and the bytestream, seeing which fields are present and interpreting them according to the type defined in the matching hierarchy entry.

08

Dec

2017

Continued polishing up my presentation on the proposed OpenLI project. The meeting itself was held on Thursday -- people seemed pretty happy with my design and thinking thus far and the project is now all scheduled to start next week. Had a good chat with Neil from the Police about the ETSI standards and some of the gotchas that I'll need to think about when writing my code.

Finished up the initial libwdcap code and used it to write a wdcapsniffer program that exports via nDAG. Spent some time testing and tweaking the wdcapsniffer on the 10g dev machines before rolling the new and improved version out onto the probe VM.

28

Nov

2017

Started working towards rolling nDAG out onto our own capture environment, so that I can observe how it performs in a slightly more realistic scenario. Had to work around a few environment limitations, such as my VM disk needing to be resized and multicast being heavily rate limited on the path between the probe and my client. I've also started working on a libwdcap that will provide all of our old snapping and anonymisation capabilities, as I won't be able to export full unencrypted packets off the probe.

Started working on an actual design and development plan for the ETSI project, including plenty of architecture diagrams. Put together some slides describing the plan and proposed architecture for presentation at the first project meeting this coming Thursday.