User login

Meenakshee Mungro's blog

12

Mar

2014

Spent first half of the week finishing up events and double-checking up the values and finally got a sufficient sample size for each of the event group categories that I was using. Then, I spent an unfortunate amount of time calculating the values to use for the different belief fusion methods for each of the groups. Turns out that Google Docs is not that smart and doesn't know exactly what I want it to do. So, lots of manual cell address entering, formula rechecking, and whatnot.

Finally, I started updating the eventing script to add the recent changes, e.g. Hidden Markov Model detector. I also added triggers to the eventing script to detect when each of the fusion methods reaches/exceeds a significance probability value (95% currently). This will be used to determine which fusion method is the fastest at declaring which events are significant. Also added code to output the order in which the detectors fired for any event group, since it might be interesting to see if there is a pattern in there.

07

Mar

2014

Was away for half of the week, so spent Thursday and Friday working on categorizing more streams so that I would have a decent sample size of events for each stream group. This is getting trickier since we are running out of streams and the ones reamining don't necessarily belong to the stream groups that have an insufficient sample size.

18

Feb

2014

Spent the past 2 weeks collecting more samples of event groups and updated the data in the spreadsheet, so I'll have a better idea of which groups have an insufficient sample size. Andrew had already finalised and entered the data for his HMMDetector for the old streams, so I made sure to include HMM events in the newer streams I analysed (afrinic, lacnic, trademe and apnic).

I also realised I had mislabelled some events detected by the Changepoint detector whenever a loss in measurements occured, so I spent some time double-checking the events and the graphs and updating the appropriate severity value. We decided to exclude them from the detector probability values, since they are a different type of event (similar to LossDetector and Noisy-to-Constant/Constant-To-Noisy updates).

I'll collect more samples (if needed!) and update the values used by the different detectors and fusion methods, and finally move on to validating the output produced by the fusion methods next.

04

Feb

2014

Last week was rather short (holiday and unwell for a couple of days). Spent a bit of time looking into other fusion methods, but then decided to take a break and look into writing the eventing script's output to a database (for easier inspection). Talked to Shane and he created a separate database that I could play with, just to be safe. After looking at their current schema, I spent some time thinking about an ideal way of storing the probabilities of the different methods in the database. Finally, finalised the schema, created the tables and started working on inserting the event data into the DB.

23

Jan

2014

Figured out how to use Bayes as a method to combine the beliefs/probabilities to obtain a final significance probability out of the results of several detectors. I had to use different values than the DS ones I had previously calculated, so I spent a while calculating and double-checking the values I needed for Bayes. After that, I did some manual calculations/testing before diving into implementing it in the eventing python script.

Also read a few other other papers regarding different methods of belief fusion, namely the Averaging and Cumulative functions. After talking to Richard, we decided to implement those functions so as to compare the values obtained by each method.

I also read some material on Fuzzy Logic, so I plan on implementing that next.

Modified the eventing script to enable easy addition of different belief fusion methods, since I plan on implementing more methods as I come across them.

13

Jan

2014

Started the week by doing a summary of the Smokeping data that Shane and I have collected last year. This included grouping the streams based on average means (i.e. < 5, < 30, < 100, > 100) and summing up the number of FPs and significant/insignificant/unclassified events for the whole stream and also on a per detector basis. Using these numbers, I was able to find out accurate probability values for each detector. This also made it easy to see exactly where we needed more data, e.g. only having 5 Mode events throughout all the streams with an avg mean of < 5.

Then, I modified my eventing python script to use different probabilities based on the detector that fired and the average mean of the stream at that time. These probability values will still need to be updated later on since the sample size is too small for some of the detectors. However, this is tricky since some detectors (especially Mode) only fire occasionally when the mode of the time series has changed considerably, so getting a big enough sample size is tricky.

Spent some time looking over Bayes Theorem, which I plan on using as a comparison of different fusion methods.

17

Dec

2013

Spent a fair amount of time reading papers on the limits and alternatives to Dempster-Schafer for combining evidence. The main limitation of D-S is that it can produce counter-intuitive results in case of strong conflict between argument beliefs. However, there are no elements of conflict in the belief functions for the detectors, which makes D-S the preferred option (until I find a better alternative). I also came across a number of other rules (Bayesian, fuzzy logic, TBM, etc) that I plan on reading about during the break.

Also spent a considerable amount of time looking at the events for the AMP-ICMP graphs for the Quarterpounder to Google.com stream. There are a huge number of events, which makes grouping and rating them take forever. I need to do more of the amp-icmp stream analysis before calculating the belief values for each detector, and that's something that I plan on doing during the break too.

09

Dec

2013

I met with Dr. Joshi from the Stats Dept and confirmed that the method I was using was indeed correct. He also mentioned looking into Bayes' theorem as an alternative, and I spent some time reading up on it. There is an element of "undecidedness" with the event significance, which is why Dempster-Schafer is more appropriate than Bayes'.

Also updated Netevmon to periodically send out mean updates to the eventing script. These mean values will be used in deciding which probability values to use in different cases (e.g. when the measurements are noisy/constant, etc). Also also, looked at and rated the events for a couple of streams and updated some of the "busier" streams with last week's events.

02

Dec

2013

Spent the first half of the week implementing a version of the Dempster-Schafer belief function in the eventing Python script. After debugging and testing to make sure that it worked properly, I went on to analysing the events for a few Smokeping streams. This consisted of finding the start of a detected event, finding it in the AMP graphs, giving it a significance rating of 0-5, with 0 being a FP and 5 being a very significant event, and then entering details of the event group in a spreadsheet. This was rather tedious and depending on the stream, sometimes took forever.

I plan on Seeing Dr. Joshi from the Stat Dept. next week to confirm the Dempster-Schafer calculations, after which I will have to resume the event analysing.

19

Nov

2013

Finalised the TEntropy detector and committed the changes to Git. Then spent the next few days reading up on belief theory and the Demspter Schafer belief functions. Started working on a Python script for a server that listens for new connections, understands the protocol used by AnomalyExporter and parses the event data received fron the anomaly_ts client.

Plan for the next week is to finish the eventing Python script (which will include event grouping by ts and stream # initially) and start gathering data from the events in order to calculate a confidence level for each detector. This is necessary for using belief theory to determine whether an event is really an event since the various detectors might not produce the same results.