### User login

### Search Blogs

### Bloggers

### Meenakshee Mungro's blog

### 12

#### Nov

#### 2013

Spent the week testing the TEntropy detector on different streams and refining the Symboliser to reduce the number of FPs.

One problem that I discovered was that the magnitude of the events were not represented correctly by the Symboliser: a severe event had the same t-entropy result as a trivial one since only 1 character was inserted at a time. Hence, a solution to this was the introduction of multiple characters based on the severity of the change.

Another problem was that small, insignificant changes were triggering events when deally, they should have not been creating entropy. Hence, I added a condition that checks whether a measurement is significantly different from the previous mean before triggering a non-default character.

### 06

#### Nov

#### 2013

The past couple of weeks have been less productive than I would have liked, whatwith it being the end of the semester and final assignments and exam marking needing to be done. I managed to integrate the Symboliser and TEntropyCalculator with the existing Netevmon code, which produced an average t-entropy value that was then put through a PlateauLevelDetector. However, there were a considerate number of False Positives and False Negatives which were rather discouraging. I knew that the actual t-entropy calculation was correct (after running the original source code and comparing values), so the problem had to be with the strings that were produced by the Symboliser. This meant that I had to spend a lot of time tinkering with the metrics and different parameters of the Symboliser to improve the strings produced in such a way that it would accurately represent the nature of the traffic.

Plan for the following week: keep on refining the metrics and experiment with different parameter values (e.g. shorter/longer string lengths, etc). And also possibly start reading up on the Demspter-Schafer belief theory and finding a decent detector that might be implemented by one of the Summer Research students.

### 15

#### Oct

#### 2013

Managed to get a working implementation of Flott which does the necessary initialisation and calculations for obtaining the t-entropy of a given string! It took longer than expected though - I was right about the objects and functions that I would need out of the original source code, but missed a number of lines in different places which meant that the tokens and values used in the calculations were incorrect, thus resulting in an incorrect output. So, I spent many, many hours adding debugging output in my implementation and the original code after each iteration/processing and compared the results to figure out what had gone wrong. I was then able to produce a t-entropy value that was very close to the original program's output. After going over the original code again, there was a scaling factor that I had missed and that fixed the last issue.

Over the next week, the plan is to refactor the code and finalise it for addition to Netevmon.

### 08

#### Oct

#### 2013

Over the last two weeks, I have been working on the TEntropy detector.

During the first week, I used anomaly_ts and anomaly_feed and produced output for a number of different streams by using a combination of different metrics, string lengths, sliding window sizes, and range delimiters. After producing strings for each sliding window sample, a python script calls the external t_Entropy function with the string as a parameter to obtain the average t-Entropy for each string and pipes the output to a file. I then wrote another Python script to produce a Gnuplot script for producing time-series graphs so that I could inspect the results. At this point, it was apparent that the t-Entropy detector was a feasible option and hence, I had to start implementing the actual t-entropy calculations within Netevmon.

Spent last week going over the T_Entropy library that I found called Fast Low Memory T-transform (flott), which is used to compute the T-complexity of a string which in turn is used to compute the t-Entropy. Unfortunately, the library consisted of around a dozen .c and header files, which made it somewhat tricky to determine which parts I would need. So, I spent around 3 days looking over the source code and trying to understand it before starting to work on adding the necessary bits to a new detector. Found the function that is used for calculating the actual t-complexity, t-information and t-entropy values, so have been working on duplicating those calculations. However, there are a number of other initialisation functions that are required before the t-* can be calculated, so I have to look into them at some point.

Also had a bunch of marking to do, so couldn't spend all week working on the flott adaptation.

### 24

#### Sep

#### 2013

Still working on the tEntropy detector, but have made good progress this week. Ironed out any bugs that I found, and have output in the correct format. Then, I spent a great deal of time collecting output for 8 different streams, each with different character bin sizes and string lengths. Also wrote a python script which takes the output files for different streams (which includes the string used for entropy measurements) and passes it to an external script which calculates an average t-entropy measurement for each timestamp. So, I now have a bunch of output files with entropy values that need to be plotted to determine which combination of string lengths and character bin sizes would be most optimal.

After a brief look at a couple of graphs, it seemed that a greater string length(50) had no benefits over using a smaller string size (20). The patterns were practically similar for each string length and differed very little, which implies that the additional computational cost of calculating the t-entropy for 50characters for every single timestamp is not worth it.

### 17

#### Sep

#### 2013

Spent the week working on the TEntropy Detector. Added a few different metrics that will be used to determine the most suitable/appopriate combination of metrics (by trial and error). Choosing the correct metric would allow transforming the samples into a time-series of average entropy values, and these will be used to detect anomalies. Worked on converting the characters and started implementing a buffer to store the characters as they are added.

### 10

#### Sep

#### 2013

Read a paper about T-Entropy and started implementing a detector that uses sliding windows, calculates some statistics, assigns an appropriate character/"class" to each window, which will then be concatenated into a string of characters, which will in turn be used to obtain the average T-Entropy for a sliding window. However, NNTSC/Netevmon was down until Wednesday so I didn't get to test it after that.

The rest of the week was spent taking care of GA duties, marking a ridiculous amount of assignments, updating Moodle grades, yadda yadda. Didn't manage to get any work done on the project, unfortunately.

Next week, I plan on working on the T-Entropy detector some more, especially adding new statistics and trying to figure out a combination of stats that "work".

### 29

#### Aug

#### 2013

Spent the week doing more reading into possible detectors, and read a couple of related surveys that compared different anomaly detection techniques. A lot of them seemed inappropriate for Netevmon, unfortunately. Did find some Bayesian methods that might be relevant and possible to implement.

Next week: investigate T-entropy and find out how it could be implemented within Netevmon

### 19

#### Aug

#### 2013

Wrote a proposal for the project, so as to get a better understanding and also formalise what I broadly need to focus on. Fell sick on Tuesday morning and it got worse by the evening, so was not able to get any work done for the rest of the week.

Next week should still involve more reading and looking at possible methods that could be used in new detectors.

### 12

#### Aug

#### 2013

Spent a couple of days at the start of the week reading papers which compared different anomaly detection methods in sequential time series data. Found some methods which might be worth looking into (Hidden Markov Models, Finite State Automata, and some window-based methods(kNN, STIDE, etc)). Spent the next couple of days reading up on the HMM, the Baum-Welch algorithm that it uses and also a Genetic Algorithm (which is supposedly more efficient at estimating the parameters in HMMs than the Baum-Welch algorithm). The papers' results seem quite promising, but unfortunately none disclose their methods in detais and I can't figure out how to implement HMMs with NetEvMon yet. Did find a C++ and a Java implementation, so it might be worth going over those.

Plan for next week: write up a brief proposal of the Masters project, talk to someone from the Stats dept. for additional help on implementing a working version of a detector that uses HMM.