Weekly Report - 7/8/15




Started examining connections between multiple files. In other words if an events happens in one file if we look into other files around the same time are there any relations?

Firstly to be able to check a time window all the log file timestamps needed to be standardised so I made a script to process each timestamp and convert it into a unix timestamp. This allows for easy comparisons by just having to subtract one timestamp fromt he other to find the difference in seconds between two events.

Once I had usable timestamps I created a program to compare two files against one another. It goes through event by event in the first file and then goes through the second file to find events within a sixty second time window of one anther. Once an event is found a similarity between the two events is calculated and tokens for matches and stored.

Alot of the lines need to be stripped of character like: =, [, < etc. so that any useful information can be compared. Have also found that since alot of the IP addresses are 130.217 there are alot of matches on just that so it may be useful to implement a blacklist for that network prefix because IP addresses can be are assigned more weight than normal words.




Log Data

Is there any chance you could provide the dataset of logs you used?
(I'd like to try my own ML prediction method on it)