RapidMiner

Evaluating Anomaly and signature detection methods

Regular Contributor

Evaluating Anomaly and signature detection methods

[ Edited ]

Hi, 

 

I am a 4th year student trying to do an experiment comparing signature based and anomaly based detection methods. I would like to do this using decision trees and random forest algorithms. The end goal would be to measure the rate of false positives in both methods and to conclude with which is better for deterring cyber attacks.  

 

I am not too sure how I am going to undertake this experiment but RM looks very helpful. I have publicly available security logs to use as data, have downloaded the anomaly detection extension, text mining extension and have set up an IDS system in Sec Onion to monitor my network. 

 

Any advice/solutions would be much-appreciated apologises if ive not made what im doing clear enough I am far from an expert on the subject 

Cheers

See more topics labeled with:

16 REPLIES
Highlighted
Community Manager

Re: Evaluating Anomaly and signature detection methods

So the first thing I would ask is if those logs you have are labeled. Do they have a tag for "intrusion" or "no intrusion"?

 

I would then load the data and use a Process Documents by Data operator and embed Tokenization/TransformCases/etc inside and then create the TFIDF word vectors and do a Cross Validation with maybe a Naive Bayes algo inside. You would have to ouput the PER port to see how well the model classifiies this data (a confusion matrix will be generated).

Regards,
Thomas - Community Manager
LinkedIn: Thomas Ott
Regular Contributor

Re: Evaluating Anomaly and signature detection methods

Yes I have my data labelled as 'attack' or 'normal'. Will cross referencing with naive bayes give me the desired result of determining rate of false positives/classification accuracy etc? Additionally, I have a signature database containing a list of known attacks, do i have to manually train the model or something or does RM handle that? 

Regular Contributor

Re: Evaluating Anomaly and signature detection methods

Furthermore do I have to upload two sets of data, one for training and one for testing, into the cross validation operator or the same set to allow the operator to split it up? 

Community Manager

Re: Evaluating Anomaly and signature detection methods

The Cross Validation operator will automatically handle splitting up the training data into a training and testing set, based on the # of k-folds and how you want to sample it. 

Regards,
Thomas - Community Manager
LinkedIn: Thomas Ott
Community Manager

Re: Evaluating Anomaly and signature detection methods

I should move your thread to the Studio forum, hardly anyone comes to this place. 

Regards,
Thomas - Community Manager
LinkedIn: Thomas Ott
Regular Contributor

Re: Evaluating Anomaly and signature detection methods

My data is the labelled NSL-KDD dataset for intrusion detection. I want to show number of false positives as well  but the only performance operator that shows this requires a binominal label? which the data doesnt have because it is labelled with the specific attack rather than 'normal' or 'attack' is it possible to make it think it is just one or the other so it can be set as binominal? 

 

Cheers

F

Moderator

Re: Evaluating Anomaly and signature detection methods

Hi,

can you simply generate a new attribute with attack/noAttack from your given signiturate and use it as label for comparison?

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Regular Contributor

Re: Evaluating Anomaly and signature detection methods

[ Edited ]

Could I do that inside RM or would i have to do it on the original dataset? I mean i know the generate attribute operator but im not sure how it works honestly

Moderator

Re: Evaluating Anomaly and signature detection methods

Of course,

Generate Attributes is the way to go. It would be something like

 

if(contains(signiture,"attack"),"attack","noAttack")

 

or something.

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner