RapidMiner

Learner I lanem
Learner I

Outlier detection operators seem to work really slow with larger data sets

Hi

I have a data set of about 160,000 and 25 attributes - trying to detect outliers for numeric variables using detect outliers operators but seems to take for ever to run and sometimes simply runs out of memory

Any advice on a more efficient way to identify outliers in a data set using RapidMiner Studio would be much appreciated

Regards Michael

5 REPLIES
RM Certified Expert
RM Certified Expert

Re: Outlier detection operators seem to work really slow with larger data sets

Have you downloaded the Outlier Detection extension? Those operators are very fast and have many more than the core RapidMiner ones. 

Learner I lanem
Learner I

Re: Outlier detection operators seem to work really slow with larger data sets

Hi Thomas

When I search in Market place for Outlier Detection extension doesn't return any values - am I using the wrong search term - I do have the anomaly detection extension installed

Regards Michael

RM Certified Expert
RM Certified Expert

Re: Outlier detection operators seem to work really slow with larger data sets

AH I meant the Anomaly Detection extension.  Ok, so you have it installed already. My guess is that the memory available to RapidMiner is not enough. How much do you have and what is your license type? Community? Educational? 

Learner I lanem
Learner I

Re: Outlier detection operators seem to work really slow with larger data sets

I have 16GB memory and using educational license of rapidminer

RM Certified Expert
RM Certified Expert

Re: Outlier detection operators seem to work really slow with larger data sets

Can you break it into subsets and iterate over that?

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed