tackle large Files

choose_usernamechoose_username Member Posts: 33 Maven
Hello all,

i have a large Data set (15 Attributes and almost 50.000 records). The Problem is : For example if a use the Operator Detect Outlier, RapidMiner need a very long time to perform it. Is there  a Solution to this (I mean without using a different Computer)? Or do i need to look for a new Data set ?

Thanks in advance



  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    well, there is no general answer for this. There simply exist some algorithms which have long runtimes (like neural networks, relevance vector machine and - as far as it seems - also the outlier detection operator). In contrast to other data mining solutions, RapidMiner does not remove such algorithms since they work quite well on smaller data sets (or faster machines  ;) ). Actually, there is not much you can do beside
    • using only a sample of the data
    • trying different schemes or different approaches for you problem, in this case for outlier detection
    • check if the algorithm is available in a parallel working mode and use more than one CPU core only
    • inspect the source code and check if it can be optimized / parallelized which we are than happy to include into RapidMiner if you allow this
  • choose_usernamechoose_username Member Posts: 33 Maven
    thank u for ur fast answer  :).  i think i will look for another Data set.


Sign In or Register to comment.