🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉



🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.

Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!


Process failed exception, any clue?

confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
edited October 17 in Help
I've created a process model that works fine on a sample dataset. However, when I run the process on my whole dataset it gets failed. I'm not sure is it because of the size of the processed files/documents? is there any size limit for the procssed documents in rapidminer? or is it something wrong with the process itself? The exception I'm getting:
  • Exception: java.lang.StackOverflowError
  • Message: null
  • Stack trace:
  • sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  • sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  • sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  • java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  • java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
  • java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1005)
  • com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext.collectResults(AbstractConcurrencyContext.java:206)
  • com.rapidminer.studio.concurrency.internal.StudioConcurrencyContext.collectResults(StudioConcurrencyContext.java:33)
  • com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext.call(AbstractConcurrencyContext.java:141)
  • com.rapidminer.studio.concurrency.internal.StudioConcurrencyContext.call(StudioConcurrencyContext.java:33)
  • com.rapidminer.Process.executeRootInPool(Process.java:1349)
  • com.rapidminer.Process.execute(Process.java:1314)
  • com.rapidminer.Process.run(Process.java:1291)
  • com.rapidminer.Process.run(Process.java:1177)
  • com.rapidminer.Process.run(Process.java:1130)
  • com.rapidminer.Process.run(Process.java:1125)
  • com.rapidminer.Process.run(Process.java:1115)
  • com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)
  • Cause
  • Exception: java.lang.StackOverflowError
  • Message: null
  • Stack trace:
  • java.util.regex.Pattern$Branch.match(Pattern.java:4606)
  • java.util.regex.Pattern$GroupHead.match(Pattern.java:4660)
  • java.util.regex.Pattern$LazyLoop.match(Pattern.java:4849)
  • .........................

Thank you

Best Answer


  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 179  RM Research

    could you perhaps share the process with us? From the error message it's not clear what has caused the error.
    Also it could help if you tell us what version of RapidMiner you are using, as we are continuously improve the product.

  • MichaelMichael Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 14  RM Data Scientist
    edited July 4
    The error message hints at a regular expression failing due to an internal buffer becoming too large which might explain why it works for the smaller sample data set.

    Are you using complex regular expressions to parse a document or a builtin function or operator?
  • confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
    Hi @Michael I'm actually using simple regexes to filter tokens. However, to test this assumption I removed the operators with the complex regexes, and just kept the ones with the very simple regexes but still get the same exception :(

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 179  RM Research
    How large is your full data set and how much memory does your machine has?
  • confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
    edited July 4
    Hi @David_A, I forgot to mention that I'm mining source code files. The dataset is quite big, I ran the process model now on just a subset (~118 Java projects, 778 MB) but I'm still getting the same exception. My rapidminer version is 9.3.001, and my installed RAM is a 16 GB and the free memory is about 1 GB. Maybe this is what caused the problem, I will try to fix it. Thanks
  • confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
    Hi @David_A
    I've tried it again with (1) 1.5 GB of available memory, on a different machine, 
    (2) Get rid of all Filter Tokens operators with complex regex.
    (3) Change the match condition in the Filter Tokens operator to contains with a single word.
    Still have the same problem. Any clue how to fix it? Thanks
  • confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
    Hi again @David_A
    I tried to run the process again from another machine that has 24 GB of available memory, got rid of the parallel operators and regular expressions, and still got the same exception. At one point I tried to run it with only one filtering operator and unfortunately got the same exception.
    However, what I did to make it work is extracting the zipped files and filter them based on their type before getting them processed by rapidminer. I used to have nested loops as part of my process model. Now it seems to work fine. 
    Thank you again for your help and suggestions.

  • confusedMonMonconfusedMonMon Member Posts: 14 Contributor I
    edited August 2
    Hi again @David_A
    As  I couldn't run Rapidminer for the whole dataset at once, because it has tens of folders (and the results will spread over tens of files), I'm looking for a way to automate this. So, just wondering if there is a way to run a rapidminer process using the command prompt on different small datasets and save the results as we go, instead of fixing the read folders and save files parameters manually from the GUI.
    Many thanks
Sign In or Register to comment.