RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Process failed - X-means clustering

albertoarenalalbertoarenal Member Posts: 10 Contributor II
edited December 2018 in Help

Hi everyone,

I´m trying to execute a X-means clustering process to a range of texts included in an Excel file, but it is not possible because everytime I try it, I obtain the same fail: 

 

"Process Failed
The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings dialog in order to get more information about this problem."

 

The log messages are:

Jun 28, 2017 6:32:59 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...

Jun 28, 2017 6:32:59 PM SEVERE: Here:

Jun 28, 2017 6:32:59 PM SEVERE: Process[1] (Process)

Jun 28, 2017 6:32:59 PM SEVERE: subprocess 'Main Process'

Jun 28, 2017 6:32:59 PM SEVERE: +- Read Excel[1] (Read Excel)

Jun 28, 2017 6:32:59 PM SEVERE: +- Process Documents from Data[1] (Process Documents from Data)

Jun 28, 2017 6:32:59 PM SEVERE: subprocess 'Vector Creation'

Jun 28, 2017 6:32:59 PM SEVERE: | +- Tokenize[2096] (Tokenize)

Jun 28, 2017 6:32:59 PM SEVERE: | +- Transform Cases[2096] (Transform Cases)

Jun 28, 2017 6:32:59 PM SEVERE: | +- Filter Stopwords (English)[2096] (Filter Stopwords (English))

Jun 28, 2017 6:32:59 PM SEVERE: | +- Stem (Snowball)[2096] (Stem (Snowball))

Jun 28, 2017 6:32:59 PM SEVERE: ==> +- X-Means[1] (X-Means)

Jun 28, 2017 6:32:59 PM SEVERE: java.lang.ArrayIndexOutOfBoundsException

The structure of the process is in the attached image. If I try the same process only changing X-means box by a K-means box, the process is working without problems and I´m obtaining the results of the corresponding clustering. 
I have also tried to do the X-Means clustering with other data input (direct input from a folder containing pdf files) and the process is not working either.

Could anyone help me?

I really appreciate your help!

Thank you very much
Alberto

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn
    Solution Accepted

    Try toggling off the 'keep text' option on the Process Documents operator and run again. Sometimes this can confuse the X-means operator. 

     

    Another side note, you should probably prune more. I normally don't like to have wide data sets feeding into a clustering algorithm but that's just me. 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    This type of error typical means that there is a problem with a data-type in your data. Did you check the output of the data from the Process Documents operator before it loads into the X-means operator?

  • albertoarenalalbertoarenal Member Posts: 10 Contributor II

    Hi Thomas, thank you very much for your quick answer.

     

    I have also proved with other kind of input data such as a bunch pdf files and the process has failed too

     

    Attached is an image with the content of "processdocument from data". As you can see, is a typical preprocessing task (tokenize-transforme cases-filter stopwords-stemming) creating a tf-if vector.

     

    Besides, If I change the X-means operator by a K-means operator, there is no problem and I obtain the result of the clustering.

     

    How do you think could I proceed? Thanks again

    Alberto Arenal

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Yes but did you check what comes out of the Process Documents operator? Did you put a breakpoint and inspect the data?

  • albertoarenalalbertoarenal Member Posts: 10 Contributor II

    Thank you THomas, I really appreciate your help

     

    Yes, I put a breakpoint just after Process document operator and I obtained a regular tf-if vector (attached an image), I don´t identify a problem with that but it is possible I am leaving out something.

     

    Could be a problem of the number of rows-examples(1048) or the number of attributes (1 special attribute, 3197 regular attributes)?

    Alberto

  • albertoarenalalbertoarenal Member Posts: 10 Contributor II

    Dear Thomas,

    Problem solved, thank you very much. I try toogling off the "keep text" option and the process continues to fail.

    But following your advice of pruning more and then it works.

    I really appreaciate your help, you saved me a lot of time and frustration.

    best regards

    ALberto 

    Thomas_Ott
Sign In or Register to comment.