[SOLVED] Select attributes only shows metadata and no variables?

kasper2304kasper2304 Member Posts: 28 Contributor II
edited November 2018 in Help
Hi out there.

I am working on a text mining project where i need to create a subset of variables for further dimensionality reduction before using training my model.  Having watched the videos online i have come to the conclusion that the "select attributes" node is the one i have to use.

Here is what i have done so far.

I have created two folders on my hard drive. One folder containing positive cases and another folder containing negative cases giving me a total of 300 cases. Somehow RapidMiner manages to get two extra cases which i believe is the "folders" themselves which i will have to remove, but first things first.

I used "Process documents from files" and loaded the two directories with class name "1" and "0". Within the "process documents from files" node i have "transform cases", "tokenize", "filter stop words", "extract token number", "extract length", "aggregate token length", "stem snowball" and "filter tokens".

The settings of "process documents from files" node are:

use file extension as type = TRUE
create wor dvector = TRUE
add meta information = TRUE
prune method = PERCENTUAL

This gives me around 150 variables where i need to kick some of them out before doing dimensionality reduction. As an example "names" does not make much sense to do any analysis with in my case.

THE PROBLEM:

The problem arises when i use the "select attribute" node. It should in my world be straight forward to attach the node to my "process documents from files" node and then simply select/de-select the variables i want to continue with. BUT the only variables that is displayed when i try to use subset option is four metadata attributes... In my world all the 150 variables should be displayed... So is this a bug or do i have some settings wrong somewhere?

Best
Kasper

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    The attribute names are determined from the data at run time so the meta data can't get hold of them. A work around is to store the example set in the repository using "Store" and fetch it again using "Retrieve".

    regards

    Andrew
  • kasper2304kasper2304 Member Posts: 28 Contributor II
    Thanks Andrew.

    I was actually just about to try that work around.

    Best
    Kasper
Sign In or Register to comment.