[SOLVED] Select attributes only shows metadata and no variables?

kasper2304kasper2304 Member Posts: 28 Contributor II
edited November 2018 in Help
Hi out there.

I am working on a text mining project where i need to create a subset of variables for further dimensionality reduction before using training my model.  Having watched the videos online i have come to the conclusion that the "select attributes" node is the one i have to use.

Here is what i have done so far.

I have created two folders on my hard drive. One folder containing positive cases and another folder containing negative cases giving me a total of 300 cases. Somehow RapidMiner manages to get two extra cases which i believe is the "folders" themselves which i will have to remove, but first things first.

I used "Process documents from files" and loaded the two directories with class name "1" and "0". Within the "process documents from files" node i have "transform cases", "tokenize", "filter stop words", "extract token number", "extract length", "aggregate token length", "stem snowball" and "filter tokens".

The settings of "process documents from files" node are:

use file extension as type = TRUE
create wor dvector = TRUE
add meta information = TRUE
prune method = PERCENTUAL

This gives me around 150 variables where i need to kick some of them out before doing dimensionality reduction. As an example "names" does not make much sense to do any analysis with in my case.


The problem arises when i use the "select attribute" node. It should in my world be straight forward to attach the node to my "process documents from files" node and then simply select/de-select the variables i want to continue with. BUT the only variables that is displayed when i try to use subset option is four metadata attributes... In my world all the 150 variables should be displayed... So is this a bug or do i have some settings wrong somewhere?



  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    The attribute names are determined from the data at run time so the meta data can't get hold of them. A work around is to store the example set in the repository using "Store" and fetch it again using "Retrieve".


  • Options
    kasper2304kasper2304 Member Posts: 28 Contributor II
    Thanks Andrew.

    I was actually just about to try that work around.

Sign In or Register to comment.