Options

How to select the regular attributes?

heron_oliveiraheron_oliveira Member Posts: 6 Newbie
I'm using 'Process Documents from Files'. So I use the operators Tokenize, Transform Cases, Generate n-Grams, Filter Token and Filter Stop words inside 'Process Documents From Files'. On the process I use 'Select Attributes'. But when I choose the attribute filter type 'a subset' and click on the button for selecting attributes, the only atributes I can choose are the special attributes. There was suposed to be 42K columns according to the results... I can choose only like 'label', 'text' 'metadata' but any other columns that appears on the results when I select 'all attributes'. 

Best Answer

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,511 RM Data Scientist
    Solution Accepted
    Hey,
    to understand this you have to understand a bit whats going on with the meta data.

    When you load in a data table, you also load in the metadata, which is effectively the schema of the data set. Every operator implements two functions. One to do the data processing, and one to transform the meta data. This way we can offer you the selection of attribtues at a given operator also provide hints etc.

    Now there are operators, where you cannot calculate the output schema from the input schema. The usual examples are Transpose, Pivot and Process Documents. The resulting columns depend on the text input and thus we cannot calculate it before the process runs.

    There are three ways to approach this:
    - Store the data set and work at least in the design phase on the stored version (maybe also with Subprocess (Caching)
    - Use the Synchronize Meta Data with Real Data option under Process. You can then execute it once and use the old meta data
    - Just type in the attribute names manually. You don't have to use the drop downs. It will also work with manually entering them.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.