RapidMiner

RapidMiner

removing columns from a dataset/analysis

Contributor II

removing columns from a dataset/analysis

Hi all,
I'm' new to Rapidminer so this is a basic question I'd appreciate your help with.

I'm running a decision tree (ID3Numerical).

How do I:
1) remove columns (variables) from a dataset before processing
2) alternatively, list the subset of the dataset's variables that I want to put through the tree?

In a related issue, how do I store and view a dataset that's been read in without running the whole ETL process again and pausing it after import?!

thanks in advance for all your help.

Richie
6 REPLIES
Regular Contributor

Re: removing columns from a dataset/analysis

Hi,

Welcome to the whacky world of RM! Here's an answer to your questions....

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>


Some say that reading the manuals and working through the tutorial and examples helps, others that it takes all the fun out of guessing.


Contributor II

Re: removing columns from a dataset/analysis

Thansk for addressing the first of my quetions and for producing an easy to follow example. I used a different operator, AttributeFilter with the same result.

How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
- view the first dataset before filtering
- view the last dataset after the FeaturenameFilter

I ask because the input and data preparation may be computationally expensive and I don't want to have to rerun them again.

Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.

Thanks,
R
Regular Contributor

Re: removing columns from a dataset/analysis

How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
- view the first dataset before filtering
- view the last dataset after the FeaturenameFilter


By using breakpoints, like this....


<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="before">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>



The process halts before removing the attributes, to see the data check out the data view of the data table. Then continue and do the same to see what got written.

Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.


Curious, if I open rapidminer-4.2-tutorial.pdf and search on "remove attribute" the first hit is the section on the FeatureNameFilter operator. Equally, if you work through the tutorial ( Help->Rapidminer Tutorial ) you'll come across examples which use breakpoints,  the first in example four...

That being said I've griped before about the documentation, but believe you me RM is much better and easier to use than the documentation. Being a halfwit myself perhaps I should offer up an idiot's guide to the data underworld...

Good weekend  Smiley Wink
Contributor II

Re: removing columns from a dataset/analysis

Thanks for the help haddock.

I ran Help\Rapidminer Tutorial but it has no search feature and when I close the tutorial dialog my whole process tree had been lost. So I gave up on that pretty quickly! I'll check the PDF you mention.

I think once these folks sort out their documentation they'll have a really excellent product that people will use. For an industry user wanting to get up and running quickly it's pretty lacking alright.

BTW, I have used breakpoints. Once you move on throuhg a breakpopint however, there's no way to go back and view the datasets. I'm coming from a SAS background where this is easy to do. Also, it's important to start a process at any stage since it's pointless to have to keep reading in the data before running any analytics.
Any pointers on where I can find out about that?

Thanks and have a good weekend yourself.

R

Regular Contributor

Re: removing columns from a dataset/analysis

Nay probs, just copy the example set, like this....

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object" value="ExampleSet"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>


I realise that you may not wish to clog up memory, so this is not perfect in the sense of a good debugger, still for what it is worth there is your answer.

For serious users I'd really recommend a course up at RM, in two days I learnt more than in the preceeding two months of grappling with the guesswork. Besides which Ralf is a very approachable tutor and genial lunch host  8)
Contributor II

Re: removing columns from a dataset/analysis

well if there's lunch...!
Thanks that's exactly what we wanted. We're not short of storage here but can't afford the time to repeatedly run through a long ETL process.

Thanks,
Richie