removing columns from a dataset/analysis

User401User401 Member Posts: 5 Contributor II
edited November 2018 in Help
Hi all,
I'm' new to Rapidminer so this is a basic question I'd appreciate your help with.

I'm running a decision tree (ID3Numerical).

How do I:
1) remove columns (variables) from a dataset before processing
2) alternatively, list the subset of the dataset's variables that I want to put through the tree?

In a related issue, how do I store and view a dataset that's been read in without running the whole ETL process again and pausing it after import?!

thanks in advance for all your help.

Richie

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi,

    Welcome to the whacky world of RM! Here's an answer to your questions....
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="200"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value="a.*"/>
            <parameter key="except_features_with_name" value="att5"/>
        </operator>
        <operator name="ExampleSetWriter" class="ExampleSetWriter">
            <parameter key="example_set_file" value="bla"/>
        </operator>
    </operator>
    Some say that reading the manuals and working through the tutorial and examples helps, others that it takes all the fun out of guessing.


  • User401User401 Member Posts: 5 Contributor II
    Thansk for addressing the first of my quetions and for producing an easy to follow example. I used a different operator, AttributeFilter with the same result.

    How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
    - view the first dataset before filtering
    - view the last dataset after the FeaturenameFilter

    I ask because the input and data preparation may be computationally expensive and I don't want to have to rerun them again.

    Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.

    Thanks,
    R
  • haddockhaddock Member Posts: 849 Maven
    How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
    - view the first dataset before filtering
    - view the last dataset after the FeaturenameFilter
    By using breakpoints, like this....

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="200"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="before">
            <parameter key="skip_features_with_name" value="a.*"/>
            <parameter key="except_features_with_name" value="att5"/>
        </operator>
        <operator name="ExampleSetWriter" class="ExampleSetWriter">
            <parameter key="example_set_file" value="bla"/>
        </operator>
    </operator>

    The process halts before removing the attributes, to see the data check out the data view of the data table. Then continue and do the same to see what got written.
    Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.
    Curious, if I open rapidminer-4.2-tutorial.pdf and search on "remove attribute" the first hit is the section on the FeatureNameFilter operator. Equally, if you work through the tutorial ( Help->Rapidminer Tutorial ) you'll come across examples which use breakpoints,  the first in example four...

    That being said I've griped before about the documentation, but believe you me RM is much better and easier to use than the documentation. Being a halfwit myself perhaps I should offer up an idiot's guide to the data underworld...

    Good weekend  ;)
  • User401User401 Member Posts: 5 Contributor II
    Thanks for the help haddock.

    I ran Help\Rapidminer Tutorial but it has no search feature and when I close the tutorial dialog my whole process tree had been lost. So I gave up on that pretty quickly! I'll check the PDF you mention.

    I think once these folks sort out their documentation they'll have a really excellent product that people will use. For an industry user wanting to get up and running quickly it's pretty lacking alright.

    BTW, I have used breakpoints. Once you move on throuhg a breakpopint however, there's no way to go back and view the datasets. I'm coming from a SAS background where this is easy to do. Also, it's important to start a process at any stage since it's pointless to have to keep reading in the data before running any analytics.
    Any pointers on where I can find out about that?

    Thanks and have a good weekend yourself.

    R

  • haddockhaddock Member Posts: 849 Maven
    Nay probs, just copy the example set, like this....
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="200"/>
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value="a.*"/>
            <parameter key="except_features_with_name" value="att5"/>
        </operator>
        <operator name="ExampleSetWriter" class="ExampleSetWriter">
            <parameter key="example_set_file" value="bla"/>
        </operator>
    </operator>
    I realise that you may not wish to clog up memory, so this is not perfect in the sense of a good debugger, still for what it is worth there is your answer.

    For serious users I'd really recommend a course up at RM, in two days I learnt more than in the preceeding two months of grappling with the guesswork. Besides which Ralf is a very approachable tutor and genial lunch host  8)
  • User401User401 Member Posts: 5 Contributor II
    well if there's lunch...!
    Thanks that's exactly what we wanted. We're not short of storage here but can't afford the time to repeatedly run through a long ETL process.

    Thanks,
    Richie
Sign In or Register to comment.