"Filtering (projecting) an ExampleSet according to a feature(attribute) set"

misaghbmisaghb Member Posts: 7 Contributor II
edited May 2019 in Help
Assume that I have an example set with 62 samples and 2000 attributes(features). I also have 25 different attribute description files (both <.att> and <.aml>) that have 50 features. How can project the total example set (with 2000 attributes) to my desired feature set (with 50 attributes)? I tried attributeConstructionLoader and attributeConstructionWriter but I did not get to any meaningful result.

Actually I want to enable 50 features and disable the rest 19950 features according to a <.att> or <.aml> file containing these 50 features and the total dataset with 62*2000 cells.

Can anybody help me?
Please help me find the operator tree to solve this problem.

Best thanks.


  • Options
    steffensteffen Member Posts: 347 Maven
    Hello misaghb

    As far as I know, there is no better way (in your special case,although it is the standard procedure) then use the operator "FeatureNameFilter". Either you are able to define a regular expression matching exactly your attributes or you specify a list of all attributes in this way:
    attr1||attr2||attr3 etc..
    and pass this as argument to the mentioned operator.

    hope this was helpful

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    actually there is a way but not by using the attribute constructions. Those are for constructing new features only, not for feature selection purposes. Search this forum and you will come up with this thread


    At the end of this thread, the solution is described: use the operators ExampleSet2AttributeWeights, the corresponding readers and writers and the operator AttributeWeightSelection for this task instead of the construction files. This should do the trick.

    Another way would be to use your .aml files since only single columns can be loaded as well. You would, however, have to ensure that the column index in the reduced .aml files would match the actual column in the complete data file. Since this can hardly be automatized, the preferred way probably is to the use the approach described above.

  • Options
    misaghbmisaghb Member Posts: 7 Contributor II
    Dear Steffen and Mierswa,
    Thanks a lot for your consideration and such fast and helpful comments.
    It was my 1st question that I submitted in this forum and fortunately it was a nice experience.

    I will use this operator-tree. I hope it would solve my main problem.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="D:\ACADEMICs\MISAGH\rapidminer_workspace\Temp\temp-09-Forum\main.aml"/>
        <operator name="RandomSelection" class="RandomSelection">
            <parameter key="number_of_features" value="50"/>
        <operator name="ExampleSet2AttributeWeights" class="ExampleSet2AttributeWeights">
        <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter" breakpoints="after">
            <parameter key="attribute_weights_file" value="D:\ACADEMICs\MISAGH\rapidminer_workspace\Temp\temp-09-Forum\weights.wgt"/>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object" value="ExampleSet"/>
        <operator name="IOConsumer (2)" class="IOConsumer">
            <parameter key="io_object" value="AttributeWeights"/>
        <operator name="ExampleSource (2)" class="ExampleSource">
            <parameter key="attributes" value="D:\ACADEMICs\MISAGH\rapidminer_workspace\Temp\temp-09-Forum\main.aml"/>
        <operator name="AttributeWeightsLoader" class="AttributeWeightsLoader">
            <parameter key="attribute_weights_file" value="D:\ACADEMICs\MISAGH\rapidminer_workspace\Temp\temp-09-Forum\weights.wgt"/>
        <operator name="AttributeWeightSelection" class="AttributeWeightSelection" breakpoints="after">
            <parameter key="k" value="50"/>
            <parameter key="p" value="1.0"/>

    Thanks again.
Sign In or Register to comment.