Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Filtering the Results"

Legacy UserLegacy User Member Posts: 0 Newbie
edited May 2019 in Help
Hey guys,

I want to compare the Rapid Miner for a university project with IBM Omnifind. For that I´d like to run the same scenario in both aplications. Don´t worry it´s a really simple one. I´ll give you the descripton and then what my problem is.

Scenario:
I use the NHTSA data base which contains many many problem reports of cars in America. I splitted every report in a seperate file. Now I want to compare the problem reports in a Correlation Matrix und filter it for the keyword fire. What I can see now is that I have a strong correlation between a car brand and a part of a car.

How to do this in RapidMiner:

I splitted the main file so that I have 1000 files each containing a problem report. Then I load the files via:
Textinput->StringTokenizer->English Stopwordfilter->TokenLengthFilter->Porterstemmer.

After that I use the Correlation Matrix. The thing is that I get too many data. I want to filter the results so that I use only the files which contain the keyword I want to filter. In my case that is "fire". Is that possible? I get at the moment a wide range Correlation Matrix but can´t really use it. Plotting the results is not possible because of too much data.

I hope that you can help me.

Cheers
Benjamin
Tagged:

Answers

  • Legacy UserLegacy User Member Posts: 0 Newbie
    ok, let´s specify my wish. I´d like to filter for some key words my dataset and do then a CorrelationMatrix. So that I can see if I filter for my keyword Fire that we have a strong correlation between Ford and door. Maybe I have to use AttributeWeightSelection.

    please help
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    the solution is quite simple: just use the operator "ExampleFilter" before applying the correlation matrix and filter out all examples where the TFIDF value for the keyword (here: Fire) or it's corresponding wordstem is 0. After that, you should apply a "RemoveUselessAttributes" operator to filter out all now constant attributes. Then apply the correlation matrix.

    Cheers,
    Ingo
  • Legacy UserLegacy User Member Posts: 0 Newbie
    about the example filter. I set the parameter string to fire but I don´t really know how to set the condition class. Can you tell me what I need to set here. If I set the parameter string then I get from  every configuration that it doesn´t work with a parameter.
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,

    you have to use the [tt]attribute_value_filter[/tt] option of the [tt]condition_class[/tt] parameter. As [tt]parameter_string[/tt] you have to specify a condition. Whenever an example does not fulfill the condition, it is filtered from the example set. The following code should work for your example.

        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Fire<>0"/>
        </operator>
    Hope that helps,
    Tobias
Sign In or Register to comment.