Options

"Group by Correlation"

ratheesanratheesan Member Posts: 68 Maven
edited June 2019 in Help
Hi,

I have 4 attribute,in which 1 is nominal and other 3 are numerical.My objective is to calculate pair wise correlation coefficient between the 3 numerical attribute group by the nominal attribute.ie,if the nominal attribute contains 2 distinct values namely city1and city2,then I need the correlation coefficient between other attributes in  city 1 and city2 seperately. I tried it with some operator but not getting group by correlation.This is my process.

<operator name="Root" class="Process" expanded="yes">
   <operator name="ExcelExampleSource" class="ExcelExampleSource">
       <parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\dummy.xls.xls"/>
       <parameter key="first_row_as_names" value="true"/>
   </operator>
   <operator name="GroupBy" class="GroupBy">
       <parameter key="attribute_name" value="aa"/>
   </operator>
   <operator name="AttributeFilter" class="AttributeFilter">
       <parameter key="condition_class" value="is_numerical"/>
   </operator>
   <operator name="CorrelationMatrix" class="CorrelationMatrix">
   </operator>
</operator>

Thanks,
Ratheesan
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    did I understood you correctly, that you are going to calculate the correlation on the subset of the example set containing either city1 or city2?
    Then you could use a ValueIterator in combination with a nested ExampleFilter.

    By the way: Ever thought of becoming enterprise customer? You have quite a bunch of questions and I would be able to answer much more detailed during consulting. I could then simply post an example process here...

    Greetings,
      Sebastian
  • Options
    ratheesanratheesan Member Posts: 68 Maven
    Hi Sebastian,

    I am using Rapid Miner Enterprise Edition only.When I am using Excelsheet as input I am getting separate correlation for each class. But when I am reading the same  data from SQLServer I am getting the error message as "Cannot instantiate 'attribute_value_filter': com.rapidminer.example.set.AttributeValueFilter: cannot invoke condition (Parameter string must have the form 'attribute {=|<|>|<=|>=|!=} value')". I am attaching the process for both.

    Excel Input

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExcelExampleSource" class="ExcelExampleSource">
            <parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\tfq.xls"/>
            <parameter key="first_row_as_names" value="true"/>
            <parameter key="label_column" value="14"/>
        </operator>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="dept"/>
            <parameter key="iteration_macro" value="mmm"/>
            <operator name="ExampleFilter" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="dept=%{mmm}"/>
            </operator>
            <operator name="AttributeFilter" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="parameter_string" value="tfq_score||tenure"/>
            </operator>
            <operator name="CorrelationMatrix" class="CorrelationMatrix">
            </operator>
        </operator>
    </operator>



    SQL Input

    <operator name="Root" class="Process" expanded="yes">
        <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
            <parameter key="database_system" value="Microsoft SQL Server (Microsoft)"/>
            <parameter key="database_url" value="jdbc:sqlserver://COMPUTER-647;databaseName=DataMart"/>
            <parameter key="username" value="sa"/>
            <parameter key="password" value="VNfe8QITNRw19hgf6f6UpA=="/>
            <parameter key="query" value="select * from F_TFQ"/>
        </operator>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="DEPT"/>
            <parameter key="iteration_macro" value="mmm"/>
            <operator name="ExampleFilter" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="DEPT=%{mmm}"/>
            </operator>
            <operator name="AttributeFilter" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="parameter_string" value="TFQ_SCORE||TENURE"/>
            </operator>
            <operator name="CorrelationMatrix" class="CorrelationMatrix">
            </operator>
        </operator>
    </operator>



    Thanks
    Ratheesan
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    sorry, but usually enterprise customer use their account on our online support ticket system for asking questions...

    This is strange but I cannot reproduce this, because I don't have your database. Did you check if the DEPT attribute is of the desired type?

    Greetings,
      Sebastian
Sign In or Register to comment.