Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Group by Correlation"
Hi,
I have 4 attribute,in which 1 is nominal and other 3 are numerical.My objective is to calculate pair wise correlation coefficient between the 3 numerical attribute group by the nominal attribute.ie,if the nominal attribute contains 2 distinct values namely city1and city2,then I need the correlation coefficient between other attributes in city 1 and city2 seperately. I tried it with some operator but not getting group by correlation.This is my process.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\dummy.xls.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="GroupBy" class="GroupBy">
<parameter key="attribute_name" value="aa"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
Thanks,
Ratheesan
I have 4 attribute,in which 1 is nominal and other 3 are numerical.My objective is to calculate pair wise correlation coefficient between the 3 numerical attribute group by the nominal attribute.ie,if the nominal attribute contains 2 distinct values namely city1and city2,then I need the correlation coefficient between other attributes in city 1 and city2 seperately. I tried it with some operator but not getting group by correlation.This is my process.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\dummy.xls.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="GroupBy" class="GroupBy">
<parameter key="attribute_name" value="aa"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
Thanks,
Ratheesan
Tagged:
0
Answers
did I understood you correctly, that you are going to calculate the correlation on the subset of the example set containing either city1 or city2?
Then you could use a ValueIterator in combination with a nested ExampleFilter.
By the way: Ever thought of becoming enterprise customer? You have quite a bunch of questions and I would be able to answer much more detailed during consulting. I could then simply post an example process here...
Greetings,
Sebastian
I am using Rapid Miner Enterprise Edition only.When I am using Excelsheet as input I am getting separate correlation for each class. But when I am reading the same data from SQLServer I am getting the error message as "Cannot instantiate 'attribute_value_filter': com.rapidminer.example.set.AttributeValueFilter: cannot invoke condition (Parameter string must have the form 'attribute {=|<|>|<=|>=|!=} value')". I am attaching the process for both.
Excel Input
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\tfq.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="14"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="dept"/>
<parameter key="iteration_macro" value="mmm"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dept=%{mmm}"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="tfq_score||tenure"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
</operator>
SQL Input
<operator name="Root" class="Process" expanded="yes">
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (Microsoft)"/>
<parameter key="database_url" value="jdbc:sqlserver://COMPUTER-647;databaseName=DataMart"/>
<parameter key="username" value="sa"/>
<parameter key="password" value="VNfe8QITNRw19hgf6f6UpA=="/>
<parameter key="query" value="select * from F_TFQ"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="DEPT"/>
<parameter key="iteration_macro" value="mmm"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="DEPT=%{mmm}"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="TFQ_SCORE||TENURE"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
</operator>
Thanks
Ratheesan
sorry, but usually enterprise customer use their account on our online support ticket system for asking questions...
This is strange but I cannot reproduce this, because I don't have your database. Did you check if the DEPT attribute is of the desired type?
Greetings,
Sebastian