Options

Normalized mutual information matrix

mp95mp95 Member Posts: 2 Contributor I
edited December 2018 in Help

Goodevening, i tried to calculate a normalized Mutual Information Matrix by passing my Data through a normalize operator, set as minmax (0-1) as follows:

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\ThomasOtt\Downloads\AccXYZ.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="k_means" compatibility="7.5.001" expanded="true" height="82" name="Clustering" width="90" x="179" y="34">
<parameter key="k" value="5"/>
</operator>
<operator activated="true" class="extract_prototypes" compatibility="7.5.001" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="313" y="34"/>
<operator activated="true" class="mututal_information_matrix" compatibility="7.5.001" expanded="true" height="82" name="Mutual Information Matrix" width="90" x="447" y="34"/>
<connect from_op="Read CSV" from_port="output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
<connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Mutual Information Matrix" to_port="example set"/>
<connect from_op="Mutual Information Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Mutual Information Matrix" from_port="matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>

the output is sent to the mutual information matrix, however the outuput in the matrix is not normalized inside the 0-1 range, what am i missing?

thanks in advance

Answers

  • Options
    MaerkliMaerkli Member Posts: 84 Guru

    Hallo Mp95,

     

    It is impossible to deploy your XML file: it is probably due to the line:

    <parameter key="csv_file" value="C:\Users\ThomasOtt\Downloads\AccXYZ.csv"/>

    Maerkli

  • Options
    mp95mp95 Member Posts: 2 Contributor I

    Thanks for your time, as i'm new to the software i pasted the wrong xml of the project, it is as follows:

    Also i've attached the dataset which is of public domain anyways.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" automodel="EXPORTED" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Local Repository/data/ENB2012_data"/>
    <description align="center" color="transparent" colored="false" width="126">Load data.</description>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="145" name="Multiply" width="90" x="179" y="238"/>
    <operator activated="true" class="normalize" compatibility="8.2.000" expanded="true" height="103" name="Normalize" width="90" x="447" y="187">
    <parameter key="method" value="range transformation"/>
    </operator>
    <operator activated="true" class="mututal_information_matrix" compatibility="8.2.000" expanded="true" height="82" name="Mutual Information Matrix" width="90" x="648" y="238"/>
    <operator activated="true" class="concurrency:correlation_matrix" compatibility="8.2.000" expanded="true" height="103" name="Correlation Matrix" width="90" x="514" y="442">
    <parameter key="normalize_weights" value="false"/>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="subprocess" compatibility="8.2.000" expanded="true" height="82" name="Preprocessing" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" automodel="EXPORTED" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Obiettivo" width="90" x="380" y="34">
    <parameter key="attribute_name" value="Heating Load"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Rimozione Colonne" width="90" x="514" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\QCooling Load\E"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <connect from_port="in 1" to_op="Obiettivo" to_port="example set input"/>
    <connect from_op="Obiettivo" from_port="example set output" to_op="Rimozione Colonne" to_port="example set input"/>
    <connect from_op="Rimozione Colonne" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Selezione obiettivo e rimozione cooling load.</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="weight_by_correlation" compatibility="8.2.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="447" y="34">
    <parameter key="normalize_weights" value="true"/>
    <parameter key="sort_weights" value="false"/>
    <parameter key="sort_direction" value="descending"/>
    <description align="center" color="transparent" colored="false" width="126">Peso in base alla correlazione&lt;br/&gt;</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="weights_to_data" compatibility="8.2.000" expanded="true" height="68" name="Weights to Data" width="90" x="581" y="34">
    <description align="center" color="transparent" colored="false" width="126">Plot dei pesi&lt;br&gt;</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="sort" compatibility="8.2.000" expanded="true" height="82" name="Sort" width="90" x="715" y="34">
    <parameter key="attribute_name" value="Weight"/>
    <parameter key="sorting_direction" value="decreasing"/>
    <description align="center" color="transparent" colored="false" width="126">Pesi in ordine decrescente&lt;br/&gt;</description>
    </operator>
    <operator activated="true" automodel="EXPORTED" class="order_attributes" compatibility="8.2.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="849" y="34">
    <parameter key="attribute_ordering" value="Attribute|Weight"/>
    <description align="center" color="transparent" colored="false" width="126">Attributi nella prima colonna&lt;br/&gt;</description>
    </operator>
    <connect from_op="Retrieve Data" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Preprocessing" to_port="in 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 3" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 4" to_port="result 5"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Mutual Information Matrix" to_port="example set"/>
    <connect from_op="Mutual Information Matrix" from_port="example set" to_port="result 3"/>
    <connect from_op="Mutual Information Matrix" from_port="matrix" to_port="result 4"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
    <connect from_op="Preprocessing" from_port="out 1" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    <portSpacing port="sink_result 6" spacing="0"/>
    </process>
    </operator>
    </process>

    The normalization process of the input data goes as planned ( i checked on results) however when i give the normalized data as an input for the mutual information matrix, it doesn't lie in the range of 0-1 but gives me the same matrix that i would have gotten if i didn't do the normalization at all, am i missing something?

    thanks

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    cc'ing @Thomas_Ott as this looks like one of his processes :) 

     

            <parameter key="csv_file" value="C:\Users\ThomasOtt\Downloads\AccXYZ.csv"/>

     

    [Note from moderator: I HIGHLY recommend upgrading your RapidMiner Studio from 7.5 to the current version!]

     

    Scott

     

  • Options
    MaerkliMaerkli Member Posts: 84 Guru

    Goededag Mp95,

     

    Thanks for having posted another XML file: I am now able to reproduce your RapidMiner project. As far as I can understand, Mutual information is not bound to 0-1 but to 0 to +∞. RapidMiner 8 Operator reference writes:

    ''Mutual information is one of many quantities that measures how much one attribute tells us about another. It is a dimensionless quantity, and can be thought of as the reduction in uncertainty about one attribute given the knowledge of another. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero
    mutual information between two attribute means the variables are independent.''

     

    If you observe the variable Orientation in Correlation Matrix and in Mutual Information Matrix, you can see that this variable is almost not correlated with other variables.

     

    Please, take my response with care for I am not a data scientist.

    Maerkli.

     

     

     

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @sgenzer I don't remember this process, could be a left over from another process. :)

  • Options
    jozeftomas_2020jozeftomas_2020 Member Posts: 40

    Hello

    I want to use nmi to evaluate clusters. Does the Mutual Information Matrix operator calculate the same nmi?
      Someone has a typical process?
    Thank you
    With respect

  • Options
    MaerkliMaerkli Member Posts: 84 Guru

    Bonjour,

     

    Is it possible to reformulate your question, please?

    Thanks,

    Maerkli

Sign In or Register to comment.