Options

Please help!!! Do anyone know how to make bivariate correlations between two questions?

CorinaCorina Member Posts: 4 Newbie
I have an questionare and my responses are nominal,( letters) i need to transform the responses in numbers and make a correlation between them. Do anyone knows how i can do it? I need it for my master degree project! Thanks a lot!

Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Corina

    I am not sure why you are trying to find the correlation for categorical variables, because practically there is no benefit of doing this as correlation is something related to continuous variables where covariance is calculated. You can still do it using nominal to numerical operator and then using a correlation matrix. Please find XML below. To check this XML, you need to open a new process then you need to access XML window by selecting (VIEW --> Show Panel --> XML), copy the code from here and paste it in the XML window and then click on Green tick mark so that you can see the process. @mschmitz can inform more about the issues with correlation on nominal attributes.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Titanic Unlabeled" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Titanic Unlabeled"/>
    </operator>
    <operator activated="true" class="nominal_to_numerical" compatibility="9.2.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="313" y="85">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="file_path"/>
    <parameter key="block_type" value="single_value"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="single_value"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="coding_type" value="unique integers"/>
    <parameter key="use_comparison_groups" value="false"/>
    <list key="comparison_groups"/>
    <parameter key="unexpected_value_handling" value="all 0 and warning"/>
    <parameter key="use_underscore_in_name" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:correlation_matrix" compatibility="9.2.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="514" y="85">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="normalize_weights" value="true"/>
    <parameter key="squared_correlation" value="false"/>
    </operator>
    <connect from_op="Retrieve Titanic Unlabeled" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Correlation Matrix" from_port="example set" to_port="result 3"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
    <connect from_op="Correlation Matrix" from_port="weights" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    Hope this helps.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    CorinaCorina Member Posts: 4 Newbie
    I need to make a correnlations like SPSS does. But i do not know the steps. I need correlations to process a questionare data
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    edited May 2019
    Hi,
    "correlation" as a term is not clrearly defined. The most common definition of correlation is Pearson-Correlation. Pearse-Correlation is not defined for non-numerical data.
    So either you want not to use pearson correlation, or you want to use some preprocessing before calculating the correlation. E.g. by using the Nominal to Numerical operator.

    I recommend to use another dependency measure, which is well defined for nominal data. My two go-to options are Gini-Index and Entropy (aka Information Gain).

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @Corina as you were doing it for a project which you need to present, I recommend @mschmitz suggestion of Entropy or Gini-Index. If you still want a correlation, you can use the solution provided in my earlier post.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    you can try using polychoric correlation:


    Note that it is valid only for ordinal variables and assumes a normal distribution. It is not included in RapidMiner, but that's ok because you seem to be interested in statistical analysis and not in machine learning.

    Regards,
    Sebastian


  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    You can use "Statististics Extension" for this. Basically the Pearson correlation will not apply to non-numeric attributes. You can however calculate "Chi Square Test" (from that extension) to pairs of your categorical attributes. The best choice for you however is to rely on Spearman correlation, which is most appropriate for survey data, and again you can generate Spearman correlation matrix from the above-mention extension. In this case, you would need to first carefully remap all your letter responses to numeric values while preserving their order, you can do this outside RM using a text editor or you can use "Map" operator or play with "Generate Attributes" text functions.

    Jacob
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    The Nominal to Numerical operator will also perform the conversion for you if you use the "integer coding" option, as long as the original answers appear in the correct order of the numerical scale (i.e., the sorted nominal order is the same order as you want in your resulting numbers).  If they do not, then you would need to re-map them manually as noted.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.