# Is there a way to process a data that is in ratio?

Member Posts: 1 Learner I
edited November 2018 in Help

I have a university ranking dataset and one of the columns is gender ratio. Is there a way to analyze it to answer my research question " Does gender distribution affect the ranking of university?"

Tagged:

• Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

Hi @apiphuh

First, I used the write Excel operator to convert your .csv file into excel file. Then I performed in excel a preprocessing step with a macro on your female_male_ratio attribute to create a new attribute female_male_ratio_2 which is numerical (33:67 => 0,49 for example).

The new excel file is in attached zip file.

1.After visual analysis, it seems that there are no obvious relationship between "world rank" and "female_male_ratio_2". See the following screenshot : 2. to confirm this observation, I use the "correlation matrix" : the correlation coef between "word rank" and "female_male_ratio_2" is 0,138.

this score means that there are not linear relationships between these two attributes.

You can go further by applying some algo.

Here the process :

`<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">  <context>    <input/>    <output/>    <macros/>  </context>  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">    <process expanded="true">      <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">        <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\timesData_Excel.xlsx"/>        <parameter key="imported_cell_range" value="A1:O2604"/>        <parameter key="first_row_as_names" value="false"/>        <list key="annotations">          <parameter key="0" value="Name"/>        </list>        <list key="data_set_meta_data_information">          <parameter key="0" value="ï»¿world_rank.true.integer.attribute"/>          <parameter key="1" value="university_name.true.polynominal.attribute"/>          <parameter key="2" value="country.true.polynominal.attribute"/>          <parameter key="3" value="teaching.true.numeric.attribute"/>          <parameter key="4" value="international.true.polynominal.attribute"/>          <parameter key="5" value="research.true.numeric.attribute"/>          <parameter key="6" value="citations.true.numeric.attribute"/>          <parameter key="7" value="income.true.polynominal.attribute"/>          <parameter key="8" value="total_score.true.numeric.attribute"/>          <parameter key="9" value="num_students.true.polynominal.attribute"/>          <parameter key="10" value="student_staff_ratio.true.numeric.attribute"/>          <parameter key="11" value="international_students.true.polynominal.attribute"/>          <parameter key="12" value="female_male_ratio.true.polynominal.attribute"/>          <parameter key="13" value="female_male_ratio_2.true.numeric.attribute"/>          <parameter key="14" value="year.true.integer.attribute"/>        </list>      </operator>      <operator activated="true" class="correlation_matrix" compatibility="8.0.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="34"/>      <connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>      <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>      <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>      <portSpacing port="source_input 1" spacing="0"/>      <portSpacing port="sink_result 1" spacing="0"/>      <portSpacing port="sink_result 2" spacing="0"/>      <portSpacing port="sink_result 3" spacing="0"/>    </process>  </operator></process>`

I hope this first response elements will be helpful.

Regards,

Lionel

• RapidMiner Certified Analyst, Member Posts: 344 Unicorn

You could use an statistical test to answer the question, for example a chi squared independency test.