The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Is there a way to process a data that is in ratio?

apiphuhapiphuh Member Posts: 1 Learner I
edited December 2018 in Help

I have a university ranking dataset and one of the columns is gender ratio. Is there a way to analyze it to answer my research question " Does gender distribution affect the ranking of university?"



  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @apiphuh


    First, I used the write Excel operator to convert your .csv file into excel file. Then I performed in excel a preprocessing step with a macro on your female_male_ratio attribute to create a new attribute female_male_ratio_2 which is numerical (33:67 => 0,49 for example).

    The new excel file is in attached zip file.


    1.After visual analysis, it seems that there are no obvious relationship between "world rank" and "female_male_ratio_2". See the following screenshot :



    2. to confirm this observation, I use the "correlation matrix" : the correlation coef between "word rank" and "female_male_ratio_2" is 0,138.

    this score means that there are not linear relationships between these two attributes.

    You can go further by applying some algo.

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\timesData_Excel.xlsx"/>
    <parameter key="imported_cell_range" value="A1:O2604"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="world_rank.true.integer.attribute"/>
    <parameter key="1" value="university_name.true.polynominal.attribute"/>
    <parameter key="2" value="country.true.polynominal.attribute"/>
    <parameter key="3" value="teaching.true.numeric.attribute"/>
    <parameter key="4" value="international.true.polynominal.attribute"/>
    <parameter key="5" value="research.true.numeric.attribute"/>
    <parameter key="6" value="citations.true.numeric.attribute"/>
    <parameter key="7" value="income.true.polynominal.attribute"/>
    <parameter key="8" value="total_score.true.numeric.attribute"/>
    <parameter key="9" value="num_students.true.polynominal.attribute"/>
    <parameter key="10" value="student_staff_ratio.true.numeric.attribute"/>
    <parameter key="11" value="international_students.true.polynominal.attribute"/>
    <parameter key="12" value="female_male_ratio.true.polynominal.attribute"/>
    <parameter key="13" value="female_male_ratio_2.true.numeric.attribute"/>
    <parameter key="14" value="year.true.integer.attribute"/>
    <operator activated="true" class="correlation_matrix" compatibility="8.0.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="34"/>
    <connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>

    I hope this first response elements will be helpful.







  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    You could use an statistical test to answer the question, for example a chi squared independency test.

Sign In or Register to comment.