RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

"Machine Learning with the Neural Network"

florianklimekflorianklimek Member Posts: 3 Contributor I
edited June 2019 in Help

Hey guys,

I'm a student and have a particular question for a project at university. We have data about all countries in the world regarding the broadband penetration and the Human Development Index in the years 2000, 2005 and 2010 - 2014. 

Now we would like to make a prediction about how the broadband penetration could affect the HDI in the future using the neural network or another method you would recommend. In the end, we would like to see that if for example Brazil risens the broadband penetration by 1% the HDI will rise by the number x. 

Can you help us how to manage that?

Thank you and kind regards!

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,074   Unicorn

    Hi @florianklimek,

     

    I don't know if your project is a regression problem or a time serie problem.

    In the first case, you can use the Linear Regression or Vector Linear Regression  operators (HDI vs broadband penetration).

     

    but can you share your dataset(s) to better understand ?

     

    Regards, 

     

    Lionel

     

     

     

     

  • florianklimekflorianklimek Member Posts: 3 Contributor I

    Thank you for your reply!

    Of course, here is the data. Ignore the other data, first of all we are focused on Broadband and the overall HDI.

    Thanks!

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,074   Unicorn

    Hi @florianklimek

     

    You have little data, however they are well fitted by a Linear Regression model (R2 micro = 0,977) : 

    HDI_vs_Broadband.pnghyhhy

    1. you can set the country to study by sethis country in the Set country operator

    2. I performed this study with  Human Development Index (HDI), if you want to study the HDI set HDI in the Select Attribute and Set role operators.

    3. You can set your future Broadband by setting them in the Future Broadband operator to predict your future HDI.

     

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2000" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2000 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set country" width="90" x="179" y="34">
    <parameter key="macro" value="country"/>
    <parameter key="value" value="Brazil"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2005" width="90" x="45" y="187">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2005 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="179" y="136">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2010" width="90" x="45" y="289">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2010 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (3)" width="90" x="179" y="289">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2011" width="90" x="45" y="442">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2011 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (4)" width="90" x="179" y="391">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2012" width="90" x="45" y="544">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2012 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (5)" width="90" x="179" y="544">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2013" width="90" x="45" y="646">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2013 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (6)" width="90" x="179" y="697">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Year 2014" width="90" x="45" y="748">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Broadband_vs_HDI\Year 2014 Gesamt.csv"/>
    <parameter key="column_separators" value=",\s*|;\s*"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Country.true.polynominal.attribute"/>
    <parameter key="1" value="Broadband Penetration.true.real.attribute"/>
    <parameter key="2" value="Gross Domestic Product (GDP).true.polynominal.attribute"/>
    <parameter key="3" value="Human Development Index (HDI).true.real.attribute"/>
    <parameter key="4" value="HDI .true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (7)" width="90" x="179" y="799">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Country.equals.%{country}"/>
    </list>
    </operator>
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="Year" width="90" x="45" y="952">
    <parameter key="Input Csv" value="Year&#10;2000,&#10;2005,&#10;2010,&#10;2011,&#10;2012,&#10;2013,&#10;2014,"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.0.001" expanded="true" height="208" name="Append" width="90" x="447" y="391"/>
    <operator activated="true" class="generate_id" compatibility="8.0.001" expanded="true" height="82" name="Generate ID" width="90" x="581" y="493"/>
    <operator activated="true" class="generate_id" compatibility="8.0.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="952"/>
    <operator activated="true" class="join" compatibility="8.0.001" expanded="true" height="82" name="Join" width="90" x="715" y="493">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="849" y="493">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Broadband Penetration|Human Development Index (HDI)"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="983" y="493"/>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1117" y="544">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="Broadband Penetration"/>
    <parameter key="attributes" value="Broadband Penetration|Human Development Index (HDI)"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="1117" y="442">
    <parameter key="attribute_name" value="Human Development Index (HDI)"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="1251" y="442">
    <parameter key="number_of_folds" value="7"/>
    <process expanded="true">
    <operator activated="true" class="linear_regression" compatibility="8.0.001" expanded="true" height="103" name="Linear Regression" width="90" x="179" y="34"/>
    <connect from_port="training set" to_op="Linear Regression" to_port="training set"/>
    <connect from_op="Linear Regression" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
    <parameter key="correlation" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply (2)" width="90" x="1318" y="646"/>
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1385" y="544">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="Future Broadband" width="90" x="1117" y="748">
    <parameter key="Input Csv" value="Broadband Penetration&#10;12.0,&#10;13.0,&#10;14.0,"/>
    </operator>
    <operator activated="true" class="numerical_to_real" compatibility="8.0.001" expanded="true" height="82" name="Numerical to Real" width="90" x="1251" y="748"/>
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="1452" y="748">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Year 2000" from_port="output" to_op="Set country" to_port="through 1"/>
    <connect from_op="Set country" from_port="through 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Year 2005" from_port="output" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Year 2010" from_port="output" to_op="Filter Examples (3)" to_port="example set input"/>
    <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Year 2011" from_port="output" to_op="Filter Examples (4)" to_port="example set input"/>
    <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Append" to_port="example set 4"/>
    <connect from_op="Year 2012" from_port="output" to_op="Filter Examples (5)" to_port="example set input"/>
    <connect from_op="Filter Examples (5)" from_port="example set output" to_op="Append" to_port="example set 5"/>
    <connect from_op="Year 2013" from_port="output" to_op="Filter Examples (6)" to_port="example set input"/>
    <connect from_op="Filter Examples (6)" from_port="example set output" to_op="Append" to_port="example set 6"/>
    <connect from_op="Year 2014" from_port="output" to_op="Filter Examples (7)" to_port="example set input"/>
    <connect from_op="Filter Examples (7)" from_port="example set output" to_op="Append" to_port="example set 7"/>
    <connect from_op="Year" from_port="output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Append" from_port="merged set" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Multiply (2)" from_port="output 2" to_op="Apply Model (3)" to_port="model"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 3"/>
    <connect from_op="Future Broadband" from_port="output" to_op="Numerical to Real" to_port="example set input"/>
    <connect from_op="Numerical to Real" from_port="example set output" to_op="Apply Model (3)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (3)" from_port="labelled data" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it will be helpful,

     

    Regards, 

     

    Lionel

     

     

    sgenzer
  • florianklimekflorianklimek Member Posts: 3 Contributor I

    Thank you very much, Lionel! 

    Really a great community here! But we have two more questions. So the process calculates all the numbers from each year right? So how can we interpretate the results? 

    Thanks a lot!

    sgenzer
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,074   Unicorn

    Hi @florianklimek,

     

    The process fit your data with Linear Regression Model.

    In your case, the equation of the model is : 

     

    Human Developement Index (HDI) = 0,006 * BroadBand Penetration + 0.685

     

    So you can use this equation to predict the Human Developement Index (HDI) for a given BroadBand Penetration.

     

    NB : Connect one of the Multiplier(2) output (which is a model output) to the result (res) to see the parameters of the  regression model.

     

    I hope it will be helpful,

     

    Regards, 

     

    Lionel

     

     

    sgenzer
Sign In or Register to comment.