Decision tree, random forest and classification of data set

g_pawarg_pawar Member Posts: 3 Contributor I
edited December 2018 in Help

Hi All,

I am new to the rapid miner. Could some one please help me to create a decision tree and random forest (got 1 target attribute and 12 parameters influencing it). Also I need to classify the data (with regression) based on the output. The main objective is to check whether a single parameter or a combination of 2 or 4 or 5 parameters significantly  or moderately influences the the main target attribute ?  The data is attached for your reference. I tried working on selecting attributes, set roles but got some errors like missing labels and parameter missing.

Thanks, 

Gopal 

GP.csv 47.4K

Best Answer

  • lionelderkrikorlionelderkrikor Posts: 619   Unicorn
    Solution Accepted

    Hi Gopal,

     

    It seems there is a problem with your XML code : It cannot be loaded. Can you verify it.

    Meanwhile, you can find an example of process including a decision tree model with your data : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Decision_tree_basic\GP.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="1.true.real.attribute"/>
    <parameter key="1" value="2.true.real.attribute"/>
    <parameter key="2" value="3.true.real.attribute"/>
    <parameter key="3" value="4.true.integer.attribute"/>
    <parameter key="4" value="5.true.integer.attribute"/>
    <parameter key="5" value="6.true.integer.attribute"/>
    <parameter key="6" value="7.true.integer.attribute"/>
    <parameter key="7" value="8.true.integer.attribute"/>
    <parameter key="8" value="9.true.real.attribute"/>
    <parameter key="9" value="10.true.real.attribute"/>
    <parameter key="10" value="11.true.real.attribute"/>
    <parameter key="11" value="12.true.real.attribute"/>
    <parameter key="12" value="Main attribute.true.real.attribute"/>
    <parameter key="13" value="13.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
    <parameter key="attribute_name" value="Main attribute"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
    <parameter key="criterion" value="least_square"/>
    </operator>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="8.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="correlation" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @g_pawar please post your XML code too using the </> button. See the Read Before Posting instructions to your right.

    sgenzer
  • g_pawarg_pawar Member Posts: 3 Contributor I
    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="380" y="391">
    <parameter key="column_separators" value=";"/>
    <parameter key="trim_lines" value="false"/>
    <parameter key="use_quotes" value="true"/>
    <parameter key="quotes_character" value="&quot;"/>
    <parameter key="escape_character" value="\"/>
    <parameter key="skip_comments" value="false"/>
    <parameter key="comment_characters" value="#"/>
    <parameter key="parse_numbers" value="true"/>
    <parameter key="decimal_character" value="."/>
    <parameter key="grouped_digits" value="false"/>
    <parameter key="grouping_character" value=","/>
    <parameter key="date_format" value=""/>
    <parameter key="first_row_as_names" value="true"/>
    <list key="annotations"/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="read_all_values_as_polynominal" value="false"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    </process>

     Hi Thomas,

    Thanks for the reply. Please find the code.

    Cheers

    Gopal

  • g_pawarg_pawar Member Posts: 3 Contributor I

    Thanks Lionel. Now its working.

    Regards,

    Gopal

    sgenzer
Sign In or Register to comment.