I have Automobile Dataset i want to predict how?

atifraza127atifraza127 Member Posts: 1 Learner I
edited November 2018 in Help

My data is attached in excel file. I want to predict this file. what prediction method I used.

Tagged:

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @atifraza127 - welcome to the community. Have you gone through the tutorials on how to do predictive analyics? You'll find lots of information here: https://community.rapidminer.com/t5/Getting-Started-Forum/bd-p/GettingStartForum

     

    Scott

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    HI @atifraza127,

     

    Once you have read the resources to get started and see the tutorials : 

    The fist step, is to build classifier model(s), which make the relationship(s) between your label attribute (in your case Chance of Stolen, I suppose) and the other attributes, from your training dataset (your file). Then you have to choose the model which has the best performances (accuracy, recall, precision etc.).

    To perform these tasks, you can find here a process to compare the performances of 5 models.

    NB : Don't hesitate to test the different classifier models proposed by RapidMiner : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Automotive\Automobile_data_alternate data.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Make.true.polynominal.attribute"/>
    <parameter key="1" value="Fuel-Type.true.polynominal.attribute"/>
    <parameter key="2" value="aspiration.true.polynominal.attribute"/>
    <parameter key="3" value="Number of Doors.true.polynominal.attribute"/>
    <parameter key="4" value="Body Style.true.polynominal.attribute"/>
    <parameter key="5" value="Wheels Drive.true.polynominal.attribute"/>
    <parameter key="6" value="Engine Location.true.polynominal.attribute"/>
    <parameter key="7" value="Horse Power.true.integer.attribute"/>
    <parameter key="8" value="MPG.true.integer.attribute"/>
    <parameter key="9" value="Price.true.integer.attribute"/>
    <parameter key="10" value="Chance of Stolen.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
    <parameter key="attribute_name" value="Chance of Stolen"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="loop" compatibility="8.0.001" expanded="true" height="82" name="Loop" width="90" x="313" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <parameter key="iterations" value="5"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="112" y="34">
    <process expanded="true">
    <operator activated="true" class="select_subprocess" compatibility="8.0.001" expanded="true" height="82" name="Select Subprocess" width="90" x="179" y="34">
    <parameter key="select_which" value="%{iteration}"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.0.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34"/>
    <connect from_port="input 1" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_random_forest" compatibility="8.0.001" expanded="true" height="103" name="Random Forest" width="90" x="112" y="34"/>
    <connect from_port="input 1" to_op="Random Forest" to_port="training set"/>
    <connect from_op="Random Forest" from_port="model" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="h2o:gradient_boosted_trees" compatibility="7.6.001" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="45" y="34">
    <list key="expert_parameters"/>
    </operator>
    <connect from_port="input 1" to_op="Gradient Boosted Trees" to_port="training set"/>
    <connect from_op="Gradient Boosted Trees" from_port="model" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="8.0.001" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"/>
    <connect from_port="input 1" to_op="Naive Bayes" to_port="training set"/>
    <connect from_op="Naive Bayes" from_port="model" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning" width="90" x="45" y="34">
    <enumeration key="hidden_layer_sizes">
    <parameter key="hidden_layer_sizes" value="50"/>
    <parameter key="hidden_layer_sizes" value="50"/>
    </enumeration>
    <enumeration key="hidden_dropout_ratios"/>
    <list key="expert_parameters"/>
    <list key="expert_parameters_"/>
    </operator>
    <connect from_port="input 1" to_op="Deep Learning" to_port="training set"/>
    <connect from_op="Deep Learning" from_port="model" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="training set" to_op="Select Subprocess" to_port="input 1"/>
    <connect from_op="Select Subprocess" from_port="output 1" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log" compatibility="8.0.001" expanded="true" height="82" name="Log (3)" width="90" x="313" y="30">
    <parameter key="filename" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Automotive\log_automotive.log"/>
    <list key="log">
    <parameter key="Model" value="operator.Loop.value.iteration"/>
    <parameter key="mean performance" value="operator.Cross Validation.value.performance main criterion"/>
    </list>
    <parameter key="persistent" value="true"/>
    </operator>
    <connect from_port="input 1" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_op="Log (3)" to_port="through 1"/>
    <connect from_op="Log (3)" from_port="through 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Once you have determined the best model, you can apply it to a score dataset to predict the label attribute.

     

    I hope it will be helpful,

     

    Regards,

     

    Lionel

     

     

Sign In or Register to comment.