Options

Predictions

PapadPapad Member Posts: 68 Guru
Hi, I want to be sure that I have fully understand the way that I have to work with a data set.
First of all, in order to see the accuracy of my model, I use this process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve customer-churn-data" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//MyFirstPrediction/Data/customer-churn-data"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
        <parameter key="attribute_name" value="Churn"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Churn.is_not_missing."/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.2.001" expanded="true" height="103" name="Decision Tree" width="90" x="447" y="34">
        <parameter key="criterion" value="gain_ratio"/>
        <parameter key="maximal_depth" value="10"/>
        <parameter key="apply_pruning" value="true"/>
        <parameter key="confidence" value="0.1"/>
        <parameter key="apply_prepruning" value="true"/>
        <parameter key="minimal_gain" value="0.01"/>
        <parameter key="minimal_leaf_size" value="2"/>
        <parameter key="minimal_size_for_split" value="4"/>
        <parameter key="number_of_prepruning_alternatives" value="3"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="581" y="34">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="performance" compatibility="9.2.001" expanded="true" height="82" name="Performance" width="90" x="715" y="136">
        <parameter key="use_example_weights" value="true"/>
      </operator>
      <connect from_op="Retrieve customer-churn-data" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 3"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Performance" from_port="example set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

--------------------------------------------------------------------------------------------------------



in this case i'm using a decision tree model. With Filter Examples I select all the examples where Churn attribute is not missing.

After that, if I want to predict who is churn or loyal, in case I don't know the result, I use this:



<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve customer-churn-data" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//MyFirstPrediction/Data/customer-churn-data"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
        <parameter key="attribute_name" value="Churn"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.2.001" expanded="true" height="103" name="Multiply" width="90" x="313" y="34"/>
      <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Churn.is_not_missing."/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.2.001" expanded="true" height="103" name="Decision Tree" width="90" x="581" y="34">
        <parameter key="criterion" value="gain_ratio"/>
        <parameter key="maximal_depth" value="10"/>
        <parameter key="apply_pruning" value="true"/>
        <parameter key="confidence" value="0.1"/>
        <parameter key="apply_prepruning" value="true"/>
        <parameter key="minimal_gain" value="0.01"/>
        <parameter key="minimal_leaf_size" value="2"/>
        <parameter key="minimal_size_for_split" value="4"/>
        <parameter key="number_of_prepruning_alternatives" value="3"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="447" y="238">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Churn.is_missing."/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="715" y="187">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <connect from_op="Retrieve customer-churn-data" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>



In the second filter examples operator I have selected the examples with missing attribute churn.
As a result I have the decision tree which is based on my first filter and as a prediction I predict if a customer will churn, based on the dataset of the first filter. Have I understand it well? The main thing I want to understand is the correlation between the two datasets(two filter examples).
Thanks in advance.

Comments

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Papad

    In the first XML code, you didn't connect test dataset to apply the model. I am providing you with a model that is used to get reliable performance results and also predict in unlabelled data. I used 5 fold cross validation operator and inside that, I placed a decision tree in training and apply model and performance in testing. This 5 fold cross validation will provide you with the performance of your labeled data (which has churn values), then I connected the apply model to make predictions on unlabelled data. You can take a look from below code. This is one of the best ways to train and test your model. To use this XML code you need to copy this an paste it in XML Window (View --> Show Panel --> XML) and then click on the green tick mark on XML window.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve customer-churn-data" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//MyFirstPrediction/Data/customer-churn-data"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
            <parameter key="attribute_name" value="Churn"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Churn.is_not_missing."/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.2.001" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="34">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="10"/>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.2.001" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="34">
                <parameter key="criterion" value="gain_ratio"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              </operator>
              <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="179" y="85">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.2.001" expanded="true" height="82" name="Performance" width="90" x="313" y="85">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="648" y="238">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <connect from_op="Retrieve customer-churn-data" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Filter Examples" from_port="unmatched example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Cross Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    PapadPapad Member Posts: 68 Guru
    edited June 2019
    Thanks for the answer.
    I know that cross validation is the right way to have a prediction and a validation together, but what I wanted to understand with my question is if my thought about a general success is write. For example, I know tha my way is something very simple but is it correct as a procedure? 

    Also, I saw in your process that you have 2 apply model operators. I have seen lot of times the first(inside cross validation) but what is the difference with the second one, which takes as an input unm from filter examples?
    Thanks in advance for your time.
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Papad

    Based on your second XML process I thought you have some data without churn labels, so I used apply model to predict these missing labels. 

    Coming to your process, I see the first XML is not correct as I don't see a testing data attached to apply model, but your second XML is good for predicting unlabelled data. Actually, in my process, I combined both your first and second XML into a single code.

    Thanks
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.