Options

Applying a Cluster Model to a Different Path

JKSJKS Member Posts: 1 Contributor I
Hello, 

I'm trying to use a cluster analysis to group regions within Texas that are not receiving cable spend with similar regions in Texas that are receiving cable spend to create a test & control experiment and then analyze web traffic among the regions with spend vs without to measure traffic lift. I used the multiply operator to create two paths (one for regions with $0 cable spend and another for regions with >$0 cable spend). I connected the k-means cluster to the path with $0 spend and am trying to apply that model to the regions with cable spend. I'm not sure if this is the best way to do this analysis but I'm having a difficult time applying the model to the regions with cable spend. 
<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="false" class="split" compatibility="9.10.001" expanded="true" height="82" name="Split (2)" width="90" x="179" y="850">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="split_pattern" value=","/>
        <parameter key="split_mode" value="ordered_split"/>
      </operator>
      <operator activated="false" class="sample" compatibility="9.10.001" expanded="true" height="82" name="Sample" width="90" x="179" y="1054">
        <parameter key="sample" value="probability"/>
        <parameter key="balance_data" value="false"/>
        <parameter key="sample_size" value="500"/>
        <parameter key="sample_ratio" value="0.1"/>
        <parameter key="sample_probability" value="0.15"/>
        <list key="sample_size_per_class"/>
        <list key="sample_ratio_per_class"/>
        <list key="sample_probability_per_class"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="false" class="concurrency:k_means" compatibility="9.10.001" expanded="true" height="82" name="Clustering (2)" width="90" x="849" y="544">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="5"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="determine_good_start_values" value="true"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="SquaredEuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="false" class="append" compatibility="9.10.001" expanded="true" height="68" name="Append" width="90" x="715" y="850">
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
        <parameter key="merge_type" value="all"/>
      </operator>
      <operator activated="false" class="nominal_to_numerical" compatibility="9.10.001" expanded="true" height="103" name="Nominal to Numerical (3)" width="90" x="782" y="646">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="coding_type" value="dummy coding"/>
        <parameter key="use_comparison_groups" value="false"/>
        <list key="comparison_groups"/>
        <parameter key="unexpected_value_handling" value="all 0 and warning"/>
        <parameter key="use_underscore_in_name" value="false"/>
      </operator>
      <operator activated="false" class="cluster_distance_performance" compatibility="9.10.001" expanded="true" height="103" name="Performance (4)" width="90" x="1050" y="697">
        <parameter key="main_criterion" value="Avg. within centroid distance"/>
        <parameter key="main_criterion_only" value="false"/>
        <parameter key="normalize" value="false"/>
        <parameter key="maximize" value="false"/>
      </operator>
      <operator activated="false" class="cluster_distance_performance" compatibility="9.10.001" expanded="true" height="103" name="Performance" width="90" x="1318" y="595">
        <parameter key="main_criterion" value="Avg. within centroid distance"/>
        <parameter key="main_criterion_only" value="false"/>
        <parameter key="normalize" value="false"/>
        <parameter key="maximize" value="false"/>
      </operator>
      <operator activated="false" class="time_series:multi_label_performance_evaluator" compatibility="9.10.001" expanded="true" height="124" name="Multi Label Performance" width="90" x="1318" y="952">
        <parameter key="auto_detect_label_and_prediction_attributes" value="true"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="add_macros" value="false"/>
        <parameter key="current_label_name_macro" value="current_label_attribute"/>
        <parameter key="current_label_type_macro" value="current_label_type"/>
        <parameter key="enable_parallel_execution" value="true"/>
        <process expanded="true">
          <portSpacing port="source_labelled set" spacing="0"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="concurrency:join" compatibility="9.10.001" expanded="true" height="82" name="Join" width="90" x="1117" y="544">
        <parameter key="remove_double_attributes" value="true"/>
        <parameter key="join_type" value="outer"/>
        <parameter key="use_id_attribute_as_key" value="true"/>
        <list key="key_attributes">
          <parameter key="DMA" value="DMA"/>
        </list>
        <parameter key="keep_both_join_attributes" value="false"/>
      </operator>
      <operator activated="false" class="concurrency:k_means" compatibility="9.10.001" expanded="true" height="82" name="Clustering (3)" width="90" x="1318" y="493">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="5"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="determine_good_start_values" value="true"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="SquaredEuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="false" class="cluster_distance_performance" compatibility="9.10.001" expanded="true" height="103" name="Performance (3)" width="90" x="1184" y="340">
        <parameter key="main_criterion" value="Avg. within centroid distance"/>
        <parameter key="main_criterion_only" value="false"/>
        <parameter key="normalize" value="false"/>
        <parameter key="maximize" value="false"/>
      </operator>
      <operator activated="false" class="subprocess" compatibility="9.10.001" expanded="true" height="82" name="Preprocessing Segmentation" width="90" x="1050" y="391">
        <process expanded="true">
          <operator activated="false" class="multiply" compatibility="9.10.001" expanded="true" height="68" name="Multiply" width="90" x="179" y="442"/>
          <operator activated="false" class="filter_examples" compatibility="9.10.001" expanded="true" height="103" name="&gt;$0 filter" width="90" x="313" y="391">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Cable Active.equals.Yes"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="9.10.001" expanded="true" height="103" name="$0 spend filter examples" width="90" x="380" y="238">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Cable Active.equals.No"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:k_means" compatibility="9.10.001" expanded="true" height="82" name="Clustering" width="90" x="179" y="34">
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="false"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="k" value="5"/>
            <parameter key="max_runs" value="10"/>
            <parameter key="determine_good_start_values" value="true"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="max_optimization_steps" value="100"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <operator activated="true" class="cluster_distance_performance" compatibility="9.10.001" expanded="true" height="103" name="Performance (2)" width="90" x="514" y="85">
            <parameter key="main_criterion" value="Avg. within centroid distance"/>
            <parameter key="main_criterion_only" value="false"/>
            <parameter key="normalize" value="false"/>
            <parameter key="maximize" value="false"/>
          </operator>
          <operator activated="false" class="join_paths" compatibility="9.10.001" expanded="true" height="68" name="Join Paths" width="90" x="648" y="544"/>
          <connect from_port="in 1" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Performance (2)" to_port="cluster model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Performance (2)" to_port="example set"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="join_paths" compatibility="9.10.001" expanded="true" height="68" name="Join Paths (2)" width="90" x="1452" y="493"/>
      <operator activated="false" class="concurrency:k_means" compatibility="9.10.001" expanded="true" height="82" name="Clustering (4)" width="90" x="715" y="544">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="5"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="determine_good_start_values" value="true"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="SquaredEuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve GME Texas DMA Features w population per DMA" width="90" x="45" y="187">
        <parameter key="repository_entry" value="//Local Repository/data/GME Texas DMA Features w population per DMA"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.10.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="179" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="Age 18-34|Age 35-49|Age 50-64|Age 65 or more|Area City|Area Rural|Area Suburb|Area Town|Cable Spend|DMA|Education 2-year|Education 4-year|Education HS graduate|Education No HS|Education Post-grad|Education Some college|Employment disabled|Employment Taking care of home or family|Employment Unemployed|Employment Working full time|Employment Working part time|EmploymentRetired|EmploymentStudent|EmploymentTemporarily unemployed|Gender Female|HHI100k or more|HHI 30-50k|HHI 50-70k|HHI 70-100k|HHI Less than 30k|HHSize4|HHSize5 or more|HHSize 1|HHSize 2|HHSize 3|Hometype Apartment|Hometype Mobile home|Hometype Single-family detached|Hometype Townhouse|Language English primarily but can speak Spanish|Language Spanish and English equally|Language Spanish primarily|Military Service|Parent|Politics Democrat|Politics Independent|Politics Republican|Race Black|Race Hispanic|Race Other|Race White|Relationship Married"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.10.001" expanded="true" height="103" name="Normalize (2)" width="90" x="313" y="136">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="Population Estimate|Age 18-34|Age 35-49|Age 50-64|Age 65 or more|Area City|Area Rural|Area Suburb|Area Town|Education 2-year|Education 4-year|Education HS graduate|Education No HS|Education Post-grad|Education Some college|Employment disabled|Employment Taking care of home or family|Employment Unemployed|Employment Working full time|Employment Working part time|EmploymentRetired|EmploymentStudent|EmploymentTemporarily unemployed|Gender Female|HHI100k or more|HHI 30-50k|HHI 50-70k|HHI 70-100k|HHI Less than 30k|HHSize4|HHSize5 or more|HHSize 1|HHSize 2|HHSize 3|Hometype Apartment|Hometype Mobile home|Hometype Single-family detached|Hometype Townhouse|Language English primarily but can speak Spanish|Language Spanish and English equally|Language Spanish primarily|Military Service|More than 50 hours per week|Parent|Politics Democrat|Politics Independent|Politics Republican|Race Black|Race Hispanic|Race Other|Race White|Relationship Married"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.10.001" expanded="true" height="82" name="Set Role (3)" width="90" x="447" y="136">
        <parameter key="attribute_name" value="Cable Spend"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.10.001" expanded="true" height="103" name="Multiply (2)" width="90" x="581" y="136"/>
      <operator activated="true" class="filter_examples" compatibility="9.10.001" expanded="true" height="103" name="$0 spend filter examples (2)" width="90" x="782" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Cable Spend.eq.0"/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="concurrency:k_means" compatibility="9.10.001" expanded="true" height="82" name="Clustering (5)" width="90" x="1050" y="85">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="4"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="determine_good_start_values" value="true"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="SquaredEuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.10.001" expanded="true" height="103" name="&gt;$0 filter (2)" width="90" x="782" y="238">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Cable Spend.gt.0"/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.10.001" expanded="true" height="82" name="Apply Model" width="90" x="1184" y="187">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="append" compatibility="9.10.001" expanded="true" height="82" name="Append (2)" width="90" x="1318" y="85">
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
        <parameter key="merge_type" value="all"/>
      </operator>
      <connect from_op="Retrieve GME Texas DMA Features w population per DMA" from_port="output" to_op="Select Attributes (3)" to_port="example set input"/>
      <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Normalize (2)" to_port="example set input"/>
      <connect from_op="Normalize (2)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
      <connect from_op="Set Role (3)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="$0 spend filter examples (2)" to_port="example set input"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="&gt;$0 filter (2)" to_port="example set input"/>
      <connect from_op="$0 spend filter examples (2)" from_port="example set output" to_op="Clustering (5)" to_port="example set"/>
      <connect from_op="Clustering (5)" from_port="cluster model" to_op="Apply Model" to_port="model"/>
      <connect from_op="&gt;$0 filter (2)" from_port="original" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <connect from_op="Append (2)" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="868" y="1133">cable spend greater than $0&lt;br&gt;changed to cable active &amp;quot;Yes&amp;quot; and top one &amp;quot;No&amp;quot;</description>
      <description align="center" color="yellow" colored="false" height="467" resized="false" width="180" x="409" y="633">the clusters that we originally got include all DMAs with spend in the same cluster. for that reason, i added a preprocessing segmentation operator - not sure if this is correct but i need some way to split the data (filter for DMAs with no spend vs with) and then create the model with &lt;br/&gt;brandon says after i split the data into two, i can drop the spend variable from both paths (after) using select attributes. so when i apply the model im answering the query &amp;quot;based on a weighted average of the N most similar DMAs, what would traffic be?&amp;quot;</description>
      <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="1440" y="846">counterfactual = what would traffic be like in the DMAs receiving spend if they didnt receive spend</description>
      <description align="center" color="yellow" colored="false" height="108" resized="false" width="180" x="1433" y="681">need to create a model using only DMAs without spend and build the model without the spend variable included?</description>
      <description align="center" color="yellow" colored="false" height="183" resized="false" width="180" x="127" y="509">if the preprocessing works, should i select different attributes and set role within the preprocessing piece (basically what i have here) and then change these here to the real label (traffic) based on treatment (spend)? will that work...</description>
    </process>
  </operator>
</process>



I pasted the XML code for reference. I'm fairly new to RapidMiner so I'm hoping someone can provide guidance on how to apply this model to the other path or if there's a better way to do this. 

I'm struggling 
Sign In or Register to comment.