Options

Nominal to numerical and nominal again, failed.

laavilalaavila Member Posts: 4 Contributor I
edited December 2018 in Help
Hello there.

I have this problem.
I am working in a clustering problem, and I've already tried with a few of proposed solutions on other topics on the forum (yup, I did the search).  I have only Polynomial attributes, then I used the nominal to numerical operator (with unique integers) I used K-Means (mixed euclidean distance), and I joined the final example set with the original one (prior to the nominal to numerical).

After all this, at the final result, I've got is just the example set with unique integers values (I don't understand very well the data with this values on it).

I couldn't get the nominal values again. Does anyone have an idea what I am doing wrong?   
Thanks! 

ps: Here's my xml process too

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.001" expanded="true" height="68" name="Retrieve 2017-2018" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../SAMU (Jaime)/2017-2018"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.001" expanded="true" height="103" name="Subprocess" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="FF|Fecha|H-420|H-Llegada S.U.|H-Pedido|H-Positivo|H-Salida|Móvil|S|OP|Origen|QTC|Actividad"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Clave.is_not_missing."/>
              <parameter key="filters_entry_key" value="Base.is_not_missing."/>
              <parameter key="filters_entry_key" value="Comuna.is_not_missing."/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="9.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="QTC|Sexo|Origen|Destino|Actividad"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="unknow"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.0.001" expanded="true" height="82" name="Replace Sexo" width="90" x="179" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Sexo"/>
            <parameter key="attributes" value="Sexo|Destino"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="0"/>
            <parameter key="replace_by" value="unknow"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.0.001" expanded="true" height="82" name="Generate ID" width="90" x="313" y="187">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="238"/>
          <operator activated="true" class="nominal_to_numerical" compatibility="9.0.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="514" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="coding_type" value="unique integers"/>
            <parameter key="use_comparison_groups" value="false"/>
            <list key="comparison_groups"/>
            <parameter key="unexpected_value_handling" value="all 0 and warning"/>
            <parameter key="use_underscore_in_name" value="false"/>
          </operator>
          <connect from_port="in 1" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Replace Sexo" to_port="example set input"/>
          <connect from_op="Replace Sexo" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_port="out 2"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize" width="90" x="313" y="34">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="|id"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="false" class="sample" compatibility="9.0.001" expanded="true" height="82" name="Sample" width="90" x="313" y="340">
        <parameter key="sample" value="absolute"/>
        <parameter key="balance_data" value="false"/>
        <parameter key="sample_size" value="5000"/>
        <parameter key="sample_ratio" value="0.1"/>
        <parameter key="sample_probability" value="0.1"/>
        <list key="sample_size_per_class"/>
        <list key="sample_ratio_per_class"/>
        <list key="sample_probability_per_class"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="false" class="k_medoids" compatibility="9.0.001" expanded="true" height="82" name="K-Medoids" width="90" x="179" y="340">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="5"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="GeneralizedIDivergence"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
      </operator>
      <operator activated="true" class="concurrency:k_means" compatibility="9.0.001" expanded="true" height="82" name="K-Means" width="90" x="447" y="34">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="5"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="determine_good_start_values" value="false"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="SquaredEuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="cluster_distance_performance" compatibility="9.0.001" expanded="true" height="103" name="Performance" width="90" x="581" y="34">
        <parameter key="main_criterion" value="Davies Bouldin"/>
        <parameter key="main_criterion_only" value="false"/>
        <parameter key="normalize" value="true"/>
        <parameter key="maximize" value="false"/>
      </operator>
      <operator activated="true" class="guess_types" compatibility="9.0.001" expanded="true" height="82" name="Guess Types" width="90" x="514" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="Tipo|Sexo|Destino|Comuna|Clave|Base"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="decimal_point_character" value="."/>
      </operator>
      <connect from_op="Retrieve 2017-2018" from_port="output" to_op="Subprocess" to_port="in 1"/>
      <connect from_op="Subprocess" from_port="out 1" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="K-Means" to_port="example set"/>
      <connect from_op="K-Means" from_port="cluster model" to_op="Performance" to_port="cluster model"/>
      <connect from_op="K-Means" from_port="clustered set" to_op="Performance" to_port="example set"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Performance" from_port="example set" to_op="Guess Types" to_port="example set input"/>
      <connect from_op="Guess Types" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>


Best Answers

Answers

  • Options
    laavilalaavila Member Posts: 4 Contributor I
    Hello David, I really appreciate your response. It was very helpfull.

    I managed to run the K-Means operator with your specifications, but I'm not sure how  to evaluate the perfomance for the whole process. I have tried with Cluster distance operator, and cluster count perfomance, but I'm not sure about which element is the best to evaluate perfomance from the clusterization operator. I have tried with Optimize parameter, as you said, but I couldn't use it. 

    Thanks for your comments again. 
Sign In or Register to comment.