All values of attributes on one example set, problem with memory

javemarjavemar Member Posts: 23 Contributor II
edited November 2018 in Help
Hello

I want to concatenate some attributes and then, put all the values of this attribute in other atrribute in one example set. Something like this:

    att1  -  att 2  -  att 3 
1 casa  - perro  - carro
2 avion-  teja    -  amigo
3 mujer  - bonita -  leal

And then create:
                                        att4
  1. casa perro carro avion teja avion teja amigo mujer bonita leal

I did the below process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
    <process expanded="true" height="370" width="614">
      <operator activated="true" class="generate_data_user_specification" compatibility="5.2.006" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="179" y="300">
        <list key="attribute_values">
          <parameter key="recortar" value="cut(&quot;%{valor}&quot;,0,length(&quot;%{valor}&quot;)-3)"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="5.2.006" expanded="true" height="60" name="Extract Macro" width="90" x="380" y="300">
        <parameter key="macro" value="valor"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="attribute_name" value="recortar"/>
        <parameter key="example_index" value="1"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="5.2.006" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="120">
        <list key="annotations"/>
        <list key="data_set_meta_data_information">
          <parameter key="1" value="a.true.nominal.regular"/>
          <parameter key="2" value="b.true.nominal.regular"/>
          <parameter key="3" value="c.true.nominal.regular"/>
          <parameter key="4" value="d.true.nominal.regular"/>
          <parameter key="5" value="e.true.nominal.regular"/>
          <parameter key="6" value="f.true.nominal.regular"/>
        </list>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="179" y="120">
        <list key="function_descriptions">
          <parameter key="nuevo" value="concat(a,b,c,d,e,f)"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="120">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="nuevo"/>
      </operator>
      <operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="76" name="Loop Values" width="90" x="447" y="120">
        <parameter key="attribute" value="nuevo"/>
        <process expanded="true" height="383" width="611">
          <operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Set Macro (3)" width="90" x="246" y="30">
            <parameter key="macro" value="valor"/>
            <parameter key="value" value="%{valor}  %{loop_value}  or"/>
          </operator>
          <connect from_port="example set" to_op="Set Macro (3)" to_port="through 1"/>
          <connect from_op="Set Macro (3)" from_port="through 1" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_port="result 1"/>
      <connect from_op="Read Excel (2)" from_port="output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Values" to_port="example set"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="234"/>
      <portSpacing port="sink_result 2" spacing="18"/>
    </process>
  </operator>
</process>

I have, in average, 9800 exampleset, and I concatenate five or six attributes. My problem is about memory. This preprocess only use more than 1,5gb of RAM. My objetive is make a sql query whit this attribute, but the query is a little big so the process crash for memory.

Please, Could you give some advice about this topic? How can I reduce the memory used for the above process



Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    You could use the following trick, which avoids loops and macros by looking for column separators which are not present in the data. No column separators means only one column is possible.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <description>This demo

    1. generates nominal data
    2. writes out the data using " " as a separator
    3. reads the data back in using ";" as a separator

    Because ";" is never found the columns are not separated.</description>
       <process expanded="true" height="296" width="748">
         <operator activated="true" class="generate_nominal_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="120">
           <parameter key="number_examples" value="9800"/>
         </operator>
         <operator activated="true" class="write_csv" compatibility="5.2.003" expanded="true" height="76" name="Write CSV" width="90" x="308" y="114">
           <parameter key="column_separator" value=" "/>
           <parameter key="write_attribute_names" value="false"/>
           <parameter key="quote_nominal_values" value="false"/>
           <parameter key="format_date_attributes" value="false"/>
         </operator>
         <operator activated="true" class="read_csv" compatibility="5.2.003" expanded="true" height="60" name="Read CSV" width="90" x="447" y="165">
           <parameter key="column_separators" value=","/>
           <parameter key="use_quotes" value="false"/>
           <parameter key="parse_numbers" value="false"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations"/>
           <list key="data_set_meta_data_information"/>
         </operator>
         <connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
         <connect from_op="Write CSV" from_port="file" to_op="Read CSV" to_port="file"/>
         <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    That should ease your memory problems a bit.

    Good weekend

    Best

    H
  • javemarjavemar Member Posts: 23 Contributor II
    Hello Haddock

    I'm a little suppruised about the use of write and read csv operator. How do you disapper the "csv file" parameter that normally is mandatory? Why do you do that?? Does the second port(fil) of the write csv operator has the same data than the first port(thr)?

    I have other issue. I want put all the rows in one row. If I have this

     row  att1
      1         a
      2         b
      3         c
      4         d

    I would like have this

     row   att2
      1         abcd




    I really appreciate your help , have a good day   Haddock


  • haddockhaddock Member Posts: 849 Maven
    Hola!

    Check out the "Pivot" operator, and apply that before the trick I showed you ( yes , fil==file as you spotted  !)

    Have a good one!

    Best

    H
  • javemarjavemar Member Posts: 23 Contributor II
    Buenos dìas haddock, muchas gracias! !I think this solve my problem.

    I have other question. How do you hide the "excel file" parameter from the read csv operator? I was reading the xml that you sent me and I don't understand how make this change.

    I really appreciate your help. Thanks a lot haddock.
  • haddockhaddock Member Posts: 849 Maven
    Hola,

    If your data is in Excel you will need to load it and write it out as a CSV, like this
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <description>This demo

    1. generates nominal data
    2. writes Excel
    3. reads Excel
    2. writes out the data using " " as a separator
    3. reads the data back in using "," as a separator

    Because ";" is never found the columns are not separated.</description>
        <process expanded="true" height="370" width="748">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="9800"/>
          </operator>
          <operator activated="true" class="write_excel" compatibility="5.2.003" expanded="true" height="76" name="Write Excel" width="90" x="179" y="30">
            <parameter key="excel_file" value="file"/>
          </operator>
          <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="313" y="165">
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.2.003" expanded="true" height="76" name="Write CSV" width="90" x="447" y="165">
            <parameter key="csv_file" value="file"/>
            <parameter key="column_separator" value="  "/>
            <parameter key="write_attribute_names" value="false"/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="format_date_attributes" value="false"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.2.003" expanded="true" height="60" name="Read CSV" width="90" x="581" y="165">
            <parameter key="column_separators" value=","/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="parse_numbers" value="false"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Write Excel" to_port="input"/>
          <connect from_op="Write Excel" from_port="file" to_op="Read Excel" to_port="file"/>
          <connect from_op="Read Excel" from_port="output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="file" to_op="Read CSV" to_port="file"/>
          <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Best

    H
Sign In or Register to comment.