How to join automatically

nic__onic__o Member Posts: 7 Newbie
edited November 2019 in Help
Hi, I want to analyse the content of files. One per line and the other one by column because data are group together in that way. For now I transpose the first one twice, i split the two documents in fonction on their number of column for the first one, and his number of lines for the second one. I have to write manually the ration for the split (for ex 3*0.33 if their is 3 lines/ columns) I want this to be automatic.

Then I join the split files which have the same id.

Here is the code

Thank you really much

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">  <context>
    <input/>
    <output/>
    <macros/>  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve" width="90" x="45" y="391">        <parameter key="repository_entry" value="//Local Repository/format_alarme_11"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="8.2.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="45" y="493">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Conditions"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="124" name="Split Data" width="90" x="45" y="595">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.3333333333"/>
          <parameter key="ratio" value="0.3333333333"/>
          <parameter key="ratio" value="0.3333333333"/>
        </enumeration>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Text" width="90" x="112" y="799">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rule operators|Result IF TRUE|Result IF FALSE|Independent test?|Action if false"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="313" y="493">
        <list key="function_descriptions">
          <parameter key="test" value="[Checked variables]"/>
        </list>
      </operator>
      <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve (2)" width="90" x="45" y="187">
        <parameter key="repository_entry" value="//Local Repository/Alarm 11 data"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose" width="90" x="45" y="34"/>
      <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="124" name="Split Data (2)" width="90" x="112" y="238">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.3333333333"/>
          <parameter key="ratio" value="0.3333333333"/>
          <parameter key="ratio" value="0.3333333333"/>
        </enumeration>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="8.2.000" expanded="true" height="68" name="Extract Macro (3)" width="90" x="246" y="238">
        <parameter key="macro" value="capteur_3"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="attribute_name" value="id"/>
        <parameter key="example_index" value="1"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose (4)" width="90" x="380" y="340"/>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="581" y="289">
        <list key="function_descriptions">
          <parameter key="test" value="%{capteur_3}"/>
        </list>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="8.2.000" expanded="true" height="68" name="Extract Macro (2)" width="90" x="246" y="136">
        <parameter key="macro" value="capteur_2"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="attribute_name" value="id"/>
        <parameter key="example_index" value="2"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose (3)" width="90" x="380" y="136"/>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="447" y="136">
        <list key="function_descriptions">
          <parameter key="test" value="%{capteur_2}"/>
        </list>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="8.2.000" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34">
        <parameter key="macro" value="capteur_1"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="attribute_name" value="id"/>
        <parameter key="example_index" value="1"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose (2)" width="90" x="313" y="34"/>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="34">
        <list key="function_descriptions">
          <parameter key="test" value="%{capteur_1}"/>
        </list>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="514" y="391">
        <parameter key="join_type" value="outer"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="test" value="test"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="340">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="test|id|att_1"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="246" y="595">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Action if false|Independent test?|Result IF FALSE|Result IF TRUE|Rule operators"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="380" y="595">
        <list key="function_descriptions">
          <parameter key="test" value="[Checked variables]"/>
        </list>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (6)" width="90" x="246" y="748">
        <list key="function_descriptions">
          <parameter key="test" value="[Checked variables]"/>
        </list>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (2)" width="90" x="648" y="544">
        <parameter key="join_type" value="outer"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="test" value="test"/>
        </list>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (3)" width="90" x="514" y="799">
        <parameter key="join_type" value="outer"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="test" value="test"/>
        </list>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="8.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="715" y="799">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rule operators|Result IF TRUE|Result IF FALSE|Independent test?|Conditions|Action if false"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (7)" width="90" x="983" y="799">
        <list key="function_descriptions"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="8.2.000" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="782" y="544">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rule operators|Result IF TRUE|Result IF FALSE|Independent test?|Conditions|Action if false"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="8.2.000" expanded="true" height="103" name="Replace Missing Values (3)" width="90" x="715" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rule operators|Result IF TRUE|Result IF FALSE|Independent test?|Conditions|Action if false"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (8)" width="90" x="849" y="34">
        <list key="function_descriptions"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Generate Attributes (4)" to_port="example set input"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Nominal to Text (2)" to_port="example set input"/>
      <connect from_op="Split Data" from_port="partition 3" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Generate Attributes (6)" to_port="example set input"/>
      <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Transpose" to_port="example set input"/>
      <connect from_op="Transpose" from_port="example set output" to_op="Split Data (2)" to_port="example set"/>
      <connect from_op="Split Data (2)" from_port="partition 1" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Split Data (2)" from_port="partition 2" to_op="Extract Macro (2)" to_port="example set"/>
      <connect from_op="Split Data (2)" from_port="partition 3" to_op="Extract Macro (3)" to_port="example set"/>
      <connect from_op="Extract Macro (3)" from_port="example set" to_op="Transpose (4)" to_port="example set input"/>
      <connect from_op="Transpose (4)" from_port="example set output" to_op="Generate Attributes (3)" to_port="example set input"/>
      <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Join (2)" to_port="left"/>
      <connect from_op="Extract Macro (2)" from_port="example set" to_op="Transpose (3)" to_port="example set input"/>
      <connect from_op="Transpose (3)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Join (3)" to_port="left"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Transpose (2)" to_port="example set input"/>
      <connect from_op="Transpose (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values (3)" to_port="example set input"/>
      <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Generate Attributes (5)" to_port="example set input"/>
      <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="Join (2)" to_port="right"/>
      <connect from_op="Generate Attributes (6)" from_port="example set output" to_op="Join (3)" to_port="right"/>
      <connect from_op="Join (2)" from_port="join" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Join (3)" from_port="join" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate Attributes (7)" to_port="example set input"/>
      <connect from_op="Generate Attributes (7)" from_port="example set output" to_port="result 3"/>
      <connect from_op="Replace Missing Values (2)" from_port="example set output" to_port="result 2"/>
      <connect from_op="Replace Missing Values (3)" from_port="example set output" to_op="Generate Attributes (8)" to_port="example set input"/>
      <connect from_op="Generate Attributes (8)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>


Tagged:

Answers

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Without the input Alarm data, I can understand very little about the steps before joining. Maybe you can try some loop operator to automate the data split?
  • nic__onic__o Member Posts: 7 Newbie
    I would really like to do thishow can i do that ? The split's files have n lines so i have to split it n time, and have n files in return, is it possible ?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    nic__o so I really want to help you here. I took your XML and put it into my Studio and I see this:



    Can you please help us a wee by cleaning up w/subprocesses, some notes, and of course the data sets you're retrieving? :wink:

    Scott
Sign In or Register to comment.