"split operator - export data not complete for further use (operators)"

joeijoei Member Posts: 6 Contributor II
edited June 2019 in Help
Hello,

the split operator gives me only the first three columns for further use even if the operator created more. That means that in the result view I see all split columns (more than thee) but I cannot choose them in another operator (only the first three are visible).

Here is a simple table one can try it:
bla  split
asdf    2345x2134
dsaf  2345x2345x345x456x356x3546
sadf 2435x2345
Tagged:

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    my quick test process worked fine, I could select up to "split_6" attribute in further operators:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.1.000-SNAPSHOT" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Retrieve 123" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/123"/>
          </operator>
          <operator activated="true" class="split" compatibility="7.1.000-SNAPSHOT" expanded="true" height="82" name="Split" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="split"/>
            <parameter key="split_pattern" value="x"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="7.1.000-SNAPSHOT" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="split_6.contains.35"/>
            </list>
          </operator>
          <connect from_op="Retrieve 123" from_port="output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <connect from_op="Filter Examples" from_port="original" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Can you provide your process XML which does not work?

    Regards,
    Marco
  • joeijoei Member Posts: 6 Contributor II
    of course. (my post wasn't complete. accidently created two posts...)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_excel" compatibility="5.3.013" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
           <parameter key="excel_file" value="rapidminer_split_text.xlsx"/>
           <parameter key="imported_cell_range" value="A1:B4"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="bla.true.polynominal.attribute"/>
             <parameter key="1" value="split.true.nominal.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="split" compatibility="5.3.013" expanded="true" height="76" name="Split" width="90" x="180" y="52">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|split"/>
           <parameter key="include_special_attributes" value="true"/>
           <parameter key="split_pattern" value="x"/>
         </operator>
         <operator activated="true" class="multiply" compatibility="5.3.013" expanded="true" height="94" name="Multiply" width="90" x="315" y="30"/>
         <operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="450" y="30"/>
         <connect from_op="Read Excel" from_port="output" to_op="Split" to_port="example set input"/>
         <connect from_op="Split" from_port="example set output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
         <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    1. RapidMiner 5.3 is old. Like really old. We cannot provide help for that anymore here. Please consider using RapidMiner Studio 7.0. instead.
    2. You are using the split operator after "Read Excel". The problem is that the output of Read Excel depends on actually reading the excel file at runtime. So until then, we don't know what the result will be. Therefore the split operator creates a dummy output to show an example of how it could look like.
    To use actual data, load it into the repository first, then access it with a "Retrieve" operator. That way, you have full metadata available and the split operator preview will be correct.

    Regards,
    Marco
  • joeijoei Member Posts: 6 Contributor II
    The filter example operator also works in my example.
    But I still cant see the split columns higher than 3 in the operators select attributes, rename, remove duplicates (subset).
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    yes, that is expected due to the "can't know beforehand" problem. You can still manually change those parameters if you know you will end up with 6 splits for example.
    But the easiest solution is to read the data into your repository, then only use the data from the repository in your process. That way you have the actual information available during construction time.

    Regards,
    Marco
  • joeijoei Member Posts: 6 Contributor II
    ok thank you.
  • joeijoei Member Posts: 6 Contributor II
    How does it work with the manually change? The data is to big for loading it into the repository.
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    your local repository sits on your file system - data cannot be to big for that ;)
    Manually depends on the parameter. For example for "Remove Duplicates", you can select 'subset', then add the name like "split_6" to the upper right textfield and press +

    Regards,
    Marco
Sign In or Register to comment.