Loop Examples Error: Too Few Examples

karim_keshavjeekarim_keshavjee Member, University Professor Posts: 9 University Professor
edited December 2018 in Help

Hi,

 

I'm running a Loop Examples to identify different items in my dataset.  I have 73 items that I'm looking for, which I've put into a macro.  The macro reads a file of 74 lines, the first line being a header.  When the Loop Examples gets to line 74, instead of exiting the loop, it's telling me that I have Too Few Examples.  I think it might be reading the header row as an example, so it's looking for 1 more example which doesn't exist.  

 

I've told Rapidminer that I have a header row and I've even tried changing some other parameters to see if that would make a difference, but it didn't .  I couldn't find any documentation about this problem.

 

Thanks,

 

Karim

Answers

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist

    Hi Karim,

    Could please share the XML code of the RapidMiner process you have built (of the screenshot you shared)?

    This would help to recreate the process with exact parameters of the operators you have set and check the error.

     

    Cheers,

  • karim_keshavjeekarim_keshavjee Member, University Professor Posts: 9 University Professor

    Here's the XML.

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <parameter key="logverbosity" value="status"/>
    <process expanded="true">
    <operator activated="false" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
    <parameter key="excel_file" value="C:\Users\karim\Google Drive\InfoClin Analytics\Data Cleaning\Data Cleaning Algorithms\To Be Cleaned\1 Million Sample\Drug Database Aug 9 2017 v2.xlsx"/>
    <parameter key="sheet_number" value="3"/>
    <parameter key="imported_cell_range" value="A1:B74"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="METFORMINS.true.polynominal.attribute"/>
    <parameter key="1" value="DINS.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="false" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34">
    <parameter key="macro" value="DIN_Metformin"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="DINS"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="false" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="112" y="340">
    <list key="function_descriptions">
    <parameter key="Metformin" value="if(contains(DIN,%{DIN_Numbers}),1,0)"/>
    </list>
    </operator>
    <operator activated="false" class="concurrency:loop_values" compatibility="7.5.003" expanded="true" height="82" name="Loop Values" width="90" x="112" y="238">
    <parameter key="attribute" value="DIN"/>
    <parameter key="iteration_macro" value="DIN_Metformin"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="DIN.equals.%{DIN_Metformin}"/>
    </list>
    <parameter key="filters_logic_and" value="false"/>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes" width="90" x="112" y="442">
    <list key="function_descriptions">
    <parameter key="Name_New" value="lower(Name_orig)"/>
    </list>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve" width="90" x="112" y="136">
    <parameter key="repository_entry" value="//Local Repository/processes/Medication Data for Processing"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (4)" width="90" x="313" y="391">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="DIN.equals.NULL"/>
    <parameter key="filters_entry_key" value="DIN.equals.?"/>
    </list>
    <parameter key="filters_logic_and" value="false"/>
    </operator>
    <operator activated="true" class="sample" compatibility="7.5.003" expanded="true" height="82" name="Sample" width="90" x="514" y="136">
    <parameter key="sample_size" value="10000"/>
    <list key="sample_size_per_class"/>
    <list key="sample_ratio_per_class"/>
    <list key="sample_probability_per_class"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (5)" width="90" x="648" y="136">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_New.contains.metform"/>
    </list>
    </operator>
    <operator activated="true" class="sample" compatibility="7.5.003" expanded="true" height="82" name="Sample (2)" width="90" x="581" y="289">
    <parameter key="sample_size" value="10000"/>
    <list key="sample_size_per_class"/>
    <list key="sample_ratio_per_class"/>
    <list key="sample_probability_per_class"/>
    </operator>
    <operator activated="true" class="loop_examples" compatibility="7.5.003" expanded="true" height="103" name="Loop Examples" width="90" x="782" y="238">
    <parameter key="iteration_macro" value="Loop"/>
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel (3)" width="90" x="179" y="34">
    <parameter key="excel_file" value="C:\Users\karim\Google Drive\InfoClin Analytics\Data Cleaning\Data Cleaning Algorithms\To Be Cleaned\1 Million Sample\Drug Database Aug 9 2017 v2.xlsx"/>
    <parameter key="sheet_number" value="3"/>
    <parameter key="imported_cell_range" value="B1:B74"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="1" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="DINS.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro (2)" width="90" x="313" y="34">
    <parameter key="macro" value="DIN_Metformin"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="DINS"/>
    <parameter key="example_index" value="%{Loop}"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (2)" width="90" x="313" y="187">
    <parameter key="parameter_string" value="DIN=%{DIN_Metformin}"/>
    <parameter key="condition_class" value="attribute_value_filter"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="DIN.equals.%{DIN_Metformin}"/>
    </list>
    <parameter key="filters_logic_and" value="false"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="82" name="Append" width="90" x="581" y="289"/>
    <connect from_port="example set" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Read Excel (3)" from_port="output" to_op="Extract Macro (2)" to_port="example set"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Filter Examples (2)" from_port="unmatched example set" to_port="example set"/>
    <connect from_op="Append" from_port="merged set" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve" from_port="output" to_op="Filter Examples (4)" to_port="example set input"/>
    <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Sample" to_port="example set input"/>
    <connect from_op="Filter Examples (4)" from_port="unmatched example set" to_op="Sample (2)" to_port="example set input"/>
    <connect from_op="Sample" from_port="example set output" to_op="Filter Examples (5)" to_port="example set input"/>
    <connect from_op="Filter Examples (5)" from_port="example set output" to_port="result 1"/>
    <connect from_op="Sample (2)" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Loop Examples" from_port="output 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • karim_keshavjeekarim_keshavjee Member, University Professor Posts: 9 University Professor

    Hi @Pavithra_Rao,

     

    Have you had a chance to work on this?

     

    Thanks,

     

    Karim

     

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Hi @karim_keshavjee - I looked at your process.  I notice that your Read Excel operator in the root process (the one that is grayed out) has the "first row as names" parameter checked, but the one inside the Loop Examples operator (not grayed out) does NOT Have this parameter checked.  Is this your problem?


    Scott

  • karim_keshavjeekarim_keshavjee Member, University Professor Posts: 9 University Professor

    Thanks for the quick response Scott,

     

    I fixed that, but it didn't solve the problem.  I did try both before. 

     

    The error is appearing inside the Loop Examples process in the Extract Macro subprocess.

     

    Karim

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @karim_keshavjee - ok I have looked at this again.  There are a lot of rather unusual things going on here and it is very hard to unpack.  Some observations:

    - when you loop examples, you are looping the examples in "Medication Data for Processing".  But when you extract the macro, you're doing it from "Drug Database...".

    - in Filter Examples(4), you're only selecting those with NULL or ?.

    - your Append operator inside the loop has only one connection

     

    I would highly advise you to look at these issues.  A good way to debug is to use breakpoints at each step along the way of your process so you can see what your dataset looks like.

     

    Scott

  • Pavithra_RaoPavithra_Rao Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist

    Hi Karem,

     

    Apologies for delay in response. Got tied up in some work.

     

    @sgenzer Thanks for looking to this. Please feel free to let me know if any furter help is needed here.

     

    Cheers,

  • karim_keshavjeekarim_keshavjee Member, University Professor Posts: 9 University Professor

    I still have not solved this problem.  Could you propose some more suggestions? 

     

    This is no longer urgent because I've found another way to solve my problem, but it would be good to know how to make Loop Examples work because I'm sure I'll need it at some point!


    Thanks,

     

    Karim

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @karim_keshavjee - there are lots of resources both online and built into RapidMiner to learn how to use Loop Examples and macros.  Have you completed the tutorials?  The one called "Data Handling" would be the one you want.  In addition, the "Getting Started with RapidMiner" video series is extremely helpful.

     

    Scott

Sign In or Register to comment.