🦉🦉   WOOT WOOT!   RAPIDMINER WISDOM 2020 EARLY BIRD REGISTRATION ENDS FRIDAY DEC 13!   REGISTER NOW!   🦉🦉

"Clustering with Loops?"

PrablyPrably Member Posts: 3 Contributor I
edited June 12 in Help
Hi RM Masters!

I am a novice with RM and inexperienced with loops and macros. I need advice on how to structure a process to loop clustering. I am trying to get three centroids - a low / medium / high for each location and illness combination (see below). This will be used so when future data is received about how long a contract from [location B] for [pain] is taking I can tell whether it is taking too long, on track, or ahead of schedule.

I'm pretty sure I want  to run clustering (k-means) with looping for all unique combinations of the attributes Location and Illness. So I want to get 3 centriods for [Location A & Ebola] subset, three centroids for [Location B & Cold], [Location C & Cold], etc. The attributes Milestone 1, Milestone 2, Milestone Final are the numerical attributes I want to use for my clustering.

My data set is about 13,000 examples and I have some other polynomial attributes that aren't listed here.

Please forgive the formatting; here is a representative sample of the example set:


Contract ID   Location   Illness             Contract Status Contract Type     Begin Date           Milestone 1    Milestone 2   Milestone Final
1                       A   Ebola               Finished               Big               1/10/2013               78                 133                 154
2                     A             Aids             Unfinished               Small               1/5/2009               1               125               162
3                     A           Cold               Finished               Big               8/17/2012               40               118               214
7                     B         Awesomeness   Finished       Small               9/27/2007               42               150               209
8                       C           Upset Stomach     Unfinished         Small     12/20/2009               10               101               219
9                     D               Ebola                   Finished               Big               1/16/2009               9               111               246
10                     D             Headache       Unfinished       Big               9/11/2005               57               127               238
11                     D             Club Foot       Unfinished     Small               12/2/2005               55               141               204
12                     D                 Aids                 Finished             Small                     2/3/2012         15               106               191
13                     D                 Upset Stomach Finished             Small               11/27/2009         48               103               194
14                     D                   Ebola               Finished       Big               5/18/2005                 86               101               160
15                     D                Ebola                     Finished       Big               11/15/2009         7               148               164
16                     D                   Pain             Unfinished       Small               5/25/2005               29                    117               242
18                     D                 Club foot             Unfinished       Big               4/28/2011               41               147               190
19                     D                 Club foot             Unfinished       Small               4/20/2007               48               113               229

Also, any thoughts on how to learn to work with loops macros would be wonderful.

Thanks in advance for the advice!

Answers

  • jaysunice3401jaysunice3401 Member Posts: 6 Contributor II
    This might help.  First, use the Generate Concatenation operator to create a new field that concatenates Location and Illness.  Then, feed that into a Loop Values operator.  When you're in the Subprocess for the loop, you will want to filter based on your new concatenated attribute.  The trick being, you will want to use a %{loop_value} -- that is, Location_Illness=%{loop_value}.  Then, just continue from there.  Hope this helps.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
       <process expanded="true" height="460" width="547">
         <operator activated="true" class="generate_concatenation" compatibility="5.3.000" expanded="true" height="76" name="Generate Concatenation" width="90" x="179" y="165">
           <parameter key="first_attribute" value="Illness"/>
           <parameter key="second_attribute" value="Location"/>
         </operator>
         <operator activated="true" class="loop_values" compatibility="5.3.000" expanded="true" height="76" name="Loop Values" width="90" x="313" y="165">
           <parameter key="attribute" value="Illness_Location"/>
           <process expanded="true" height="663" width="887">
             <operator activated="true" class="filter_examples" compatibility="5.3.000" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="30">
               <parameter key="condition_class" value="attribute_value_filter"/>
               <parameter key="parameter_string" value="Illness_Location=%{loop_value}"/>
             </operator>
             <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
             <connect from_op="Filter Examples" from_port="example set output" to_port="out 1"/>
             <portSpacing port="source_example set" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Generate Concatenation" from_port="example set output" to_op="Loop Values" to_port="example set"/>
         <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
Sign In or Register to comment.