Options

"Help - Clustering?"

JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
edited May 2019 in Help
I'm very new to this datamining lark so apologies in advance.

I have a example set containing only "yes" data & I have been asked to score records in a new example set based on their similarity to records in the "yes" set.  ??? - I don't really know what I'm doing, but I have a feeling clustering might be involved somehow.  So far though all I have done is create clusters using the "yes" set and then labelled the new records with a prediction on which cluster they would fall into. 
Not quite what I'm after;  the desired result is to give each record a label from 1 to 10 indicating how close that record is a match it is to the "yes" set. 

Any pointers would be appreciated.
Thanks,
JEdward
Tagged:

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    well, this sounds (if I got it right) like a scenario where a 1-class modeling might be most appropriate. You could try the 1-class SVM offered by RapidMiner. First you model the "yes"-data set and afterwards you just apply the trained model on your prediction data set. Afterwards you can rescale the predictions from [0-1] to [1-10] and round it to integers. That's it.

    Cheers,
    Ingo
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Thanks Ingo,
    That sounds exactly what I'm looking for, I'll give it a try. 

    JEdward.
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hello,

    On trying to store the labelled data to the repository I receive a 'ConcurrentModificationException' error. 
    I think this is caused by the ApplyModel process creating two special attributes 'confidence(inside)' and 'prediction(LabelT)' as this is the only thing that changes between the original dataset. 

    Can anyone point me in the right direction to resolve this? 

    Thanks,
    JEdward.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    please post the process as well as the stack trace for this exception. We will see if we can help you.

    Greetings,
    Sebastian
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Sebastian,

    Here's the process attached. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input>
          <location>ProcessApplic</location>
          <location>Model</location>
        </input>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="612" width="710">
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="246" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.006" expanded="true" height="60" name="Store" width="90" x="447" y="165">
            <parameter key="repository_entry" value="LabelledData"/>
          </operator>
          <connect from_port="input 1" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_port="input 2" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Store" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Not sure what you mean by stack trace. Is this it? (copied from the log window).

    May 20, 2011 10:25:12 AM INFO: Process //RapidMinerLocalRepository/Process/3_ApplyModel starts
    May 20, 2011 10:26:57 AM SEVERE: Process failed: Cannot store data in repository at entry 'LabelledData'. Reason: Cannot store data at 'U:\RapidMinerRepository\Process\LabelledData.ioo': java.util.ConcurrentModificationException.
    May 20, 2011 10:26:57 AM SEVERE: Here:          Process[1] (Process)
              subprocess 'Main Process'
                +- Apply Model[1] (Apply Model)
          ==>  +- Store[1] (Store)
    Thanks,
    JEdward
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi,

    I have solved the problem by changing the process to rename the attribute confidence(inside).  :) Could it be that the brackets in the name that caused the store operator problems?  I had to write the field names into the Rename & SelectAttributes operators because they are not available from the menus & drop down lists after being created by Apply Model. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input>
          <location>ProcessApplic</location>
          <location>Model</location>
        </input>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="612" width="710">
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="112" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.1.006" expanded="true" height="76" name="Rename (2)" width="90" x="246" y="210">
            <parameter key="old_name" value="confidence(inside)"/>
            <parameter key="new_name" value="confidence"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="prediction(LabelT)"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.006" expanded="true" height="60" name="Store" width="90" x="581" y="210">
            <parameter key="repository_entry" value="LabelledData"/>
          </operator>
          <connect from_port="input 1" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_port="input 2" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Store" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.