evaluattion EMclustering

nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
edited November 2018 in Help
I run  EMClustering Op with initial parameter K=30 (number clusters) but after the result has 30 cluster in which have 18 clusters no data. Why is  it? (in theory, each cluster must has one data object at least)
Help me!
Thanks

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    which version of RapidMiner do you use? I remember there had been a problem in the past with this clustering algorithm.

    If it's the most current version of RapidMiner, please post a bug report on bugs.rapid-i.com. If possible with the process and the input data.

    Greetings,
     Sebastian
  • nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
    Currently, i use Rapidmier 4.6
    - I run  EMClustering Op with initial parameter K=30 (number clusters) but after the result has 30 cluster in which have 18 clusters no data. Why is  it? (in theory, each cluster must has one data object at least)
    and the reuslt of Kernel Kmean Clustering all so.
    Help Me
    Thanks
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    your RapidMiner version is the problem. I would strongly suggest you update RapidMiner to the latest version 5.1 to fix your problem.

    Regards,
    Marco
  • nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
    Hi
    Rapidminer 4.6 has tutorial that says to expand and write new operator but I readed Rapididminer tutorial 5.1don't say to expand and write new operator.
    Do you have any material that say to expand and write new operator in Rapidminer 5.1?
    Hlep me
    Thanks
  • nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
    I run Rapidminer 5.0,5.1  the results of EMClustering all so, ie have clusters don't data
    Why is it?
    Help me
    thanks
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi again,

    about the empty clusters: well, I don't see a real problem with empty clusters in EM clustering. The clusterer starts with a set of random distributions and assigns points to those distributions. At the end, you will get a lot of cluster probabilities in the data set but it might of course easily be the case that the hard decision (which cluster is the one with the highest probability) will favor some clusters more than others. This is often the case if you have defined a too high number of clusters.

    Here is a simple process showing the cluster model and clustered data set Iris for k = 30:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="145" width="279">
          <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="expectation_maximization_clustering" compatibility="5.1.008" expanded="true" height="76" name="Clustering" width="90" x="179" y="30">
            <parameter key="k" value="30"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers,
    Ingo
  • nguyenxuanhaunguyenxuanhau Member Posts: 22 Contributor II
    Hi
    So,  Why did the result of W-EM clustering with k=30 (or k>30) that don't have empty clusters but The result of EMClustering had some empty clusters. (W-EM and EM are implementations of the same algorithm)
    I thought that the two results were often  must different little but they were different all
    Why is it?
    Help me
    Thanks
Sign In or Register to comment.