"Text Clustering Example"

Legacy UserLegacy User Member Posts: 0 Newbie
edited May 2019 in Help

All of my queries on text clustering are occurring because this "text clustering example", outputs "no results produced".  The newsgroup data is present in the directories noted. 

Why does it not generate the output identified in the description?

<operator name="Root" class="Process" expanded="yes">
      <description text="#ylt#h3#ygt#Clustering text documents#ylt#/h3#ygt##ylt#p#ygt#In this experiment, texts from two newsgroups are read and clustered. To make the clusters better comprehensible, three keywords are extracted for each cluster and added to the cluster description.#ylt#/p#ygt#"/>
      <parameter key="logverbosity" value="status"/>
      <operator name="TextInput" class="TextInput" expanded="yes">
          <parameter key="default_content_language" value="english"/>
          <list key="namespaces">
          <parameter key="prune_above" value="10"/>
          <parameter key="prune_below" value="5"/>
          <list key="texts">
            <parameter key="graphics" value="../data/newsgroup/graphics"/>
            <parameter key="hardware" value="../data/newsgroup/hardware"/>
          <operator name="StringTokenizer" class="StringTokenizer">
          <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
          <operator name="TokenLengthFilter" class="TokenLengthFilter">
              <parameter key="min_chars" value="5"/>
          <operator name="PorterStemmer" class="PorterStemmer">
      <operator name="KMeans" class="KMeans">
      <operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">


  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    this sample (from a plugin (!)) was not updated to the fact that the automatic cluster characterization was removed some time ago. I can hardly believe that this process has worked at all (did you really run it on a fresh RM 4.4 installation?) since I would think that the operator "AttributeSumClusterCharacterizer" is deprecated if not even removed - but I can be mistaken.

    Before you ask: the characterization took a long a time even if you were not interested in it and worked not well enough. Much better characterizations can be found with the approaches I sketched in the other thread and hence it was removed.

Sign In or Register to comment.