Options

Multiple value attribute

mdcmdc Member Posts: 58 Maven
edited November 2018 in Help

I am trying to do clustering of literatures (text) and wondering how to make use of the author names as attribute. I believe that most of the literatures with common authors should fall into a common cluster.

So far, what I've done was to create a regular attribute named 'Author'. I used this, together with the word vector, to apply to KMedoids. My problem now is I can only add one author per document. Since usually there are many authors per document, Is it possible to have multiple authors (ORed?) per attribute? Or is there any other way to include the author names as attribute in clustering?

thanks
Matthew

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello Matthew,

    in any case it would be better to encode the authors as binominal attributes, i.e. having one attribute for each author stating if this author was actually one of the authors of the corresponding document or not. Beside the fact that this would probably deliver better results for most distance measures this also naturally allows multiple authors by having several authors with a "true" value. If you have something like a comma separated list of authors for each document, the transformation to the binominal format should be possible with the new split operator (RM 4.4 - coming soon) and the nominal2binominal operator.

    Hope that helps. Cheers,
    Ingo
  • Options
    mdcmdc Member Posts: 58 Maven
    Hi Ingo,

    Thanks for the reply. I think that makes more sense. I'll try that, and can't wait for that split operator.

    Matthew
Sign In or Register to comment.