Classification using Inputted keywords

mdcmdc Member Posts: 58 Maven
edited November 2018 in Help
Hi,

Is it possible in RM  to define a Class using user specified keywords? My understanding of Classification is that you have to generate a model for a certain class and then apply the model to the new documents.

What I really wanted to happen is to input keywords xxx and yyy, and then RM will find all the relevant documents using for example Similarity or Classification.

thanks,
Matthew

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Matthew,

    just a suggestion: I would index your documents by TFIDF with the text input operator. Store this example set together with the word list. Then build a new document containing only your keywords and index it by using the same word list. Instead of classification, you can now merge both examples sets and calculate the similarity by, for example, cosine similarity. Filter out only those similarities containing the keyword document and sort according to the similarities. The basic operators are all part of RM and the text plugin.

    Cheers,
    Ingo
  • mdcmdc Member Posts: 58 Maven
    Hi Ingo,

    What operator is used to "merge both example sets"?

    thanks,
    Matthew
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    You won't believe it: it's called "ExampleSetMerge"  ;D

    Cheers,
    Ingo
  • mdcmdc Member Posts: 58 Maven

    Thanks, but I guess the "ExampleSetMerge" is in 4.3. My PC has 4.2 and couldn't find it. I'm just waiting for 4.4 to  upgrade.

    Matthew
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Ah, yes, that could be. Updating to 4.3 does not make too much sense since 4.4 is currently under the final tests before it is going to be released.

    Cheers,
    Ingo
  • mdcmdc Member Posts: 58 Maven
    Hi Ingo,

    I have not implemented it yet (I'm waiting for RM 4.4). However I have a question with this method. With this way, the Similarity will be applied to each of the documents against each one of them. Is this correct? Is there a way to check the similarity of one document only against several documents?

    thanks
    Matthew
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    only with a trick: iterate over the examples with an IteratingOperatorChain where the number of iterations is taken from a macro defined by the new DataMacroDefinition operator (number of examples). Filter down the examples to the current one with the ExampleRangeFilter and merge the document with this single example. Calculate the similarity and store it via ProcessLog. After the loop, you can transform the ProcessLog back to a data set, sort it...

    Cheers,
    Ingo
Sign In or Register to comment.