Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Finding Document Similarity Based Mostly on Keywords and Title"

joandcruzjoandcruz Member Posts: 10 Contributor I
edited June 2019 in Help

Hi all,
Sorry for the long title, but I could not find an efficient one  Smiley
I am new to RM, and I am finding document similarities via RM. My sources are the webpages, and I basically read them and compare them.
So far so good; but here is the problem:
I want to determine keywords and title for the documents, and I also want to assign weights to keywords.  When I run the program, title and keywords seem to be '?'. So, is there a way to manually enter the keywords and title for now? For later stages: how can RM automatically get keywords from webpages?

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    Hi,

    if you just want to assign a score, based on the keywords you might want to have a look at this thread http://rapid-i.com/rapidforum/index.php/topic,8638.0.html

    If you want to find the words automatically, you can do standard text mining on them. The trick is, that you can cluster the documents. Afterwards you can use the cluster information as Label and do a feature selection on them. Thus you can get the important words per cluster.

    Cheers,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.