Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Search for Keywords

Tim91Tim91 Member Posts: 5 Contributor II

Hello community,

I am currently doing my masters degree and in one of our courses me and my group have to work on a project with rapidminer. We have no background in programming and this is the first time we are working with rapidminer. Our task is do create a textmining tool that crawls a list of excel-files and in a first step enables us to search for a list of keywords. We then need to know wether the texts contain those keywords or not. We would also like to know how often a keyword appears in those texts.

We tried using the following operators:

1.       SelectAttributes

2.       Filter documents (by content) (we created a loop that goes through the excel-file and wrote every text in a separate document)

3.       FilterExamples

However we don’t really know how to use those operators because everything we’ve tried (playing with the different options of the operators) didn’t work out.

Another thing we thought about is to create a cut-set of the texts and the keywordlist and see which elements the two files have in common (but again we don’t know how to implement this).

Are we heading towards the right direction or do you have any tips how we should start?

I hope you can help us

 

Cheers

Tim

Best Answer

Answers

  • Tim91Tim91 Member Posts: 5 Contributor II
    Hi Martin,
    Thank you for your answer, we'll try that.
  • Tim91Tim91 Member Posts: 5 Contributor II
    One Followup Question,
    right now we have two streams, the first one is reading the excel list with the texts and the second one is reading the one with the keywords. Then we used Process Documents with tokenize for both paths (1.Read excel 2.Nominal to text 3.Process Documents form Data 4.Data to Documents-Process Documents). We see all the words and in which row they occur, but is it possible to filter the results or change the setting so that we only see the keywords? This would be a lot more convenient because you wouldn't have to look through all the words.
Sign In or Register to comment.