Search for Keywords

Tim91Tim91 Member Posts: 4 Newbie

Hello community,

I am currently doing my masters degree and in one of our courses me and my group have to work on a project with rapidminer. We have no background in programming and this is the first time we are working with rapidminer. Our task is do create a textmining tool that crawls a list of excel-files and in a first step enables us to search for a list of keywords. We then need to know wether the texts contain those keywords or not. We would also like to know how often a keyword appears in those texts.

We tried using the following operators:

1.       SelectAttributes

2.       Filter documents (by content) (we created a loop that goes through the excel-file and wrote every text in a separate document)

3.       FilterExamples

However we don’t really know how to use those operators because everything we’ve tried (playing with the different options of the operators) didn’t work out.

Another thing we thought about is to create a cut-set of the texts and the keywordlist and see which elements the two files have in common (but again we don’t know how to implement this).

Are we heading towards the right direction or do you have any tips how we should start?

I hope you can help us

 

Cheers

Tim

Tghadially

Best Answer

Answers

  • Tim91Tim91 Member Posts: 4 Newbie
    Hi Martin,
    Thank you for your answer, we'll try that.
  • Tim91Tim91 Member Posts: 4 Newbie
    One Followup Question,
    right now we have two streams, the first one is reading the excel list with the texts and the second one is reading the one with the keywords. Then we used Process Documents with tokenize for both paths (1.Read excel 2.Nominal to text 3.Process Documents form Data 4.Data to Documents-Process Documents). We see all the words and in which row they occur, but is it possible to filter the results or change the setting so that we only see the keywords? This would be a lot more convenient because you wouldn't have to look through all the words.
    Tghadially
Sign In or Register to comment.