I need an operator like the inverse of Filter Stopwords (Dictionary) operator

FatmahFatmah Member Posts: 6 Contributor I
edited July 2019 in Help

Hi

Thanks for reading my post

I work on my master thesis and I find same my problem here in this link

http://community.rapidminer.com/t5/RapidMiner-Studio/SOLVED-Filter-text-from-a-list-of-word/td-p/21459

He solve the problem by changing the code for filterstopword(dictionary)

I read the document "How to extend rapidminer"

I prepare the envirmonet by downloding Java and Eclipse Java Neon 

I know now how to create my own operator but I don't know how to copy existing operator code and modify it ?

 

Thank you again

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Have you checked out the Filter Stopwords by Dictionary operator? There you can provide a custom txt file for stopwords.

  • FatmahFatmah Member Posts: 6 Contributor I
    Hi Thomas
    Thanks for replay
    Yes, I checked it many times
    The operator will filter out the document from the words in text file
    I want the filter to filter the document from all words exepet the words inside the text file
    the opposite what I need ..
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Would Filter by Content help? It has an inverse condition. What about Filter by POS tags? That also has an inverse condition.

  • FatmahFatmah Member Posts: 6 Contributor I

    I try Filter by content but it is insufficient for my situation because the text file have tens of words
    Filter by content will be good for very small number of expression
    In Filter by POS tag ? I can't determine the words I want !

     

    Thank you again for your help and yes plz if you have any suggestion tell me or if you know how I can reach the code ?

  • hmhsinghmhsing Member Posts: 29 Maven
    I changed the dictionary txt file into Excel and then use Filter Tokens Using ExampleSet (need check invert filter), it works. See the attached file.
  • kaymankayman Member Posts: 662 Unicorn
    In theory you could use the process documents from data operator and use your reversed stoplist (or whitelist) as a wordlist, this would allow only the words in your list as acceptable. There is no real out of the box operator to create your own wordlist but this one goes more in detail : 

    https://community.rapidminer.com/discussion/35707/creating-a-comparing-white-list-of-words-to-a-wordlist-from-a-data-mined-webpage
     
Sign In or Register to comment.