"[SOLVED] Add custom Stopwords Dictionary"

ezouliasezoulias Member Posts: 28 Maven
edited June 2019 in Help
Dear sirs,

I am surpise of the very nice things that your product can do.
I am trying some new research parts in text mining in Greek language and I would like to know if it is possibe to add my custom stopwords dictionary and in what way.

Thank you in advance
Manolis

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    You can easily use your dictionary. Just add a "Filter Stopwords (Dictionary)" operator from Text Processing/Filtering (Text mining extension) to your document processing and select the file.

    Regards

    Balázs
  • ezouliasezoulias Member Posts: 28 Maven
    Thank you for your immidiate reply

    Are there any specifications that I have to follow for the txt document? Like comma delimited or seperators among words?

    Does UTF-8 supported?

    Thank you in advance

    Manolis
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The documentation of that operator contains all required information: one stop word per line, and the encoding (UTF-8 or whatever you like) can be selected with the encoding parameter.

    Best regards,
    Marius
  • ezouliasezoulias Member Posts: 28 Maven
    Hi again after a long time,

    1. Yes but is there any tutorial for how I can add a custom filter stop words operator?

    2. Do I have to develop something? Is there any tutorial for that?

    3. Where can I find the documentation of filter operator? Is this helpfull as a tutorial?

    Sorry for the silly questions but I can not find the answer

    Thank you in advance
    Manos
  • ezouliasezoulias Member Posts: 28 Maven
    Problem Solved
  • rayana_azusrayana_azus Member Posts: 1 Contributor I

    Bom dia! 

    Como você conseguiu resolver o problema?

    Preciso criar um dicionário de stopwords em pt-br.

Sign In or Register to comment.