filter stopword operator's result

Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
edited May 2020 in Help
hi.i want to see the result of filter stop words in my data set after applying  this operator to my data set  but i recieve collection of documents  in result view...i put this operator(filter stop word) inside Loop collection operator...what do i do to solve this problem?


Tagged:

Best Answer

  • sara20sara20 Member Posts: 110 Unicorn
    Solution Accepted
    @Mohamad1367

    Hello

    Look at the screen please then according to that first you should download the rmp file then import it to your RM.


    I hope this helps
    sara

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Mohamad1367,

    It is the normal behavior. You have to select an element of the collection to see for this selected document the results after applying Filter Stopwords operators.
    But I guess that your final goal is not just to see your document after applying Filter Stopwords operator .. right ?

    So it would be more useful to share your data (a priori the example set called "test") and describe explicitly what you want to do in fine.
    This way we could help you more efficiently...

    Regards,

    Lionel

  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    thanks for your response @lionelderkrikor i describe what i want to reach : i have a data set in persian language to do sentiment analysis on it.  each row in my data set has a sentiment lable for example lable=5 means that this sentence is very positive
    i want to do some text preprocessing steps on it like : tokenization  , stop word filtering, steaming ,etc
    for tokenization i install rosette extension that supports persian language
    i share my data set here... what  operators  should i use to achieve this goal and sequence of them?

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Mohamad1367,

    Unfortunately, I'm not aware of a Stopwords Filters, steaming operators etc. for Persian in Rosette extension or in RapidMiner.
    You could take a look at this text processing Python extension  : 

    https://github.com/sobhe/hazm

    Regards,

    Lionel
  • sara20sara20 Member Posts: 110 Unicorn
    @Mohamad1367
    Hello

    There is some good posts about persian text mining also there is a stop word for that in RM. I recommend you to search in community. You can find alot of useful posts for that.

    Best regards
    sara
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Mohamad1367,

    @sara20 is right, you have resources for Persian text processing including stopwords dictionnary (Sorry for my previous post, I have not checked it in the community site... :/ )
    In particular look at this thread including a @sgenzer post which explains where to find a dictionary for Persian stopwords : 
    https://community.rapidminer.com/discussion/55674/persian-dictionary

    Hope this helps,

    Regards,

    Lionel

  • sara20sara20 Member Posts: 110 Unicorn
    Also there is an other stop words here ;)
  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    edited May 2020
    @sara20 @lionelderkrikor thanks for your respons,, i have stop word dictionary in persian ..i forgott to upload here in previous comment...my problem is when i apply stop word filter operator to my data set i want to see the filtered result in result view but i can't do this
    i only for tokenization apply rosette extension for other tasks such as steming , stop word filtering,etc i use text processing extension which is language independent and only needs to a  dictionary
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Mohamad1367,

    According to your dataset, I think I understood what you want to achieve : You want in fine create a model to do sentiment classification ? right ?
    In this case, you will need the Process Document from Data operator and put all your text processing steps (Tokenize (again), Filter stopwords) INSIDE this operator.
    Please check the process in attached file. You will see in exit of this process a word vector with the Stopwords (Persian) filtered.(Don't forget to set the path where your dictionary  file for the stopwords is stored...)
    From this starting point , you can create a model to perform sentiment classification, by adding a Set Role and a model (a classifier) of your choice after the Process Document from Data operator .

    hope this helps,

    Regards,

    Lionel
  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    edited May 2020
    thanks for your answer @lionelderkrikor .... I know that this is clear but please explain more which it is atthached, how can i run it?by drag and drop of the attached file to the design view and only connecting that to the result port?

  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    @sara20 thank you very much

  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    edited May 2020
    @lionelderkrikor i run the proces that you are attached in previous post but i recieve only tokenized result.. stop words were not filtered...here i attached the screenshot of my result...can you help me please?
Sign In or Register to comment.