python scripting operator

Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
edited June 2020 in Help
hi dear community...i have a question about python scripting operator...i have a dataset in persian that i want to tokenize and then filter stop words...for tokenization i use rosette extension  but for stopword removal i want to use a code written in python...where i shoul put this operator in my procces?is there anyone to help me?
Tagged:

Answers

  • kaymankayman Member Posts: 662 Unicorn
    Is there a real need to use an external stopwordlist or Rosette? 
    The text mining extension has all of these options also, and this way you could keep your work flow a bit more organized. 
  • Mohamad1367Mohamad1367 Member Posts: 22 Contributor I
    @kayman yes i need the rosette extension to tokenize my dataset because my data set is in persian language which rosette supports this language but text mining extension don't support persian for this pupose
  • kaymankayman Member Posts: 662 Unicorn
    Clear, Farsi is indeed not in the standard toolkit.
    I'm not familiar with the rosetta output, but as it seems an exampleset you could just add the python operator straight after your Rosetta operator. This way you can reuse your existing code.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    @Mohamad1367 there's a Persian stopword dictionary in the Community Samples repo:



    Scott
Sign In or Register to comment.