Options

Textual ETL: Stemming from dictionary

WanttoknowWanttoknow Member Posts: 6 Contributor II
edited November 2019 in Help
Hi,

First of all I have to say that RM5.0 is a wonderful tool. :o Congratulations.

I started with pre processing text for classification and I am having some problems with the "Stem (Dictionary)" component.

I am referring to a textfile for the patterns but I am not sure about the syntax of the entries/records in the textfile. The help is very brief about this

Right now the first line in my designated TXT file looks like this:

"move: moving moved move"

But it is not replacing any of the terms to their stem.

Any idea?

Answers

  • Options
    arminmaniaarminmania Member Posts: 7 Contributor II
    Hi,

    I am not sure, but I think you have to write as followed:

    move , moving moved move
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,
    Wanttoknow wrote:

    Right now the first line in my designated TXT file looks like this:

    "move: moving moved move"
    did you try to put a blank before the colon?

    Kind regards,
    Tobias
  • Options
    WanttoknowWanttoknow Member Posts: 6 Contributor II
    Well, after a lot of trail and error this seems to work

    "
    aanleveren:aanlever.*
    aanleveren:aangelever.*
    zorgverzekering:zorgverzeker.*
    "
    But putting multiple patterns on 1 line like this "aanleveren : aanlever* aangelever*" doesn't work.

  • Options
    WanttoknowWanttoknow Member Posts: 6 Contributor II
    Another question:

    Is it possible to use an external list for the ReplaceToken component? That would be more convenient than entering records with the list editor of the component.
Sign In or Register to comment.