Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Textual ETL: Stemming from dictionary
Wanttoknow
Member Posts: 6 Contributor II
Hi,
First of all I have to say that RM5.0 is a wonderful tool. Congratulations.
I started with pre processing text for classification and I am having some problems with the "Stem (Dictionary)" component.
I am referring to a textfile for the patterns but I am not sure about the syntax of the entries/records in the textfile. The help is very brief about this
Right now the first line in my designated TXT file looks like this:
"move: moving moved move"
But it is not replacing any of the terms to their stem.
Any idea?
First of all I have to say that RM5.0 is a wonderful tool. Congratulations.
I started with pre processing text for classification and I am having some problems with the "Stem (Dictionary)" component.
I am referring to a textfile for the patterns but I am not sure about the syntax of the entries/records in the textfile. The help is very brief about this
Right now the first line in my designated TXT file looks like this:
"move: moving moved move"
But it is not replacing any of the terms to their stem.
Any idea?
Tagged:
0
Answers
I am not sure, but I think you have to write as followed:
move , moving moved move
Kind regards,
Tobias
"
aanleveren:aanlever.*
aanleveren:aangelever.*
zorgverzekering:zorgverzeker.*
"
But putting multiple patterns on 1 line like this "aanleveren : aanlever* aangelever*" doesn't work.
Is it possible to use an external list for the ReplaceToken component? That would be more convenient than entering records with the list editor of the component.