Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
how to tag the documents in a directory based on the max term occurnce
hi,
i want to have a separate column which specifies the document belongs to some category.
the brief of it as explained below:
i have directories which consists of various text files.
i want to check how many times is||am||are||has||have (present)
was||were||had (past)
will||shall (future)
occurrence of these will tell whether the document is in present state or past state or future state.
we need to put the number of occurrence of present, past or future in three variables(x,y,z)
After this we calculate the max(x,y,z)
based on which we assign the value to column as present in case of x is maximum, past in case of y is maximum or future in case of z is maximum.
in the result, we need to get one extra column where each row for each document has the value present or past or future.
Please help me out by letting me know how to do that.
Thanks and Regards:
Sukh
i want to have a separate column which specifies the document belongs to some category.
the brief of it as explained below:
i have directories which consists of various text files.
i want to check how many times is||am||are||has||have (present)
was||were||had (past)
will||shall (future)
occurrence of these will tell whether the document is in present state or past state or future state.
we need to put the number of occurrence of present, past or future in three variables(x,y,z)
After this we calculate the max(x,y,z)
based on which we assign the value to column as present in case of x is maximum, past in case of y is maximum or future in case of z is maximum.
in the result, we need to get one extra column where each row for each document has the value present or past or future.
Please help me out by letting me know how to do that.
Thanks and Regards:
Sukh
0