Options

"[SOLVED] Two near words"

zahrahnnxzahrahnnx Member Posts: 9 Contributor II
edited June 2019 in Help
Hi everyone

I have an excel file including 20 rows... Each row is filled by description regarding to business analysis.
The words "problem" & "solving" are among the common words . But in each document they may come in different order. eg "solving the problems" or " problem solving skills" "solving technical problems" etc

I want to put all of these combinations of "problem " & "solving" into one attribute. For example, I'll add an attribute called "problem-solving". If an document includes the words "problem " & "solving" together or with 1~4 words in between, the value of attribute "problem-solving" set to 1. else 0.

I did similar thing for "Database" related words. eg if a document contains sql,or mysql the value of "Database" will be 1. It works. But I don't know how to do it when there is two words.

image

Please let me know if you have any idea. Thanks
Zahrahnnx

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,512 RM Data Scientist
    Hi,

    my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    zahrahnnxzahrahnnx Member Posts: 9 Contributor II
    Martin Schmitz wrote:

    Hi,

    my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.

    Cheers,
    Martin
    Thanks for the response , yes n-gram works :)
    I also came up with another solution. I'll share, maybe someone face with same problem.

    Using "Extract Information" operator inside " Process document from Data". and then use below Regular Expression in "Extract Information"
    (problem\W+(?:\w+\W+){0,5}?solving)|(solving\W+(?:\w+\W+){0,5}?problem)

    It adds new attribute which I called it "Problem_Solving", then in the main process I used "Select Attribute" operator to check "Problem_Solving"

    Both ways works  ;)
    Thanks again
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,512 RM Data Scientist
    Hi,

    i think i like your idea a bit more. Seems to be a bit faster :)

    Thanks for the message!

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.