"Using Word Vector in Model"

DuffyDuffy Member Posts: 6 Contributor I
edited June 2019 in Help

Hi

 

Issue/Problem: 

  • I have created a Word Vector using the "Process to Documents from Data operator"
  • I configured the operator to create Term Occurences. 
  • This generated over 3,000 atrributes which shows the number of times each word appears in an example.  Eg the word "good" appears 10 times in row 5 (so far so good)
  • I now want to select some of these words and use them as an atttribute when building a model.
  • I thought the Select Attribute operator would do this, but it only shows the original attributes and not the new word vectors that were created.

Can someone point me to the correct operator so that I can select the word vectors I want to use?

 

Thanks

Duffy

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi Duffy,

     

    the problem here is the metadata propagation. RM cannot predict based on the metadata which attributes will be present. What you can try is to take the meta data from the last execution. To do this try Process->Synchronize Meta Data with Real Data and run it once.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • DuffyDuffy Member Posts: 6 Contributor I

    Thanks Martin for your reply.

     

    I understand the problem.

    Your solution solved the problem.

    However, before marking this thread as as "Solved", it would be preferable to avoid the long process of generating a word vector and just generate "word occurrences" for a pre-defined set of words.

    For example, I have 5 words or phrases (good, great, wonderful, bad, not good)  I want to know how frequently they are mentioned in the text.

    What operator would I use to extract this information?

     

    Duffy

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    good question. I would built a dummy work vector on one text to get a word list. Afterwards you can plug this word list in your usual Process Documents to just get the 5 words you want.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • DuffyDuffy Member Posts: 6 Contributor I

    Thanks

Sign In or Register to comment.