🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"Using Word Vector in Model"

DuffyDuffy Member Posts: 6 Contributor I
edited June 2019 in Help

Hi

 

Issue/Problem: 

  • I have created a Word Vector using the "Process to Documents from Data operator"
  • I configured the operator to create Term Occurences. 
  • This generated over 3,000 atrributes which shows the number of times each word appears in an example.  Eg the word "good" appears 10 times in row 5 (so far so good)
  • I now want to select some of these words and use them as an atttribute when building a model.
  • I thought the Select Attribute operator would do this, but it only shows the original attributes and not the new word vectors that were created.

Can someone point me to the correct operator so that I can select the word vectors I want to use?

 

Thanks

Duffy

Best Answer

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,055  RM Data Scientist
    Solution Accepted

    Hi Duffy,

     

    the problem here is the metadata propagation. RM cannot predict based on the metadata which attributes will be present. What you can try is to take the meta data from the last execution. To do this try Process->Synchronize Meta Data with Real Data and run it once.

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany

Answers

  • DuffyDuffy Member Posts: 6 Contributor I

    Thanks Martin for your reply.

     

    I understand the problem.

    Your solution solved the problem.

    However, before marking this thread as as "Solved", it would be preferable to avoid the long process of generating a word vector and just generate "word occurrences" for a pre-defined set of words.

    For example, I have 5 words or phrases (good, great, wonderful, bad, not good)  I want to know how frequently they are mentioned in the text.

    What operator would I use to extract this information?

     

    Duffy

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,055  RM Data Scientist

    Hi,

     

    good question. I would built a dummy work vector on one text to get a word list. Afterwards you can plug this word list in your usual Process Documents to just get the 5 words you want.

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • DuffyDuffy Member Posts: 6 Contributor I

    Thanks

Sign In or Register to comment.