top TF-IDF keyword

ahootanhaahootanha Member Posts: 69 Contributor I
edited December 2018 in Help

Hello
I want to extract five words with the highest tf-idf in the output tf-idf matrix.
How should i do ???
Thanks

 

and how remove '@' , '#' charachters and url from sentence in rapidminer???

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @ahootanha,

     

    To answer to your first question, you can find here a process which perform what you want to do : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="tesla"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="648" y="85"/>
    <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort" width="90" x="782" y="85">
    <parameter key="attribute_name" value="total"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="8.1.000" expanded="true" height="82" name="Filter Example Range" width="90" x="916" y="85">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="5"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
    <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Sort" from_port="original" to_port="result 3"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you Tokenize on non-letters, all the special characters will be stripped from the resulting words that comprise the word vector.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @ahootanha what @Telcontar120 says is true. My suggestion is the use the Specify Characters in the Tokenize operator to select what to split on. I do a lot of Twitter extraction and I don't want #hashtag to get wiped out by default, so I split on stuff like !.?"[ but not on #.

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    thank you
    But
    I am a beginner
    I did not understand where to use these codes
    How to write a regular expression in the filter token operator?
    Please guide
    Thanks

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Can you give more guidance? And an example


    @Thomas_Ott wrote:

    @ahootanha what @Telcontar120 says is true. My suggestion is the use the Specify Characters in the Tokenize operator to select what to split on. I do a lot of Twitter extraction and I don't want #hashtag to get wiped out by default, so I split on stuff like !.?"[ but not on #.


     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    thank you very much
    But
    I do not know where to use these code in my rapidshare program?
    Please guide
    Send me a screenshot of the implementation of operators
    Thanks


    @Thomas_Ott wrote:

    @ahootanha grab the process here: http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/


     

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    thank you very much
    But
    I do not know where to use these code in my rapidminer program?
    Please guide
    Send me a screenshot of the implementation of operators
    Thanks





  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @ahootanha welcome to the community! Some quick recommendations for you (pretty much exactly what @Thomas_Ott was recommending)...


    • Post your XML process here in this thread (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply)
    • Attach your dataset if possible (use a fictionalized version if there are privacy concerns)
    • Make sure you have all dependent extensions installed (see https://youtu.be/pjBqG3xtXx4)

     

    Scott

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    I saw links to YouTube
    I installed all the packages
    But still can not
    Extract ten repetitive words from the tf-idf matrix
    Please guide
    Thanks

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello I saw links to YouTube I installed all the packages But still can not Extract ten repetitive words from the tf-idf matrix Please guide Thanks

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Should you run the program after
    Write xml code?
    how?

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @ahootanha I really need to see your data and your XML process in order to help. Can you please post both here in this thread?

     

    Scott

     

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    Thank
    I did not use coding
    I just entered the data and used the process document (TF-IDF)
    Thank you for helping me
    Please

Sign In or Register to comment.