"opinion mining/sentiment analysis-rapidminer5"

lina · January 2011

hi!
I would appreciate your giving me any piece of information!it is really important to me!
i have created an excel file,filled with comments about a specific topic!
now,i am trying to classify these comments(in fact the comments are short sentences from various sources via the net)
into positive,neutral and negative!
how can i proceed?
please let me inform you that all comments are written in greek.i hope there is no problem with it!
since i am new to this topic i would be really grateful for any help!
thanks in advance, i am looking forward to your reply!

el_chief · January 2011

see my blog vancouverdata.blogspot.com

i have a five part video series on text mining, including how to do classification (sentiment analysis) in the 5th part

good luck

neil

lina · February 2011

Neil McGuigan wrote:

see my blog vancouverdata.blogspot.com

i have a five part video series on text mining, including how to do classification (sentiment analysis) in the 5th part

good luck

neil

thank you very much,neil!i'm going to visit your blog and watch the videos!!

lina · March 2011

hi!
i'm still working on opinion mining but i have few problems.
i have watched the videos from vancouverdata.btw,i found them really helpfull,thanks neil

!
First of all,the language i use is greek so i want to create a text for the operator: Filter Stopword.Does anyone know how the text should be like? I've created a text like this: "word1|word2...."but unfortunately it is not recognized. Any idea, please?
Also, there is not a stem operator for my language.How can i create one as it seems to be very important?
Apart from these problems, i have followed the method which is showed in the 5th part of the video series but i also have a problem. The operator naive bayes : "cannot check whether input example set has special attribute "label""
What about this?Should i specify a label or an attribute in the file i use?Specifically, i use an excel file instead of database which is used in the video.
Sorry for the long post.
I'm looking forward to your answers and your help!!

B_ · March 2011

Lina

Filter Stopwords by Dictionary allows you to create your own stoplist - it reads from a file that you create.

You can try using regular expressions to create a basic stemmer if the endings of Greek words are consistent for cases and gender.

Here is a simple classifer you can adapt:
http://rapid-i.com/rapidforum/index.php/topic,2993.0.html

Remove the N-Gram operator and change input to Excel. The column that contains the opinion should be set as Label in the Set Role operator.

B.

lina · March 2011

thank you so much B.i do appreciate yor help!

Filter Stopwords by Dictionary allows you to create your own stoplist - it reads from a file that you create.

i'm trying to create this file but it is not recognized by RapidMiner.what should the form of this file be like?
i've tried something like : "word1|word2..." but it doesn't work!any idea about it?
regarding the classifier and the example given,i'm going to check it out and i hope i manage to classify my own documents!

B_ · March 2011

In Windows it's a txt file.

In rmstop.txt
one
two
three



<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
    <process expanded="true" height="386" width="413">
      <operator activated="true" class="text:create_document" compatibility="5.1.001" expanded="true" height="60" name="Create Document" width="90" x="98" y="61">
        <parameter key="text" value="Apples are green and red.&#10;Lemons are yellow.&#10;One lemon and two oranges.&#10;Three apples."/>
        <parameter key="add label" value="true"/>
        <parameter key="label_type" value="text"/>
        <parameter key="label_value" value="textlabel"/>
      </operator>
      <operator activated="true" class="text:documents_to_data" compatibility="5.1.001" expanded="true" height="76" name="Documents to Data" width="90" x="112" y="210">
        <parameter key="text_attribute" value="textlabel"/>
        <parameter key="add_meta_information" value="false"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.1.001" expanded="true" height="76" name="Process Documents from Data" width="90" x="313" y="165">
        <parameter key="vector_creation" value="Term Frequency"/>
        <list key="specify_weights"/>
        <process expanded="true" height="505" width="774">
          <operator activated="true" class="text:tokenize" compatibility="5.1.001" expanded="true" height="60" name="Tokenize" width="90" x="112" y="75"/>
          <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="5.1.001" expanded="true" height="60" name="Filter Stopwords (Dictionary)" width="90" x="514" y="75">
            <parameter key="file" value="M:\Data\rmstop.txt"/>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
          <connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

fritmore · March 2011

lina wrote:

thank you so much B.i do appreciate yor help!

Filter Stopwords by Dictionary allows you to create your own stoplist - it reads from a file that you create.

i'm trying to create this file but it is not recognized by RapidMiner.what should the form of this file be like?
i've tried something like : "word1|word2..." but it doesn't work!any idea about it?
regarding the classifier and the example given,i'm going to check it out and i hope i manage to classify my own documents!

create an ascii file with txt or csv extension
sample of the file data structure:

attrib1,attrib2,attrib3
apple,monkey,brick
orange,monkey,stick

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"opinion mining/sentiment analysis-rapidminer5"

Answers