Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Dissertation advice for sentiment analysis."

lum-xlum-x Member Posts: 3 Contributor I
edited June 2019 in Help
Hello,

I am doing my BSc dissertation where I have to integrate a sentiment analysis engine into a system that provides continuous feedback to lecturers from students. Now I’m planning it and I have some difficulties deciding how I should proceed, and I seek some advice since I’m using RapidMiner and I don’t want to change it because so far I’m loving it.

During my planning I came to these conclusion’s, but since I not really proficient with RapidMiner I need some advice before I start investing more time in it.   

1) If I want to use Rapidminer it would be nice if I’m able to customize the algorithms:

  -Rapidminer takes a set of documents that are classified as positive or negative
  -It generates the rules from that documents so if I do not have a good training dataset I will get bad results with new documents
  -Keeping in mind how the classifier should work:
    1-Clean data (bad characters, etc.)
    2-NLP for filtering some words (stopwords, stemming?, etc.) Here is where I don’t know how to start to customize Rapidminer, and this is the part that I’m not sure how to do it.
    3-Extraction of classes and association rules that classify new classes as positive or negative

The consequence is that using RapidMiner I need to adjust the task 2 (custom stop-words, etc.) and if possible to add my own classification rules. That would be nice but I do not need nothing about grammars, etc. because RapidMiner automatically infers that if n documents contains "is" and "good" is classified as positive then the class of all documents containing "is" and "good" is automatically inferred and new documents will be classified as positive. If I change the training dataset these rules could change.


2) From the scratch, in this second case if I prefer to use LingPipe the steps are similar but I can manage in a low level the natural language processing:

  1-Lexical analysis of data: stopwords, step
  2-Syntactical analysis to identify categories such as verbs, nouns, etc. (english grammars that are available)
  3-Entity extraction
  4-I can have my own list of "positive" verbs and "positive" adjectives (that can be expanded using Wordnet) and create my own rules or use an external library such as Weka or Mahout but customizing, at the most, the input training dataset

I think that due to time restrictions and my experience this custom solution it is not suitable now.


Best regards.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    if you have a training set, in most cases a learning algorithm will supersede custom rules, since it can also grab interactions between words. E.g. "good" is different from "not good", which would not be caught by custom rules.

    To customize RapidMiner, you could create your own extension, provided that you know Java. Here is a whitepaper describing the basic steps: http://docs.rapid-i.com/files/howtoextend/How%20to%20Extend%20RapidMiner%205.pdf

    You should be aware though, that also for extending and customizing RapidMiner you'll need some time to first get the grips on the guts and internals of RapidMiner, and then actually implement your operators.

    Best regards,
    Marius
Sign In or Register to comment.