Text Mining --> Producing Automatic Clusters of Correlated Text
I have installed the text plugin and managed to read in my text files successfully.
My next big challenge of course is to do the actual mining operation on this word bag.
This is the scenario.
I have an input text file containing many many paragraphs of text of comments made by people. Each person's comment/statement is one paragraph, separated by a \n of course.
I therefore want to read in this file and then nominate each paragraph to an attribute within rapidminer. Is this possible?
Then, once this step is successful, i want to perform some kind of operation on each of these attributes (paragraphs) so that at the end of it rapidminer is able to create clusters of correlated paragraphs. It should itself be able to determine how many unique classifications there are based on the current text input file, and then assign each attribute to a particular cluster that it thinks it belongs to.
Of course there is an element of pattern matching and correlation finding here, and therefore i need some guidance with this.
I have heard of supervised and unsupervised learning, but cannot get my head around which is the most suitable for this particular task.
Also, people have talked about creating a learning model beforehand on sample data - this i understand, but again, which operator/s do i use to create this initial learning model?
It would be extremely useful if you were able to provide me with a high-level functional flow of the exact steps of processes/operators that could be performed to meet my objectives.
I understand this might be asking a lot, but i really would appreciate it. I am very excited about the prospects of using rapidminer for many many future tasks - however i just need to get past this initial 'first-time new user' hurdle.