dataset and coding problem

bangbadabang · May 2009

Hi,

I have to cluster texts (course lectures) into different categories using hierarchical clustering.
My data input is from mysql database.

1st question, what format does my data have to be in?

I also need to connect RM with the system I work for (nonprofitable org), to trigger it to run automatically after a certain period.
But I have problem with the coding..
I have looked into the documentation, but it's really confusing..
Could you suggest me on where to and what functions in which location that does :
load the data
pick learner type
construct the output model

Thanks so much!

fischer · May 2009

bangbadabang wrote:

1st question, what format does my data have to be in?

You can read the data directly from your database, e.g. using a DatabaseExampleSource. Just make sure your texts are marked as Sting attributes. Use a Nominal2String operator for that purpose.

bangbadabang wrote:

I also need to connect RM with the system I work for (nonprofitable org), to trigger it to run automatically after a certain period.
But I have problem with the coding..
I have looked into the documentation, but it's really confusing..
Could you suggest me on where to and what functions in which location that does :
load the data
pick learner type
construct the output model

Probably the easiest method would be not do any coding at all. Just setup a process that does the work, use cron or whatever scheduler you will be using to trigger a call to RapidMiner <yourprocess.xml>. If that does not suit your needs, you would have to be more specific about what your problem is and what confuses you. The functions you are asking for are done in the respective operators' apply methods. What operators these are depends only on your choice.

Cheers,
Simon

bangbadabang · May 2009

hello,

sorry Fischer ,
thanks a lot. I just got to look at this post.

now i'm not sure whether rm or weka is easier to use. the deadline is squeezing in, tho.
I think rm is a quite complex.. just studying how to write xml format alone requires some time..

I want to take data input (from any source now), do clustering (and able to refer back to the model.. so i need to keep to model somewhere.. like i need to write that model down in db or text.. ), test ..etc
there might be some threshold issues... etc.. from studying the tutorial , I only understand like first half of the examples provided. How do you learn which type of input will be needed in the next phase?

and
<quote>use cron or whatever scheduler you will be using to trigger a call to RapidMiner <yourprocess.xml>. I</quote>

how?

1 more thing- what is model applier?

bangbadabang · May 2009

I mean where does the program swallow xml input?

and what stop word languages does RM support?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

dataset and coding problem

Answers