dataset and coding problem

bangbadabangbangbadabang Member Posts: 6 Contributor II
edited November 2018 in Help
Hi,

I have to cluster texts (course lectures) into different categories using hierarchical clustering.
My data input is from mysql database.

1st question, what format does my data have to be in?

I also need to connect RM with the system I work for (nonprofitable org), to trigger it to run automatically after a certain period.
But I have problem with the coding..
I have looked into the documentation, but it's really confusing..
Could you suggest me on where to and what functions in which location that does :
load the data
pick learner type
construct the output model

Thanks so much!

Answers

  • fischerfischer Member Posts: 439 Maven
    bangbadabang wrote:

    1st question, what format does my data have to be in?
    You can read the data directly from your database, e.g. using a DatabaseExampleSource. Just make sure your texts are marked as Sting attributes. Use a Nominal2String operator for that purpose.
    bangbadabang wrote:

    I also need to connect RM with the system I work for (nonprofitable org), to trigger it to run automatically after a certain period.
    But I have problem with the coding..
    I have looked into the documentation, but it's really confusing..
    Could you suggest me on where to and what functions in which location that does :
    load the data
    pick learner type
    construct the output model
    Probably the easiest method would be not do any coding at all. Just setup a process that does the work, use cron or whatever scheduler you will be using to trigger a call to RapidMiner <yourprocess.xml>. If that does not suit your needs, you would have to be more specific about what your problem is and what confuses you. The functions you are asking for are done in the respective operators' apply methods. What operators these are depends only on your choice.

    Cheers,
    Simon


  • bangbadabangbangbadabang Member Posts: 6 Contributor II
    hello,

    sorry Fischer ,
    thanks a lot. I just got to look at this post.

    now i'm not sure whether rm or weka is easier to use. the deadline is squeezing in, tho.
    I think rm is a quite complex.. just studying how to write xml format alone requires some time..

    I want to take data input (from any source now), do clustering (and able to refer back to the model.. so i need to keep to model somewhere.. like i need to write that model down in db or text.. ), test ..etc
    there might be some threshold issues... etc.. from studying the tutorial , I only understand like first half of the examples provided.  How do you learn which type of input will be needed in the next phase?


    and
    <quote>use cron or whatever scheduler you will be using to trigger a call to RapidMiner <yourprocess.xml>. I</quote>

    how?


    1 more thing- what is model applier?
  • bangbadabangbangbadabang Member Posts: 6 Contributor II
    I mean where does the program swallow xml input?

    and what stop word languages does RM support?
Sign In or Register to comment.