🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"SVM or Regression from data in database - how to??"

noah977noah977 Member Posts: 32  Guru
edited May 2019 in Help
I'm VERY new to RM.  Just installed it today :)

So far, I'm very impressed and a bit overwhemled by all the options it has.

I was hoping someone could help me design a model/workflow in the GUI for a simple problem.

-My data is stored in MYSQL  (I do understand how to use DatabaseExampleSource to access the raw data

-The input is 4 columns.  The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313

I would like to use RM to create a "predictor" for this data.  Build a model based on many training examples.  One thought is regression, the other is an SVM.  I might also expand into a model with 50-60 features.  In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

As I wrote above, I can connect to my database and select the data.  I'm not sure what to do with the data once I have it.

Any advice?

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 291  RM Product Management
    Hi Noah,
    noah977 wrote:

    I'm VERY new to RM.  Just installed it today :)

    So far, I'm very impressed and a bit overwhemled by all the options it has.
    Congratulations on coming upon RM and having made the first steps. Of course, RM is a bit overwhelming at the beginning, but once you have toyed around a while and understood the general principle on how to build a process, I am sure you will highly appreciate the vast possibilities for designing data mining processes RM offers.

    But enough of advertising .. ;)
    noah977 wrote:

    (I do understand how to use DatabaseExampleSource to access the raw data

    -The input is 4 columns.  The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
    Fields: ID, first_measure, Second_measure, resulting_score
    Example Data: 1, 13.5, 57.2, 6.12312313

    I would like to use RM to create a "predictor" for this data.  Build a model based on many training examples.  One thought is regression, the other is an SVM.  I might also expand into a model with 50-60 features.  In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

    As I wrote above, I can connect to my database and select the data.  I'm not sure what to do with the data once I have it.
    The first steps of your tasks are to designate your ID and your result_score as special attributes, namely as a (who would have thought ;)) id and label, respectively. This can be done by setting the parameters [tt]id_attribute[/tt] and [tt]label_attribute[/tt] of the [tt] DatabaseExampleSource[/tt] operator to the appropriate column names. Note that this designation can also be done separetely by the operator [tt]ChangeAttributeRole[/tt], one for each attribute.

    The second step is to simply place the [tt]LinearRegression[/tt] or e.g. the [tt]LibSVM[/tt] operator in the process. If you then run the process, it should give you a regression or SVM model, respectively.

    The task of genetic feature selection is a bit more complicated. I stronly advise you to have a look at the RM built-in tutorial (i.e. the example processes coming with RM). There are also examples for feature selection. You should easily get an idea how this works from them.

    Hope that helps,
    Tobias
  • noah977noah977 Member Posts: 32  Guru
    Tobias,

    Thank you for the quick answer.

    I can't wait to get good with RM.  I see so many great possibilities!

    One additional question:  Can I specify some details about a feature.  For example, one of my features is the ID number of a category.  We keep it as an Integer in our DB.  I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category.  (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

    Thanks again!!!!

    -N
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 291  RM Product Management
    Hi Noah,
    noah977 wrote:

    Thank you for the quick answer.

    I can't wait to get good with RM.  I see so many great possibilities!
    Wow, great to see someone that eager to learn RM ... lets me answer even well outside office hours! ;)
    noah977 wrote:

    One additional question:  Can I specify some details about a feature.  For example, one of my features is the ID number of a category.  We keep it as an Integer in our DB.  I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category.  (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)
    Nothing easier than that. Just use an [tt]Numerical2Polynominal[/tt] operator inside an [tt]AttributeSubsetPreprocessing[/tt] operator with the attribute specified as parameter. Here is the XML code snippet:

    <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
        <parameter key="attribute_name_regex" value ="ID"/>
        <parameter key="condition_class" value="attribute_name_filter"/>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
    </operator>
    Regards,
    Tobias
Sign In or Register to comment.