"SVM or Regression from data in database - how to??"

noah977 · November 2008

I'm VERY new to RM. Just installed it today

So far, I'm very impressed and a bit overwhemled by all the options it has.

I was hoping someone could help me design a model/workflow in the GUI for a simple problem.

-My data is stored in MYSQL (I do understand how to use DatabaseExampleSource to access the raw data

-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313

I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.

Any advice?

TobiasMalbrecht · November 2008

Hi Noah,

noah977 wrote:

I'm VERY new to RM. Just installed it today

So far, I'm very impressed and a bit overwhemled by all the options it has.

Congratulations on coming upon RM and having made the first steps. Of course, RM is a bit overwhelming at the beginning, but once you have toyed around a while and understood the general principle on how to build a process, I am sure you will highly appreciate the vast possibilities for designing data mining processes RM offers.

But enough of advertising ..

noah977 wrote:

(I do understand how to use DatabaseExampleSource to access the raw data

-The input is 4 columns. The first is a unique ID, the next 2 are various features (numbers), the last column is the result.
Fields: ID, first_measure, Second_measure, resulting_score
Example Data: 1, 13.5, 57.2, 6.12312313

I would like to use RM to create a "predictor" for this data. Build a model based on many training examples. One thought is regression, the other is an SVM. I might also expand into a model with 50-60 features. In that case, it would be nice to use some kind of genetic algorithm to learn the best features and correlation for the most accurate prediction.

As I wrote above, I can connect to my database and select the data. I'm not sure what to do with the data once I have it.

The first steps of your tasks are to designate your ID and your result_score as special attributes, namely as a (who would have thought

) id and label, respectively. This can be done by setting the parameters [tt]id_attribute[/tt] and [tt]label_attribute[/tt] of the [tt] DatabaseExampleSource[/tt] operator to the appropriate column names. Note that this designation can also be done separetely by the operator [tt]ChangeAttributeRole[/tt], one for each attribute.

The second step is to simply place the [tt]LinearRegression[/tt] or e.g. the [tt]LibSVM[/tt] operator in the process. If you then run the process, it should give you a regression or SVM model, respectively.

The task of genetic feature selection is a bit more complicated. I stronly advise you to have a look at the RM built-in tutorial (i.e. the example processes coming with RM). There are also examples for feature selection. You should easily get an idea how this works from them.

Hope that helps,
Tobias

noah977 · November 2008

Tobias,

Thank you for the quick answer.

I can't wait to get good with RM. I see so many great possibilities!

One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

Thanks again!!!!

-N

TobiasMalbrecht · November 2008

Hi Noah,

noah977 wrote:

Thank you for the quick answer.

I can't wait to get good with RM. I see so many great possibilities!

Wow, great to see someone that eager to learn RM ... lets me answer even well outside office hours!

noah977 wrote:

One additional question: Can I specify some details about a feature. For example, one of my features is the ID number of a category. We keep it as an Integer in our DB. I want to tell RM that it is not an actual number to average, etc, but just an identifier of a category. (I guess one way would be to translate it into a string "ID-1", etc. but I was hoping there was a nicer way to do this in RM.)

Nothing easier than that. Just use an [tt]Numerical2Polynominal[/tt] operator inside an [tt]AttributeSubsetPreprocessing[/tt] operator with the attribute specified as parameter. Here is the XML code snippet:


<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
    <parameter key="attribute_name_regex" value ="ID"/>
    <parameter key="condition_class" value="attribute_name_filter"/>
    <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
    </operator>
</operator>

Regards,
Tobias

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"SVM or Regression from data in database - how to??"

Answers