Options

# Getting started with my data set in RM 5.0

Hi all,

I'm new to RapidMiner and data mining (although I've done what, in retrospect, was some very basic data mining in the past). I do have some university level statistics under my belt, but that is about it.

I've created a data set that I would like to work on. In general terms, I have two numeric inputs which more or less follow a linear regression. Now as for the more or less, I have a handful of non-numeric categorizations for associated with each data pair on the linear regression. I suspect these non-numerics will explain some of the directional wobble around the regression line (if that makes sense) and so I would like to run some data mining trials against the data.

Now, from what I can understand, this data is 'polynominal' according to RapidMiner so I am having a difficult time finding a mining function that works with the data set I've described. What are some good options for me to start with?

Thanks in advance.

I'm new to RapidMiner and data mining (although I've done what, in retrospect, was some very basic data mining in the past). I do have some university level statistics under my belt, but that is about it.

I've created a data set that I would like to work on. In general terms, I have two numeric inputs which more or less follow a linear regression. Now as for the more or less, I have a handful of non-numeric categorizations for associated with each data pair on the linear regression. I suspect these non-numerics will explain some of the directional wobble around the regression line (if that makes sense) and so I would like to run some data mining trials against the data.

Now, from what I can understand, this data is 'polynominal' according to RapidMiner so I am having a difficult time finding a mining function that works with the data set I've described. What are some good options for me to start with?

Thanks in advance.

0

## Answers

849MavenI came to Rapidminer primarily because it provided a nice environment for testing Support Vector Machines against large stacks of data. Why SVMs? Partly because of the speed compared to induction or neural nets, partly because they avoided the dreaded neural local pothole problem, and partly because they are like swiss army knives and can handle just about any combo of data types. The weird thing is that it worked as I had planned, because well tuned SVMs are competitive, and because RM enables testing harnesses to be implemented quickly, even by mental midgets such as myself.

2,531Unicornyou could transform the polynominal attributes with the polynominal to binominal to binominal attributes. You can turn these to binary 0 - 1 coded attributes that can be used by numerical methods like SVMs. This is a common way how to handle these attributes.

Greetings,

Sebastian