"Samples for X-Prediction?"

mstuebner · August 2010

Hello,

I'm quite new to Rapidminer, so I still struggle with some details, i.e. for X-Prediction. Especially in the german manual, it starts to describe a very useful example, where a lable/target variable is known for some records and shall be predicted for the rest.

Unformatunately the description of this scenario is interrupted before any detail, i.e. which operator etc.

I think that X-Prediction is the right one, but didn't find any samples how to implement it. At least I guess that the training and test part of a X-Prediction needs to be filled with some blocks/operators by ths user?

I would think that the scenario to predict a feature for records, based on a given subset is used quite often. So can someone point me to some samples or some web sites with information that can guide me further?

Thanks in advance,
Matthias

haddock · August 2010

Greets, and welcome!

In your position I'd check out the videos on the main website http://rapid-i.com/content/view/189/198/, and then work through the examples in Help->Tutorials ( which cover XVal ).

mstuebner · August 2010

haddock wrote:

Greets, and welcome!

In your position I'd check out the videos on the main website http://rapid-i.com/content/view/189/198/, and then work through the examples in Help->Tutorials ( which cover XVal ).

Thanx for your welcome. What you recommend is what I'm doing right now. I found that X-Validation is partly covered by examples, but X-Prediction isn't, or am I wrong looking after X-Prediction?

Finally I need to predict records, based on records I measured before. Is there a better way?
Beside as my background isn't statistics, but telecoms industry, are there some pointer to material to improve my background knowledge?

Thanks,
Matthias

fischer · August 2010

Hi,

the X-Prediction won't help if you want to make predictions on unlabeled data. To that end, you only need an "Apply Model" operator. We offer a varienty of training courses at http://rapid-i.com/component/option,com_virtuemart/Itemid,180/lang,de/vmcchk,1/.

Best,
Simon

mstuebner · August 2010

Simon Fischer wrote:

the X-Prediction won't help if you want to make predictions on unlabeled data. To that end, you only need an "Apply Model" operator. We offer a varienty of training courses at http://rapid-i.com/component/option,com_virtuemart/Itemid,180/lang,de/vmcchk,1/.

I'm aware of these trainings. Actually I try to get Rapidminer into our software pool for data analysis, so it is a chicken/egg situation: To demostrate that the sw does what we need or pay a training first (what will not happen).

What I still do not fully understand: In case that I have a numer of records, where only few have a label already, what model fits best, as most I tried cannot work with missing values (what would be the label of those unlabled records? If I set the record to somewhat using "Replace Missing Values" wouldn't it change the result of the model?

(As far as I understood the mechanism, the i.e. SVM takes the attributes and tries to find a formula that leads to the given label with the best result. If that is true, giving a random/average value for label to avoid missing values, would change the formula, doesn't it?)

br Matthias

fischer · August 2010

Hi,

having missing values is different from having missing labels. Missing regular values can easily be replaced, but replacing a missing label does not make a lot of sense. Probably you want to filter them out.

Sorry for recommending things that cost money, but I don't believe this forum is the right place to search for the answers you need at this point of time. The questions you are asking require some understanding of and experience in data mining and cannot be answered by a single post, Your question in parentheses confirms this assumption.

Just so you don't get the impression I am trying to withhold the answer to your question: The answer is "Yes", but that is already all I can say without knowing more about your problem and the data, so it is probably of zero use for you.

Best,
Simon

mstuebner · August 2010

Simon Fischer wrote:
Just so you don't get the impression I am trying to withhold the answer to your question: The answer is "Yes", but that is already all I can say without knowing more about your problem and the data, so it is probably of zero use for you.

What else would you need to know? As in the opening post: The example the german manual for version starts with is exactly what I'm looking for, that why I was so excited about that manual. Unfortunately the example wasn't taken to its practical part. Is there any document to talks about that or a similar example. I like to work out things myself, but some pointers are always welcome.

As I surely miss some of the statistics background, are there some helper what model to use for what scenario, or it is more like "You have to know it yourself?"

Thanks for time,
Matthias

fischer · August 2010

mstuebner wrote:

What else would you need to know? As in the opening post: The example the german manual for version starts with is exactly what I'm looking for,

I don't think so. The fact that you have unlabelled examples makes me believe you are trying to do something else. Maybe you want to split your data into training and test data. Or you want a transductive learner. Maybe you want clustering rather than classification. These are all things I don't know.

mstuebner wrote:

As I surely miss some of the statistics background, are there some helper what model to use for what scenario, or it is more like "You have to know it yourself?"

There are rules of thumb, and there are also methods to find that out more or less automatically. It's similar when dealing with missing values.

Best,
Simon

mstuebner · August 2010

Simon Fischer wrote:

I don't think so. The fact that you have unlabelled examples makes me believe you are trying to do something else. Maybe you want to split your data into training and test data. Or you want a transductive learner. Maybe you want clustering rather than classification. These are all things I don't know.

There are rules of thumb, and there are also methods to find that out more or less automatically. It's similar when dealing with missing values.

Thank you very much for your time, stating that such rules exist. I will try to find them somewhere. You were of great help.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Samples for X-Prediction?"

Answers