joining multiple result sets (example sets) and exporting them to a file at once

d1m0s · June 2010

Hi guys,

I have N variables with their historical evolution...for each variable I build a linear/quantile regression with several predictors (day of week, day of month, holiday etc) to forecast daily values for those N variables one month ahead.

In the end I get N example sets (predicted) with the same number of rows/examples and the same ID...I would like to join them all at once and write them to an excel file, the problem is that the join operators supports only 2 example set while I need to join N at once.

Is there a way to do this without using join iteratively N-1 times?

Thanks a lot in advance

Here is a rough scheme of what I want to do:

land · June 2010

Hi,
no this operator does not exist right now. But I think for n=4 it should be possible to insert n-1=3 join operators, isn't it?
If n becomes bigger, you could built a combination of some operators to join several sets. If you store this operation in a building block, it's nearly as easy as having a special operator for this.
Might be, we will add such an operator, but it's not planned until now.

Greetings,
Sebastian

d1m0s · June 2010

Thanks Sebastian,

In fact there are about 20-30 of them, I put 4 just for simplicity.

This operator is coming from real life needs, could be very useful:)

Do you have any examples of using blocks? a screenshot maybe?

haddock · June 2010

Hi,

Looking at the pictures it looks as though you want to try linear regression with the same attributes, but against different labels; if that is the case then you can rather simply loop through labels, like this...

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="235" width="547">
      <operator activated="true" class="generate_multi_label_data" expanded="true" height="60" name="Generate Multi-Label Data" width="90" x="45" y="30"/>
      <operator activated="true" class="loop_labels" expanded="true" height="76" name="Loop Labels" width="90" x="246" y="30">
        <process expanded="true" height="403" width="890">
          <operator activated="true" class="decision_tree" expanded="true" height="76" name="Decision Tree" width="90" x="283" y="29"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="445" y="27">
            <list key="application_parameters"/>
          </operator>
          <connect from_port="example set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select" expanded="true" height="60" name="Select" width="90" x="447" y="30">
        <parameter key="unfold" value="true"/>
      </operator>
      <connect from_op="Generate Multi-Label Data" from_port="output" to_op="Loop Labels" to_port="example set"/>
      <connect from_op="Loop Labels" from_port="out 1" to_op="Select" to_port="collection"/>
      <connect from_op="Select" from_port="selected" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

d1m0s · June 2010

Thanks,

Do you have a screenshot, my RM doesn't generate the process diagram for this XML (I'm not good at reading XML)

It's ok. I've got the diagram.

haddock · June 2010

Mmm,

If I copy all of my XML code into my XML tab and press the green tick to parse it up all works fine; posting pix is almost useless for debug purposes.

d1m0s · June 2010

Thanks, pasting XML didn't work out I don't know why. Usually it works.

I created and XML file in notepad and then used import functionality of RM. It worked.

I'll try to modify my process according to this method that you proposed. This Loop Labels operator is too tough:) Many many thanks again:)

haddock · June 2010

Cool, hope it goes well

RM is good in many many ways, but documentation is not one of them, so finding your way around takes time. I wish they would understand that beginners will judge the product by the manual, because that is what they need most when they begin.

d1m0s · June 2010

IMHO they intentionally don't document properly to push their paid services

the learning curve is just too steep

haddock · June 2010

You could well be right, on the other hand they need to earn a crust. The question is whether this strategy is better for them. There could well be potential paying customers who are put off by the feeling that they would be on the hook - they would only have to look at this forum to see how difficult some people find even the most obvious tasks

Underneath this there are generic stresses caused by the transition of Machine Learning in the uni seminar to Datamining in the accounts department - it would be difficult to cater for the entire spectrum of ignorance. But not impossible! Surely some decision tree could be induced to assist ;D but only if someone can be bothered I guess, or someone dips into their pocket..

But now it is lunchtime here in southern France, and it is Friday, which in England is dedicated to poets...

( P#ss Off Early Tomorrow's Saturday )

land · June 2010

Hi folks,
as you probably have noticed, we do in fact read the entire forum (Although thanks to community members like haddock don't have to answer each single post). And as shown on each startup of RapidMiner, we do have a manual. Currently it's stuck in the translation to english, but for all german users, it's already available and it's for free.
Of course it cannot give you a detailed step by step explanation for each single case that might occur in real life projects. We have around 700 operators, we cannot describe each single combination! And nobody would like to read such a manual, I think

So we have decided to go the following way:
The paper manual and the video tutorials are for beginners, they are introducing into the basics of RapidMiner and are created to help you climbing the steep learning curve.
After you have become familiar with the concepts of RapidMiner, you can read the operator documentations to learn more about each operator. And if you have some detail problem in process design: You are welcome in the forum, where you will get most qualified responses by the people responsible for this part of program and which are experts in the data mining domain. This is a service, you will not even get as a customer of a bigger software company. And here you get it for free...
The third and last tower of our support are the paid courses for people who want to get all the knowledge in a few days without digging for it on their own.

I know you will probably point out that the operator documentation is sometimes a mess. We already tried to clean this up and improve documentation, but it took to much time. Since we don't get paid for this and "need to earn a crust", we had to stop this effort. But we are now going to launch another project, aiming at this weakness: We will create a Wiki, where the operator documentation will be stored. We are hoping the community will help us to improve the documentation and share their knowledge with other users.

I hope this gives you a better understanding of what we are trying to achieve, more adequate than the simple assumption we only want to push our paid service

What of course isn't wrong at all, but we are regarding our free services as best advertisement. Just imagine getting even more detailed answers than here in the forum just five minutes after the problem has occurred! That can save valuable work time.

Greetings,
Sebastian

haddock · June 2010

Bonjour Seb,

Unfortunately your post got me to thinking, so the following is all your fault

Let's be brutally honest, you'd have to stretch the imagination to claim that the users of this forum form a community. It serves a different role in practise, one of unpaid support.

As you know, I'm pig ignorant on most things, so I read around, and I came across this...

http://www.drdobbs.com/open-source/222002070

according to which you could quite reasonably charge for posting questions, once each operator gets an adequate entry in OperatorsCoreDocumentation.xml and a usage example. People are quite happy to pay for service, and even happier not to pay for it.

Either way we should all acknowledge that you, as a company, contribute good support and get precious little in return.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

joining multiple result sets (example sets) and exporting them to a file at once

Answers