"Read CSV to example set"

MonacoMonaco Member Posts: 5 Contributor II
edited June 12 in Help
Hi,

Just beginning RapidMiner experiment & having trouble with "Read CSV" operator.
I can output the data to res  (and see the ExampleSet), but when other operators require an example set in input, no data is available. Is this a limitation of Read CSV or is there a way to make the data available as an example set ?
Regards.
Tagged:

Answers

  • haddockhaddock Member Posts: 849  Guru
    HI, and welcome!

    Start Rapidminer and go Help->Tutorial, that will load runnable examples, so you have some idea of what RM can and cannot do. Believe me, it saves time in the long run!

  • colocolo Member Posts: 236  Guru
    Hi Monaco,

    if your operator provides an example set to the results port of the process, it will do the same for other operators. Did you check the connection from the output port of "Read CSV" to the input port of the following operator? Perhaps you might want to post your process (code from XML tab) here to reveal possible mistakes in process design.

    Regards
    Matthias
  • MonacoMonaco Member Posts: 5 Contributor II
    Hi Colo,

    Many thanks for your quick reply.
    Here is the code (nothing fancy). Doesn't work with CSV Reader but works well with Read Excel or Retrieve.
    When you are modifying the file that has been stored as a Data Table in the repository, do you know how to automaticaly update this Data Table ?

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="426" width="673">
          <operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
            <parameter key="csv_file" value="D:\Data.csv"/>
            <parameter key="date_format" value="yyyyMMdd"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="locale" value="French"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.date.id"/>
              <parameter key="1" value="Data.true.integer.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="179" y="30">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="1"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="Data"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MonacoMonaco Member Posts: 5 Contributor II
    haddock wrote:

    HI, and welcome!

    Start Rapidminer and go Help->Tutorial, that will load runnable examples, so you have some idea of what RM can and cannot do. Believe me, it saves time in the long run!


    Hi Haddock,

    Thank you for your insight. I've studied this tutorial last week and effectively the ressource is amazingly powerful and educative. But I haven't found an answer to my current problem. I've posted the code, but I don't think it will help. You can try for yourself with a very simple csv file, when you drag the mouse cursor over the operator output, it indicates "number of examples=-1".
    Regards
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,643  RM Founder
    Aehem, only a quick question: Did you actually have executed the process (i.e. pressed the "Play" icon in the toolbar?). Does it work then?

    Cheers,
    Ingo
  • MonacoMonaco Member Posts: 5 Contributor II
    Hi Ingo,

    When I execute the process, I works fine to display the data (even if number of example set=-1). But when I add a windowing operator, which requires a number of example set superior to the horizon (set to 1), it fails.
    Cheers
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,643  RM Founder
    Ok, then try the following:

    1. Load the data with "Read CSV", add an operator "Store" and save the data set directly again in your repository.
    2. Drag the freshly saved data from your repository (it will be transformed into a new operator named "Retrieve" which will load the data for you from the repository)

    Try again with this data set loaded with "Retrieve". Expected behaviour: Everything works like expected. Reason for your confusion: Search in the forum for "Repository" and "meta data". Best solution for you: Book a training at Rapid-I - it definitely will help  :D
    This would probably also the best option if you do not know what I mean with "Repository"  ;D

    Cheers,
    Ingo

    P.S. (for the more experienced readers here...): I never did expect that this - definitely very unique and innovative - feature of RapidMiner called "meta data propagation" would cause so much uncertainty for some users. I am open for all suggestions how we could make the difference more clear between "meta data" and "actual data" and why it is sometimes impossible to provide meta data (like for CSV files...)
  • colocolo Member Posts: 236  Guru
    Hi Monaco,

    just to be sure... you didn't use the "Window Document" operator after "Read CSV", did you? Which operators did you try?
    I hoped you would post your process with this second operator to reveal possible problems ;)

    Regards
    Matthias
  • MonacoMonaco Member Posts: 5 Contributor II
    Hey Ingo,

    Just read your post at http://rapid-i.com/rapidforum/index.php/topic,2902.msg11559.html#msg11559
    Frequent update of my csv files is why I don't use the repository (unless there is a way to easily and automatically update it).
    I don't understand why the same data can be output when in xls and can't in csv format. Fortunately I have found alternative ways to properly deal with this issue, but I would have prefered (it's not crucial) to output directly fron Read CSV.
    Many thanks for your support.

    Best regards.
  • dragoljubdragoljub Member Posts: 241  Maven
    Read CSV should pass the data correctly assuming you have set all the attributes types & special attributes correctly . Most times read CSV just produces the raw data, you still need to set things like labels, special attributes etc. Also maybe your values are not read in as reals or integers and imported as some wrong data type like polynomial. This can cause all types of problems. It might just be easier to run an import process right before you run your analysis, to make sure your data is perfect.

    -Gagi
  • SKOMSKOM Member Posts: 1 Contributor I
    I've just run into a similar problem, with "Read CSV" output number of examples = -1, and one of subsequent nodes not working. Since apparently it's a feature and not a bug  :P , shouldn't the operator description include something like "recommended use with Store and Retrieve modes"?

    Best,
    PK
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    SKOM wrote:

    I've just run into a similar problem, with "Read CSV" output number of examples = -1, and one of subsequent nodes not working. Since apparently it's a feature and not a bug  :P , shouldn't the operator description include something like "recommended use with Store and Retrieve modes"?

    Best,
    PK
    Good idea, we should probably promote the complete repository-based approach better to our users and explain why it is often easier to use than file-based approaches.

    Best regards,
    Marius
Sign In or Register to comment.