Options

Problem with Store / Retrieve

hughesfleminghughesfleming Member Posts: 14 Contributor II
edited November 2018 in Help
I am reading two CSV files that are generated by an application that sets up training data and out of sample data and these CSV's are updated daily. When I start the process in rapidminer, I store these CSV's to the repository with a file name and then retrieve them. The CSV's are coming in properly with the read operator but sometimes the retrieve operator brings in the previous days stored data and not the current day's data. I don't remember having this problem under OSX and reading the CSV's from a network drive. I am now having this problem running my process under Windows 7 64bit and I have to run the process a couple of times before it brings in the correctly stored CSV. I am at a loss as this used to be straight forward. Anyone have any ideas?

Many thanks,

Alex Fleming

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Alex,

    can you please attach sample processes where you store and retrieve the data?
    Which repository type are you using? Is it a local repository, or a remote repository on a RapidAnalytics server?

    Best,
    Marius
  • Options
    hughesfleminghughesfleming Member Posts: 14 Contributor II
    Thanks Marius, this is running on a local repository. Today I separated out the Read CSV / store components to a separate processes and ran that first and that seems to help. The basic process starts with the attached xml and my process continues with the data received from the repository. This I did to get around some metadata issues that I was getting when trying to read the CSV directly into the process.

    regards,

    Alex

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="633" width="1224">
          <operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV" width="90" x="179" y="120">
            <parameter key="csv_file" value="C:\MT4-2\Broco Trader\experts\files\EURUSDTD,D1.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.date_time.attribute"/>
              <parameter key="1" value="Time.false.binominal.attribute"/>
              <parameter key="2" value="Open.true.real.attribute"/>
              <parameter key="3" value="High.true.real.attribute"/>
              <parameter key="4" value="Low.true.real.attribute"/>
              <parameter key="5" value="Close.true.real.attribute"/>
              <parameter key="6" value="ACLV.true.real.attribute"/>
              <parameter key="7" value="Range1.true.real.attribute"/>
              <parameter key="8" value="Range2.true.real.attribute"/>
              <parameter key="9" value="Range3.true.real.attribute"/>
              <parameter key="10" value="Range4.true.real.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store" width="90" x="313" y="120">
            <parameter key="repository_entry" value="Training Data"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV (2)" width="90" x="179" y="210">
            <parameter key="csv_file" value="C:\MT4-2\Broco Trader\experts\files\EURUSDNN,D1.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.date_time.attribute"/>
              <parameter key="1" value="Time.false.binominal.attribute"/>
              <parameter key="2" value="Open.true.real.attribute"/>
              <parameter key="3" value="High.true.real.attribute"/>
              <parameter key="4" value="Low.true.real.attribute"/>
              <parameter key="5" value="Close.true.real.attribute"/>
              <parameter key="6" value="ACLV.true.real.attribute"/>
              <parameter key="7" value="Range1.true.real.attribute"/>
              <parameter key="8" value="Range2.true.real.attribute"/>
              <parameter key="9" value="Range3.true.real.attribute"/>
              <parameter key="10" value="Range4.true.real.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store (2)" width="90" x="313" y="210">
            <parameter key="repository_entry" value="OutofSample Data"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve (2)" width="90" x="514" y="210">
            <parameter key="repository_entry" value="OutofSample Data"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="514" y="120">
            <parameter key="repository_entry" value="Training Data"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
          <connect from_op="Read CSV (2)" from_port="output" to_op="Store (2)" to_port="input"/>
          <connect from_op="Retrieve (2)" from_port="output" to_port="result 2"/>
          <connect from_op="Retrieve" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    so basically you now got your processes working?

    The process you posted above looks fine. However, since the Retrieve operators don't have an input port, the execution order is sometimes a bit "random" (in fact it is deterministic, but might seem random depending on the order you dragged the operators on the process :) ), meaning that the Retrieve operators might be executed before the Store operators, which might have caused the seemingly strange behaviour of your process. To control the operator execution order, click the blue up-down-arrow icon on the top right of the process pane. Hint: to make an operator the first one to be executed, right click on it and select "bring to front".

    Best,
    Marius
  • Options
    hughesfleminghughesfleming Member Posts: 14 Contributor II
    Hi Marius, yes in a way I have things working but not quite the way I expected. Basically I am importing and storing 12 Csv files in one process and then executing six more processes each with two SVM's so seven processes in total. I am now trying to figure out how to execute all these in sequence at a specific time. Is there a tutorial somewhere that explains how to do this?

    Many thanks,

    Alex
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, create a new process which uses the Execute Process operator to run all other processes.
    To schedule the process you might want to have a look at our RapidAnalytics server. If you think that's overkill, just create a cron job (if you are on unix) to call RapidMiner and specify the process you want to run as command line argument.

    Best, Marius
Sign In or Register to comment.