rloserrloser Member Posts: 1 Contributor I
edited June 2020 in Help
I am new to Dataminer and programming.  We have a MSSQL 2008 database which I need to query.
Is there some way to create an input interface which would allow me change the data-time variable which is the only thing
that changes in the query?  We use a "start-time" and "stop-time" variable.

That is:  WHERE start-time  > '2011-11-01 03:00:00' and start-time < '2011-11-01 04:45:00'

I would like to have two variables StartTime1  and StartTime2 such that I could use that as a variable instead of the having to
input the times.  We have anywhere from two to nine areas where the time is referenced.  Having a variable instead of needing to
change each entry would greatly speed up the process as I need to run queries on over six months worth of data with various start-stop times and durations.

Thanks in advance.


  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, the process below shows an example on how to use so called Macros in RapidMiner in combination with Read Database.

    Set Macro defines a macro, i.e. a variable, called birthdate with a certain value. To use the macro later on, just enter the macro name into any parameter field of any operator like this: %{birthdate}.

    Read Database executes a query:
    SELECT *
    FROM `dimlead` WHERE birthdate < ?
    Please not the questionmark: it indicates, that this statement is a prepared statement. That means, that any questionmarks will be replaced by the operator during execution with certain values. The values can be specified if you enable the "prepare statement" parameter and then add an entry to the "parameters". There you specify VARCHAR as type and as value "%{birthdate}".

    When you execute this operator, two things will happen:
    1. %{birthdate} is replaced by '1954-01-01'
    2. The questionmark in the SQL statement is replaced by '1954-01-01'

    Even though prepared statements may seem a bit complicated, it's quite easy in reality, and they prevent nasty things like SQL injection etc.

    Hope this helps! If you have any question left, please ask.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="670" width="727">
          <operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="60" name="Set Macro" width="90" x="45" y="30">
            <parameter key="macro" value="birthdate"/>
            <parameter key="value" value="1954-01-01"/>
          <operator activated="true" class="read_database" compatibility="5.2.006" expanded="true" height="60" name="Read Database" width="90" x="179" y="30">
            <parameter key="connection" value="cecile_test"/>
            <parameter key="query" value="SELECT *&#10;FROM `dimlead` WHERE birthdate &lt; ?"/>
            <parameter key="table_name" value="dimlead"/>
            <parameter key="prepare_statement" value="true"/>
            <enumeration key="parameters">
              <parameter key="parameter" value="VARCHAR.%{birthdate}"/>
          <connect from_op="Read Database" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
  • AsokaAsoka Member Posts: 14 Contributor II
    This is almost exactly my question I was hoping to find an answer for.  In my case, I'm looking to provide a list of dates (or dates and times) to a Read Database command, and use that list in the WHERE clause of the Read Database.

    So if I had 5 dates:

    The Read Database would execute once for each date, along with the downstream processing (that includes writing the result back to the database).  Then loop and repeat for the next date.

    The real problem I'm working around is that the data sets I am querying from the database are too large to fit into memory, but if I work through them in smaller chunks, it all runs fine.

    Hints are greatly appreciated - it looks like I'll also be looking for more information about macros.

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    By chunks you mean the dataset for each date? Then you could use Loop Values to iterate the example set containing the dates, and then use the technique described in my first post (prepare query) to retrieve the data of the current date.

  • AsokaAsoka Member Posts: 14 Contributor II
    I know I'm getting closer Marius, but I think there's basic "use LOOP" syntax that I'm not understanding.

    I've done some hunting for documentation on any flavor of Loop, but I'm not finding anything.  Do you know of a short "here's how to use Loop Value" video or other primer?

    In effect, we're using the LOOP VALUE operator to establish a cursor, and then using the cursor variable on the inner set of queries to do some processing.  The concept is clear, but getting them linked back and forth isn't working at a very basic level (I can get a straight passthrough on the inner loop to show me the loop values, but nothing else will pass the error stage).

    And yes - by 'chunks', I mean those date (or in this case, hour) increments of the data set.

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    Loop Values iterates all unique values of the specified attribute, and sets a so-called macro to that value. Then it executes its inner process. To use the value of the macro, you write its name into a percentage sign followed by curly braces, like this: %{myMacro}

    The attached process simply outputs the current loop value to the console, but you can use macros in any parameter text field (e.g. in Read Database)
    This requires of course that you already retrieved the list of dates over which you want to iterate.

    Best regards,
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="328" width="803">
          <operator activated="true" class="generate_nominal_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30"/>
          <operator activated="true" class="loop_values" compatibility="5.3.000" expanded="true" height="76" name="Loop Values" width="90" x="246" y="30">
            <parameter key="attribute" value="att1"/>
            <process expanded="true" height="346" width="821">
              <operator activated="true" class="print_to_console" compatibility="5.3.000" expanded="true" height="76" name="Print to Console" width="90" x="246" y="30">
                <parameter key="log_value" value="current loop value: %{loop_value}"/>
              <connect from_port="example set" to_op="Print to Console" to_port="through 1"/>
              <connect from_op="Print to Console" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Loop Values" to_port="example set"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
Sign In or Register to comment.