Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Read URL (CSV)

amundamund Member Posts: 22 Maven
edited June 2019 in Help

Hi,

 

I'm trying to use the Read URL-operator to get some financial data. This data is real-time, and I need to re-load it every now and then without having to manually download and edit the data. Example:

 

https://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/csv

 

The Read URL fails because there are some lines on the top of the document (17 lines) that are only informational and not useful for the data set. I need to remove these lines.

 

Is there a way to remove or ignore these lines using RapidMiner operators?

 

Tagged:

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    Are these streaming delayed quotes? I usually download the historical quotes with no problem. I wonder if it has to do with the data feed itself. 

     

    So this works but you'd need to rename the columns and verify that's correct.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="open_file" compatibility="7.5.001" expanded="true" height="68" name="Open File" width="90" x="112" y="34">
    <parameter key="resource_type" value="URL"/>
    <parameter key="url" value="https://www.google.com/finance/getprices?q=AAPL&amp;i=300&amp;i=300&amp;p=100d&amp;f=d,c,h,l,o,v"/>
    </operator>
    <operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="246" y="34">
    <parameter key="csv_file" value="C:\Users\THOMAS~1\AppData\Local\Temp\rm_file_4131079488985567346.dump"/>
    <parameter key="column_separators" value="[0-9]+,[0-9]+,[0-9]+,[0-9]+,[0-9]+,[0-9]+"/>
    <parameter key="comment_characters" value="\"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Comment"/>
    <parameter key="1" value="Comment"/>
    <parameter key="2" value="Comment"/>
    <parameter key="3" value="Comment"/>
    <parameter key="4" value="Comment"/>
    <parameter key="5" value="Comment"/>
    <parameter key="6" value="Comment"/>
    <parameter key="7" value="Unit"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="att1.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="split" compatibility="7.5.001" expanded="true" height="82" name="Split" width="90" x="380" y="34"/>
    <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    In this particular case I would use an Open File operator and a Read CSV operator. The Read CSV allows you to comment out the lines you don't want. See below:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="open_file" compatibility="7.5.001" expanded="true" height="68" name="Open File" width="90" x="112" y="34">
    <parameter key="resource_type" value="URL"/>
    <parameter key="url" value="https://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/csv"/>
    </operator>
    <operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="246" y="34">
    <parameter key="csv_file" value="C:\Users\THOMAS~1\AppData\Local\Temp\rm_file_737176258094901243.dump"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Comment"/>
    <parameter key="1" value="Comment"/>
    <parameter key="2" value="Comment"/>
    <parameter key="3" value="Comment"/>
    <parameter key="4" value="Comment"/>
    <parameter key="5" value="Comment"/>
    <parameter key="6" value="Comment"/>
    <parameter key="7" value="Comment"/>
    <parameter key="8" value="Comment"/>
    <parameter key="9" value="Comment"/>
    <parameter key="10" value="Comment"/>
    <parameter key="11" value="Comment"/>
    <parameter key="12" value="Comment"/>
    <parameter key="13" value="Comment"/>
    <parameter key="14" value="Comment"/>
    <parameter key="15" value="Comment"/>
    <parameter key="16" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="volume:0.true.integer.attribute"/>
    </list>
    </operator>
    <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • amundamund Member Posts: 22 Maven

     

    Hi Thomas,

    I didn't realise I can read an online CSV directly with this operator - that simplifies the process.

     

    The process works except it only returns the volume column (there are also close, high, low, open and volume).

     

    I tried to add the rest of the columns in the data set meta information, but I get the error "the columns does not exist in input data." It must have something to do with how RapidMiner understands the file. Does RapidMiner require column names separated by commas above the example set? I can't find a way to define the columns.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    From what I see if I paste the URL into the browser is that it shows the time series data in CSV format but when selected the comma as delimeter it only shows the Volume. Hmm you might have to use the REGEX option. What does the API documentaiton say about importing this data?

     

    Is an XML data export option available? You might want to try that if it's available. If yes, try the Read XML operator then. 

  • amundamund Member Posts: 22 Maven

    Ok Thomas, 

     

    The data is available as xml: https://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/xml

     

    I will look into it.

     

     

  • amundamund Member Posts: 22 Maven

    Could there be an issue with how RapidMiner handles data using the Open File (URL)  / Read CSV?

     

    I tried using another data source:

    http://www.google.com/finance/getprices?q=AAPL&i=300&i=300&p=100d&f=d,c,h,l,o,v

     

    But I have the same problem with only the first column being selected while the others are left out and not recognized.

    It seems like even though I rule out the first rows using annotations, RapidMiner still remembers that the first row has only one column, and sticks with that

  • amundamund Member Posts: 22 Maven

    Thanks Thomas, that works great!

     

    According to my sources these are live data. I checked the timestamp and it is todays date. Counting minutes from the timestamp it looks like live data to me.

     

    This is the link for minute data (i=xx is seconds):

    http://www.google.com/finance/getprices?q=AAPL&i=60&p=1d&f=d,c,h,l,o,v

     

    The previous one was 5 minute intervals.

     

    Thanks again!

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yeah Google and Yahoo get cranky when you start extracting live data from them so they make it hard to get it. Good luck!

Sign In or Register to comment.