RapidMiner

‎08-29-2017 06:42 PM

 

 yahoo.pngSo I believe I have finally found a decent alternative to the old Yahoo Finance API so you can bring financial market data directly into RapidMiner for data analysis.  A company called Alpha Vantage has developed an API that appears to perform the same functions. 

 

You can get a free API key on their website, and you can use the process below directly into RapidMiner 7.6:

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="set_macros" compatibility="7.6.000" expanded="true" height="68" name="Set Macros" width="90" x="45" y="238">
<list key="macros">
<parameter key="apiKey" value="YROTKAFWV8NITPGL"/>
<parameter key="tickerSymbol" value="AAPL"/>
<parameter key="function" value="TIME_SERIES_DAILY"/>
<parameter key="outputSize" value="compact"/>
</list>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="238">
<list key="attribute_values"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="238">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="url" value="https://www.alphavantage.co/query?function=%{function}&amp;outputsize=%{outputSize}&amp;symbol=%{tickerSymbol}&amp;apikey=%{apiKey}"/>
<list key="request_properties"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.000" expanded="true" height="82" name="Subprocess" width="90" x="447" y="238">
<process expanded="true">
<operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="45" y="34">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights">
<parameter key="foo" value="1.0"/>
</list>
</operator>
<operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="179" y="34"/>
<operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="313" y="34"/>
<operator activated="true" class="numerical_to_real" compatibility="7.6.000" expanded="true" height="82" name="Numerical to Real" width="90" x="447" y="136">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Time.*"/>
</operator>
<operator activated="true" class="de_pivot" compatibility="7.6.000" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="34">
<list key="attribute_name">
<parameter key="value" value="Time.*"/>
</list>
<parameter key="index_attribute" value="date"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<operator activated="true" class="split" compatibility="7.6.000" expanded="true" height="82" name="Split" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date"/>
<parameter key="split_pattern" value="[.]\s"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace" width="90" x="782" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="Time Series \(Daily\)[.]"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace (2)" width="90" x="916" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="[.][0-9]"/>
</operator>
<operator activated="true" class="pivot" compatibility="7.6.000" expanded="true" height="82" name="Pivot" width="90" x="1050" y="34">
<parameter key="group_attribute" value="date_1"/>
<parameter key="index_attribute" value="date_2"/>
<parameter key="consider_weights" value="false"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="1184" y="34">
<parameter key="replace_what" value="value[_]"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="1318" y="34">
<parameter key="replace_what" value="Meta Data[.][0-9][.]\s"/>
</operator>
<operator activated="true" class="rename" compatibility="7.6.000" expanded="true" height="82" name="Rename" width="90" x="1452" y="34">
<parameter key="old_name" value="date_1"/>
<parameter key="new_name" value="Date"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.000" expanded="true" height="82" name="Nominal to Date" width="90" x="1586" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.000" expanded="true" height="82" name="Set Role" width="90" x="1720" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_port="in 1" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
<connect from_op="Combine Documents" from_port="document" to_op="JSON To Data" to_port="documents 1"/>
<connect from_op="JSON To Data" from_port="example set" to_op="Numerical to Real" to_port="example set input"/>
<connect from_op="Numerical to Real" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
<connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">clean up</description>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Subprocess" to_port="in 1"/>
<connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="201" resized="false" width="640" x="35" y="18">Alpha Vantage API(alphavantage.co)&lt;br&gt;Author: Scott Genzer&lt;br&gt;Published: &amp;#8206;&amp;#8206;8-25-2017&lt;br&gt;Link: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Real-Time-Financial-Data-via-Alpha-Venture-API-alternative-to/ta-p/41119&lt;br&gt;&lt;br&gt;Note: For each process below, enter your Alpha Vantage API key, ticker symbol and other request elements in the Set Macros operator before running (see https://www.alphavantage.co/documentation/)&lt;br&gt;&lt;br&gt;</description>
</process>
</operator>
</process>

 Feedback welcome.  Enjoy!

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Comments
Unicorn
Unicorn

Wonderful.  Works like a charm!

Hi Scott,

 

I implemented your XML into a subprocess and building block. But I don't understand the result sets of this process.

See the following pictures. Each example set from the Alpha Vantage process contains different attributes. Can you explain why?

 

TSLA3.jpegFor any subsequent operator the Alpha Vantage process delivers an example set with just 3 attributes

TSLA2.jpegAlpha Vantage result example set written to CSV delivers a completely other example set.

TSLA1.jpegAlpha Vantage example set connected to a process output port.

I don't understand. What is happening here?

Why does the result example set of the Alpha Vantage process shows 3 types of formatting dependant of the subsequent operator?

 

Best regards,

Luc

Community Manager Community Manager
Community Manager

Hello Luc,

 

I think what is happening here is that the metadata is not being pushed through.  This is typical for Enrich Data via Webservice as RapidMiner has no idea what attributes are coming its way.  It's only picking up the ones that I rename manually afterwards: Date, date_3 and value.  Don't worry - all your attributes are there and you can "select" as many/few as you want.  It's just that you may need to add them manually rather than the nice arrow in Select Attributes.

 

Screen Shot 2017-10-08 at 9.41.11 PM.png

 

Scott

Thanks Scott 🙂

To other users of this building block:

To get TIME_SERIES_INTRADAY example sets, don't forget to adjust the url REGEX in the "Enrich Data by Webservice (3)" operator that compiles the URL string. For TIME_SERIES_INTRADAY examples sets the required "interval" parameter and related value are neccesary in the url. To accomplish this add an extra Macro "requiredInterval" like this:

 

requiredInterval.jpegAdd requiredInterval Macro

And adjust the url REGEX in the "Enrich Data by Webservice (3)" operator like this:

https://www.alphavantage.co/query?function=%{function}&outputsize=%{outputSize}&symbol=%{tickerSymbo...}&interval=%{requiredInterval}

Contributor II 33312defgj
Contributor II

@luc_bartkowski

Hi Luke , Im a novice, could you tell me please or post a sample how to ajust the date time elements of this process when applying the INTRADAY

Ive made the changes as specified the process fails. Any help advice appreciated. Regards Lee

fail.pngfail

Hi Lee,
Sorry for my late response.
I'm sorry to say but I didn't continue to use the process of @sgenzer.

 

I created a database using MySQL for daily and intraday stock prices.

I wrote a python program that continuously updates this database.

Accessing stock price data from RapidMiner is then no more than accessing tables/views from this database.

Contributor II 33312defgj
Contributor II

@luc_bartkowski

 

Hi Luc sorry for my even later reply Smiley Happy

 

Do you have an intraday model working I could borrow :/...

 

I got it working on the daily, but can seem to get it to run on the intraday, prices, mabey its the formatting of the date idk.

 

regards lee

 

 

Hi Lee,

 

The thing with the Alpha vantage API is that Intraday returns an extra attribute, the Interval e.g. 5min, 10min, 15min etc. @sgenzer's RM process has to be extended to support this extra parameter.

I haven't a RM intraday model. I discovered that AV doesn't always return the correct answer, sometimes I receive an API error instead of a resultset. In my python script I included a retry function if such response is received. Secondly: I want to store the data in a database. That requires CRUD functionality that I also included in my python script. Sorry I can't help you in response to your request.