RapidMiner

‎08-29-2017 06:42 PM

 

 yahoo.pngSo I believe I have finally found a decent alternative to the old Yahoo Finance API so you can bring financial market data directly into RapidMiner for data analysis.  A company called Alpha Vantage has developed an API that appears to perform the same functions. 

 

You can get a free API key on their website, and you can use the process below directly into RapidMiner 7.6:

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="set_macros" compatibility="7.6.000" expanded="true" height="68" name="Set Macros" width="90" x="45" y="238">
<list key="macros">
<parameter key="apiKey" value="YROTKAFWV8NITPGL"/>
<parameter key="tickerSymbol" value="AAPL"/>
<parameter key="function" value="TIME_SERIES_DAILY"/>
<parameter key="outputSize" value="compact"/>
</list>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="238">
<list key="attribute_values"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="238">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="url" value="https://www.alphavantage.co/query?function=%{function}&amp;outputsize=%{outputSize}&amp;symbol=%{tickerSymbol}&amp;apikey=%{apiKey}"/>
<list key="request_properties"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.000" expanded="true" height="82" name="Subprocess" width="90" x="447" y="238">
<process expanded="true">
<operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="45" y="34">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights">
<parameter key="foo" value="1.0"/>
</list>
</operator>
<operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="179" y="34"/>
<operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="313" y="34"/>
<operator activated="true" class="numerical_to_real" compatibility="7.6.000" expanded="true" height="82" name="Numerical to Real" width="90" x="447" y="136">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Time.*"/>
</operator>
<operator activated="true" class="de_pivot" compatibility="7.6.000" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="34">
<list key="attribute_name">
<parameter key="value" value="Time.*"/>
</list>
<parameter key="index_attribute" value="date"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<operator activated="true" class="split" compatibility="7.6.000" expanded="true" height="82" name="Split" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date"/>
<parameter key="split_pattern" value="[.]\s"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace" width="90" x="782" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="Time Series \(Daily\)[.]"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace (2)" width="90" x="916" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="[.][0-9]"/>
</operator>
<operator activated="true" class="pivot" compatibility="7.6.000" expanded="true" height="82" name="Pivot" width="90" x="1050" y="34">
<parameter key="group_attribute" value="date_1"/>
<parameter key="index_attribute" value="date_2"/>
<parameter key="consider_weights" value="false"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="1184" y="34">
<parameter key="replace_what" value="value[_]"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="1318" y="34">
<parameter key="replace_what" value="Meta Data[.][0-9][.]\s"/>
</operator>
<operator activated="true" class="rename" compatibility="7.6.000" expanded="true" height="82" name="Rename" width="90" x="1452" y="34">
<parameter key="old_name" value="date_1"/>
<parameter key="new_name" value="Date"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.000" expanded="true" height="82" name="Nominal to Date" width="90" x="1586" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.000" expanded="true" height="82" name="Set Role" width="90" x="1720" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_port="in 1" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
<connect from_op="Combine Documents" from_port="document" to_op="JSON To Data" to_port="documents 1"/>
<connect from_op="JSON To Data" from_port="example set" to_op="Numerical to Real" to_port="example set input"/>
<connect from_op="Numerical to Real" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
<connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">clean up</description>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Subprocess" to_port="in 1"/>
<connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="201" resized="false" width="640" x="35" y="18">Alpha Vantage API(alphavantage.co)&lt;br&gt;Author: Scott Genzer&lt;br&gt;Published: &amp;#8206;&amp;#8206;8-25-2017&lt;br&gt;Link: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Real-Time-Financial-Data-via-Alpha-Venture-API-alternative-to/ta-p/41119&lt;br&gt;&lt;br&gt;Note: For each process below, enter your Alpha Vantage API key, ticker symbol and other request elements in the Set Macros operator before running (see https://www.alphavantage.co/documentation/)&lt;br&gt;&lt;br&gt;</description>
</process>
</operator>
</process>

 Feedback welcome.  Enjoy!

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Comments
RM Certified Expert
RM Certified Expert

Wonderful.  Works like a charm!

Contributor II luc_bartkowski
Contributor II

Hi Scott,

 

I implemented your XML into a subprocess and building block. But I don't understand the result sets of this process.

See the following pictures. Each example set from the Alpha Vantage process contains different attributes. Can you explain why?

 

For any subsequent operator the Alpha Vantage process delivers an example set with just 3 attributesFor any subsequent operator the Alpha Vantage process delivers an example set with just 3 attributes

Alpha Vantage result example set written to CSV delivers a completely other example set.Alpha Vantage result example set written to CSV delivers a completely other example set.

Alpha Vantage example set connected to a process output port.Alpha Vantage example set connected to a process output port.

I don't understand. What is happening here?

Why does the result example set of the Alpha Vantage process shows 3 types of formatting dependant of the subsequent operator?

 

Best regards,

Luc

Community Manager Community Manager
Community Manager

Hello Luc,

 

I think what is happening here is that the metadata is not being pushed through.  This is typical for Enrich Data via Webservice as RapidMiner has no idea what attributes are coming its way.  It's only picking up the ones that I rename manually afterwards: Date, date_3 and value.  Don't worry - all your attributes are there and you can "select" as many/few as you want.  It's just that you may need to add them manually rather than the nice arrow in Select Attributes.

 

Screen Shot 2017-10-08 at 9.41.11 PM.png

 

Scott

Contributor II luc_bartkowski
Contributor II

Thanks Scott 🙂

Contributor II luc_bartkowski
Contributor II

To other users of this building block:

To get TIME_SERIES_INTRADAY example sets, don't forget to adjust the url REGEX in the "Enrich Data by Webservice (3)" operator that compiles the URL string. For TIME_SERIES_INTRADAY examples sets the required "interval" parameter and related value are neccesary in the url. To accomplish this add an extra Macro "requiredInterval" like this:

 

Add requiredInterval MacroAdd requiredInterval Macro

And adjust the url REGEX in the "Enrich Data by Webservice (3)" operator like this:

https://www.alphavantage.co/query?function=%{function}&outputsize=%{outputSize}&symbol=%{tickerSymbo...}&interval=%{requiredInterval}