The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Real-Time Financial Data via Alpha Venture API (alternative to Yahoo Finance API)
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
So I believe I have finally found a decent alternative to the old Yahoo Finance API so you can bring financial market data directly into RapidMiner for data analysis. A company called Alpha Vantage has developed an API that appears to perform the same functions.
You can get a free API key on their website, and you can use the process below directly into RapidMiner 7.6:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="set_macros" compatibility="7.6.000" expanded="true" height="68" name="Set Macros" width="90" x="45" y="238">
<list key="macros">
<parameter key="apiKey" value="your-KEY-here"/>
<parameter key="tickerSymbol" value="AAPL"/>
<parameter key="function" value="TIME_SERIES_DAILY"/>
<parameter key="outputSize" value="compact"/>
</list>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="238">
<list key="attribute_values"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="238">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="url" value="https://www.alphavantage.co/query?function=%{function}&outputsize=%{outputSize}&symbol=%{tickerSymbol}&apikey=%{apiKey}"/>
<list key="request_properties"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.000" expanded="true" height="82" name="Subprocess" width="90" x="447" y="238">
<process expanded="true">
<operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="45" y="34">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights">
<parameter key="foo" value="1.0"/>
</list>
</operator>
<operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="179" y="34"/>
<operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="313" y="34"/>
<operator activated="true" class="numerical_to_real" compatibility="7.6.000" expanded="true" height="82" name="Numerical to Real" width="90" x="447" y="136">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Time.*"/>
</operator>
<operator activated="true" class="de_pivot" compatibility="7.6.000" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="34">
<list key="attribute_name">
<parameter key="value" value="Time.*"/>
</list>
<parameter key="index_attribute" value="date"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<operator activated="true" class="split" compatibility="7.6.000" expanded="true" height="82" name="Split" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date"/>
<parameter key="split_pattern" value="[.]\s"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace" width="90" x="782" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="Time Series \(Daily\)[.]"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.000" expanded="true" height="82" name="Replace (2)" width="90" x="916" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date_1"/>
<parameter key="replace_what" value="[.][0-9]"/>
</operator>
<operator activated="true" class="pivot" compatibility="7.6.000" expanded="true" height="82" name="Pivot" width="90" x="1050" y="34">
<parameter key="group_attribute" value="date_1"/>
<parameter key="index_attribute" value="date_2"/>
<parameter key="consider_weights" value="false"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="1184" y="34">
<parameter key="replace_what" value="value[_]"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.6.000" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="1318" y="34">
<parameter key="replace_what" value="Meta Data[.][0-9][.]\s"/>
</operator>
<operator activated="true" class="rename" compatibility="7.6.000" expanded="true" height="82" name="Rename" width="90" x="1452" y="34">
<parameter key="old_name" value="date_1"/>
<parameter key="new_name" value="Date"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.000" expanded="true" height="82" name="Nominal to Date" width="90" x="1586" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.000" expanded="true" height="82" name="Set Role" width="90" x="1720" y="34">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_port="in 1" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
<connect from_op="Combine Documents" from_port="document" to_op="JSON To Data" to_port="documents 1"/>
<connect from_op="JSON To Data" from_port="example set" to_op="Numerical to Real" to_port="example set input"/>
<connect from_op="Numerical to Real" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
<connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">clean up</description>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Subprocess" to_port="in 1"/>
<connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="201" resized="false" width="640" x="35" y="18">Alpha Vantage API(alphavantage.co)<br>Author: Scott Genzer<br>Published: &#8206;&#8206;8-25-2017<br>Link: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Real-Time-Financial-Data-via-Alpha-Venture-API-alternative-to/ta-p/41119<br><br>Note: For each process below, enter your Alpha Vantage API key, ticker symbol and other request elements in the Set Macros operator before running (see https://www.alphavantage.co/documentation/)<br><br></description>
</process>
</operator>
</process>
Feedback welcome. Enjoy!
Scott
Tagged:
7
Comments
Wonderful. Works like a charm!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Scott,
I implemented your XML into a subprocess and building block. But I don't understand the result sets of this process.
See the following pictures. Each example set from the Alpha Vantage process contains different attributes. Can you explain why?
For any subsequent operator the Alpha Vantage process delivers an example set with just 3 attributes
Alpha Vantage result example set written to CSV delivers a completely other example set.
Alpha Vantage example set connected to a process output port.
I don't understand. What is happening here?
Why does the result example set of the Alpha Vantage process shows 3 types of formatting dependant of the subsequent operator?
Best regards,
Luc
Hello Luc,
I think what is happening here is that the metadata is not being pushed through. This is typical for Enrich Data via Webservice as RapidMiner has no idea what attributes are coming its way. It's only picking up the ones that I rename manually afterwards: Date, date_3 and value. Don't worry - all your attributes are there and you can "select" as many/few as you want. It's just that you may need to add them manually rather than the nice arrow in Select Attributes.
Scott
Thanks Scott ?
To other users of this building block:
To get TIME_SERIES_INTRADAY example sets, don't forget to adjust the url REGEX in the "Enrich Data by Webservice (3)" operator that compiles the URL string. For TIME_SERIES_INTRADAY examples sets the required "interval" parameter and related value are neccesary in the url. To accomplish this add an extra Macro "requiredInterval" like this:
Add requiredInterval Macro
And adjust the url REGEX in the "Enrich Data by Webservice (3)" operator like this:
https://www.alphavantage.co/query?function=%{function}&outputsize=%{outputSize}&symbol=%{tickerSymbol}&apikey=%{apiKey}&interval=%{requiredInterval}
@luc_bartkowski
Hi Luke , Im a novice, could you tell me please or post a sample how to ajust the date time elements of this process when applying the INTRADAY
Ive made the changes as specified the process fails. Any help advice appreciated. Regards Lee
fail
Hi Lee,
Sorry for my late response.
I'm sorry to say but I didn't continue to use the process of @sgenzer.
I created a database using MySQL for daily and intraday stock prices.
I wrote a python program that continuously updates this database.
Accessing stock price data from RapidMiner is then no more than accessing tables/views from this database.
@luc_bartkowski
Hi Luc sorry for my even later reply
Do you have an intraday model working I could borrow ...
I got it working on the daily, but can seem to get it to run on the intraday, prices, mabey its the formatting of the date idk.
regards lee
Hi Lee,
The thing with the Alpha vantage API is that Intraday returns an extra attribute, the Interval e.g. 5min, 10min, 15min etc. @sgenzer's RM process has to be extended to support this extra parameter.
I haven't a RM intraday model. I discovered that AV doesn't always return the correct answer, sometimes I receive an API error instead of a resultset. In my python script I included a retry function if such response is received. Secondly: I want to store the data in a database. That requires CRUD functionality that I also included in my python script. Sorry I can't help you in response to your request.
Hi,
Is it possible to get step by step tutorial on this?
Kind regards,
Mareks
h @10329563 - so it's even easier now than it was when I wrote that article. Just do the following:
1. Go to the Alpha Vantage website and get an API key.
2. Go to the Alpha Vantage Process which is already in the Community Repository by clicking on this link.
3. Follow these steps:
Click "Apply" and run the process. Done.
Happy RapidMining!
Scott
Hi, has anyone here ever accessed Accurint (https://secure.accurint.com/app/bps/main) via RapidMiner? This is the best fee-based site for people searches and many other public records. Any tips at all would be greatly appreciated, as I'm just a beginner at this.
hi @cleisner - nope not to my knowledge. Happy to collaborate on making it connect if you are a subscriber.
Scott
Hi, @sgenzer. It looks like I won't be using Accurint after all. It turns out that this is a highly regulated data source and I can't use it as planned.
Hello Scott,
thanks for uploading the process. I still have the problem to get the attributes the useable.
For example I want to calculate the moving average of the close data. I´m not able to select the data.
Thanks for your help
Björn
hello @bjoernbalu can you post your XML so I can see what you're doing?
Hello @sgenzer
here is the XML. The API is removed.
hmm not sure...can you show what the ExampleSet looks like right at that Multiply operator (put a breakpoint).
Hi,
there are still 13 regular attributes, but in the list are only 3 attributes (value, date_3, Date) available to use.
The interesting thing is that everything is working correctly when I storing the data in the repository. After retrieving the attributes are available in the moving average operator. I don´t understand that.
Thanks for your help and your tutorials.
Björn
yes so it is quite common for a RapidMiner process to lose its metadata along the way for various reasons. You can always type the name of the attribute directly in the parameters box even if it does not show up in the pull-down menu:
Hi, thanks seems to be highly useful..
1) the Alpha Vantage API makes use of other Web Extensions; Web mining and Text processing, be prepared to install those. The Alpha Vantage API appeared in red, it was starightforward enough to click the warning message and follow the steps.
2) When I run it now, only the Date column appears. No price or instrument info. Can't see anything obvious, will try with breakpoints to inspect the data..
Thanks again
Philip
Hi, so it looks like the Attributes are not available for selection. Might be a result of the pivot? Don't know enough about this yet to be able to tell. The data is definitely there, right up until Set Role (connect Set Role exa to Out and the result shows exactly what I expect)
I guess I am wondering whether I need the Select Attributes at all.
Will have to see when I try to analyse the data..
Thank you.
Thanks @PhilipD. Keep me posted and let me know if I need to make changes to the template.
Scott
Scott
Hi there, I’m new to this and machine learning but not new to algorithmic trading. I'm sorry for the many questions here but I had no idea how complicated it would be in getting financial data ready for use with ML.
Using Excel files and RM:
I'm assuming that if I want to do an update Excel and get more data from Yahoo Finance (via a Run Web Query) that I will then have to import a newer version of an Excel file into RM and lose the RM work I have on done on eg, an older Excel file that only has data up to May 2020 for example, as opposed to having the very latest data up to the present (10/06/2020)?
It would seem that it's easier if real time data is being pulled directly into RM rather that using Excel? So I read Scotts post above which contained:
"2. Go to the Alpha Vantage Process which is already in the Community Repository by clicking on this link" (I've had to quote it as there is no url to refernece for individual posts?)
But the link does nothing and the first image I am not sure how to achieve/get to that in RM?
Neither is there a Alpha Vantage Process in the Marketplace when I search manually?
Also how do you bring in/call some of those financial indicators on the Alpha Vantage website so that they are alongside the open/High/Low/Close/Volume data? With my Excel solution I will have the O/H/L/C/V data plus the indicator values for each day.
Also do you know why this website freezes/very slow in Chrome (compared to Firefox where I had to finish off this post)?
Thanks very much.
I was just wondering how regularly these forums are monitored or if I it would have been better to have written a fresh post?
Thanks.
hello @SkyTrader these forums are monitored all the time but it is highly variable whether or not someone has an answer to a specific question.
As for this post, I will be honest that I have not used AlphaVantage for a long time (pretty much since I wrote that post back in 2017. I know there are a bunch of RM users here on the community who work with financial data...my friend @hughesfleming68 immediately comes to mind. Perhaps he has some thoughts?
Scott
RapidMiner processes are designed to be able to be run and rerun very easily. Once you have created a process that takes data in from an underlying source and does additional ETL, it is very simple to rerun that process when you have more data that needs processing. As long as nothing about the underlying data structure has changed, the simple addition of more rows will not inhibit RapidMiner from doing its work efficiently. So losing work on an older workbook is not typically a concern since that work can be easily recreated
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts