Open File - not returning data from url
![kludikovsky](https://us.v-cdn.net/6030995/uploads/defaultavatar/nCCNNSPK1YM69.jpg)
![](https://s3.amazonaws.com/rapidminer.community/vanilla-rank-images/maven-16x16.png )
The "Open File" operator does not return anything.
I have tried the following example
http://www.neuralmarkettrends.com/Extracting-OpenStreetMap-Data-In-RapidMiner/
which returned an error.
By analysing the cause I found that the Open File Operator seemed to not return anything useful (even if so said).
I have modified the Read CSV and all work from there on.
But this is no the solution if someone likes to retrieve information from a url.
Can someone please verify my experience and/or explain what's wrong.
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959
Community Manager
ok that was a fun puzzle.
So that URL http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv is a redirect to a https link. This is why the Open File did not work. If you change your URL to https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv, it works perfectly.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="resource_type" value="URL"/>
<parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
<parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
<description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
</operator>
<operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="csv_file" value="/Users/genzerconsulting/Desktop/2.5_day.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="time.true.polynominal.attribute"/>
<parameter key="1" value="latitude.true.real.attribute"/>
<parameter key="2" value="longitude.true.real.attribute"/>
<parameter key="3" value="depth.true.real.attribute"/>
<parameter key="4" value="mag.true.real.attribute"/>
<parameter key="5" value="magType.true.polynominal.attribute"/>
<parameter key="6" value="nst.true.integer.attribute"/>
<parameter key="7" value="gap.true.integer.attribute"/>
<parameter key="8" value="dmin.true.real.attribute"/>
<parameter key="9" value="rms.true.real.attribute"/>
<parameter key="10" value="net.true.polynominal.attribute"/>
<parameter key="11" value="id.true.polynominal.attribute"/>
<parameter key="12" value="updated.true.polynominal.attribute"/>
<parameter key="13" value="place.true.polynominal.attribute"/>
<parameter key="14" value="type.true.polynominal.attribute"/>
<parameter key="15" value="horizontalError.true.real.attribute"/>
<parameter key="16" value="depthError.true.real.attribute"/>
<parameter key="17" value="magError.true.real.attribute"/>
<parameter key="18" value="magNst.true.integer.attribute"/>
<parameter key="19" value="status.true.polynominal.attribute"/>
<parameter key="20" value="locationSource.true.polynominal.attribute"/>
<parameter key="21" value="magSource.true.polynominal.attribute"/>
</list>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Scott
1
Answers
hi @kludikovsky - Open File is probably not what you're looking for here. That's usually for opening a local file. I know it has a URL option...I would highly recommend trying the Get Page operator in the Web Mining extension instead.
Scott
hi @sgenzer,
thanks fo replying.
I am trying out some things here just to understand it.
So using something different is helpful but does not solve some of the issues with the operators.
If there is a function that supposes to do something and does not do it, this costs an enormous effort when trying to apply it in real tasks, because you always search for your own bug, even as the the function misbehaves, which is not known.
So it should be clear if the functions behaves properly, and what the settings should be, or correct them or the understanding.
I also can't see that the Get Page will provide me with a file which can be used as an example set.
hi @Thomas_Ott,
I tried the given USGS link and that worked in the browser.
So it does not seem it's from there.
If I donwload the file and read the CSV from the local file it works.
There seems to be something between the open and the read.
BY THE WAY: I am using the lastest version of RM (7.6.001)
hi @kludikovsky - your feedback on the operators is welcome and noted. The "Open File" operator is used to open a data file, not the HTML from a web page, as stated in the Help page: "[Open File] Opens a file for processing by parsing operators. Even if this file points to a data file, like Excel or CSV, this operator returns an uninterpreted blob.". Hence if you enter a URL that points to a datafile, it will work fine as long as you realize that it will pull it in as a blob. As @Thomas_Ott mentioned above, the link you are trying to use is a dead link (it used to point to a CSV) and it sounds like he'll fix it as soon as he can.
As for the "Get Page" operator, this is used to retrieve the HTML of a page, not a data file. You use whichever is more applicable to your use case.
Hope that all makes sense.
Scott
Hi @sgenzer,
thanks for the explanation.
Actually this is what I expected.
As the operator
indicated there is a file at
which I can perfectly download, as can be seen here:
Now according to your statement, that should be passed on to the Read CSV. The Open tells me, that I has got a file.
The Read CSV fails as it does not recognise any input and returns an empty example set.
BUT
if I use the Read CSV to read the downloaded file from the local system (remove the Open File and specify the file in the Read CSV directly), the remainder of the process works as expected.
Which eliminates the possible fault that there is an issue with the format of the CSV.
So I don't see the issue at USGS. I suspect it either at Open File or Read CSV.
Regards,
Kurt
I can confirm @sgenzer's fix. the original url was 'http' but they mustlve changed it to 'https'
Anywho, I updated the process on the tutorial page so you can copy and paste it back in.
Thanks guys.
Hi @sgenzer,
excellent work.
Thanks.