RapidMiner

Contributor II kludikovsky
Contributor II

Open File - not returning data from url

The "Open File" operator does not return anything.

 

I have tried the following example

http://www.neuralmarkettrends.com/Extracting-OpenStreetMap-Data-In-RapidMiner/ 

which returned an error.

 

By analysing the cause I found that the Open File Operator seemed to not return anything useful (even if so said).

I have modified the Read CSV and all work from there on.

But this is no the solution if someone likes to retrieve information from a url.

Can someone please verify my experience and/or explain what's wrong.

10 REPLIES
Community Manager Community Manager
Community Manager

Re: Open File - not returning data from url

hi @kludikovsky - Open File is probably not what you're looking for here.  That's usually for opening a local file.  I know it has a URL option...I would highly recommend trying the Get Page operator in the Web Mining extension instead.

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Expert
RM Certified Expert

Re: Open File - not returning data from url

That's my old post! I noticed that that the USGS website changed how you access the earthquake CSV file. This needs to be updated when I get back. In the meantime just download the CSV file manually and use a Read CSV file
Contributor II kludikovsky
Contributor II

Re: Open File - not returning data from url

hi @sgenzer,

thanks fo replying.

I am trying out some things here just to understand it.

So using something different is helpful but does not solve some of the issues with the operators.

If there is a function that supposes to do something and does not do it, this costs an enormous effort when trying to apply it in real tasks, because you always search for your own bug, even as the the function misbehaves, which is not known.

So it should be clear if the functions behaves properly, and what the settings should be, or correct them or the understanding.

 

I also can't see that the Get Page will provide me with a file which can be used as an example set.

Contributor II kludikovsky
Contributor II

Re: Open File - not returning data from url

hi @Thomas_Ott,

 

I tried the given USGS link and that worked in the browser.

So it does not seem it's from there.

If I donwload the file and read the CSV from the local file it works.

 

There seems to be something between the open and the read.

 

BY THE WAY: I am using the lastest version of RM (7.6.001)

RM Certified Expert
RM Certified Expert

Re: Open File - not returning data from url

Yeah, like I said they changed something so people can't do what I did. I'll try to fix it when I get back to the States.
Community Manager Community Manager
Community Manager

Re: Open File - not returning data from url

hi @kludikovsky - your feedback on the operators is welcome and noted.  The "Open File" operator is used to open a data file, not the HTML from a web page, as stated in the Help page: "[Open File] Opens a file for processing by parsing operators. Even if this file points to a data file, like Excel or CSV, this operator returns an uninterpreted blob.".  Hence if you enter a URL that points to a datafile, it will work fine as long as you realize that it will pull it in as a blob.  As @Thomas_Ott mentioned above, the link you are trying to use is a dead link (it used to point to a CSV) and it sounds like he'll fix it as soon as he can.

 

As for the "Get Page" operator, this is used to retrieve the HTML of a page, not a data file.  You use whichever is more applicable to your use case.

 

Hope that all makes sense.

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Contributor II kludikovsky
Contributor II

Re: Open File - not returning data from url

 

Hi @sgenzer,

 

thanks for the explanation. 

Actually this is what I expected.

As the operator 

              <operator activated="true" class="open_file" compatibility="6.5.002" expanded="true" height="60" name="Open File" width="90" x="45" y="30">
                <parameter key="resource_type" value="URL"/>
                <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
                <parameter key="url" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
                <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
              </operator>

 indicated there is a file at

http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv

which I can perfectly download, as can be seen here:

time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
2017-09-06T06:35:18.550Z,38.8358345,-122.7823334,1.58,2.68,md,50,47,0.006764,0.05,nc,nc72886105,2017-09-06T06:40:03.593Z,"5km WNW of Cobb, California",earthquake,0.15,0.2,0.15,38,automatic,nc,nc
2017-09-06T06:01:32.201Z,60.0517,-148.3932,4.8,2.7,ml,,,,0.8,ak,ak16774505,2017-09-06T06:06:38.834Z,"57km ESE of Bear Creek, Alaska",earthquake,,0.2,,,automatic,ak,ak
2017-09-06T04:37:44.590Z,42.6118,-111.5234,5.77,4.5,mwr,,65,0.347,0.94,us,us2000agiz,2017-09-06T06:42:45.073Z,"8km SE of Soda Springs, Idaho",earthquake,3,5.5,0.032,94,reviewed,us,us
2017-09-06T04:36:29.180Z,42.5637,-111.3478,5,3.8,ml,,39,0.271,0.87,us,us2000agiu,2017-09-06T06:37:45.031Z,"23km ESE of Soda Springs, Idaho",earthquake,3.8,2,0.04,82,reviewed,us,us
2017-09-06T04:27:42.390Z,42.6271,-111.4606,1.89,3,ml,,49,0.299,0.59,us,us2000agis,2017-09-06T07:02:13.232Z,"12km ESE of Soda Springs, Idaho",earthquake,1.8,5.6,0.043,72,reviewed,us,us
2017-09-06T03:20:55.280Z,36.1657,-96.9196,3.79,2.9,mb_lg,,38,0.122,0.21,us,us2000aghy,2017-09-06T05:58:26.438Z,"13km ENE of Stillwater, Oklahoma",earthquake,1.2,0.6,0.053,93,reviewed,us,us
2017-09-06T02:38:44.730Z,42.5643,-111.4644,3.33,2.9,ml,,48,0.335,0.64,us,us2000aghl,2017-09-06T04:57:47.883Z,"15km SE of Soda Springs, Idaho",earthquake,1.9,9.6,0.038,90,reviewed,us,us
2017-09-06T02:36:17.870Z,42.5871,-111.3521,3.79,2.6,ml,,47,0.257,1.27,us,us2000aghi,2017-09-06T04:09:09.040Z,"22km ESE of Soda Springs, Idaho",earthquake,1.9,9.9,0.048,58,reviewed,us,us
2017-09-06T02:26:41.730Z,42.5733,-111.4019,5,3,ml,,47,0.293,1.17,us,us2000aghe,2017-09-06T04:03:47.040Z,"18km ESE of Soda Springs, Idaho",earthquake,2.5,2,0.042,76,reviewed,us,us
2017-09-06T02:22:11.020Z,42.5801,-111.4469,5,3.2,ml,,41,0.476,0.96,us,us2000aghb,2017-09-06T02:46:14.135Z,"15km ESE of Soda Springs, Idaho",earthquake,2.9,0.4,0.037,94,reviewed,us,us
2017-09-06T01:45:14.900Z,41.3737,47.7606,10,4.1,mb,,104,2.599,1.06,us,us2000aggx,2017-09-06T03:43:39.040Z,"9km SSE of Akhty, Russia",earthquake,8.1,1.4,0.114,21,reviewed,us,us
2017-09-06T01:08:42.380Z,42.5479,-111.4465,5,2.9,ml,,48,0.335,0.76,us,us2000aggl,2017-09-06T03:30:36.040Z,"17km SE of Soda Springs, Idaho",earthquake,2.5,2,0.047,60,reviewed,us,us
2017-09-06T01:00:16.390Z,42.5729,-111.4143,5,2.8,ml,,31,0.301,0.71,us,us2000aggh,2017-09-06T03:21:07.040Z,"18km ESE of Soda Springs, Idaho",earthquake,1.9,2,0.038,90,reviewed,us,us
2017-09-06T00:37:10.470Z,42.5584,-111.3873,5,2.7,ml,,40,0.295,0.61,us,us2000aggc,2017-09-06T03:15:37.040Z,"20km ESE of Soda Springs, Idaho",earthquake,4.2,2,0.041,80,reviewed,us,us
2017-09-06T00:10:55.760Z,10.242,93.1278,86.75,5.4,mb,,76,1.454,0.99,us,us2000agg9,2017-09-06T00:29:47.040Z,"162km SSE of Port Blair, India",earthquake,6.5,5.2,0.045,166,reviewed,us,us
2017-09-06T00:01:03.200Z,34.202,-117.007,7.25,2.63,ml,92,34,0.09341,0.15,ci,ci37755871,2017-09-06T06:28:35.601Z,"9km E of Running Springs, CA",earthquake,0.15,0.61,0.135,23,automatic,ci,ci
2017-09-05T23:00:17.250Z,42.5339,-111.4026,5,3.6,ml,,27,0.321,0.83,us,us2000ages,2017-09-06T01:27:54.040Z,"21km SE of Soda Springs, Idaho",earthquake,3.4,2,0.035,106,reviewed,us,us
2017-09-05T22:46:58.860Z,42.5647,-111.4516,5,2.9,ml,,48,0.327,0.44,us,us2000agep,2017-09-05T23:31:01.040Z,"16km SE of Soda Springs, Idaho",earthquake,2.7,2,0.036,102,reviewed,us,us
2017-09-05T22:41:01.180Z,29.7088,70.0371,29.19,4.4,mb,,132,7.917,1.22,us,us2000agek,2017-09-05T23:03:22.040Z,"36km WNW of Dajal, Pakistan",earthquake,13,8,0.12,20,reviewed,us,us
2017-09-05T22:32:47.830Z,42.561,-111.4427,5,3.4,ml,,26,0.324,0.98,us,us2000age2,2017-09-06T01:08:53.040Z,"16km SE of Soda Springs, Idaho",earthquake,4.2,0.7,0.036,100,reviewed,us,us
2017-09-05T22:20:36.170Z,32.8429,59.1485,10,4.7,mb,,97,8.28,0.92,us,us2000agdr,2017-09-06T05:13:52.343Z,"7km WSW of Birjand, Iran",earthquake,9,1.9,0.073,57,reviewed,us,us
2017-09-05T21:43:19.140Z,42.582,-111.3851,5,3,ml,,56,0.278,0.91,us,us2000agcy,2017-09-06T04:18:38.879Z,"19km ESE of Soda Springs, Idaho",earthquake,2.4,2,0.042,76,reviewed,us,us
2017-09-05T21:37:41.300Z,32.8169,-100.918,5,2.5,mb_lg,,42,0.05,0.65,us,us2000agcu,2017-09-05T21:50:53.040Z,"10km N of Snyder, Texas",earthquake,1.9,2,0.081,41,reviewed,us,us
2017-09-05T21:23:19.360Z,42.5959,-111.4299,5,4.3,mwr,,40,0.296,0.84,us,us2000agc9,2017-09-06T04:08:48.499Z,"15km ESE of Soda Springs, Idaho",earthquake,2.4,2,0.033,87,reviewed,us,us
2017-09-05T21:01:11.260Z,38.7900009,-122.7623367,1.15,2.52,md,51,40,0.01944,0.07,nc,nc72885795,2017-09-06T00:44:02.355Z,"1km NNW of The Geysers, California",earthquake,0.15,0.26,0.19,33,automatic,nc,nc
2017-09-05T20:54:15.970Z,42.5648,-111.4192,5,4.3,mwr,,12,0.309,1.07,us,us2000agbd,2017-09-06T05:03:06.124Z,"18km ESE of Soda Springs, Idaho",earthquake,3.3,2,0.032,93,reviewed,us,us
2017-09-05T20:44:52.463Z,58.3754,-137.0915,0,2.5,ml,,,,0.7,ak,ak16774091,2017-09-05T21:01:19.135Z,"79km W of Gustavus, Alaska",earthquake,,0.5,,,automatic,ak,ak
2017-09-05T20:02:09.300Z,37.4211655,-121.8203354,1.89,2.51,md,54,30,0.02991,0.07,nc,nc72885755,2017-09-06T05:42:52.574Z,"4km N of East Foothills, California",earthquake,0.11,0.37,0.12,60,automatic,nc,nc
2017-09-05T19:59:07.614Z,51.7631,-166.3474,32.6,2.9,ml,,,,0.48,ak,ak16774083,2017-09-05T21:55:35.058Z,"215km SE of Nikolski, Alaska",earthquake,,9.1,,,reviewed,ak,ak
2017-09-05T18:54:38.780Z,-6.6875,130.2825,99.72,5.1,mb,,42,1.634,0.72,us,us2000ag8q,2017-09-05T19:07:47.040Z,"180km NW of Saumlaki, Indonesia",earthquake,7.3,8.1,0.114,25,reviewed,us,us
2017-09-05T16:08:37.830Z,49.062,-125.5178333,4.86,3.76,ml,7,109,0.102,0.77,uw,uw61303592,2017-09-06T06:45:42.966Z,"14km N of Ucluelet, Canada",earthquake,3.05,65.08,0.289,8,reviewed,uw,uw
2017-09-05T15:48:10.680Z,44.3213333,-124.3326667,20.35,2.76,ml,12,252,0.1696,0.13,uw,uw61303577,2017-09-05T16:53:10.038Z,"24km WSW of Waldport, Oregon",earthquake,2.39,1.42,0.273,7,reviewed,uw,uw
2017-09-05T15:21:24.750Z,9.9467,126.579,10,4.7,mb,,91,9.148,0.56,us,us2000ag3v,2017-09-05T16:40:04.040Z,"49km ENE of General Luna, Philippines",earthquake,11.9,1.9,0.083,44,reviewed,us,us
2017-09-05T15:20:10.737Z,60.3235,-143.0442,1.3,3.1,ml,,,,0.83,ak,ak16772339,2017-09-05T15:33:25.151Z,"44km NW of Cape Yakataga, Alaska",earthquake,,0.2,,,automatic,ak,ak
2017-09-05T14:46:42.090Z,42.6226,-111.5078,4.33,2.7,ml,,144,0.332,0.59,us,us2000ag3c,2017-09-05T22:49:10.254Z,"8km ESE of Soda Springs, Idaho",earthquake,2.3,9.6,0.047,60,reviewed,us,us
2017-09-05T14:29:17.565Z,60.8854,-147.0403,36.1,2.5,ml,,,,0.79,ak,ak16771766,2017-09-05T14:38:31.237Z,"46km SW of Valdez, Alaska",earthquake,,1.3,,,automatic,ak,ak
2017-09-05T13:35:18.331Z,60.8946,-147.0655,16.3,2.7,ml,,,,0.64,ak,ak16771200,2017-09-05T13:50:13.984Z,"46km SW of Valdez, Alaska",earthquake,,0.3,,,automatic,ak,ak
2017-09-05T12:28:25.440Z,53.61,-162.883,10,4.1,mb,,164,1.789,1.08,us,us2000ag26,2017-09-05T12:56:45.040Z,"142km SSE of False Pass, Alaska",earthquake,9.3,2,0.077,46,reviewed,us,us
2017-09-05T11:28:34.330Z,8.3592,-82.8546,33.42,4.5,mb,,84,0.213,0.98,us,us2000ag1l,2017-09-05T12:13:34.364Z,"1km NNW of Finca Corredor, Panama",earthquake,3.8,8.3,0.124,19,reviewed,us,us
2017-09-05T10:02:02.740Z,42.5665,-111.4148,5,3.1,ml,,24,0.305,1.15,us,us2000ag0q,2017-09-05T21:18:20.947Z,"18km ESE of Soda Springs, Idaho",earthquake,1.5,2,0.037,98,reviewed,us,us
2017-09-05T09:52:00.030Z,43.6422,-127.4131,10,4.3,mb,,190,2.377,1.16,us,us2000ag0i,2017-09-05T19:07:30.636Z,"250km WNW of Bandon, Oregon",earthquake,8.1,2,0.069,59,reviewed,us,us
2017-09-05T09:47:11.190Z,42.5984,-111.4196,5,3.2,ml,,35,0.288,1.15,us,us2000ag0e,2017-09-05T13:50:38.040Z,"16km ESE of Soda Springs, Idaho",earthquake,1.7,2,0.038,92,reviewed,us,us
2017-09-05T08:23:57.811Z,55.7583,-152.9956,33.1,4.2,ml,,,,0.85,ak,ak16768973,2017-09-05T09:28:30.040Z,"164km E of Chirikof Island, Alaska",earthquake,,1.8,,,reviewed,ak,ak
2017-09-05T08:13:16.060Z,42.5924,-111.429,5,4.3,mb,,47,0.297,0.93,us,us2000afyp,2017-09-06T05:02:41.198Z,"15km ESE of Soda Springs, Idaho",earthquake,2.9,2,0.075,50,reviewed,us,us
2017-09-05T08:12:21.010Z,42.5975,-111.433,5,3.6,ml,,55,0.297,0.91,us,us2000afyn,2017-09-05T15:31:15.188Z,"15km ESE of Soda Springs, Idaho",earthquake,1.4,2,0.052,48,reviewed,us,us
2017-09-05T07:40:34.040Z,19.9193,-64.0931,78,3.72,md,14,317,2.3345,0.32,pr,pr2017248004,2017-09-05T11:36:30.994Z,"175km NNE of Road Town, British Virgin Islands",earthquake,5.76,15.87,0.17,9,reviewed,pr,pr

 

Now according to your statement, that should be passed on to the Read CSV. The Open tells me, that I has got a file.

The Read CSV fails as it does not recognise any input and returns an empty example set.

 

BUT

if I use the Read CSV to read the downloaded file from the local system (remove the Open File and specify the file in the Read CSV directly), the remainder of the process works as expected.
Which eliminates the possible fault that there is an issue with the format of the CSV.

 

So I don't see the issue at USGS. I suspect it either at Open File or Read CSV.

 

Regards,

Kurt

Highlighted
Community Manager Community Manager
Community Manager
Solution

Re: Open File - not returning data from url

ok that was a fun puzzle.  Smiley Happy So that URL http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv is a redirect to a https link.  This is why the Open File did not work.  If you change your URL to https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv, it works perfectly.  Smiley Happy

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
        <parameter key="resource_type" value="URL"/>
        <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
      </operator>
      <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
        <parameter key="csv_file" value="/Users/genzerconsulting/Desktop/2.5_day.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="time.true.polynominal.attribute"/>
          <parameter key="1" value="latitude.true.real.attribute"/>
          <parameter key="2" value="longitude.true.real.attribute"/>
          <parameter key="3" value="depth.true.real.attribute"/>
          <parameter key="4" value="mag.true.real.attribute"/>
          <parameter key="5" value="magType.true.polynominal.attribute"/>
          <parameter key="6" value="nst.true.integer.attribute"/>
          <parameter key="7" value="gap.true.integer.attribute"/>
          <parameter key="8" value="dmin.true.real.attribute"/>
          <parameter key="9" value="rms.true.real.attribute"/>
          <parameter key="10" value="net.true.polynominal.attribute"/>
          <parameter key="11" value="id.true.polynominal.attribute"/>
          <parameter key="12" value="updated.true.polynominal.attribute"/>
          <parameter key="13" value="place.true.polynominal.attribute"/>
          <parameter key="14" value="type.true.polynominal.attribute"/>
          <parameter key="15" value="horizontalError.true.real.attribute"/>
          <parameter key="16" value="depthError.true.real.attribute"/>
          <parameter key="17" value="magError.true.real.attribute"/>
          <parameter key="18" value="magNst.true.integer.attribute"/>
          <parameter key="19" value="status.true.polynominal.attribute"/>
          <parameter key="20" value="locationSource.true.polynominal.attribute"/>
          <parameter key="21" value="magSource.true.polynominal.attribute"/>
        </list>
      </operator>
      <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Expert
RM Certified Expert

Re: Open File - not returning data from url

I can confirm @sgenzer's fix. the original url was 'http' but they mustlve changed it to 'https'

 

Anywho, I updated the process on the tutorial page so you can copy and paste it back in.

 

Thanks guys. 

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed