Open File - not returning data from url

kludikovskykludikovsky Member Posts: 30 Maven
edited December 2018 in Help

The "Open File" operator does not return anything.

 

I have tried the following example

http://www.neuralmarkettrends.com/Extracting-OpenStreetMap-Data-In-RapidMiner/ 

which returned an error.

 

By analysing the cause I found that the Open File Operator seemed to not return anything useful (even if so said).

I have modified the Read CSV and all work from there on.

But this is no the solution if someone likes to retrieve information from a url.

Can someone please verify my experience and/or explain what's wrong.

Best Answer

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted

    ok that was a fun puzzle.  :) So that URL http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv is a redirect to a https link.  This is why the Open File did not work.  If you change your URL to https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv, it works perfectly.  :)

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
    <parameter key="resource_type" value="URL"/>
    <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
    <parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
    <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
    </operator>
    <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
    <parameter key="csv_file" value="/Users/genzerconsulting/Desktop/2.5_day.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="time.true.polynominal.attribute"/>
    <parameter key="1" value="latitude.true.real.attribute"/>
    <parameter key="2" value="longitude.true.real.attribute"/>
    <parameter key="3" value="depth.true.real.attribute"/>
    <parameter key="4" value="mag.true.real.attribute"/>
    <parameter key="5" value="magType.true.polynominal.attribute"/>
    <parameter key="6" value="nst.true.integer.attribute"/>
    <parameter key="7" value="gap.true.integer.attribute"/>
    <parameter key="8" value="dmin.true.real.attribute"/>
    <parameter key="9" value="rms.true.real.attribute"/>
    <parameter key="10" value="net.true.polynominal.attribute"/>
    <parameter key="11" value="id.true.polynominal.attribute"/>
    <parameter key="12" value="updated.true.polynominal.attribute"/>
    <parameter key="13" value="place.true.polynominal.attribute"/>
    <parameter key="14" value="type.true.polynominal.attribute"/>
    <parameter key="15" value="horizontalError.true.real.attribute"/>
    <parameter key="16" value="depthError.true.real.attribute"/>
    <parameter key="17" value="magError.true.real.attribute"/>
    <parameter key="18" value="magNst.true.integer.attribute"/>
    <parameter key="19" value="status.true.polynominal.attribute"/>
    <parameter key="20" value="locationSource.true.polynominal.attribute"/>
    <parameter key="21" value="magSource.true.polynominal.attribute"/>
    </list>
    </operator>
    <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @kludikovsky - Open File is probably not what you're looking for here.  That's usually for opening a local file.  I know it has a URL option...I would highly recommend trying the Get Page operator in the Web Mining extension instead.

     

    Scott

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    That's my old post! I noticed that that the USGS website changed how you access the earthquake CSV file. This needs to be updated when I get back. In the meantime just download the CSV file manually and use a Read CSV file
  • kludikovskykludikovsky Member Posts: 30 Maven

    hi @sgenzer,

    thanks fo replying.

    I am trying out some things here just to understand it.

    So using something different is helpful but does not solve some of the issues with the operators.

    If there is a function that supposes to do something and does not do it, this costs an enormous effort when trying to apply it in real tasks, because you always search for your own bug, even as the the function misbehaves, which is not known.

    So it should be clear if the functions behaves properly, and what the settings should be, or correct them or the understanding.

     

    I also can't see that the Get Page will provide me with a file which can be used as an example set.

  • kludikovskykludikovsky Member Posts: 30 Maven

    hi @Thomas_Ott,

     

    I tried the given USGS link and that worked in the browser.

    So it does not seem it's from there.

    If I donwload the file and read the CSV from the local file it works.

     

    There seems to be something between the open and the read.

     

    BY THE WAY: I am using the lastest version of RM (7.6.001)

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Yeah, like I said they changed something so people can't do what I did. I'll try to fix it when I get back to the States.
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @kludikovsky - your feedback on the operators is welcome and noted.  The "Open File" operator is used to open a data file, not the HTML from a web page, as stated in the Help page: "[Open File] Opens a file for processing by parsing operators. Even if this file points to a data file, like Excel or CSV, this operator returns an uninterpreted blob.".  Hence if you enter a URL that points to a datafile, it will work fine as long as you realize that it will pull it in as a blob.  As @Thomas_Ott mentioned above, the link you are trying to use is a dead link (it used to point to a CSV) and it sounds like he'll fix it as soon as he can.

     

    As for the "Get Page" operator, this is used to retrieve the HTML of a page, not a data file.  You use whichever is more applicable to your use case.

     

    Hope that all makes sense.

     

    Scott

  • kludikovskykludikovsky Member Posts: 30 Maven

     

    Hi @sgenzer,

     

    thanks for the explanation. 

    Actually this is what I expected.

    As the operator 

                  <operator activated="true" class="open_file" compatibility="6.5.002" expanded="true" height="60" name="Open File" width="90" x="45" y="30">
    <parameter key="resource_type" value="URL"/>
    <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
    <parameter key="url" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
    <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
    </operator>

     indicated there is a file at

    http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv

    which I can perfectly download, as can be seen here:

    time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
    2017-09-06T06:35:18.550Z,38.8358345,-122.7823334,1.58,2.68,md,50,47,0.006764,0.05,nc,nc72886105,2017-09-06T06:40:03.593Z,"5km WNW of Cobb, California",earthquake,0.15,0.2,0.15,38,automatic,nc,nc
    2017-09-06T06:01:32.201Z,60.0517,-148.3932,4.8,2.7,ml,,,,0.8,ak,ak16774505,2017-09-06T06:06:38.834Z,"57km ESE of Bear Creek, Alaska",earthquake,,0.2,,,automatic,ak,ak
    2017-09-06T04:37:44.590Z,42.6118,-111.5234,5.77,4.5,mwr,,65,0.347,0.94,us,us2000agiz,2017-09-06T06:42:45.073Z,"8km SE of Soda Springs, Idaho",earthquake,3,5.5,0.032,94,reviewed,us,us
    2017-09-06T04:36:29.180Z,42.5637,-111.3478,5,3.8,ml,,39,0.271,0.87,us,us2000agiu,2017-09-06T06:37:45.031Z,"23km ESE of Soda Springs, Idaho",earthquake,3.8,2,0.04,82,reviewed,us,us
    2017-09-06T04:27:42.390Z,42.6271,-111.4606,1.89,3,ml,,49,0.299,0.59,us,us2000agis,2017-09-06T07:02:13.232Z,"12km ESE of Soda Springs, Idaho",earthquake,1.8,5.6,0.043,72,reviewed,us,us
    2017-09-06T03:20:55.280Z,36.1657,-96.9196,3.79,2.9,mb_lg,,38,0.122,0.21,us,us2000aghy,2017-09-06T05:58:26.438Z,"13km ENE of Stillwater, Oklahoma",earthquake,1.2,0.6,0.053,93,reviewed,us,us
    2017-09-06T02:38:44.730Z,42.5643,-111.4644,3.33,2.9,ml,,48,0.335,0.64,us,us2000aghl,2017-09-06T04:57:47.883Z,"15km SE of Soda Springs, Idaho",earthquake,1.9,9.6,0.038,90,reviewed,us,us
    2017-09-06T02:36:17.870Z,42.5871,-111.3521,3.79,2.6,ml,,47,0.257,1.27,us,us2000aghi,2017-09-06T04:09:09.040Z,"22km ESE of Soda Springs, Idaho",earthquake,1.9,9.9,0.048,58,reviewed,us,us
    2017-09-06T02:26:41.730Z,42.5733,-111.4019,5,3,ml,,47,0.293,1.17,us,us2000aghe,2017-09-06T04:03:47.040Z,"18km ESE of Soda Springs, Idaho",earthquake,2.5,2,0.042,76,reviewed,us,us
    2017-09-06T02:22:11.020Z,42.5801,-111.4469,5,3.2,ml,,41,0.476,0.96,us,us2000aghb,2017-09-06T02:46:14.135Z,"15km ESE of Soda Springs, Idaho",earthquake,2.9,0.4,0.037,94,reviewed,us,us
    2017-09-06T01:45:14.900Z,41.3737,47.7606,10,4.1,mb,,104,2.599,1.06,us,us2000aggx,2017-09-06T03:43:39.040Z,"9km SSE of Akhty, Russia",earthquake,8.1,1.4,0.114,21,reviewed,us,us
    2017-09-06T01:08:42.380Z,42.5479,-111.4465,5,2.9,ml,,48,0.335,0.76,us,us2000aggl,2017-09-06T03:30:36.040Z,"17km SE of Soda Springs, Idaho",earthquake,2.5,2,0.047,60,reviewed,us,us
    2017-09-06T01:00:16.390Z,42.5729,-111.4143,5,2.8,ml,,31,0.301,0.71,us,us2000aggh,2017-09-06T03:21:07.040Z,"18km ESE of Soda Springs, Idaho",earthquake,1.9,2,0.038,90,reviewed,us,us
    2017-09-06T00:37:10.470Z,42.5584,-111.3873,5,2.7,ml,,40,0.295,0.61,us,us2000aggc,2017-09-06T03:15:37.040Z,"20km ESE of Soda Springs, Idaho",earthquake,4.2,2,0.041,80,reviewed,us,us
    2017-09-06T00:10:55.760Z,10.242,93.1278,86.75,5.4,mb,,76,1.454,0.99,us,us2000agg9,2017-09-06T00:29:47.040Z,"162km SSE of Port Blair, India",earthquake,6.5,5.2,0.045,166,reviewed,us,us
    2017-09-06T00:01:03.200Z,34.202,-117.007,7.25,2.63,ml,92,34,0.09341,0.15,ci,ci37755871,2017-09-06T06:28:35.601Z,"9km E of Running Springs, CA",earthquake,0.15,0.61,0.135,23,automatic,ci,ci
    2017-09-05T23:00:17.250Z,42.5339,-111.4026,5,3.6,ml,,27,0.321,0.83,us,us2000ages,2017-09-06T01:27:54.040Z,"21km SE of Soda Springs, Idaho",earthquake,3.4,2,0.035,106,reviewed,us,us
    2017-09-05T22:46:58.860Z,42.5647,-111.4516,5,2.9,ml,,48,0.327,0.44,us,us2000agep,2017-09-05T23:31:01.040Z,"16km SE of Soda Springs, Idaho",earthquake,2.7,2,0.036,102,reviewed,us,us
    2017-09-05T22:41:01.180Z,29.7088,70.0371,29.19,4.4,mb,,132,7.917,1.22,us,us2000agek,2017-09-05T23:03:22.040Z,"36km WNW of Dajal, Pakistan",earthquake,13,8,0.12,20,reviewed,us,us
    2017-09-05T22:32:47.830Z,42.561,-111.4427,5,3.4,ml,,26,0.324,0.98,us,us2000age2,2017-09-06T01:08:53.040Z,"16km SE of Soda Springs, Idaho",earthquake,4.2,0.7,0.036,100,reviewed,us,us
    2017-09-05T22:20:36.170Z,32.8429,59.1485,10,4.7,mb,,97,8.28,0.92,us,us2000agdr,2017-09-06T05:13:52.343Z,"7km WSW of Birjand, Iran",earthquake,9,1.9,0.073,57,reviewed,us,us
    2017-09-05T21:43:19.140Z,42.582,-111.3851,5,3,ml,,56,0.278,0.91,us,us2000agcy,2017-09-06T04:18:38.879Z,"19km ESE of Soda Springs, Idaho",earthquake,2.4,2,0.042,76,reviewed,us,us
    2017-09-05T21:37:41.300Z,32.8169,-100.918,5,2.5,mb_lg,,42,0.05,0.65,us,us2000agcu,2017-09-05T21:50:53.040Z,"10km N of Snyder, Texas",earthquake,1.9,2,0.081,41,reviewed,us,us
    2017-09-05T21:23:19.360Z,42.5959,-111.4299,5,4.3,mwr,,40,0.296,0.84,us,us2000agc9,2017-09-06T04:08:48.499Z,"15km ESE of Soda Springs, Idaho",earthquake,2.4,2,0.033,87,reviewed,us,us
    2017-09-05T21:01:11.260Z,38.7900009,-122.7623367,1.15,2.52,md,51,40,0.01944,0.07,nc,nc72885795,2017-09-06T00:44:02.355Z,"1km NNW of The Geysers, California",earthquake,0.15,0.26,0.19,33,automatic,nc,nc
    2017-09-05T20:54:15.970Z,42.5648,-111.4192,5,4.3,mwr,,12,0.309,1.07,us,us2000agbd,2017-09-06T05:03:06.124Z,"18km ESE of Soda Springs, Idaho",earthquake,3.3,2,0.032,93,reviewed,us,us
    2017-09-05T20:44:52.463Z,58.3754,-137.0915,0,2.5,ml,,,,0.7,ak,ak16774091,2017-09-05T21:01:19.135Z,"79km W of Gustavus, Alaska",earthquake,,0.5,,,automatic,ak,ak
    2017-09-05T20:02:09.300Z,37.4211655,-121.8203354,1.89,2.51,md,54,30,0.02991,0.07,nc,nc72885755,2017-09-06T05:42:52.574Z,"4km N of East Foothills, California",earthquake,0.11,0.37,0.12,60,automatic,nc,nc
    2017-09-05T19:59:07.614Z,51.7631,-166.3474,32.6,2.9,ml,,,,0.48,ak,ak16774083,2017-09-05T21:55:35.058Z,"215km SE of Nikolski, Alaska",earthquake,,9.1,,,reviewed,ak,ak
    2017-09-05T18:54:38.780Z,-6.6875,130.2825,99.72,5.1,mb,,42,1.634,0.72,us,us2000ag8q,2017-09-05T19:07:47.040Z,"180km NW of Saumlaki, Indonesia",earthquake,7.3,8.1,0.114,25,reviewed,us,us
    2017-09-05T16:08:37.830Z,49.062,-125.5178333,4.86,3.76,ml,7,109,0.102,0.77,uw,uw61303592,2017-09-06T06:45:42.966Z,"14km N of Ucluelet, Canada",earthquake,3.05,65.08,0.289,8,reviewed,uw,uw
    2017-09-05T15:48:10.680Z,44.3213333,-124.3326667,20.35,2.76,ml,12,252,0.1696,0.13,uw,uw61303577,2017-09-05T16:53:10.038Z,"24km WSW of Waldport, Oregon",earthquake,2.39,1.42,0.273,7,reviewed,uw,uw
    2017-09-05T15:21:24.750Z,9.9467,126.579,10,4.7,mb,,91,9.148,0.56,us,us2000ag3v,2017-09-05T16:40:04.040Z,"49km ENE of General Luna, Philippines",earthquake,11.9,1.9,0.083,44,reviewed,us,us
    2017-09-05T15:20:10.737Z,60.3235,-143.0442,1.3,3.1,ml,,,,0.83,ak,ak16772339,2017-09-05T15:33:25.151Z,"44km NW of Cape Yakataga, Alaska",earthquake,,0.2,,,automatic,ak,ak
    2017-09-05T14:46:42.090Z,42.6226,-111.5078,4.33,2.7,ml,,144,0.332,0.59,us,us2000ag3c,2017-09-05T22:49:10.254Z,"8km ESE of Soda Springs, Idaho",earthquake,2.3,9.6,0.047,60,reviewed,us,us
    2017-09-05T14:29:17.565Z,60.8854,-147.0403,36.1,2.5,ml,,,,0.79,ak,ak16771766,2017-09-05T14:38:31.237Z,"46km SW of Valdez, Alaska",earthquake,,1.3,,,automatic,ak,ak
    2017-09-05T13:35:18.331Z,60.8946,-147.0655,16.3,2.7,ml,,,,0.64,ak,ak16771200,2017-09-05T13:50:13.984Z,"46km SW of Valdez, Alaska",earthquake,,0.3,,,automatic,ak,ak
    2017-09-05T12:28:25.440Z,53.61,-162.883,10,4.1,mb,,164,1.789,1.08,us,us2000ag26,2017-09-05T12:56:45.040Z,"142km SSE of False Pass, Alaska",earthquake,9.3,2,0.077,46,reviewed,us,us
    2017-09-05T11:28:34.330Z,8.3592,-82.8546,33.42,4.5,mb,,84,0.213,0.98,us,us2000ag1l,2017-09-05T12:13:34.364Z,"1km NNW of Finca Corredor, Panama",earthquake,3.8,8.3,0.124,19,reviewed,us,us
    2017-09-05T10:02:02.740Z,42.5665,-111.4148,5,3.1,ml,,24,0.305,1.15,us,us2000ag0q,2017-09-05T21:18:20.947Z,"18km ESE of Soda Springs, Idaho",earthquake,1.5,2,0.037,98,reviewed,us,us
    2017-09-05T09:52:00.030Z,43.6422,-127.4131,10,4.3,mb,,190,2.377,1.16,us,us2000ag0i,2017-09-05T19:07:30.636Z,"250km WNW of Bandon, Oregon",earthquake,8.1,2,0.069,59,reviewed,us,us
    2017-09-05T09:47:11.190Z,42.5984,-111.4196,5,3.2,ml,,35,0.288,1.15,us,us2000ag0e,2017-09-05T13:50:38.040Z,"16km ESE of Soda Springs, Idaho",earthquake,1.7,2,0.038,92,reviewed,us,us
    2017-09-05T08:23:57.811Z,55.7583,-152.9956,33.1,4.2,ml,,,,0.85,ak,ak16768973,2017-09-05T09:28:30.040Z,"164km E of Chirikof Island, Alaska",earthquake,,1.8,,,reviewed,ak,ak
    2017-09-05T08:13:16.060Z,42.5924,-111.429,5,4.3,mb,,47,0.297,0.93,us,us2000afyp,2017-09-06T05:02:41.198Z,"15km ESE of Soda Springs, Idaho",earthquake,2.9,2,0.075,50,reviewed,us,us
    2017-09-05T08:12:21.010Z,42.5975,-111.433,5,3.6,ml,,55,0.297,0.91,us,us2000afyn,2017-09-05T15:31:15.188Z,"15km ESE of Soda Springs, Idaho",earthquake,1.4,2,0.052,48,reviewed,us,us
    2017-09-05T07:40:34.040Z,19.9193,-64.0931,78,3.72,md,14,317,2.3345,0.32,pr,pr2017248004,2017-09-05T11:36:30.994Z,"175km NNE of Road Town, British Virgin Islands",earthquake,5.76,15.87,0.17,9,reviewed,pr,pr

     

    Now according to your statement, that should be passed on to the Read CSV. The Open tells me, that I has got a file.

    The Read CSV fails as it does not recognise any input and returns an empty example set.

     

    BUT

    if I use the Read CSV to read the downloaded file from the local system (remove the Open File and specify the file in the Read CSV directly), the remainder of the process works as expected.
    Which eliminates the possible fault that there is an issue with the format of the CSV.

     

    So I don't see the issue at USGS. I suspect it either at Open File or Read CSV.

     

    Regards,

    Kurt

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I can confirm @sgenzer's fix. the original url was 'http' but they mustlve changed it to 'https'

     

    Anywho, I updated the process on the tutorial page so you can copy and paste it back in.

     

    Thanks guys. 

  • kludikovskykludikovsky Member Posts: 30 Maven

    Hi @sgenzer,

     

    excellent work.

    Thanks.

Sign In or Register to comment.