Options

"Send xls file to Rapidanalytics service"

juliojulio Member Posts: 17 Contributor II
edited June 2019 in Help
Dear all,

Is it possible to process a plain .xls file from a Rapidanalytics service (external input and READ Excel operator)? There is a good video explaining how .csv files work, but how about .xls? It is supported? Some trick to take into account?

I am getting a "file cannot be loaded error".
With rapidminer and local .xls files, the service works fine.

Thank you,

Julio
Tagged:

Answers

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Julio,

    Are you meaning using the Read Excel within a process setup as a service?  Or do you mean loading the file directly within the RA web interface? 
    The Read Excel method works (to the best of my knowledge). 

    J
  • Options
    juliojulio Member Posts: 17 Contributor II
    Hi J,

    Read Excel directly works fine. However, we could not make it work posting a .xls file into a service.
    FYI, as an additional test, I read the posted file and saved it. Excel then warned that the file was not correct. Excel was though able to retrieve most of the info.

    We are on Rapidminer 5.2.8, RapidAnalytics 1.2 and running on Linux. The excel was generated under windows. We tried to work with different encodings, but no difference.

    Any help will be appreciated.

    Thank you,

    Julio
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Julio,

    You tried several encodings, but did you try all of them?  ;)

    I used the loop parameters operator & handle exception to go through every encoding & try to encode, read an xls file then store it in the repository. 
    The Handle Exception would skip any errors on ones it couldn't open & then (when it found an encoding that it could read) it would store it. 
    Apparently, the winning encoding from the sample xls I used was ISO-8859-1 so try this on your process first & if not I've included my loop in the sample process below, you'd just need to reenable it. 

    Hope it helps,
    J
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.012">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>mparam</key>
            <value>"1"</value>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="web:get_webpage" compatibility="5.3.001" expanded="true" height="60" name="Get Page" width="90" x="45" y="165">
            <parameter key="url" value="http://www.econ.yale.edu/~shiller/data/ie_data.xls"/>
            <list key="query_parameters"/>
            <list key="request_properties"/>
            <parameter key="encoding" value="windows-1250"/>
          </operator>
          <operator activated="false" class="loop_parameters" compatibility="5.3.012" expanded="true" height="76" name="Loop Parameters" width="90" x="45" y="300">
            <list key="parameters">
              <parameter key="Write Document.encoding" value="ISO-8859-1,ISO-8859-13,ISO-8859-15,ISO-8859-2,ISO-8859-3,ISO-8859-4,ISO-8859-5,ISO-8859-6,ISO-8859-7,ISO-8859-8,ISO-8859-9,JIS_X0201,JIS_X0212-1990,KOI8-R,KOI8-U,Shift_JIS,TIS-620,US-ASCII,UTF-16,UTF-16BE,UTF-16LE,UTF-32,UTF-32BE,UTF-32LE,UTF-8,windows-1250,windows-1251,windows-1252,windows-1253,windows-1254,windows-1255,windows-1256,windows-1257,windows-1258,windows-31j,x-Big5-HKSCS-2001,x-Big5-Solaris,x-euc-jp-linux,x-EUC-TW,x-eucJP-Open,x-IBM1006,x-IBM1025,x-IBM1046,x-IBM1097,x-IBM1098,x-IBM1112,x-IBM1122,x-IBM1123,x-IBM1124,x-IBM1364,x-IBM1381,x-IBM1383,x-IBM33722,x-IBM737,x-IBM833,x-IBM834,x-IBM856,x-IBM874,x-IBM875,x-IBM921,x-IBM922,x-IBM930,x-IBM933,x-IBM935,x-IBM937,x-IBM939,x-IBM942,x-IBM942C,x-IBM943,x-IBM943C,x-IBM948,x-IBM949,x-IBM949C,x-IBM950,x-IBM964,x-IBM970,x-ISCII91,x-ISO-2022-CN-CNS,x-ISO-2022-CN-GB,x-iso-8859-11,x-JIS0208,x-JISAutoDetect,x-Johab,x-MacArabic,x-MacCentralEurope,x-MacCroatian,x-MacCyrillic,x-MacDingbat,x-MacGreek,x-MacHebrew,x-MacIceland,x-MacRoman,x-MacRomania,x-MacSymbol,x-MacThai,x-MacTurkish,x-MacUkraine,x-MS932_0213,x-MS950-HKSCS,x-MS950-HKSCS-XP,x-mswin-936,x-PCK,x-SJIS_0213,x-UTF-16LE-BOM,X-UTF-32BE-BOM,X-UTF-32LE-BOM,x-windows-50220,x-windows-50221,x-windows-874,x-windows-949,x-windows-950,x-windows-iso2022jp"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="text:write_document" compatibility="5.3.001" expanded="true" height="76" name="Write Document" width="90" x="45" y="210">
                <parameter key="file" value="C:\Users\john.heath\Desktop\TestDataRM\ie_data.xls"/>
                <parameter key="encoding" value="x-JISAutoDetect"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.3.012" expanded="true" height="76" name="Log" width="90" x="112" y="120">
                <list key="log">
                  <parameter key="lparameter" value="operator.Write Document.parameter.encoding"/>
                </list>
              </operator>
              <operator activated="true" class="log_to_data" compatibility="5.3.012" expanded="true" height="94" name="Log to Data" width="90" x="179" y="30"/>
              <operator activated="true" class="extract_macro" compatibility="5.3.012" expanded="true" height="60" name="Extract Macro" width="90" x="313" y="30">
                <parameter key="macro" value="mparam"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="lparameter"/>
                <parameter key="example_index" value="1"/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="handle_exception" compatibility="5.3.012" expanded="true" height="76" name="Handle Exception" width="90" x="246" y="255">
                <process expanded="true">
                  <operator activated="true" class="read_excel" compatibility="5.3.012" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="75">
                    <parameter key="sheet_number" value="3"/>
                    <parameter key="imported_cell_range" value="A1:L2423"/>
                    <parameter key="first_row_as_names" value="false"/>
                    <list key="annotations">
                      <parameter key="0" value="Comment"/>
                      <parameter key="1" value="Comment"/>
                      <parameter key="2" value="Comment"/>
                      <parameter key="3" value="Comment"/>
                      <parameter key="4" value="Comment"/>
                      <parameter key="5" value="Comment"/>
                      <parameter key="6" value="Name"/>
                    </list>
                    <list key="data_set_meta_data_information">
                      <parameter key="0" value="Date.true.real.attribute"/>
                      <parameter key="1" value="P.true.real.attribute"/>
                      <parameter key="2" value="D.true.real.attribute"/>
                      <parameter key="3" value="E.true.real.attribute"/>
                      <parameter key="4" value="CPI.true.real.attribute"/>
                      <parameter key="5" value="Fraction.true.real.attribute"/>
                      <parameter key="6" value="Rate GS10.true.real.attribute"/>
                      <parameter key="7" value="Price.true.real.attribute"/>
                      <parameter key="8" value="Dividend.true.real.attribute"/>
                      <parameter key="9" value="Earnings.true.real.attribute"/>
                      <parameter key="10" value="CAPE.true.polynominal.attribute"/>
                    </list>
                  </operator>
                  <operator activated="true" class="store" compatibility="5.3.012" expanded="true" height="60" name="Store" width="90" x="179" y="120">
                    <parameter key="repository_entry" value="//Local Repository/data/AllExcel/%{mparam}"/>
                  </operator>
                  <connect from_port="in 1" to_op="Read Excel (3)" to_port="file"/>
                  <connect from_op="Read Excel (3)" from_port="output" to_op="Store" to_port="input"/>
                  <connect from_op="Store" from_port="through" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
                <process expanded="true">
                  <connect from_port="in 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Write Document" to_port="document"/>
              <connect from_op="Write Document" from_port="file" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
              <connect from_op="Log to Data" from_port="exampleSet" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Log to Data" from_port="through 1" to_op="Handle Exception" to_port="in 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:write_document" compatibility="5.3.001" expanded="true" height="76" name="Write Document (2)" width="90" x="179" y="210">
            <parameter key="file" value="C:\Users\john.heath\Desktop\TestDataRM\ie_data.xls"/>
            <parameter key="encoding" value="ISO-8859-1"/>
          </operator>
          <operator activated="true" class="read_excel" compatibility="5.3.012" expanded="true" height="60" name="Read Excel (2)" width="90" x="313" y="255">
            <parameter key="sheet_number" value="3"/>
            <parameter key="imported_cell_range" value="A1:L2423"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Comment"/>
              <parameter key="1" value="Comment"/>
              <parameter key="2" value="Comment"/>
              <parameter key="3" value="Comment"/>
              <parameter key="4" value="Comment"/>
              <parameter key="5" value="Comment"/>
              <parameter key="6" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.real.attribute"/>
              <parameter key="1" value="P.true.real.attribute"/>
              <parameter key="2" value="D.true.real.attribute"/>
              <parameter key="3" value="E.true.real.attribute"/>
              <parameter key="4" value="CPI.true.real.attribute"/>
              <parameter key="5" value="Fraction.true.real.attribute"/>
              <parameter key="6" value="Rate GS10.true.real.attribute"/>
              <parameter key="7" value="Price.true.real.attribute"/>
              <parameter key="8" value="Dividend.true.real.attribute"/>
              <parameter key="9" value="Earnings.true.real.attribute"/>
              <parameter key="10" value="CAPE.true.polynominal.attribute"/>
            </list>
          </operator>
          <connect from_op="Get Page" from_port="output" to_op="Write Document (2)" to_port="document"/>
          <connect from_op="Write Document (2)" from_port="file" to_op="Read Excel (2)" to_port="file"/>
          <connect from_op="Read Excel (2)" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    juliojulio Member Posts: 17 Contributor II
    Hi J,

    Sorry (again) about the misunderstanding. Apparently I am just not using enough words!

    Getting from Rapidminer an xls from a remote web page works fine (Rapidminer picking it up). No encoding issues. That is the example that you published (actually we explored this as a potential workaround but we need basic authentication/site-login from RapidAnalytics, which we were not able to figure out (but I just saw that someone answered on that other thread).

    What we cannot make work is the following:

    We define a Rapidminer process. We publish the process in RapidAnalytics as a service. We then POST a .xls file to the RapidAnalytics service. That does not work.

    Still, we will play around with the encoding you mentioned from the sending side and see if that helps!!

    Will be back!

    Julio
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Julio,

    Can you post a sample process as an example? 

    Thanks,
    J.
  • Options
    juliojulio Member Posts: 17 Contributor II
    Hi!

    For the purpose of simplicity:
    We created a one operator process that just reads an excel sheet with format.
    We published this as a service (how can you tell Analytics what file it should connect to what input port if you have multiple files?)


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>mparam</key>
            <value>"1"</value>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel_format" compatibility="5.3.013" expanded="true" height="60" name="Read Excel with Format" width="90" x="179" y="30"/>
          <connect from_port="input 1" to_op="Read Excel with Format" to_port="file"/>
          <connect from_op="Read Excel with Format" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    We then created a php script to send a test.xls file

    <?php

            $target_url = 'http://192.168.0.30:8090/RA/public_process/PostXLSWS';

            $revisionFile = realpath('test.xls');

            $postData = array(
                    'revision' => '@'.$revisionFile,
            );

            $response = sendPost($target_url, $postData);

            echo "<p><b>Response Code:</b></p>";
            echo "<p>" . $response['code'] . "</p>";
            echo "<p><b>Response Content:</b></p>";
            echo "<div style=\"border: solid 1px red; min-width: 100px; min-height: 100px; padding: 10px;\">";
                    echo "<p>" . $response['content'] . "</p>";
            echo "</div>";
            echo "<p><a href=\"\">REFRESH</a></p>";


            function sendPost($address, $data) {

                    $ch = curl_init();

                    curl_setopt($ch, CURLOPT_URL, $address);
                    curl_setopt($ch, CURLOPT_POST, 1);
                    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
                    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

                    $responseContent = curl_exec($ch);
                    $responseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

                    $response = array();
                    $response['content'] = $responseContent;
                    $response['code'] = $responseCode;

                    curl_close($ch);

                    return $response;

            }

    ?>
    We are getting the following error:
    de.rapidanalytics.ejb.service.ServiceDataSourceException: Error executing process /home/anonymous/CRMDocuments/PostXLS for service PostXLSWS: Could not read file 'null': Unable to recognize OLE stream.

    I assume that the file we post should be in standard format, and not a blob...?

    Thank you!

    Julio
Sign In or Register to comment.