RapidMiner

placing files on the cloud server with out conversion to data

SOLVED
Highlighted
Contributor II

placing files on the cloud server with out conversion to data

When doing text analysis I like to use the filter stopwords dictionary. The dictionary allows me to supplement the words that normally are in the stopwords English file. The Stoppard dictionary has a requirement to link to a file. If I store my stopwords list that the text file onto the cloud is converted into data which will not then allow me to proceed without an error because the process is expecting a file and sees data. I've tried changing the resource type from file "repository blob entry" that does not help. Any suggestions would be greatly appreciated.

See more topics labeled with:

2 REPLIES
Moderator

Re: placing files on the cloud server with out conversion to data

Hi Michael,

 

you can store the fileobject you get from e.g. read file in the cloud repo. This file object (purple line) can be useable on the cloud.

 

 The process below depicts how it works.

Best,

Martin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="false" class="text:create_document" compatibility="7.2.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="238">
        <parameter key="text" value="a&#10;b&#10;c"/>
      </operator>
      <operator activated="false" class="text:write_document" compatibility="7.2.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="238"/>
      <operator activated="false" class="store" compatibility="7.3.000" expanded="true" height="68" name="Store" width="90" x="380" y="238">
        <parameter key="repository_entry" value="//Cloud Repository/Forum/Trashpit/Michael/data/Stopwordlist"/>
      </operator>
      <operator activated="true" class="text:create_document" compatibility="7.2.001" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="34">
        <parameter key="text" value="a b c d e f g"/>
      </operator>
      <operator activated="true" class="text:tokenize" compatibility="7.2.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Stopwordlist" width="90" x="380" y="85">
        <parameter key="repository_entry" value="data/Stopwordlist"/>
      </operator>
      <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="7.2.001" expanded="true" height="82" name="Filter Stopwords (Dictionary)" width="90" x="514" y="34"/>
      <operator activated="true" class="store" compatibility="7.3.000" expanded="true" height="68" name="Store (2)" width="90" x="648" y="34">
        <parameter key="repository_entry" value="../data/result"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="file" to_op="Store" to_port="input"/>
      <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize" to_port="document"/>
      <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
      <connect from_op="Retrieve Stopwordlist" from_port="output" to_op="Filter Stopwords (Dictionary)" to_port="file"/>
      <connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_op="Store (2)" to_port="input"/>
      <connect from_op="Store (2)" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="190" resized="true" width="559" x="20" y="185">Use this to store a stopword list in the cloud</description>
    </process>
  </operator>
</process>
--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor II

Re: placing files on the cloud server with out conversion to data- xml code

I had to modify your process so i could read a file from PC.

 

the default setting did not work so here is modified process should anyone else have this problem

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="7.2.001" expanded="true" height="68" name="Read Document" width="90" x="112" y="238">
<parameter key="file" value="C:\Users\Michael\Downloads\stop_words_ignore_2.txt"/>
<parameter key="extract_text_only" value="false"/>
<parameter key="use_file_extension_as_type" value="false"/>
</operator>
<operator activated="true" class="text:write_document" compatibility="7.2.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="238"/>
<operator activated="true" class="store" compatibility="7.3.000" expanded="true" height="68" name="Store" width="90" x="380" y="238">
<parameter key="repository_entry" value="//Cloud Repository/Samples/data/stop_words_ignore_2"/>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Write Document" to_port="document"/>
<connect from_op="Write Document" from_port="file" to_op="Store" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<description align="center" color="yellow" colored="false" height="190" resized="true" width="559" x="43" y="176">Use this to store a stopword list in the cloud</description>
</process>
</operator>
</process>