RapidMiner

Use Document text for macro value

Contributor

Use Document text for macro value

Hi,

I am quite new to RM, but I have searched the internet with no luck. So here is my problem.

I am reading html links via GETPAGES. In the next step I use Data to Documents to retrieve the content of the html. This result in a collection of documents (Everthing is fine at this stage) So I run the Loop Collection Operator for looping through all contained documents and saving each one as a textfile. But I want specify the filename depending on attribute value or some part of the text of the specific textfile.

So I used the extract Information operator to get the specific string (with regex), which should be the filename in the end. So how I can I store this value in a macro or other variable that I am able to use this as the filename?

I can't use the extract Macro operator because it is not an example set, it is a document so I have absolutely no idea how to solve that.

 

I hope my question is not too stupid and I someone can help me,

 

Best regards,

 

Makus

1 REPLY
Elite III

Re: Use Document text for macro value

It would be nice if there was an operator that worked in the same way as Extract Macro from Annotation, but until someone tells me different then the way I do it is to store the document as data with the metadata and then extract that piece of metadata with a macro. 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="136">
        <parameter key="text" value="Here is a document and the filename should be: &lt;myFile.txt&gt;"/>
      </operator>
      <operator activated="true" class="text:extract_information" compatibility="7.3.000" expanded="true" height="68" name="Extract Information" width="90" x="246" y="136">
        <list key="string_machting_queries">
          <parameter key="saveFilename" value="&lt;.&gt;"/>
        </list>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries"/>
        <list key="namespaces"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.3.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="238"/>
      <operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data" width="90" x="514" y="136">
        <parameter key="text_attribute" value="textData"/>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="7.3.000" expanded="true" height="68" name="Extract Macro" width="90" x="648" y="136">
        <parameter key="macro" value="saveFilename"/>
        <parameter key="macro_type" value="data_value"/>
        <parameter key="attribute_name" value="saveFilename"/>
        <parameter key="example_index" value="1"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="7.3.000" expanded="true" height="82" name="Write Document" width="90" x="782" y="238">
        <parameter key="file" value="%{saveFilename}"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
      <connect from_op="Extract Information" from_port="document" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Write Document" to_port="document"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Write Document" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

 

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com