RapidMiner

RapidMiner

[SOLVED] Process for concatenating files ?

Regular Contributor

[SOLVED] Process for concatenating files ?

Is there any process for concatenating files?
7 REPLIES
Moderator

Re: Process for concatenating files ?

Hi,

once you have imported your files via one of the various import operators or one of the import wizards (Files -> Import Data), you can concatenate example sets via the "Append" operator.

Regards,
Marco
_________________________________________________________
Team Lead Software Engineering | RapidMiner GmbH
Regular Contributor

Re: Process for concatenating files ?

Thanks, I already imported the files from a database remotely + my local desktop.

I do not need  to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.

I tell you why:

1. We want to associate a series of files to each other e.g. HR documents for one employee
2. We want to perform text classification on the entire series of documents, not just one
3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file

D
Moderator

Re: Process for concatenating files ?

Hi,

to read multiple files for text classification, you can use the "Process Documents from Files" operator. There should be plenty of help available in the forums because text mining questions are pretty common Smiley Wink

Regards,
Marco
_________________________________________________________
Team Lead Software Engineering | RapidMiner GmbH
mdc
Regular Contributor

Re: Process for concatenating files ?


Hi,

I quickly created a process to read files, concatenate the contents and write to another file. See if you can adapt this to your needs.

enjoy,
Matthew



<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
        <parameter key="text" value="This is to initialize the content of Remember/Recall operators.&#10;&#10;"/>
      </operator>
      <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember (2)" width="90" x="246" y="75">
        <parameter key="name" value="doc"/>
        <parameter key="io_object" value="Document"/>
      </operator>
      <operator activated="true" class="loop_files" compatibility="5.3.005" expanded="true" height="60" name="Loop Files" width="90" x="380" y="165">
        <parameter key="directory" value="/Users/mdc/Texts"/>
        <process expanded="true">
          <operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document (2)" width="90" x="112" y="120">
            <parameter key="file" value="%{file_path}"/>
          </operator>
          <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <operator activated="true" class="text:combine_documents" compatibility="5.3.000" expanded="true" height="94" name="Combine Documents (2)" width="90" x="313" y="120"/>
          <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember" width="90" x="447" y="120">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <connect from_op="Read Document (2)" from_port="output" to_op="Combine Documents (2)" to_port="documents 2"/>
          <connect from_op="Recall" from_port="result" to_op="Combine Documents (2)" to_port="documents 1"/>
          <connect from_op="Combine Documents (2)" from_port="document" to_op="Remember" to_port="store"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall (2)" width="90" x="514" y="75">
        <parameter key="name" value="doc"/>
        <parameter key="io_object" value="Document"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="5.3.000" expanded="true" height="76" name="Write Document" width="90" x="648" y="75">
        <parameter key="file" value="/Users/matthewgarong/concatenated_text.txt"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Remember (2)" to_port="store"/>
      <connect from_op="Recall (2)" from_port="result" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regular Contributor

Re: Process for concatenating files ?

Thanx Matthew

It works! Impressed.

Could you kindly tell me how to make this a separate process by itself, like an IO box to use in other processes? I am not sure how to do this in general i.e. making my own processes from others

Dara
mdc
Regular Contributor

Re: Process for concatenating files ?



To make it a separate process - Save and call from  your process using 'Execute Process' operator. I have not tried this though.
You can also add this to your process - just copy and paste to your process (at top level or inside a 'Subprocess' operator. Do  this in the Process window, not in XML.

Matthew
Regular Contributor

Re: Process for concatenating files ?

Thanx mdc

Got it to work, really appreciate everyone's help
D