Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"[SOLVED] How to compare text files for equality"

tennenrishintennenrishin Member Posts: 177 Contributor II
edited June 2019 in Help
I need to conditionally execute some operators in the event that there is a difference between the file contents of two files (s.txt and last_s.txt), which contain 2 csv fields (namely a regex and a comment) separated by ';'.

If the files contain multiple lines, the only solution I can think of involves looping and comparing each line one-by-one, maintaining a boolean macro along the way. Does anyone have a better idea to compare a whole text file? (Perhaps an easy way to obtain/construct some kind of file hash?)

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey,

    you can use the text extension for this: read the two files as documents, then use Documents to Data to store the complete documents, i.e. the contents of the files, in a single attribute: each row now contains an example with the complete document. For easier comparison, transpose the example set, then use Generate Attributes for comparison.

    Please have a look at the attached process, and replace the Create Document operators with Read Document.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
        <process expanded="true" height="555" width="746">
          <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="112" y="30">
            <parameter key="text" value="Hey&#10;this&#10;is an&#10;exmaple"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document (2)" width="90" x="112" y="120">
            <parameter key="text" value="Hey&#10;this&#10;is an&#10;exmaple1"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="5.2.001" expanded="true" height="94" name="Documents to Data (2)" width="90" x="313" y="30">
            <parameter key="text_attribute" value="text"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="5.2.007" expanded="true" height="76" name="Transpose" width="90" x="447" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.2.007" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="changed" value="att_1 != att_2"/>
            </list>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data (2)" to_port="documents 2"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    Thank you!
Sign In or Register to comment.