"[SOLVED] How to compare text files for equality"

tennenrishintennenrishin Member Posts: 177  Maven
edited June 2019 in Help
I need to conditionally execute some operators in the event that there is a difference between the file contents of two files (s.txt and last_s.txt), which contain 2 csv fields (namely a regex and a comment) separated by ';'.

If the files contain multiple lines, the only solution I can think of involves looping and comparing each line one-by-one, maintaining a boolean macro along the way. Does anyone have a better idea to compare a whole text file? (Perhaps an easy way to obtain/construct some kind of file hash?)


  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn

    you can use the text extension for this: read the two files as documents, then use Documents to Data to store the complete documents, i.e. the contents of the files, in a single attribute: each row now contains an example with the complete document. For easier comparison, transpose the example set, then use Generate Attributes for comparison.

    Please have a look at the attached process, and replace the Create Document operators with Read Document.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.007">
      <operator activated="true" class="process" compatibility="5.2.007" expanded="true" name="Process">
        <process expanded="true" height="555" width="746">
          <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="112" y="30">
            <parameter key="text" value="Hey&#10;this&#10;is an&#10;exmaple"/>
          <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document (2)" width="90" x="112" y="120">
            <parameter key="text" value="Hey&#10;this&#10;is an&#10;exmaple1"/>
          <operator activated="true" class="text:documents_to_data" compatibility="5.2.001" expanded="true" height="94" name="Documents to Data (2)" width="90" x="313" y="30">
            <parameter key="text_attribute" value="text"/>
          <operator activated="true" class="transpose" compatibility="5.2.007" expanded="true" height="76" name="Transpose" width="90" x="447" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.2.007" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="changed" value="att_1 != att_2"/>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data (2)" to_port="documents 2"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
  • tennenrishintennenrishin Member Posts: 177  Maven
    Thank you!
Sign In or Register to comment.