The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Compare 2 pdf texts

c_sabinec_sabine Member Posts: 8 Contributor I
edited December 2018 in Help


I'm trying to create a process which consist on comparing 2 pdf that are subtly different.

I process my documents (tokenize, filter stopwords, generate n grams...) from two differents files and merge it into one common example set with the operator "Append" and use the operator "Remove duplicates" to see differences in the pdf. Please find attached my process, I have 2 questions :

1) Is it possible to convert my example set result into a wordlist to have a table by row rather than column ?

2) It seems that something went wrong because there are words which are in the 2 files which appears in the output, while it should show words that are in a specific document and whiich is absent in the other one, and so on


Thanks !








  • Options
    c_sabinec_sabine Member Posts: 8 Contributor I

    Please find attached a screen of my process, the second pictures describe what is contained inside the two operators "Process document from files".

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    When you generate the original wordlist from each pdf, you can use "Wordlist to Data" operator to create examplesets of the words and their counts. You could then add a source field (with Generate Attributes or via a macro) for each pdf, and then merge/join those two datasets.  That should enable you to see easily which words are common to both files and which ones are unique to one or the other.


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.