[SOLVED] Remove Duplicates Weird Problem !

aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
Hi  ,

I had used the remove duplicate operator on a dataset with 5 million records on my laptop and on a powerful linux system , it'd run in 7 seconds on both systems,

Now I have a dataset with 8 million records (same number of attributes) , it runs on my laptop in 10 seconds ! but it doesn't finish running on the same remote server ! it took 8 hours but it didn't finish !

It's very weird !!! It's a very simple operation !

How could something like this happen ? Is there anyway to solve it ?



  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Arian,

    maybe the server ran out of memory? Do you use the same memory settings on both machines?

    Best regards,
  • aryan_hosseinzaaryan_hosseinza Member Posts: 74 Contributor II
    Actually It was very wired , I was doing some missing value replacing and now I changed the way I was doing it , it now works (the result of both missing value replacements seems to be the same ! but remove duplication doesn't work on one of them)

    Thanks by the way ,

Sign In or Register to comment.