🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
[SOLVED] Remove duplicates selecting which examples must remain
I have a large customer dataset with some values "duplicate". Let me try to make myself clear: I have a dataset with over 50 attributes of over 200k contracts. One of these attributes is contract_status. Some of these statuses are valid and some are invalid. I've created a boolean attribute named is_valid_status.
I'd like to remove duplicates based on a subset of attributes and keep only the examples where is_valid_status is true.
How can I do it?