Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Outlier deection: How to change an outlier value to mean value of the attribute
Ecclesiastes
Member Posts: 1 Learner III
Hi,
I'm working on a high dimensional data (>250 attributes) to compare the different outlier detection methods.
I have already tested CoF and teh Distance-based method. There prodoce total different reults, but that was expected.
However, for forther comarision I like to treat the detected outlier in a simple workflow like this:
1) run outlier detecion
2) replace detectet outlier value with mean value of the attribute
3) run a clasifier on the preprocessed data.
Both, CoF and Density based outlier detection creates a new boolean variable outlier = (true / false)
that means i need just something like a filter, which selcts teh affected value of the attribute and
a simle replacement with the mean value of the attribute.
I have just found a "replace missing value" function which offers the mean replacement,
but not for outlier.
Is there a way, to do this sort of value replacement in rapid miner?
I have used RapidMinder today for the first time, so Im no expert..
Any comments are appreciated
marvin
I'm working on a high dimensional data (>250 attributes) to compare the different outlier detection methods.
I have already tested CoF and teh Distance-based method. There prodoce total different reults, but that was expected.
However, for forther comarision I like to treat the detected outlier in a simple workflow like this:
1) run outlier detecion
2) replace detectet outlier value with mean value of the attribute
3) run a clasifier on the preprocessed data.
Both, CoF and Density based outlier detection creates a new boolean variable outlier = (true / false)
that means i need just something like a filter, which selcts teh affected value of the attribute and
a simle replacement with the mean value of the attribute.
I have just found a "replace missing value" function which offers the mean replacement,
but not for outlier.
Is there a way, to do this sort of value replacement in rapid miner?
I have used RapidMinder today for the first time, so Im no expert..
Any comments are appreciated
marvin
0
Answers
this is possible, but unfortunately a little bit complicated I append a process, that will first perform an outlier detection on artificial data and then select only examples where outlier = true is. Then the process iterates over each example and sets the value of the attribute att1 do unknown, so that you can use the replace missing values operator to assign a new value. Another, probably more elegant solution would be as follows: Sorry, but I hope, that one of the processes will suit your needs.
Greetings,
Sebastian
For future reference, outlier detection operators based on neighbors should not take the parameter (number of neighbors = 1). Because the nearest neighbor (number of neighbors = 1) for a given example is the example itself. This would lead to make distance based outliers detection methods to detect outliers improperly.
Please correct me if I am wrong.
Regards,
--Motaz
I think you are right depending on the definition of neighbor If it excludes the actual point, 1 is a good value. But at least one should asure that it is then meant in a reasonable way, I will try to remember to look in the code.
Greetings,
Sebastian
Warm Greetings
--Motaz